docker-llama2-chat  by soulteary

Docker image for LLaMA2 inference

Created 2 years ago
541 stars

Top 58.8% on SourcePulse

GitHubView on GitHub
Project Summary

This project provides a streamlined Docker-based solution for deploying and interacting with Meta's LLaMA2 models, including official English versions, a Chinese variant, and quantized INT4 versions for reduced VRAM usage. It targets developers and researchers seeking a quick, reproducible setup for local LLM experimentation, enabling chat functionalities with minimal configuration.

How It Works

The project leverages Docker to encapsulate LLaMA2 models and their dependencies, simplifying deployment across different environments. It supports multiple model variants: official Hugging Face releases (7B and 13B), a community-contributed Chinese version, and INT4 quantized models using the Transformers library for reduced VRAM requirements. For CPU-only inference, it integrates with llama.cpp for GGML quantized models.

Quick Start & Requirements

  • Install/Run: Execute provided shell scripts (e.g., scripts/make-7b.sh, scripts/run-7b.sh).
  • Prerequisites: Docker, sufficient disk space for models, and VRAM as follows: 5GB for INT4, 8-14GB for 7B/13B official/Chinese models. CPU inference is also supported.
  • Setup: Model download and Docker image build are automated by scripts.
  • Docs: ENGLISH

Highlighted Details

  • Supports official LLaMA2 7B/13B chat models.
  • Includes a Chinese LLaMA2 7B variant.
  • Offers INT4 quantization for 5GB VRAM usage.
  • Enables CPU-only inference via GGML (llama.cpp).

Maintenance & Community

The project is maintained by soulteary. Links to Hugging Face repositories for models are provided.

Licensing & Compatibility

The project itself appears to be unencumbered by a specific license in the README. However, it utilizes Meta's LLaMA2 models, which are subject to Meta's own license and acceptable use policy. Compatibility for commercial use depends on the underlying LLaMA2 model license.

Limitations & Caveats

The README does not explicitly state the license for the project's scripts or Dockerfiles. Users must adhere to Meta's LLaMA2 license and Responsible Use Guide. Model performance and capabilities are dependent on the specific LLaMA2 variant chosen.

Health Check
Last Commit

2 years ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
0 stars in the last 30 days

Explore Similar Projects

Starred by Wing Lian Wing Lian(Founder of Axolotl AI) and Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems").

airllm by lyogavin

0.1%
6k
Inference optimization for LLMs on low-resource hardware
Created 2 years ago
Updated 2 weeks ago
Feedback? Help us improve.