docker-llama2-chat  by soulteary

Docker image for LLaMA2 inference

created 2 years ago
542 stars

Top 59.5% on sourcepulse

GitHubView on GitHub
Project Summary

This project provides a streamlined Docker-based solution for deploying and interacting with Meta's LLaMA2 models, including official English versions, a Chinese variant, and quantized INT4 versions for reduced VRAM usage. It targets developers and researchers seeking a quick, reproducible setup for local LLM experimentation, enabling chat functionalities with minimal configuration.

How It Works

The project leverages Docker to encapsulate LLaMA2 models and their dependencies, simplifying deployment across different environments. It supports multiple model variants: official Hugging Face releases (7B and 13B), a community-contributed Chinese version, and INT4 quantized models using the Transformers library for reduced VRAM requirements. For CPU-only inference, it integrates with llama.cpp for GGML quantized models.

Quick Start & Requirements

  • Install/Run: Execute provided shell scripts (e.g., scripts/make-7b.sh, scripts/run-7b.sh).
  • Prerequisites: Docker, sufficient disk space for models, and VRAM as follows: 5GB for INT4, 8-14GB for 7B/13B official/Chinese models. CPU inference is also supported.
  • Setup: Model download and Docker image build are automated by scripts.
  • Docs: ENGLISH

Highlighted Details

  • Supports official LLaMA2 7B/13B chat models.
  • Includes a Chinese LLaMA2 7B variant.
  • Offers INT4 quantization for 5GB VRAM usage.
  • Enables CPU-only inference via GGML (llama.cpp).

Maintenance & Community

The project is maintained by soulteary. Links to Hugging Face repositories for models are provided.

Licensing & Compatibility

The project itself appears to be unencumbered by a specific license in the README. However, it utilizes Meta's LLaMA2 models, which are subject to Meta's own license and acceptable use policy. Compatibility for commercial use depends on the underlying LLaMA2 model license.

Limitations & Caveats

The README does not explicitly state the license for the project's scripts or Dockerfiles. Users must adhere to Meta's LLaMA2 license and Responsible Use Guide. Model performance and capabilities are dependent on the specific LLaMA2 variant chosen.

Health Check
Last commit

1 year ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
0
Star History
1 stars in the last 90 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), Jeff Hammerbacher Jeff Hammerbacher(Cofounder of Cloudera), and
3 more.

LLaMA-Adapter by OpenGVLab

0.0%
6k
Efficient fine-tuning for instruction-following LLaMA models
created 2 years ago
updated 1 year ago
Feedback? Help us improve.