Docker image for LLaMA2 inference
Top 59.5% on sourcepulse
This project provides a streamlined Docker-based solution for deploying and interacting with Meta's LLaMA2 models, including official English versions, a Chinese variant, and quantized INT4 versions for reduced VRAM usage. It targets developers and researchers seeking a quick, reproducible setup for local LLM experimentation, enabling chat functionalities with minimal configuration.
How It Works
The project leverages Docker to encapsulate LLaMA2 models and their dependencies, simplifying deployment across different environments. It supports multiple model variants: official Hugging Face releases (7B and 13B), a community-contributed Chinese version, and INT4 quantized models using the Transformers library for reduced VRAM requirements. For CPU-only inference, it integrates with llama.cpp for GGML quantized models.
Quick Start & Requirements
scripts/make-7b.sh
, scripts/run-7b.sh
).Highlighted Details
Maintenance & Community
The project is maintained by soulteary. Links to Hugging Face repositories for models are provided.
Licensing & Compatibility
The project itself appears to be unencumbered by a specific license in the README. However, it utilizes Meta's LLaMA2 models, which are subject to Meta's own license and acceptable use policy. Compatibility for commercial use depends on the underlying LLaMA2 model license.
Limitations & Caveats
The README does not explicitly state the license for the project's scripts or Dockerfiles. Users must adhere to Meta's LLaMA2 license and Responsible Use Guide. Model performance and capabilities are dependent on the specific LLaMA2 variant chosen.
1 year ago
1 day