*Deprecated* minimal example for loading and running Llama 3 models
Top 1.3% on sourcepulse
This repository provides a minimal example for loading and running inference with Meta's Llama 3 large language models, available in 8B and 70B parameter sizes. It targets developers, researchers, and businesses seeking to integrate advanced LLM capabilities into their applications, offering a starting point for experimentation and scaling.
How It Works
The project leverages PyTorch and CUDA for efficient inference. It provides scripts for loading pre-trained and instruction-tuned models, with specific formatting requirements for chat-based interactions. The architecture supports model parallelism (MP) for scaling inference across multiple GPUs, with MP values of 1 for 8B models and 8 for 70B models.
Quick Start & Requirements
pip install -e .
within a Conda environment with PyTorch/CUDA.huggingface-cli download meta-llama/Meta-Llama-3-8B-Instruct --include "original/*"
).wget
, md5sum
, PyTorch, CUDA.torchrun --nproc_per_node <MP_value> example_chat_completion.py --ckpt_dir <model_path> --tokenizer_path <tokenizer_path> ...
Highlighted Details
Maintenance & Community
This repository is marked as deprecated, with functionality migrated to several new repositories: llama-models
, PurpleLlama
, llama-toolchain
, llama-agentic-system
, and llama-cookbook
. Users are directed to these new repos for ongoing development and support.
Licensing & Compatibility
The models and weights are licensed for research and commercial entities, with an accompanying Acceptable Use Policy.
Limitations & Caveats
This repository is deprecated and serves only as a minimal example. All active development and support have moved to newer, specialized repositories within the Llama Stack.
6 months ago
Inactive