llama-chat  by randaller

Chat app for local LLaMA model inference

created 2 years ago
837 stars

Top 43.4% on sourcepulse

GitHubView on GitHub
Project Summary

This repository provides an easy-to-use interface for running Meta's LLaMA large language models on home PCs. It targets users with NVIDIA GPUs and sufficient RAM, enabling local chat interactions and fine-tuning capabilities.

How It Works

The project leverages PyTorch and Hugging Face Transformers for LLaMA model inference and training. It supports both raw model weights requiring manual merging and a Hugging Face version that handles automatic downloading and caching. The implementation allows for flexible generation parameter tuning, including temperature, top-p, and top-k sampling, with options for repetition penalty and custom stop sequences.

Quick Start & Requirements

  • Install: Clone the repo, create a Conda environment (conda create -n llama python=3.10), activate it (conda activate llama), install PyTorch with CUDA 11.7 (conda install pytorch torchvision torchaudio pytorch-cuda=11.7 -c pytorch -c nvidia), install requirements (pip install -r requirements.txt), and the package (pip install -e .).
  • Prerequisites: NVIDIA GPU (2GB VRAM minimum, more recommended), 64-128GB RAM (192GB ideal for 65B models), Python 3.10.
  • Model Download: For raw weights, download via torrent and merge using python merge-weights.py. For HF version, models are downloaded automatically.
  • Run: python example-chat.py ./model ./tokenizer/tokenizer.model (for raw weights) or python hf-chat-example.py (for HF version).
  • Docs: https://github.com/randaller/llama-chat

Highlighted Details

  • Supports LLaMA models from 7B to 65B parameters.
  • Offers both inference and fine-tuning capabilities via Hugging Face integration.
  • Allows offloading to GPU using accelerate for memory optimization.
  • Demonstrates fine-tuning for Stable Diffusion prompt generation.

Maintenance & Community

The project is based on several foundational LLaMA repositories. Community interaction and prompt sharing occur via GitHub Issues.

Licensing & Compatibility

The repository itself appears to be un-licensed, but it is heavily based on Meta's LLaMA, which has its own usage restrictions. The Hugging Face models are subject to their respective licenses. Commercial use is likely restricted by the underlying LLaMA license.

Limitations & Caveats

Running larger models (30B+) requires substantial RAM (48GB+) and potentially slow inference on lower-end hardware or with limited VRAM. The project relies on downloading LLaMA weights, which may have distribution restrictions.

Health Check
Last commit

2 years ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
2 stars in the last 90 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), Ying Sheng Ying Sheng(Author of SGLang), and
9 more.

alpaca-lora by tloen

0.0%
19k
LoRA fine-tuning for LLaMA
created 2 years ago
updated 1 year ago
Feedback? Help us improve.