cog-llama-template  by replicate

Cog template for LLaMA model deployment

created 2 years ago
307 stars

Top 88.4% on sourcepulse

GitHubView on GitHub
1 Expert Loves This Project
Project Summary

This repository provides a template for packaging Meta's LLaMA language models using Cog, enabling deployment as a web service or API on Replicate. It targets researchers and developers needing to run LLaMA models in the cloud, offering a streamlined path to cloud-based inference.

How It Works

The template leverages Cog to containerize LLaMA models, facilitating easy deployment. It includes scripts for converting PyTorch checkpoints to Hugging Face Transformers format and then "tensorizing" them for faster cold-starts. This approach abstracts away complex Docker and dependency management, allowing users to focus on model integration and prompt engineering.

Quick Start & Requirements

  • Install Cog: sudo curl -o /usr/local/bin/cog -L "https://github.com/replicate/cog/releases/latest/download/cog_$(uname -s)_$(uname -m)" && sudo chmod +x /usr/local/bin/cog
  • Prerequisites: LLaMA weights (apply via Meta Research), Linux machine with NVIDIA GPU and NVIDIA Container Toolkit, Docker.
  • Setup: Requires downloading and converting LLaMA weights, then tensorizing them.
  • Docs: replicate.com/meta

Highlighted Details

  • Supports LLaMA 1 & 2 models (7B, 13B, 70B parameters).
  • Includes weight conversion to Hugging Face format and tensorization for performance.
  • Facilitates deployment to Replicate for cloud-based API access.
  • Designed for research purposes; commercial use is restricted by LLaMA's license.

Maintenance & Community

  • Generated by Marco Mascorro (@mascobot).
  • Follows the all-contributors specification; contributions are welcome.

Licensing & Compatibility

  • License: The template itself appears to be open-source, but the use of LLaMA models is governed by Meta's research-focused license, which restricts commercial use.
  • Compatibility: Designed for use with Replicate's platform.

Limitations & Caveats

  • This is an experimental branch that depends on exllama and requires specific Git checkout.
  • LLaMA models are for research purposes only and not intended for commercial use.
  • Users must obtain LLaMA weights separately from Meta Research.
  • Prompting requires careful construction as LLaMA is not fine-tuned for conversational Q&A.
Health Check
Last commit

1 year ago

Responsiveness

1 week

Pull Requests (30d)
0
Issues (30d)
0
Star History
0 stars in the last 90 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), Jeff Hammerbacher Jeff Hammerbacher(Cofounder of Cloudera), and
3 more.

LLaMA-Adapter by OpenGVLab

0.0%
6k
Efficient fine-tuning for instruction-following LLaMA models
created 2 years ago
updated 1 year ago
Feedback? Help us improve.