cog-llama-template by replicate

Cog template for LLaMA model deployment

Created 2 years ago

304 stars

Top 88.2% on SourcePulse

View on GitHub

1 Expert Loves This Project

Travis Fischer

Founder of Agentic

Project Summary

This repository provides a template for packaging Meta's LLaMA language models using Cog, enabling deployment as a web service or API on Replicate. It targets researchers and developers needing to run LLaMA models in the cloud, offering a streamlined path to cloud-based inference.

How It Works

The template leverages Cog to containerize LLaMA models, facilitating easy deployment. It includes scripts for converting PyTorch checkpoints to Hugging Face Transformers format and then "tensorizing" them for faster cold-starts. This approach abstracts away complex Docker and dependency management, allowing users to focus on model integration and prompt engineering.

Quick Start & Requirements

Install Cog: sudo curl -o /usr/local/bin/cog -L "https://github.com/replicate/cog/releases/latest/download/cog_$(uname -s)_$(uname -m)" && sudo chmod +x /usr/local/bin/cog
Prerequisites: LLaMA weights (apply via Meta Research), Linux machine with NVIDIA GPU and NVIDIA Container Toolkit, Docker.
Setup: Requires downloading and converting LLaMA weights, then tensorizing them.
Docs: replicate.com/meta

Highlighted Details

Supports LLaMA 1 & 2 models (7B, 13B, 70B parameters).
Includes weight conversion to Hugging Face format and tensorization for performance.
Facilitates deployment to Replicate for cloud-based API access.
Designed for research purposes; commercial use is restricted by LLaMA's license.

Maintenance & Community

Generated by Marco Mascorro (@mascobot).
Follows the all-contributors specification; contributions are welcome.

Licensing & Compatibility

License: The template itself appears to be open-source, but the use of LLaMA models is governed by Meta's research-focused license, which restricts commercial use.
Compatibility: Designed for use with Replicate's platform.

Limitations & Caveats

This is an experimental branch that depends on exllama and requires specific Git checkout.
LLaMA models are for research purposes only and not intended for commercial use.
Users must obtain LLaMA weights separately from Meta Research.
Prompting requires careful construction as LLaMA is not fine-tuned for conversational Q&A.

Health Check

Last Commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

0 stars in the last 30 days