torchchat  by pytorch

PyTorch-native SDK for local LLM inference across diverse platforms

Created 1 year ago
3,609 stars

Top 13.5% on SourcePulse

GitHubView on GitHub
Project Summary

torchchat enables running PyTorch Large Language Models (LLMs) locally across servers, desktops, and mobile devices. It targets developers and power users seeking a flexible, PyTorch-native solution for LLM deployment, offering Python, C++, and mobile (iOS/Android) interfaces with performance optimizations.

How It Works

torchchat leverages PyTorch's native capabilities, including eager execution, compilation via AOT Inductor for optimized desktop/server deployment, and ExecuTorch for mobile optimization. This PyTorch-centric approach emphasizes simplicity, extensibility, and correctness, allowing for modular integration and customization of LLM execution.

Quick Start & Requirements

  • Install: Clone the repo, create a virtual environment, and run ./install/install_requirements.sh.
  • Prerequisites: Python 3.10+, Hugging Face account and CLI login for model downloads.
  • Resources: Requires sufficient RAM for the chosen LLM (e.g., 8GB+ for some models).
  • Docs: Customization Guide, Multimodal Guide

Highlighted Details

  • Multimodal support for Llama 3.2 11B Vision.
  • Command-line interface for popular LLMs (Llama 3, Mistral, etc.).
  • Support for various data types (FP32, FP16, BF16) and quantization schemes.
  • Native execution via AOT Inductor (for C++ runner) and ExecuTorch (for mobile).

Maintenance & Community

  • Active development with recent updates for DeepSeek R1 Distill and Llama 3.2 multimodal support.
  • Community engagement via Discord for support and contributions.
  • CONTRIBUTING guide available.

Licensing & Compatibility

  • BSD 3-Clause license for torchchat, with MIT and Apache licenses for additional code.
  • Compatibility with commercial use is generally permissive, but users must comply with third-party model terms of service.

Limitations & Caveats

The eval feature is noted as a work in progress. Some model access requires requesting permission via Hugging Face. The README includes a disclaimer about potential performance and compatibility differences compared to original model versions.

Health Check
Last Commit

1 week ago

Responsiveness

1 day

Pull Requests (30d)
2
Issues (30d)
0
Star History
12 stars in the last 30 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems").

JittorLLMs by Jittor

0.0%
2k
Low-resource LLM inference library
Created 2 years ago
Updated 6 months ago
Feedback? Help us improve.