torchchat  by pytorch

PyTorch-native SDK for local LLM inference across diverse platforms

created 1 year ago
3,602 stars

Top 13.7% on sourcepulse

GitHubView on GitHub
Project Summary

torchchat enables running PyTorch Large Language Models (LLMs) locally across servers, desktops, and mobile devices. It targets developers and power users seeking a flexible, PyTorch-native solution for LLM deployment, offering Python, C++, and mobile (iOS/Android) interfaces with performance optimizations.

How It Works

torchchat leverages PyTorch's native capabilities, including eager execution, compilation via AOT Inductor for optimized desktop/server deployment, and ExecuTorch for mobile optimization. This PyTorch-centric approach emphasizes simplicity, extensibility, and correctness, allowing for modular integration and customization of LLM execution.

Quick Start & Requirements

  • Install: Clone the repo, create a virtual environment, and run ./install/install_requirements.sh.
  • Prerequisites: Python 3.10+, Hugging Face account and CLI login for model downloads.
  • Resources: Requires sufficient RAM for the chosen LLM (e.g., 8GB+ for some models).
  • Docs: Customization Guide, Multimodal Guide

Highlighted Details

  • Multimodal support for Llama 3.2 11B Vision.
  • Command-line interface for popular LLMs (Llama 3, Mistral, etc.).
  • Support for various data types (FP32, FP16, BF16) and quantization schemes.
  • Native execution via AOT Inductor (for C++ runner) and ExecuTorch (for mobile).

Maintenance & Community

  • Active development with recent updates for DeepSeek R1 Distill and Llama 3.2 multimodal support.
  • Community engagement via Discord for support and contributions.
  • CONTRIBUTING guide available.

Licensing & Compatibility

  • BSD 3-Clause license for torchchat, with MIT and Apache licenses for additional code.
  • Compatibility with commercial use is generally permissive, but users must comply with third-party model terms of service.

Limitations & Caveats

The eval feature is noted as a work in progress. Some model access requires requesting permission via Hugging Face. The README includes a disclaimer about potential performance and compatibility differences compared to original model versions.

Health Check
Last commit

3 weeks ago

Responsiveness

1 day

Pull Requests (30d)
1
Issues (30d)
0
Star History
50 stars in the last 90 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems).

JittorLLMs by Jittor

0%
2k
Low-resource LLM inference library
created 2 years ago
updated 5 months ago
Starred by Andrej Karpathy Andrej Karpathy(Founder of Eureka Labs; Formerly at Tesla, OpenAI; Author of CS 231n), Zhuohan Li Zhuohan Li(Author of vLLM), and
6 more.

torchtitan by pytorch

0.9%
4k
PyTorch platform for generative AI model training research
created 1 year ago
updated 1 day ago
Starred by Lewis Tunstall Lewis Tunstall(Researcher at Hugging Face), Patrick von Platen Patrick von Platen(Core Contributor to Hugging Face Transformers and Diffusers), and
5 more.

torchtune by pytorch

0.2%
5k
PyTorch library for LLM post-training and experimentation
created 1 year ago
updated 1 day ago
Starred by Jeff Hammerbacher Jeff Hammerbacher(Cofounder of Cloudera), Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), and
2 more.

serve by pytorch

0.1%
4k
Serve, optimize, and scale PyTorch models in production
created 5 years ago
updated 3 weeks ago
Starred by Patrick von Platen Patrick von Platen(Core Contributor to Hugging Face Transformers and Diffusers), Michael Han Michael Han(Cofounder of Unsloth), and
1 more.

ktransformers by kvcache-ai

0.4%
15k
Framework for LLM inference optimization experimentation
created 1 year ago
updated 2 days ago
Feedback? Help us improve.