Discover and explore top open-source AI tools and projects—updated daily.
High-performance LLM inference framework
Top 31.1% on SourcePulse
Chitu is a high-performance inference framework for large language models, designed for production-grade deployment with a focus on efficiency, flexibility, and scalability. It targets enterprises and researchers needing to deploy LLMs from small-scale experiments to large-scale clusters, offering optimized performance across diverse hardware and deployment scenarios.
How It Works
Chitu employs a highly optimized inference engine that supports various quantization techniques, including FP8 and FP4, to reduce memory footprint and increase throughput. It features advanced parallelism strategies (Tensor Parallelism and Pipeline Parallelism) and efficient operator implementations for NVIDIA GPUs and domestic accelerators. The framework prioritizes long-term stability for production environments and offers features like CPU+GPU heterogeneous inference.
Quick Start & Requirements
--recursive
and install dependencies via pip install -r requirements-build.txt
, pip install -U torch --index-url https://download.pytorch.org/whl/cu124
(replace cu124
with your CUDA version), and pip install --no-build-isolation .
.Highlighted Details
Maintenance & Community
Licensing & Compatibility
Limitations & Caveats
23 hours ago
1 day