Discover and explore top open-source AI tools and projects—updated daily.
z-labEfficient LLM inference via novel quantization
Top 89.7% on SourcePulse
Summary
ParoQuant, presented at ICLR 2026, addresses efficient Large Language Model (LLM) inference by introducing state-of-the-art INT4 quantization. Targeting engineers and researchers, it significantly reduces model size and computational overhead while preserving accuracy, enabling faster and more accessible LLM deployment for reasoning tasks.
How It Works
The core innovation lies in learned pairwise rotations, a technique designed to effectively suppress weight outliers within LLM architectures. This approach allows ParoQuant to achieve INT4 quantization accuracy comparable to FP16 models, a significant improvement over traditional methods. The architecture is optimized for high-speed inference, rivaling established techniques like AWQ.
Quick Start & Requirements
Installation is straightforward via pip: pip install "paroquant[vllm]" for NVIDIA GPUs (CUDA 12.9/13.0) or pip install "paroquant[mlx]" for Apple Silicon. Docker images are available for chat and API serving on NVIDIA GPUs. Models are hosted on Hugging Face. Specific CUDA versions (12.9, 13.0) and associated vLLM/PyTorch versions are required for NVIDIA GPU setups.
Highlighted Details
Maintenance & Community
The project's main branch is under active development. Reproducibility for the ICLR 2026 paper is guaranteed on a legacy branch. No specific community channels (e.g., Discord, Slack) or roadmap details are provided in the README.
Licensing & Compatibility
The repository's license is not explicitly stated in the README. This absence prevents a definitive assessment of compatibility for commercial use or integration into closed-source projects.
Limitations & Caveats
The primary limitation is the lack of guaranteed reproducibility on the main development branch; users requiring stable, paper-verified results must utilize the legacy branch. Furthermore, the absence of explicit licensing information poses a significant adoption risk for commercial applications.
1 week ago
Inactive
mit-han-lab
mit-han-lab
Tiiny-AI
lyogavin