xgrammar  by mlc-ai

Library for efficient structured generation

Created 1 year ago
1,252 stars

Top 31.6% on SourcePulse

GitHubView on GitHub
Project Summary

XGrammar is an open-source library designed for efficient, flexible, and portable structured generation with Large Language Models (LLMs). It targets developers and researchers seeking to enforce specific output formats, such as JSON or code, by leveraging context-free grammars. The library aims to provide zero-overhead integration into LLM inference engines, enabling faster and more reliable structured outputs.

How It Works

XGrammar utilizes context-free grammars to guide the generation process, supporting a wide array of output structures. Its core advantage lies in a minimal, portable C++ backend that is co-designed with LLM inference engines. This tight integration allows for zero-overhead structured generation, meaning the grammar constraints are applied directly within the inference loop without significant performance penalties.

Quick Start & Requirements

  • Installation: pip install xgrammar
  • Prerequisites: Python. No specific hardware or CUDA versions are mentioned as mandatory for basic installation.
  • Documentation: https://xgrammar.mlc.ai/docs/

Highlighted Details

  • Zero-overhead structured generation through tight integration with LLM inference.
  • Supports general context-free grammars for broad structural flexibility.
  • Minimal and portable C++ backend for easy integration across environments.
  • Official integrations with TensorRT-LLM, vLLM, and SGLang.

Maintenance & Community

XGrammar is actively developed by the mlc-ai team, with recent integrations into major LLM inference frameworks like vLLM and TensorRT-LLM indicating strong community adoption and development momentum. Further details on talks and presentations are available in the README.

Licensing & Compatibility

The project is licensed under the Apache 2.0 license, which permits commercial use and integration into closed-source projects.

Limitations & Caveats

While the library supports general context-free grammars, the complexity and performance of generation may still be influenced by the LLM's inherent capabilities and the specific grammar's complexity. The project is relatively new, with its first official release in late 2024.

Health Check
Last Commit

21 hours ago

Responsiveness

1 week

Pull Requests (30d)
32
Issues (30d)
5
Star History
83 stars in the last 30 days

Explore Similar Projects

Starred by Shizhe Diao Shizhe Diao(Author of LMFlow; Research Scientist at NVIDIA), Yineng Zhang Yineng Zhang(Inference Lead at SGLang; Research Scientist at Together AI), and
8 more.

EAGLE by SafeAILab

10.6%
2k
Speculative decoding research paper for faster LLM inference
Created 1 year ago
Updated 1 week ago
Starred by Andrej Karpathy Andrej Karpathy(Founder of Eureka Labs; Formerly at Tesla, OpenAI; Author of CS 231n), Lei Zhang Lei Zhang(Director Engineering AI at AMD), and
23 more.

gpt-fast by meta-pytorch

0.2%
6k
PyTorch text generation for efficient transformer inference
Created 1 year ago
Updated 3 weeks ago
Feedback? Help us improve.