aici by microsoft

AICI constrains LLM output using (Wasm) programs

Created 2 years ago

2,060 stars

Top 21.4% on SourcePulse

View on GitHub

12 Experts Love This Project

Tobi Lutke

Cofounder of Shopify

Jeff Hammerbacher

Cofounder of Cloudera

Elvis Saravia

Founder of DAIR.AI

Chip Huyen

Author of "AI Engineering", "Designing Machine Learning Systems"

and 8 more!

Project Summary

AICI (Artificial Intelligence Controller Interface) provides a framework for real-time control and constraint of Large Language Model (LLM) output. It enables developers to build flexible "Controllers" that dictate token-by-token generation, manage state, and integrate custom logic, targeting researchers and developers seeking fine-grained control over LLM responses.

How It Works

Controllers are implemented as WebAssembly (Wasm) modules, allowing them to run efficiently on the CPU in parallel with the LLM's GPU-based token generation. This approach minimizes overhead and allows controllers to be written in various languages that compile to Wasm, such as Rust, C++, or interpreted languages like Python and JavaScript. AICI abstracts LLM inference details, aiming for portability across different inference engines.

Quick Start & Requirements

Installation: Requires Rust toolchain, Python 3.11+, and specific system dependencies (e.g., build-essential, cmake, clang).
LLM Backend: Integrates with llama.cpp (via rllm-llamacpp) and libtorch/CUDA (via rllm-cuda). The CUDA backend requires NVIDIA GPUs with compute capability 8.0+.
Setup: Detailed setup instructions are provided for WSL/Linux/macOS, with a recommended devcontainer for easier CUDA setup.
Running: Use ./server.sh to start the rLLM server and ./aici.sh run <script> to execute controllers.
Documentation: QuickStart: Example Walkthrough

Highlighted Details

Controllers are sandboxed Wasm modules, enhancing security by restricting filesystem, network, and other resource access.
Supports multiple controller implementations (e.g., pyctrl for Python, jsctrl for JavaScript) and aims to support higher-level libraries like Guidance and LMQL.
Performance claims indicate minimal overhead (0.2-2.0ms per token for common constraints) on an AMD EPYC 7V13 with NVIDIA A100 GPU.
Offers flexibility for complex control strategies including backtracking KV-cache, forking generations, and inter-fork communication.

Maintenance & Community

Actively maintained by Microsoft Research.
Contributions are welcome via pull requests, requiring agreement to a CLA.
Follows the Microsoft Open Source Code of Conduct.

Licensing & Compatibility

Licensed under the MIT License.
Compatible with commercial use and closed-source linking.

Limitations & Caveats

AICI is described as a prototype.
The vLLM integration is noted as out-of-date, recommending rLLM-cuda or rLLM-llama.cpp instead.
Native Windows support is tracked as a future enhancement.

Health Check

Last Commit

11 months ago

Responsiveness

1 day

Pull Requests (30d)

Issues (30d)

Star History

2 stars in the last 30 days