aici  by microsoft

AICI constrains LLM output using (Wasm) programs

created 1 year ago
2,042 stars

Top 22.2% on sourcepulse

GitHubView on GitHub
Project Summary

AICI (Artificial Intelligence Controller Interface) provides a framework for real-time control and constraint of Large Language Model (LLM) output. It enables developers to build flexible "Controllers" that dictate token-by-token generation, manage state, and integrate custom logic, targeting researchers and developers seeking fine-grained control over LLM responses.

How It Works

Controllers are implemented as WebAssembly (Wasm) modules, allowing them to run efficiently on the CPU in parallel with the LLM's GPU-based token generation. This approach minimizes overhead and allows controllers to be written in various languages that compile to Wasm, such as Rust, C++, or interpreted languages like Python and JavaScript. AICI abstracts LLM inference details, aiming for portability across different inference engines.

Quick Start & Requirements

  • Installation: Requires Rust toolchain, Python 3.11+, and specific system dependencies (e.g., build-essential, cmake, clang).
  • LLM Backend: Integrates with llama.cpp (via rllm-llamacpp) and libtorch/CUDA (via rllm-cuda). The CUDA backend requires NVIDIA GPUs with compute capability 8.0+.
  • Setup: Detailed setup instructions are provided for WSL/Linux/macOS, with a recommended devcontainer for easier CUDA setup.
  • Running: Use ./server.sh to start the rLLM server and ./aici.sh run <script> to execute controllers.
  • Documentation: QuickStart: Example Walkthrough

Highlighted Details

  • Controllers are sandboxed Wasm modules, enhancing security by restricting filesystem, network, and other resource access.
  • Supports multiple controller implementations (e.g., pyctrl for Python, jsctrl for JavaScript) and aims to support higher-level libraries like Guidance and LMQL.
  • Performance claims indicate minimal overhead (0.2-2.0ms per token for common constraints) on an AMD EPYC 7V13 with NVIDIA A100 GPU.
  • Offers flexibility for complex control strategies including backtracking KV-cache, forking generations, and inter-fork communication.

Maintenance & Community

  • Actively maintained by Microsoft Research.
  • Contributions are welcome via pull requests, requiring agreement to a CLA.
  • Follows the Microsoft Open Source Code of Conduct.

Licensing & Compatibility

  • Licensed under the MIT License.
  • Compatible with commercial use and closed-source linking.

Limitations & Caveats

  • AICI is described as a prototype.
  • The vLLM integration is noted as out-of-date, recommending rLLM-cuda or rLLM-llama.cpp instead.
  • Native Windows support is tracked as a future enhancement.
Health Check
Last commit

6 months ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
28 stars in the last 90 days

Explore Similar Projects

Starred by Andrej Karpathy Andrej Karpathy(Founder of Eureka Labs; Formerly at Tesla, OpenAI; Author of CS 231n), Nat Friedman Nat Friedman(Former CEO of GitHub), and
32 more.

llama.cpp by ggml-org

0.4%
84k
C/C++ library for local LLM inference
created 2 years ago
updated 14 hours ago
Feedback? Help us improve.