Discover and explore top open-source AI tools and projects—updated daily.
algorithmicsuperintelligenceOptimizing inference proxy for LLMs
Top 15.6% on SourcePulse
OptiLLM is an OpenAI API-compatible inference proxy designed to enhance LLM performance and accuracy, particularly for coding, logical, and mathematical tasks. It targets developers and researchers seeking to improve LLM reasoning capabilities through advanced inference-time techniques.
How It Works
OptiLLM implements state-of-the-art techniques like Mixture of Agents (MoA), Monte Carlo Tree Search (MCTS), and Chain-of-Thought (CoT) decoding. These methods augment LLM responses by performing additional computations at inference time, aiming to surpass frontier models on complex queries. The proxy supports various optimization approaches, selectable via model name prefixes, extra_body parameters, or prompt tags.
Quick Start & Requirements
pip install optillm or use Docker (docker pull ghcr.io/codelion/optillm:latest).HF_TOKEN. Supports various LLM providers via environment variables (e.g., OPENAI_API_KEY, GEMINI_API_KEY).python optillm.py or docker run -p 8000:8000 ghcr.io/codelion/optillm:latest. Set base_url to http://localhost:8000/v1 in OpenAI client.Highlighted Details
Maintenance & Community
The project is actively developed, with contributions from Asankhaya Sharma. Community channels are not explicitly mentioned in the README.
Licensing & Compatibility
The project is available under an unspecified license. Its OpenAI API compatibility allows integration with existing tools and frameworks.
Limitations & Caveats
Some optimization techniques (e.g., cot_decoding, entropy_decoding) are not supported when using external servers like Anthropic API, llama.cpp, or Ollama due to their lack of multi-response sampling.
3 days ago
1 day
EricLBuehler
openai