squeeze-evolve  by squeeze-evolve

Multi-model orchestration for verifier-free evolutionary LLM scaling

Created 1 month ago
261 stars

Top 97.2% on SourcePulse

GitHubView on GitHub
Project Summary

Squeeze-Evolve is an open-source framework that drastically cuts LLM inference costs using a verifier-free evolutionary approach and multi-model orchestration. It intelligently routes inference tasks to the most cost-effective model based on difficulty, aiming for equivalent or better accuracy at a fraction of the expense. This targets researchers and power users optimizing LLM deployments.

How It Works

The system employs an evolutionary loop to refine candidate solutions. Its core innovation is "fitness-based routing": candidate groups are scored for difficulty (using confidence or diversity proxies) and dynamically routed to tiered models—expensive for hard problems, cheaper for easy ones, and a lightweight aggregator for consensus. This parallel, adaptive routing optimizes resource utilization and minimizes overall inference cost.

Quick Start & Requirements

Installation: Clone with submodules (git clone --recurse-submodules), then uv sync --dev or pip install -e ".[dev]". Optional cloud storage (AWS, GCS) requires squeeze-evolve[aws] or squeeze-evolve[gcs]. A forked vllm with a custom confidence engine is installable via VLLM_USE_PRECOMPILED=1 uv pip install --editable external/vllm. CLI tools (squeeze-evolve-client, squeeze-evolve-server) and benchmarks (AIME 2025, HMMT 2025, GPQA-Diamond) are included. vllm implies GPU/CUDA requirements.

Highlighted Details

  • Verifier-free evolutionary framework for LLM inference scaling.
  • Multi-model orchestration with adaptive, fitness-based routing.
  • Custom vllm fork with GPU-accelerated confidence engine (4-10x lower scoring latency).
  • Pluggable storage: local, S3, GCS.
  • Extensible operator registry for custom fitness, selection, recombination, etc.
  • Pre-configured benchmarks for academic math and QA datasets.

Maintenance & Community

Described as "actively evolving research code" with ongoing productionization efforts. Contributions and feedback are welcomed via issues. No specific community channels or detailed contributor information are provided.

Licensing & Compatibility

Licensed under the Apache License 2.0, permissive for commercial use and closed-source integration.

Limitations & Caveats

As "actively evolving research code," users should expect potential instability or incomplete features. The forked vllm submodule may introduce build complexities. No explicit unsupported platforms or known bugs are detailed.

Health Check
Last Commit

1 month ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
267 stars in the last 30 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems") and Jeff Hammerbacher Jeff Hammerbacher(Cofounder of Cloudera).

LLMRouter by ulab-uiuc

1.2%
2k
Optimize LLM inference with intelligent routing
Created 7 months ago
Updated 2 weeks ago
Starred by Yineng Zhang Yineng Zhang(Inference Lead at SGLang; Research Scientist at Together AI), Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), and
9 more.

LightLLM by ModelTC

0.2%
4k
Python framework for LLM inference and serving
Created 2 years ago
Updated 11 hours ago
Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), Sebastian Raschka Sebastian Raschka(Author of "Build a Large Language Model (From Scratch)"), and
11 more.

optillm by algorithmicsuperintelligence

2.0%
4k
Optimizing inference proxy for LLMs
Created 1 year ago
Updated 2 weeks ago
Feedback? Help us improve.