squeeze-evolve by squeeze-evolve

Multi-model orchestration for verifier-free evolutionary LLM scaling

Created 1 month ago

261 stars

Top 97.2% on SourcePulse

Project Summary

Squeeze-Evolve is an open-source framework that drastically cuts LLM inference costs using a verifier-free evolutionary approach and multi-model orchestration. It intelligently routes inference tasks to the most cost-effective model based on difficulty, aiming for equivalent or better accuracy at a fraction of the expense. This targets researchers and power users optimizing LLM deployments.

How It Works

The system employs an evolutionary loop to refine candidate solutions. Its core innovation is "fitness-based routing": candidate groups are scored for difficulty (using confidence or diversity proxies) and dynamically routed to tiered models—expensive for hard problems, cheaper for easy ones, and a lightweight aggregator for consensus. This parallel, adaptive routing optimizes resource utilization and minimizes overall inference cost.

Quick Start & Requirements

Installation: Clone with submodules (git clone --recurse-submodules), then uv sync --dev or pip install -e ".[dev]". Optional cloud storage (AWS, GCS) requires squeeze-evolve[aws] or squeeze-evolve[gcs]. A forked vllm with a custom confidence engine is installable via VLLM_USE_PRECOMPILED=1 uv pip install --editable external/vllm. CLI tools (squeeze-evolve-client, squeeze-evolve-server) and benchmarks (AIME 2025, HMMT 2025, GPQA-Diamond) are included. vllm implies GPU/CUDA requirements.

Highlighted Details

Verifier-free evolutionary framework for LLM inference scaling.
Multi-model orchestration with adaptive, fitness-based routing.
Custom vllm fork with GPU-accelerated confidence engine (4-10x lower scoring latency).
Pluggable storage: local, S3, GCS.
Extensible operator registry for custom fitness, selection, recombination, etc.
Pre-configured benchmarks for academic math and QA datasets.

Maintenance & Community

Described as "actively evolving research code" with ongoing productionization efforts. Contributions and feedback are welcomed via issues. No specific community channels or detailed contributor information are provided.

Licensing & Compatibility

Licensed under the Apache License 2.0, permissive for commercial use and closed-source integration.

Limitations & Caveats

As "actively evolving research code," users should expect potential instability or incomplete features. The forked vllm submodule may introduce build complexities. No explicit unsupported platforms or known bugs are detailed.

squeeze-evolve by squeeze-evolve

Explore Similar Projects

OmniInfer by omnimind-ai

JustRL by thunlp

ThinkMesh by martianlantern

Kolosal by KolosalAI

llmaz by InftyAI

deepconf by facebookresearch

vidur by microsoft

LLM-VM by anarchy-ai

guidellm by vllm-project

LLMRouter by ulab-uiuc

LightLLM by ModelTC

optillm by algorithmicsuperintelligence