ATLAS by itigges22

Boosts frozen LLM performance for efficient, self-hosted AI

Created 5 months ago

2,061 stars

Top 20.9% on SourcePulse

View on GitHub

3 Experts Love This Project

Wing Lian

Founder of Axolotl AI

Daniel Han

Cofounder of Unsloth

Dan Guido

Cofounder of Trail of Bits

Project Summary

Adaptive Test-time Learning and Autonomous Specialization (ATLAS) provides a self-hosted framework for running large language models locally, achieving competitive performance against frontier API models without fine-tuning or cloud reliance. It targets power users and researchers seeking cost-effective, private AI solutions on single consumer GPUs. The system wraps a frozen, quantized model within an intelligent infrastructure, enabling autonomous specialization and iterative refinement for complex tasks.

How It Works

ATLAS employs a multi-phase pipeline: Phase 1 generates candidate solutions using PlanSearch and BudgetForcing. Phase 2 scores and tests these candidates via a Geometric Lens (using self-embeddings for scoring) and sandbox execution. Tasks failing verification proceed to Phase 3, where the model generates its own test cases and iteratively refines solutions using PR-CoT (self-verified repair). This approach leverages a frozen, quantized model (e.g., Qwen3-14B-Q4_K_M) and avoids external API calls, data exfiltration, or usage metering, running entirely on local hardware.

Quick Start & Requirements

Primary install/run command: Clone the repo, copy atlas.conf.example to atlas.conf (setting MODEL_PATH, DATA_DIR, GPU), run sudo ./scripts/install.sh, verify with ./scripts/verify-install.sh, and execute benchmarks with python3 benchmark/v3_runner.py.
Prerequisites: Minimum 16 GB GPU VRAM, 14 GB System RAM, Python 3.10+. Tested on RTX 5060 Ti 16GB, RHEL 9/Ubuntu 24, CUDA 12.8.
Links: Full installation: docs/SETUP.md.

Highlighted Details

Achieves 74.6% LiveCodeBench pass@1-v(k=3) on a frozen 14B model using a single RTX 5060 Ti 16GB GPU.
Estimated cost of ~$0.004 per task, primarily local electricity, significantly cheaper than API alternatives.
Fully self-hosted: no data leaves the machine, no API keys required.
The pipeline includes structured generation, energy-based verification, and self-verified iterative repair.

Maintenance & Community

No specific community links (Discord/Slack) or details on notable contributors/sponsorships are provided in the README.

Licensing & Compatibility

Licensed under the A.T.L.A.S Source Available License v1.0. This license may have restrictions on commercial use or redistribution; consult the LICENSE file for specifics.

Limitations & Caveats

The current V3.0 release is primarily optimized for LiveCodeBench, with other benchmarks (GPQA, SciCode) requiring further tuning for cross-domain generalization. The Geometric Lens candidate discrimination is limited by an undertrained scoring model, and the G(x) metric tensor is currently dormant or undergoing redesign. The task pipeline is single-threaded, and a stdio handling bug exists in the SandboxAdapter. V3.1 is planned to address these limitations, including model swaps, pipeline redesigns, and parallel task execution.

Health Check

Last Commit

2 days ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

32 stars in the last 30 days