ATLAS  by itigges22

Boosts frozen LLM performance for efficient, self-hosted AI

Created 1 month ago
933 stars

Top 39.0% on SourcePulse

GitHubView on GitHub
Project Summary

Adaptive Test-time Learning and Autonomous Specialization (ATLAS) provides a self-hosted framework for running large language models locally, achieving competitive performance against frontier API models without fine-tuning or cloud reliance. It targets power users and researchers seeking cost-effective, private AI solutions on single consumer GPUs. The system wraps a frozen, quantized model within an intelligent infrastructure, enabling autonomous specialization and iterative refinement for complex tasks.

How It Works

ATLAS employs a multi-phase pipeline: Phase 1 generates candidate solutions using PlanSearch and BudgetForcing. Phase 2 scores and tests these candidates via a Geometric Lens (using self-embeddings for scoring) and sandbox execution. Tasks failing verification proceed to Phase 3, where the model generates its own test cases and iteratively refines solutions using PR-CoT (self-verified repair). This approach leverages a frozen, quantized model (e.g., Qwen3-14B-Q4_K_M) and avoids external API calls, data exfiltration, or usage metering, running entirely on local hardware.

Quick Start & Requirements

  • Primary install/run command: Clone the repo, copy atlas.conf.example to atlas.conf (setting MODEL_PATH, DATA_DIR, GPU), run sudo ./scripts/install.sh, verify with ./scripts/verify-install.sh, and execute benchmarks with python3 benchmark/v3_runner.py.
  • Prerequisites: Minimum 16 GB GPU VRAM, 14 GB System RAM, Python 3.10+. Tested on RTX 5060 Ti 16GB, RHEL 9/Ubuntu 24, CUDA 12.8.
  • Links: Full installation: docs/SETUP.md.

Highlighted Details

  • Achieves 74.6% LiveCodeBench pass@1-v(k=3) on a frozen 14B model using a single RTX 5060 Ti 16GB GPU.
  • Estimated cost of ~$0.004 per task, primarily local electricity, significantly cheaper than API alternatives.
  • Fully self-hosted: no data leaves the machine, no API keys required.
  • The pipeline includes structured generation, energy-based verification, and self-verified iterative repair.

Maintenance & Community

No specific community links (Discord/Slack) or details on notable contributors/sponsorships are provided in the README.

Licensing & Compatibility

Licensed under the A.T.L.A.S Source Available License v1.0. This license may have restrictions on commercial use or redistribution; consult the LICENSE file for specifics.

Limitations & Caveats

The current V3.0 release is primarily optimized for LiveCodeBench, with other benchmarks (GPQA, SciCode) requiring further tuning for cross-domain generalization. The Geometric Lens candidate discrimination is limited by an undertrained scoring model, and the G(x) metric tensor is currently dormant or undergoing redesign. The task pipeline is single-threaded, and a stdio handling bug exists in the SandboxAdapter. V3.1 is planned to address these limitations, including model swaps, pipeline redesigns, and parallel task execution.

Health Check
Last Commit

2 days ago

Responsiveness

Inactive

Pull Requests (30d)
1
Issues (30d)
7
Star History
1,017 stars in the last 30 days

Explore Similar Projects

Starred by Peter Norvig Peter Norvig(Author of "Artificial Intelligence: A Modern Approach"; Research Director at Google), Vincent Weisser Vincent Weisser(Cofounder of Prime Intellect), and
3 more.

dgm by jennyzzt

2.2%
2k
Self-improving agent system
Created 10 months ago
Updated 7 months ago
Feedback? Help us improve.