Discover and explore top open-source AI tools and projects—updated daily.
Avarok-CybersecurityHigh-performance LLM inference engine in pure Rust
New!
Top 72.1% on SourcePulse
Atlas is a pure Rust LLM inference engine designed to provide high-performance, stable, and cost-effective local inference, addressing the dependency hell and ecosystem instability common in Python-based engines. It targets engineers and researchers seeking to run powerful LLMs locally without premium cloud API costs, offering significant speedups through hardware-specific optimizations and advanced techniques.
How It Works
Atlas employs a monorepo architecture in Rust for enhanced stability and community contribution. Its core innovation lies in hardware- and model-specific kernels, meticulously tuned for each combination to maximize performance, reportedly achieving 2-3x speedups. The system features a plug-and-play design with well-defined abstraction boundaries (traits) for models, layers, GPU backends, communication, and storage, enabling modularity and extensibility. An HTTP server interfaces with a scheduler that orchestrates batched decoding, speculative execution, and sampling, dispatching computations to hardware-specific CUDA kernels.
Quick Start & Requirements
The project provides a Docker image (avarok/atlas-gb10:latest) pre-compiled for NVIDIA GB10 hardware (SM121).
docker pull avarok/atlas-gb10:latestsudo docker run -d --name atlas \
--network host --gpus all --ipc=host \
-v ~/.cache/huggingface:/root/.cache/huggingface \
avarok/atlas-gb10:latest \
serve Qwen/Qwen3.6-35B-A3B-FP8 \
--port 8888 \
--max-seq-len 65536 \
--kv-cache-dtype fp8 \
--gpu-memory-utilization 0.90 \
--speculative
QUICKSTART.md and CONTRIBUTING.md.Highlighted Details
--high-speed-swap) for long contexts.Maintenance & Community
Atlas emphasizes a "Community-First" philosophy, encouraging contributions via its Discord server. The monorepo design aims to facilitate meaningful PRs, including AI-generated ones. The project actively integrates research from papers and welcomes community efforts to expand hardware and model support.
Licensing & Compatibility
The project uses a dual-license model:
Limitations & Caveats
The primary limitation is that the provided Docker image is pre-compiled and optimized specifically for NVIDIA GB10 hardware. While the architecture is designed for extensibility, adding support for other hardware (e.g., AMD, Apple Silicon, Intel) or new models requires significant community or commercial effort. Some advanced KV cache quantization methods (e.g., turbo3) are noted as experimental.
1 day ago
Inactive