Discover and explore top open-source AI tools and projects—updated daily.
radixarkEnterprise RL for large-scale MoE models
Top 83.3% on SourcePulse
A reinforcement learning framework for large-scale MoE post-training and production workloads, Miles addresses the need for stable, controllable RL on new hardware and in production environments. Forked from and co-evolving with slime, it targets enterprise users and researchers seeking repeatable, auditable, and high-stakes experimentation with large models.
How It Works
Miles inherits slime's modular, decoupled architecture, separating training (Megatron), rollout/sample generation (SGLang + router), and data management (Data Buffer). This design allows independent scaling and customization of training and rollout engines, facilitating algorithm swaps without touching core code. It leverages advanced techniques like FlashAttention-3, DeepGEMM, batch-invariant kernels, and torch.compile for performance and numerical alignment between training and inference.
Quick Start & Requirements
Miles is under active development, with commands and examples subject to change. Users are directed to a "Quick Start Guide" and provided "examples" for environment setup, data preparation, and training startup. Support for specific hardware like "GB300" is highlighted. A pre-commit hook is mentioned, suggesting a Python development environment. Official documentation links are pending.
Highlighted Details
Maintenance & Community
Contributions are welcomed, particularly for new hardware backends, MoE RL recipes, stability improvements, and multimodal/speculative training use cases. Links to the slime GitHub repository are provided. Specific community channels or maintainer details are not detailed in the README.
Licensing & Compatibility
The provided README does not specify a software license. Compatibility for commercial use or closed-source linking cannot be determined without this information.
Limitations & Caveats
The project is explicitly noted as being "under active development," with potential for evolving commands and examples. Comprehensive documentation for FAQs and developer guides is indicated as "coming soon."
1 day ago
Inactive
ByteDance-Seed
XueFuzhao
catalyst-team
PaddlePaddle