Discover and explore top open-source AI tools and projects—updated daily.
radixarkEnterprise RL for large-scale MoE models
Top 38.0% on SourcePulse
A reinforcement learning framework for large-scale MoE post-training and production workloads, Miles addresses the need for stable, controllable RL on new hardware and in production environments. Forked from and co-evolving with slime, it targets enterprise users and researchers seeking repeatable, auditable, and high-stakes experimentation with large models.
How It Works
Miles inherits slime's modular, decoupled architecture, separating training (Megatron), rollout/sample generation (SGLang + router), and data management (Data Buffer). This design allows independent scaling and customization of training and rollout engines, facilitating algorithm swaps without touching core code. It leverages advanced techniques like FlashAttention-3, DeepGEMM, batch-invariant kernels, and torch.compile for performance and numerical alignment between training and inference.
Quick Start & Requirements
Miles is under active development, with commands and examples subject to change. Users are directed to a "Quick Start Guide" and provided "examples" for environment setup, data preparation, and training startup. Support for specific hardware like "GB300" is highlighted. A pre-commit hook is mentioned, suggesting a Python development environment. Official documentation links are pending.
Highlighted Details
Maintenance & Community
Contributions are welcomed, particularly for new hardware backends, MoE RL recipes, stability improvements, and multimodal/speculative training use cases. Links to the slime GitHub repository are provided. Specific community channels or maintainer details are not detailed in the README.
Licensing & Compatibility
The provided README does not specify a software license. Compatibility for commercial use or closed-source linking cannot be determined without this information.
Limitations & Caveats
The project is explicitly noted as being "under active development," with potential for evolving commands and examples. Comprehensive documentation for FAQs and developer guides is indicated as "coming soon."
21 hours ago
Inactive
XueFuzhao
catalyst-team
PaddlePaddle