ArcticTraining by snowflakedb

LLM post-training acceleration framework

Created 1 year ago

277 stars

Top 93.7% on SourcePulse

View on GitHub

3 Experts Love This Project

Akshat Bubna

Cofounder of Modal

Luis Capelo

Cofounder of Lightning AI

Stas Bekman

Author of "Machine Learning Engineering Open Book"; Research Engineer at Snowflake

Project Summary

Summary

ArcticTraining is an open-source framework to simplify and accelerate LLM post-training. It addresses challenges like limited rapid prototyping support and lack of native synthetic data generation tools. By offering modular trainer designs, streamlined code, and integrated pipelines for synthetic data creation/cleaning, ArcticTraining empowers users to efficiently enhance LLM capabilities in code generation and complex reasoning, providing a flexible development experience.

How It Works

The framework emphasizes modularity and customization. Core components include modular trainer designs and simplified code structures for rapid iteration. A key differentiator is its integrated pipeline for native synthetic data generation and cleaning. Users can extend ArcticTraining by subclassing Trainer or SFTTrainer, allowing custom loss functions or training methodologies. This flexible design aims to boost LLM performance in tasks like code generation and complex reasoning with improved efficiency.

Quick Start & Requirements

Installation: pip install arctic-training. Training uses a YAML recipe file and the arctic_training CLI, leveraging DeepSpeed for distributed training. Customization involves modifying YAML or developing new trainers via subclassing. Further details are in the project's blog and documentation.

Highlighted Details

Arctic Long Sequence Training (ALST): Enables scalable training for LLMs handling multi-million token sequences.
Arctic Inference & Speculative Decoding: Integrates with vLLM for accelerated inference, featuring fast speculative decoding.
Arctic-Embed: Tools for simple, scalable training of embedding models.
SwiftKV: Accelerates enterprise LLM workloads via knowledge-preserving compute reduction.
DeepSpeed Integration: Seamlessly leverages DeepSpeed for robust distributed training.

Maintenance & Community

The project receives GPU CI funding from Modal. The README content lacks links to community channels (Discord, Slack) or a public roadmap.

Licensing & Compatibility

The specific open-source license and compatibility notes for commercial use or closed-source linking are not explicitly mentioned in the provided README text.

Limitations & Caveats

The current README content does not specify explicit limitations, unsupported platforms, or known bugs. Documentation focuses on features and extensions rather than constraints.

Health Check

Last Commit

1 day ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

3 stars in the last 30 days