flame  by fla-org

Minimal, efficient framework for LLM training

Created 9 months ago
263 stars

Top 97.0% on SourcePulse

GitHubView on GitHub
1 Expert Loves This Project
Project Summary

Summary

Flame is a minimal, efficient training framework built on torchtitan for scaling large language models (LLMs). It targets engineers and researchers seeking high performance and ease of use, offering features like zero-cost data preprocessing and advanced parallelism for faster LLM development.

How It Works

Flame leverages torchtitan to provide a streamlined training experience. Its core design emphasizes efficiency through zero-cost data preprocessing, including online tokenization and dataset shuffling, and supports multiple datasets. The framework is built for scalability, with features like 4D parallelism planned for future releases, aiming to accelerate LLM training pipelines.

Quick Start & Requirements

Installation involves cloning the repository and running pip install .. Key dependencies include specific versions of flash-linear-attention and torchtitan (commit 0b44d4c). Dataset preparation utilizes the datasets library for loading, such as HuggingFaceFW/fineweb-edu. Training is initiated via bash train.sh, configurable with numerous command-line arguments. Recommended for torch.compile usage are torch>=2.6 and triton>=3.0. Multi-node training is supported, with environment variables like MASTER_ADDR and MASTER_PORT needing configuration or handled by job schedulers.

Highlighted Details

  • Zero-Cost Data Preprocessing: Enables online tokenization, dataset shuffling, and support for multiple datasets without upfront processing costs.
  • Variable-Length Training: Utilizes --training.varlen to pack variable-length documents into fixed sequences, eliminating padding and improving efficiency.
  • torch.compile Integration: Supports PyTorch 2.0+ compilation via --training.compile for potential speedups, though potential conflicts with fused kernels exist.
  • Advanced Parallelism: Features include support for tensor parallelism, pipeline parallelism (requiring manual split point specification), and planned 4D parallelism.
  • Checkpointing & Conversion: Manages distributed checkpoints (DCPs) and provides scripts to convert between DCP and Hugging Face formats for seamless training resumption and model sharing.
  • Float8 Support: Integrates Float8 precision via torchao for potential memory and speed benefits.

Maintenance & Community

The provided README does not detail specific community channels (e.g., Discord, Slack), active maintainers beyond the authors listed in the citation, or sponsorship information.

Licensing & Compatibility

The repository's license is not specified in the provided README content. This lack of information presents an adoption blocker, particularly for commercial use or integration into closed-source projects.

Limitations & Caveats

The integration of torch.compile may encounter conflicts with Flame's fused kernels, requiring up-to-date dependencies. Dataset streaming can be unstable due to network dependencies; local downloads are recommended for reliable training. 4D parallelism is listed as "coming soon." Pipeline parallelism requires manual definition of split points. The absence of explicit licensing information is a significant caveat.

Health Check
Last Commit

1 month ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
3
Star History
25 stars in the last 30 days

Explore Similar Projects

Starred by Andrej Karpathy Andrej Karpathy(Founder of Eureka Labs; Formerly at Tesla, OpenAI; Author of CS 231n), Pawel Garbacki Pawel Garbacki(Cofounder of Fireworks AI), and
14 more.

torchtitan by pytorch

0.6%
5k
PyTorch platform for generative AI model training research
Created 1 year ago
Updated 23 hours ago
Starred by Jeff Hammerbacher Jeff Hammerbacher(Cofounder of Cloudera), Stas Bekman Stas Bekman(Author of "Machine Learning Engineering Open Book"; Research Engineer at Snowflake), and
25 more.

gpt-neox by EleutherAI

0.1%
7k
Framework for training large-scale autoregressive language models
Created 4 years ago
Updated 2 weeks ago
Starred by Tobi Lutke Tobi Lutke(Cofounder of Shopify), Li Jiang Li Jiang(Coauthor of AutoGen; Engineer at Microsoft), and
27 more.

ColossalAI by hpcaitech

0.0%
41k
AI system for large-scale parallel training
Created 4 years ago
Updated 1 day ago
Feedback? Help us improve.