OLMoE by allenai

Open MoE language model research paper

Created 1 year ago

948 stars

Top 38.6% on SourcePulse

View on GitHub

3 Experts Love This Project

Chip Huyen

Author of "AI Engineering", "Designing Machine Learning Systems"

Ji Yichao

Cofounder of Manus

Nathan Lambert

Research Scientist at AI2

Project Summary

OLMoE provides a fully open, state-of-the-art Mixture-of-Experts (MoE) language model with 1.3 billion active and 6.9 billion total parameters. It offers comprehensive resources including data, code, logs, and checkpoints for pretraining, supervised fine-tuning (SFT), and preference tuning (DPO/KTO), targeting researchers and developers working with large language models.

How It Works

OLMoE is built upon the OLMo framework, leveraging a Mixture-of-Experts architecture. This design allows for a significantly larger total parameter count while maintaining a smaller active parameter set during inference, leading to potentially more efficient computation and improved performance on complex tasks. The project emphasizes open access to all artifacts, enabling reproducibility and further research.

Quick Start & Requirements

Inference: Recommended via vLLM (pip install vllm) or llama.cpp (requires downloading GGUF checkpoints). Transformers integration is available but noted as slower.
Pretraining: Requires cloning the OLMo repository, installing dependencies (pip install -e ., pip install git+https://github.com/Muennighoff/megablocks.git@olmoe), setting up a configuration file, and tokenizing data using dolma tokens.
Adaptation (SFT/DPO/KTO): Requires cloning open-instruct and installing transformers and torch. Training commands utilize accelerate launch with DeepSpeed for distributed training.
Hardware: GPU acceleration is essential for efficient operation, particularly for training and inference.

Highlighted Details

State-of-the-art Mixture-of-Experts model with 1.3B active / 6.9B total parameters.
Full release of pretraining, SFT, and DPO/KTO checkpoints, data, and logs.
Integration with popular inference engines: vLLM, SGLang, llama.cpp, and Hugging Face Transformers.
Detailed instructions for pretraining, adaptation, and evaluation, including sparse upcycling and expert choice implementations.

Maintenance & Community

The project is associated with Allen Institute for AI (AI2). Further community engagement details are not explicitly provided in the README.

Licensing & Compatibility

The README does not explicitly state a license. However, given the association with Allen Institute for AI and the release of all artifacts, it is likely intended for research and non-commercial use, but commercial compatibility should be verified.

Limitations & Caveats

The transformers implementation for inference is noted as slow. Reproducing specific experimental configurations, such as sparse upcycling or expert choice, requires careful adherence to detailed instructions and potentially specific code branches or PRs.

Health Check

Last Commit

3 months ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

24 stars in the last 30 days