OLMoE  by allenai

Open MoE language model research paper

created 1 year ago
823 stars

Top 44.0% on sourcepulse

GitHubView on GitHub
1 Expert Loves This Project
Project Summary

OLMoE provides a fully open, state-of-the-art Mixture-of-Experts (MoE) language model with 1.3 billion active and 6.9 billion total parameters. It offers comprehensive resources including data, code, logs, and checkpoints for pretraining, supervised fine-tuning (SFT), and preference tuning (DPO/KTO), targeting researchers and developers working with large language models.

How It Works

OLMoE is built upon the OLMo framework, leveraging a Mixture-of-Experts architecture. This design allows for a significantly larger total parameter count while maintaining a smaller active parameter set during inference, leading to potentially more efficient computation and improved performance on complex tasks. The project emphasizes open access to all artifacts, enabling reproducibility and further research.

Quick Start & Requirements

  • Inference: Recommended via vLLM (pip install vllm) or llama.cpp (requires downloading GGUF checkpoints). Transformers integration is available but noted as slower.
  • Pretraining: Requires cloning the OLMo repository, installing dependencies (pip install -e ., pip install git+https://github.com/Muennighoff/megablocks.git@olmoe), setting up a configuration file, and tokenizing data using dolma tokens.
  • Adaptation (SFT/DPO/KTO): Requires cloning open-instruct and installing transformers and torch. Training commands utilize accelerate launch with DeepSpeed for distributed training.
  • Hardware: GPU acceleration is essential for efficient operation, particularly for training and inference.

Highlighted Details

  • State-of-the-art Mixture-of-Experts model with 1.3B active / 6.9B total parameters.
  • Full release of pretraining, SFT, and DPO/KTO checkpoints, data, and logs.
  • Integration with popular inference engines: vLLM, SGLang, llama.cpp, and Hugging Face Transformers.
  • Detailed instructions for pretraining, adaptation, and evaluation, including sparse upcycling and expert choice implementations.

Maintenance & Community

The project is associated with Allen Institute for AI (AI2). Further community engagement details are not explicitly provided in the README.

Licensing & Compatibility

The README does not explicitly state a license. However, given the association with Allen Institute for AI and the release of all artifacts, it is likely intended for research and non-commercial use, but commercial compatibility should be verified.

Limitations & Caveats

The transformers implementation for inference is noted as slow. Reproducing specific experimental configurations, such as sparse upcycling or expert choice, requires careful adherence to detailed instructions and potentially specific code branches or PRs.

Health Check
Last commit

4 months ago

Responsiveness

1 day

Pull Requests (30d)
1
Issues (30d)
0
Star History
91 stars in the last 90 days

Explore Similar Projects

Starred by Jeremy Howard Jeremy Howard(Cofounder of fast.ai) and Stas Bekman Stas Bekman(Author of Machine Learning Engineering Open Book; Research Engineer at Snowflake).

SwissArmyTransformer by THUDM

0.3%
1k
Transformer library for flexible model development
created 3 years ago
updated 7 months ago
Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), Jeff Hammerbacher Jeff Hammerbacher(Cofounder of Cloudera), and
10 more.

open-r1 by huggingface

0.2%
25k
SDK for reproducing DeepSeek-R1
created 6 months ago
updated 3 days ago
Feedback? Help us improve.