OLMoE  by allenai

Open MoE language model research paper

Created 1 year ago
865 stars

Top 41.5% on SourcePulse

GitHubView on GitHub
1 Expert Loves This Project
Project Summary

OLMoE provides a fully open, state-of-the-art Mixture-of-Experts (MoE) language model with 1.3 billion active and 6.9 billion total parameters. It offers comprehensive resources including data, code, logs, and checkpoints for pretraining, supervised fine-tuning (SFT), and preference tuning (DPO/KTO), targeting researchers and developers working with large language models.

How It Works

OLMoE is built upon the OLMo framework, leveraging a Mixture-of-Experts architecture. This design allows for a significantly larger total parameter count while maintaining a smaller active parameter set during inference, leading to potentially more efficient computation and improved performance on complex tasks. The project emphasizes open access to all artifacts, enabling reproducibility and further research.

Quick Start & Requirements

  • Inference: Recommended via vLLM (pip install vllm) or llama.cpp (requires downloading GGUF checkpoints). Transformers integration is available but noted as slower.
  • Pretraining: Requires cloning the OLMo repository, installing dependencies (pip install -e ., pip install git+https://github.com/Muennighoff/megablocks.git@olmoe), setting up a configuration file, and tokenizing data using dolma tokens.
  • Adaptation (SFT/DPO/KTO): Requires cloning open-instruct and installing transformers and torch. Training commands utilize accelerate launch with DeepSpeed for distributed training.
  • Hardware: GPU acceleration is essential for efficient operation, particularly for training and inference.

Highlighted Details

  • State-of-the-art Mixture-of-Experts model with 1.3B active / 6.9B total parameters.
  • Full release of pretraining, SFT, and DPO/KTO checkpoints, data, and logs.
  • Integration with popular inference engines: vLLM, SGLang, llama.cpp, and Hugging Face Transformers.
  • Detailed instructions for pretraining, adaptation, and evaluation, including sparse upcycling and expert choice implementations.

Maintenance & Community

The project is associated with Allen Institute for AI (AI2). Further community engagement details are not explicitly provided in the README.

Licensing & Compatibility

The README does not explicitly state a license. However, given the association with Allen Institute for AI and the release of all artifacts, it is likely intended for research and non-commercial use, but commercial compatibility should be verified.

Limitations & Caveats

The transformers implementation for inference is noted as slow. Reproducing specific experimental configurations, such as sparse upcycling or expert choice, requires careful adherence to detailed instructions and potentially specific code branches or PRs.

Health Check
Last Commit

6 months ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
1
Star History
25 stars in the last 30 days

Explore Similar Projects

Starred by Yineng Zhang Yineng Zhang(Inference Lead at SGLang; Research Scientist at Together AI).

dots.llm1 by rednote-hilab

0.2%
462
MoE model for research
Created 4 months ago
Updated 4 weeks ago
Starred by Shizhe Diao Shizhe Diao(Author of LMFlow; Research Scientist at NVIDIA), Yineng Zhang Yineng Zhang(Inference Lead at SGLang; Research Scientist at Together AI), and
8 more.

EAGLE by SafeAILab

10.6%
2k
Speculative decoding research paper for faster LLM inference
Created 1 year ago
Updated 1 week ago
Starred by Junyang Lin Junyang Lin(Core Maintainer at Alibaba Qwen), Hanlin Tang Hanlin Tang(CTO Neural Networks at Databricks; Cofounder of MosaicML), and
5 more.

dbrx by databricks

0%
3k
Large language model for research/commercial use
Created 1 year ago
Updated 1 year ago
Feedback? Help us improve.