Open MoE language model research paper
Top 44.0% on sourcepulse
OLMoE provides a fully open, state-of-the-art Mixture-of-Experts (MoE) language model with 1.3 billion active and 6.9 billion total parameters. It offers comprehensive resources including data, code, logs, and checkpoints for pretraining, supervised fine-tuning (SFT), and preference tuning (DPO/KTO), targeting researchers and developers working with large language models.
How It Works
OLMoE is built upon the OLMo framework, leveraging a Mixture-of-Experts architecture. This design allows for a significantly larger total parameter count while maintaining a smaller active parameter set during inference, leading to potentially more efficient computation and improved performance on complex tasks. The project emphasizes open access to all artifacts, enabling reproducibility and further research.
Quick Start & Requirements
pip install vllm
) or llama.cpp (requires downloading GGUF checkpoints). Transformers integration is available but noted as slower.pip install -e .
, pip install git+https://github.com/Muennighoff/megablocks.git@olmoe
), setting up a configuration file, and tokenizing data using dolma tokens
.open-instruct
and installing transformers
and torch
. Training commands utilize accelerate launch
with DeepSpeed for distributed training.Highlighted Details
Maintenance & Community
The project is associated with Allen Institute for AI (AI2). Further community engagement details are not explicitly provided in the README.
Licensing & Compatibility
The README does not explicitly state a license. However, given the association with Allen Institute for AI and the release of all artifacts, it is likely intended for research and non-commercial use, but commercial compatibility should be verified.
Limitations & Caveats
The transformers
implementation for inference is noted as slow. Reproducing specific experimental configurations, such as sparse upcycling or expert choice, requires careful adherence to detailed instructions and potentially specific code branches or PRs.
4 months ago
1 day