Open-source MoE LLM for research
Top 27.2% on sourcepulse
OpenMoE provides a family of open-sourced Mixture-of-Experts (MoE) Large Language Models, aiming to foster community research in this promising area. The project offers fully shared training data, strategies, architecture, and weights, targeting researchers and developers interested in MoE LLMs.
How It Works
OpenMoE models are based on a decoder-only architecture, a departure from the encoder-decoder ST-MoE. They utilize a modified UL2 training objective initially, transitioning to next-token prediction for later stages. Key components include RoPE embeddings, SwiGLU activations, and a 2K context length. The approach emphasizes sharing intermediate checkpoints to facilitate the study of MoE training dynamics.
Quick Start & Requirements
ColossalAI
(forked version) and transformers
. Install via pip install ./ColossalAI
and pip install -r ./ColossalAI/examples/language/openmoe/requirements.txt
. Inference example provided using transformers
library.Highlighted Details
Maintenance & Community
The project is driven by a student team, with active development noted in recent news. Links to Discord and Twitter are provided for community engagement.
Licensing & Compatibility
Code is licensed under Apache 2.0. Model usage is subject to the licenses of the RedPajama and The Stack datasets.
Limitations & Caveats
The README notes potential convergence issues with the current GPU training implementation (referencing GitHub issues #5163, #5212) and states that the OpenMoE-base
model is for debugging only and not suitable for practical applications.
1 year ago
1 day