Discover and explore top open-source AI tools and projects—updated daily.
SkyworkAIAccelerating Mixture-of-Experts models with zero-computation techniques
Top 99.0% on SourcePulse
MoE++ addresses the computational inefficiency of Mixture-of-Experts (MoE) models by introducing "zero-computation experts" and "gating residuals." This approach accelerates MoE methods, offering significant performance gains and higher throughput. It is targeted at researchers and engineers seeking to optimize large language models, providing a foundation for more efficient MoE architectures.
How It Works
MoE++ integrates three types of zero-computation experts: the zero expert (discard), copy expert (skip), and constant expert (replace). These experts require negligible computation, allowing for flexible allocation of computational resources. The system also employs gating residuals, which enable tokens to consider previous layer routing decisions when selecting experts. This mechanism facilitates reduced computation for simpler tokens and allows more complex tokens to utilize a greater number of experts, thereby enhancing overall performance and efficiency.
Quick Start & Requirements
Inference can be performed using the Hugging Face transformers library. The base model MoE++7B-Base is available at Chat-UniVi/MoE-Plus-Plus-7B. A Python snippet demonstrates loading the model and tokenizer, requiring trust_remote_code=True and device_map='auto' for potential multi-GPU utilization. Training code is built upon Skywork-MoE and will be released after approval; evaluation uses the Eleuther AI Language Model Evaluation Harness.
Highlighted Details
Maintenance & Community
The repository encourages users to watch for the latest updates. Links to GitHub issues are provided for tracking. Related projects like Skywork-MoE, MoH, and Chat-UniVi are also highlighted.
Licensing & Compatibility
The project is primarily licensed under Apache 2.0. However, it is designated as a research preview for non-commercial use only, subject to the LLaMA model license, OpenAI's data terms of use, and ShareGPT's privacy practices.
Limitations & Caveats
The chat model inference is marked as "Coming Soon." The release of training code is contingent on the open-sourcing of Skywork-MoE. The non-commercial use restriction is a significant caveat for adoption.
1 year ago
Inactive
microsoft
MoonshotAI
databricks
huggingface