Discover and explore top open-source AI tools and projects—updated daily.
deepseek-aiFine-tuning method for Mixture-of-Experts (MoE) LLMs
Top 48.3% on SourcePulse
Expert-Specialized Fine-Tuning (ESFT) offers an efficient method for customizing Mixture-of-Experts (MoE) Large Language Models (LLMs). It targets researchers and practitioners aiming to adapt LLMs for specific tasks with reduced computational resources and storage, by selectively fine-tuning only task-relevant expert components.
How It Works
ESFT employs a multi-stage process. First, it evaluates the performance of individual experts on various datasets to identify their strengths. Based on these scores, it generates a specialized configuration that directs the model to utilize the most suitable experts for a given task. Finally, it fine-tunes the LLM using this configuration, enabling efficient adaptation by focusing updates on the identified expert pathways. This approach optimizes resource usage and performance by avoiding full model fine-tuning.
Quick Start & Requirements
pip install transformers torch safetensors acceleratebash scripts/download_adapters.shHighlighted Details
train_ep.py.Maintenance & Community
Licensing & Compatibility
Limitations & Caveats
The project is recently released with code available from August 2024, and the todo list indicates ongoing development. Specific hardware requirements for multi-GPU training (e.g., world_size, gpus_per_rank) are detailed in scripts but not summarized upfront.
5 months ago
1 week
ByteDance-Seed
Leeroo-AI
XueFuzhao
thinking-machines-lab
mosaicml