ESFT  by deepseek-ai

Fine-tuning method for Mixture-of-Experts (MoE) LLMs

created 1 year ago
654 stars

Top 52.0% on sourcepulse

GitHubView on GitHub
Project Summary

Expert-Specialized Fine-Tuning (ESFT) offers an efficient method for customizing Mixture-of-Experts (MoE) Large Language Models (LLMs). It targets researchers and practitioners aiming to adapt LLMs for specific tasks with reduced computational resources and storage, by selectively fine-tuning only task-relevant expert components.

How It Works

ESFT employs a multi-stage process. First, it evaluates the performance of individual experts on various datasets to identify their strengths. Based on these scores, it generates a specialized configuration that directs the model to utilize the most suitable experts for a given task. Finally, it fine-tunes the LLM using this configuration, enabling efficient adaptation by focusing updates on the identified expert pathways. This approach optimizes resource usage and performance by avoiding full model fine-tuning.

Quick Start & Requirements

  • Install dependencies: pip install transformers torch safetensors accelerate
  • Download adapters: bash scripts/download_adapters.sh
  • Requires Python 3.x, PyTorch, Transformers, Safetensors, and Accelerate.
  • Evaluation and training scripts are provided.
  • Official paper: https://arxiv.org/abs/2407.01906

Highlighted Details

  • Enables efficient customization of MoE LLMs.
  • Reduces resource and storage requirements compared to full fine-tuning.
  • Selectively fine-tunes task-relevant expert components.
  • Supports multi-GPU training with train_ep.py.

Maintenance & Community

  • Project accepted to EMNLP 2024 Main Conference.
  • Code released August 11, 2024.
  • Support via GitHub Issues.
  • Todo list indicates ongoing development for models, scripts, and features.

Licensing & Compatibility

  • The README does not explicitly state a license. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

The project is recently released with code available from August 2024, and the todo list indicates ongoing development. Specific hardware requirements for multi-GPU training (e.g., world_size, gpus_per_rank) are detailed in scripts but not summarized upfront.

Health Check
Last commit

2 months ago

Responsiveness

1 week

Pull Requests (30d)
0
Issues (30d)
1
Star History
54 stars in the last 90 days

Explore Similar Projects

Starred by Stas Bekman Stas Bekman(Author of Machine Learning Engineering Open Book; Research Engineer at Snowflake) and Travis Fischer Travis Fischer(Founder of Agentic).

lingua by facebookresearch

0.1%
5k
LLM research codebase for training and inference
created 9 months ago
updated 2 weeks ago
Starred by Ross Taylor Ross Taylor(Cofounder of General Reasoning; Creator of Papers with Code), Daniel Han Daniel Han(Cofounder of Unsloth), and
4 more.

open-instruct by allenai

0.2%
3k
Training codebase for instruction-following language models
created 2 years ago
updated 21 hours ago
Feedback? Help us improve.