Discover and explore top open-source AI tools and projects—updated daily.
inclusionAIFine-tuning framework for large language models
Top 99.0% on SourcePulse
Summary
dFactory offers an easy and efficient framework for fine-tuning large language models (LLMs), particularly Mixture-of-Experts (MoE) architectures. It targets engineers and researchers seeking to customize LLMs, providing significant performance benefits through optimized weight handling and integrated fine-tuning methods.
How It Works
The core innovation is its efficient management of MoE models via a "merged-expert" weight format. This consolidates individual expert weights into single tensors, enabling substantial speedups through batched matrix multiplication on GPUs. dFactory includes utilities (moe_convertor.py) to convert models between the standard Hugging Face "separate-expert" format and this optimized "merged-expert" format, facilitating both training and inference. It supports continuous supervised fine-tuning (SFT) with methods like block-diffusion and full attention.
Quick Start & Requirements
Installation is recommended via uv (uv sync --extra gpu) or pip. The process involves cloning the repo, setting up the environment, downloading base model weights, and converting them to the "merged-expert" format using provided scripts (./scripts/download_hf_model.py, scripts/moe_convertor.py --mode merge). Training data requires preparation (e.g., ./scripts/build_gsm8k_dataset.py). Fine-tuning starts by modifying configuration files (e.g., configs/sft/llada2_mini_bd_sft.yaml) and running the train.sh script. A tutorial is available at https://inclusionai.github.io/dFactory/.
Highlighted Details
Maintenance & Community
The project is actively developed, with a roadmap including comprehensive documentation and trainable parallel decoding. No specific community channels are listed.
Licensing & Compatibility
Licensed under the Apache 2.0 license, allowing broad compatibility with commercial and closed-source applications.
Limitations & Caveats
Comprehensive documentation is still in progress. Features like trainable parallel decoding are planned for future releases. The workflow requires explicit steps for converting model weights between separate and merged formats, adding complexity to setup and inference.
2 months ago
Inactive
XueFuzhao
allenai
IDEA-CCNL
thinking-machines-lab