Full-parameter fine-tuning guide for DeepSeek-V3/R1 671B
Top 48.0% on sourcepulse
This repository provides a comprehensive guide and open-source solution for full parameter fine-tuning of the DeepSeek-V3/R1 671B large language model. It targets researchers and engineers aiming to adapt this powerful model for specific tasks, offering complete code from training to inference, along with practical insights and troubleshooting advice.
How It Works
The project leverages an extended xtuner
framework, incorporating data parallelism (DeepSpeed ZeRO) and sequence parallelism (SP) to enable efficient full parameter fine-tuning of the 671B model. It implements custom modeling logic based on the DeepSeek-V3 paper and DeepSeek-V2 architecture, facilitating adaptation for reasoning tasks with a structured data format that supports multi-turn dialogues and selective loss calculation.
Quick Start & Requirements
conda
environment, and installation via pip install -r requirements.txt
. Core xtuner
code for DeepseekV3ForCausalLM needs to be manually copied.sft_deepseek.py
for configuration and sft_deepseek.sh
as a startup script, requiring manual adjustment of NODE_RANK
per machine.pdsh
based multi-node setups.Highlighted Details
Maintenance & Community
Developed jointly by the Institute of Automation of the Chinese Academy of Sciences and Beijing Wenge Technology Co. Ltd.
Licensing & Compatibility
Licensed under Apache-2.0. Compatible with commercial use.
Limitations & Caveats
The setup is resource-intensive, requiring substantial GPU, memory, and storage. The manual code overwriting step in xtuner
might be fragile across different xtuner
versions. Training requires careful configuration of distributed execution across multiple nodes.
4 months ago
1 day