DeepSeek-671B-SFT-Guide by ScienceOne-AI

Full-parameter fine-tuning guide for DeepSeek-V3/R1 671B

Created 10 months ago

791 stars

Top 44.5% on SourcePulse

Project Summary

This repository provides a comprehensive guide and open-source solution for full parameter fine-tuning of the DeepSeek-V3/R1 671B large language model. It targets researchers and engineers aiming to adapt this powerful model for specific tasks, offering complete code from training to inference, along with practical insights and troubleshooting advice.

How It Works

The project leverages an extended xtuner framework, incorporating data parallelism (DeepSpeed ZeRO) and sequence parallelism (SP) to enable efficient full parameter fine-tuning of the 671B model. It implements custom modeling logic based on the DeepSeek-V3 paper and DeepSeek-V2 architecture, facilitating adaptation for reasoning tasks with a structured data format that supports multi-turn dialogues and selective loss calculation.

Quick Start & Requirements

Installation: Requires Python 3.10+, conda environment, and installation via pip install -r requirements.txt. Core xtuner code for DeepseekV3ForCausalLM needs to be manually copied.
Hardware: Minimum 8 x NVIDIA H100 80GB GPUs with CUDA 12.6, 2.0TB RAM, and 100TB NVMe SSD storage is recommended for training. Inference deployment is suggested with 4 machines, 32 cards.
Data Format: Supports OpenAI standard format, extended for reasoning, with an option to merge reasoning content into the assistant's response.
Training: Uses sft_deepseek.py for configuration and sft_deepseek.sh as a startup script, requiring manual adjustment of NODE_RANK per machine.
Inference: Utilizes vLLM for deployment, with provided scripts for SLURM or pdsh based multi-node setups.
Resources: Training requires significant storage (7.4TB per intermediate checkpoint) and potentially large swap files for model conversion.
Documentation: README_zh.md (Chinese), README.md (English).

Highlighted Details

Full parameter fine-tuning of DeepSeek-V3/R1 671B.
Supports data parallelism (DeepSpeed ZeRO) and sequence parallelism (SP).
Includes detailed experimental results on feasibility across different parallel strategies and configurations.
Provides scripts for model weight conversion to Huggingface format and vLLM deployment.

Maintenance & Community

Developed jointly by the Institute of Automation of the Chinese Academy of Sciences and Beijing Wenge Technology Co. Ltd.

Licensing & Compatibility

Licensed under Apache-2.0. Compatible with commercial use.

Limitations & Caveats

The setup is resource-intensive, requiring substantial GPU, memory, and storage. The manual code overwriting step in xtuner might be fragile across different xtuner versions. Training requires careful configuration of distributed execution across multiple nodes.

DeepSeek-671B-SFT-Guide by ScienceOne-AI

Explore Similar Projects

Seed-Thinking-v1.5 by ByteDance-Seed

dots.llm1 by rednote-hilab

Anemll by Anemll

unlock-deepseek by datawhalechina

happy-transformer by EricFillion

MixtralKit by open-compass

Open-Reasoner-Zero by Open-Reasoner-Zero

OLMo-core by allenai

awesome-deepseek-coder by deepseek-ai

FlagAI by FlagAI-Open

open-r1 by huggingface

openvino by openvinotoolkit