DeepSeek-671B-SFT-Guide  by ScienceOne-AI

Full-parameter fine-tuning guide for DeepSeek-V3/R1 671B

created 4 months ago
737 stars

Top 48.0% on sourcepulse

GitHubView on GitHub
Project Summary

This repository provides a comprehensive guide and open-source solution for full parameter fine-tuning of the DeepSeek-V3/R1 671B large language model. It targets researchers and engineers aiming to adapt this powerful model for specific tasks, offering complete code from training to inference, along with practical insights and troubleshooting advice.

How It Works

The project leverages an extended xtuner framework, incorporating data parallelism (DeepSpeed ZeRO) and sequence parallelism (SP) to enable efficient full parameter fine-tuning of the 671B model. It implements custom modeling logic based on the DeepSeek-V3 paper and DeepSeek-V2 architecture, facilitating adaptation for reasoning tasks with a structured data format that supports multi-turn dialogues and selective loss calculation.

Quick Start & Requirements

  • Installation: Requires Python 3.10+, conda environment, and installation via pip install -r requirements.txt. Core xtuner code for DeepseekV3ForCausalLM needs to be manually copied.
  • Hardware: Minimum 8 x NVIDIA H100 80GB GPUs with CUDA 12.6, 2.0TB RAM, and 100TB NVMe SSD storage is recommended for training. Inference deployment is suggested with 4 machines, 32 cards.
  • Data Format: Supports OpenAI standard format, extended for reasoning, with an option to merge reasoning content into the assistant's response.
  • Training: Uses sft_deepseek.py for configuration and sft_deepseek.sh as a startup script, requiring manual adjustment of NODE_RANK per machine.
  • Inference: Utilizes vLLM for deployment, with provided scripts for SLURM or pdsh based multi-node setups.
  • Resources: Training requires significant storage (7.4TB per intermediate checkpoint) and potentially large swap files for model conversion.
  • Documentation: README_zh.md (Chinese), README.md (English).

Highlighted Details

  • Full parameter fine-tuning of DeepSeek-V3/R1 671B.
  • Supports data parallelism (DeepSpeed ZeRO) and sequence parallelism (SP).
  • Includes detailed experimental results on feasibility across different parallel strategies and configurations.
  • Provides scripts for model weight conversion to Huggingface format and vLLM deployment.

Maintenance & Community

Developed jointly by the Institute of Automation of the Chinese Academy of Sciences and Beijing Wenge Technology Co. Ltd.

Licensing & Compatibility

Licensed under Apache-2.0. Compatible with commercial use.

Limitations & Caveats

The setup is resource-intensive, requiring substantial GPU, memory, and storage. The manual code overwriting step in xtuner might be fragile across different xtuner versions. Training requires careful configuration of distributed execution across multiple nodes.

Health Check
Last commit

4 months ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
1
Star History
74 stars in the last 90 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), Alex Cheema Alex Cheema(Cofounder of EXO Labs), and
1 more.

recurrent-pretraining by seal-rg

0.1%
806
Pretraining code for depth-recurrent language model research
created 5 months ago
updated 2 weeks ago
Starred by Georgios Konstantopoulos Georgios Konstantopoulos(CTO, General Partner at Paradigm) and Jiayi Pan Jiayi Pan(Author of SWE-Gym; AI Researcher at UC Berkeley).

DeepSeek-V2 by deepseek-ai

0.1%
5k
MoE language model for research/API use
created 1 year ago
updated 10 months ago
Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems) and Jiayi Pan Jiayi Pan(Author of SWE-Gym; AI Researcher at UC Berkeley).

DeepSeek-Coder-V2 by deepseek-ai

0.4%
6k
Open-source code language model comparable to GPT4-Turbo
created 1 year ago
updated 10 months ago
Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), Jeff Hammerbacher Jeff Hammerbacher(Cofounder of Cloudera), and
10 more.

open-r1 by huggingface

0.2%
25k
SDK for reproducing DeepSeek-R1
created 6 months ago
updated 3 days ago
Feedback? Help us improve.