DeepSeek-671B-SFT-Guide  by ScienceOne-AI

Full-parameter fine-tuning guide for DeepSeek-V3/R1 671B

Created 6 months ago
758 stars

Top 45.8% on SourcePulse

GitHubView on GitHub
Project Summary

This repository provides a comprehensive guide and open-source solution for full parameter fine-tuning of the DeepSeek-V3/R1 671B large language model. It targets researchers and engineers aiming to adapt this powerful model for specific tasks, offering complete code from training to inference, along with practical insights and troubleshooting advice.

How It Works

The project leverages an extended xtuner framework, incorporating data parallelism (DeepSpeed ZeRO) and sequence parallelism (SP) to enable efficient full parameter fine-tuning of the 671B model. It implements custom modeling logic based on the DeepSeek-V3 paper and DeepSeek-V2 architecture, facilitating adaptation for reasoning tasks with a structured data format that supports multi-turn dialogues and selective loss calculation.

Quick Start & Requirements

  • Installation: Requires Python 3.10+, conda environment, and installation via pip install -r requirements.txt. Core xtuner code for DeepseekV3ForCausalLM needs to be manually copied.
  • Hardware: Minimum 8 x NVIDIA H100 80GB GPUs with CUDA 12.6, 2.0TB RAM, and 100TB NVMe SSD storage is recommended for training. Inference deployment is suggested with 4 machines, 32 cards.
  • Data Format: Supports OpenAI standard format, extended for reasoning, with an option to merge reasoning content into the assistant's response.
  • Training: Uses sft_deepseek.py for configuration and sft_deepseek.sh as a startup script, requiring manual adjustment of NODE_RANK per machine.
  • Inference: Utilizes vLLM for deployment, with provided scripts for SLURM or pdsh based multi-node setups.
  • Resources: Training requires significant storage (7.4TB per intermediate checkpoint) and potentially large swap files for model conversion.
  • Documentation: README_zh.md (Chinese), README.md (English).

Highlighted Details

  • Full parameter fine-tuning of DeepSeek-V3/R1 671B.
  • Supports data parallelism (DeepSpeed ZeRO) and sequence parallelism (SP).
  • Includes detailed experimental results on feasibility across different parallel strategies and configurations.
  • Provides scripts for model weight conversion to Huggingface format and vLLM deployment.

Maintenance & Community

Developed jointly by the Institute of Automation of the Chinese Academy of Sciences and Beijing Wenge Technology Co. Ltd.

Licensing & Compatibility

Licensed under Apache-2.0. Compatible with commercial use.

Limitations & Caveats

The setup is resource-intensive, requiring substantial GPU, memory, and storage. The manual code overwriting step in xtuner might be fragile across different xtuner versions. Training requires careful configuration of distributed execution across multiple nodes.

Health Check
Last Commit

6 months ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
15 stars in the last 30 days

Explore Similar Projects

Starred by Yineng Zhang Yineng Zhang(Inference Lead at SGLang; Research Scientist at Together AI).

dots.llm1 by rednote-hilab

0.2%
462
MoE model for research
Created 4 months ago
Updated 4 weeks ago
Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), Jeff Hammerbacher Jeff Hammerbacher(Cofounder of Cloudera), and
20 more.

open-r1 by huggingface

0.2%
25k
SDK for reproducing DeepSeek-R1
Created 7 months ago
Updated 1 week ago
Feedback? Help us improve.