Discover and explore top open-source AI tools and projects—updated daily.
HongriJiujiuModels for diagnosing silent inconsistency in distributed fine-tuning
Top 58.8% on SourcePulse
Summary
This repository offers three experimental fine-tuned models designed to diagnose "Silent Inconsistency" in synchronous data-parallel (DDP) full-parameter fine-tuning. It targets researchers and engineers working with distributed training, providing a method to detect subtle worker-level optimization divergences invisible in global metrics, thereby enhancing training reliability.
How It Works
The project addresses hidden divergences in worker-level optimization dynamics during synchronous DDP training, where parameter synchronization doesn't guarantee consistent internal states. It introduces three lightweight, online monitoring metrics—Loss Dispersion, Gradient-Norm Dispersion, and Gradient-Direction Consistency—computable with negligible overhead. These metrics diagnose per-worker loss and gradient behavior invisible in global loss curves, offering a novel debugging approach for distributed training.
Quick Start & Requirements
Fine-tuned models are available on Hugging Face (https://huggingface.co/jiujiudahaozi/op_pangu). They are fully fine-tuned from openPangu-Embedded-1B-V1.1 (~1B parameters) using bf16 mixed precision on the tatsu-lab/alpaca dataset (https://huggingface.co/datasets/tatsu-lab/alpaca). Training used an Instruction-Input-Response template, max sequence length 1024, with loss computed only on response tokens. Inference requires a suitable GPU environment for a 1B parameter model.
Highlighted Details
openPangu-Embedded-1B-V1.1 causal LM (~1B parameters), trained with bf16 mixed precision.tatsu-lab/alpaca instruction dataset.Maintenance & Community
Contributors include Hong Li, Zhen Zhou, Honggang Zhang, Yuping Luo, Xinyue Wang, Han Gong, and Zhiyuan Liu. No community channels or roadmap links are provided.
Licensing & Compatibility
The README omits license information, precluding assessment of commercial use or closed-source linking compatibility without clarification.
Limitations & Caveats
This repository provides experimental models for diagnosing DDP silent inconsistencies, not training scripts or a general diagnostic toolkit. Its focus is on the phenomenon and resulting models, not on enabling users to reproduce experiments or apply diagnostics broadly.
1 month ago
Inactive
huggingface
XueFuzhao
CalculatedContent