LLM-Dojo by mst272

LLM training framework for model training and RLHF

Created 1 year ago

914 stars

Top 39.8% on SourcePulse

Project Summary

LLM-Dojo is an open-source framework designed for learning and experimenting with large language models (LLMs) and vision-language models (VLMs). It offers a clean, readable codebase for training, including Supervised Fine-Tuning (SFT) and Reinforcement Learning from Human Feedback (RLHF) methods like DPO, CPO, and KTO. The project targets researchers and developers looking for a flexible platform to customize and test various LLM training techniques.

How It Works

LLM-Dojo is built upon the Hugging Face ecosystem, providing modular components for SFT, VLM, and RLHF. Its SFT framework supports various training strategies like LoRA, QLoRA, and full parameter tuning, with automatic chat template adaptation. The RLHF module integrates multiple reinforcement learning algorithms and supports efficient training on a single A100 GPU using DeepSpeed and LoRA. The VLM module enables multimodal training for tasks like Visual Question Answering.

Quick Start & Requirements

Installation/Execution: Primarily uses deepspeed for multi-GPU training or standard Python commands for single-GPU training. Example commands are provided in run_example.sh.
Prerequisites: Python, Hugging Face libraries, DeepSpeed. GPU with CUDA is recommended for efficient training.
Resources: Claims a single A100 GPU is sufficient for RLHF training. LoRA with Qwen (7B) uses ~16GB VRAM with DeepSpeed Zero3.
Documentation: Detailed explanations for RLHF, SFT, and VLM are available within their respective directories. Technical tricks and explanations are in the llm_tricks folder.

Highlighted Details

Supports a wide range of LLMs including Qwen, Llama, Yi, Gemma, and Phi-3.
RLHF framework includes Knowledge Distillation, DPO, CPO, SimPO, KTO, and Rejected Sampling.
VLM module supports Qwen-2-VL and Llava for Visual Question Answering.
Includes a llm_tricks section for implementing and explaining advanced LLM techniques from scratch.

Maintenance & Community

The project actively lists recent updates and new feature implementations, indicating ongoing development. Contributions via Issues and Pull Requests are encouraged. Links to related technical articles on Zhihu and Medium are provided for deeper understanding.

Licensing & Compatibility

The README does not explicitly state a license. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

The project is under active development, with some features like Rejected Sampling documentation still pending. While it aims for ease of use and modification, the rapid addition of new techniques might lead to occasional breaking changes or incomplete documentation for the latest additions.

LLM-Dojo by mst272

Explore Similar Projects

mistral by stanford-crfm

naifu by Mikubill

awesome-llms-fine-tuning by Curated-Awesome-Lists

open-r1-multimodal by EvolvingLMMs-Lab

molmo by allenai

TinyLLaVA_Factory by TinyLLaVA

tiny-llm-zh by wdndev

NeMo-Framework-Launcher by NVIDIA

multimodal by facebookresearch

tunix by google

EasyR1 by hiyouga

trl by huggingface