MPP-LLaVA  by Coobiw

MLLM for training LLaVA-like models on limited hardware

Created 1 year ago
472 stars

Top 64.7% on SourcePulse

GitHubView on GitHub
Project Summary

This project provides a framework for training and deploying multimodal large language models (MLLMs) based on the QwenLM architecture, specifically targeting users with limited hardware resources (e.g., RTX 3090/4090 24GB GPUs). It enables multimodal capabilities including single-image, multi-image, and video-based question answering and multi-turn conversations, with the goal of making advanced MLLM training accessible for personal projects.

How It Works

The core innovation lies in its implementation of Pipeline Parallelism (PP) combined with Data Parallelism (DP) using DeepSpeed. This approach distributes the model layers across multiple GPUs, allowing for the training of larger models than would otherwise fit into single GPU memory. The framework supports a two-stage training process, mirroring LLaVA's pretrain and supervised fine-tuning (SFT) stages, and offers custom data formats for continued training.

Quick Start & Requirements

Health Check
Last Commit

6 months ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
6 stars in the last 30 days

Explore Similar Projects

Feedback? Help us improve.