MPP-LLaVA  by Coobiw

MLLM for training LLaVA-like models on limited hardware

created 1 year ago
463 stars

Top 66.4% on sourcepulse

GitHubView on GitHub
Project Summary

This project provides a framework for training and deploying multimodal large language models (MLLMs) based on the QwenLM architecture, specifically targeting users with limited hardware resources (e.g., RTX 3090/4090 24GB GPUs). It enables multimodal capabilities including single-image, multi-image, and video-based question answering and multi-turn conversations, with the goal of making advanced MLLM training accessible for personal projects.

How It Works

The core innovation lies in its implementation of Pipeline Parallelism (PP) combined with Data Parallelism (DP) using DeepSpeed. This approach distributes the model layers across multiple GPUs, allowing for the training of larger models than would otherwise fit into single GPU memory. The framework supports a two-stage training process, mirroring LLaVA's pretrain and supervised fine-tuning (SFT) stages, and offers custom data formats for continued training.

Quick Start & Requirements

Health Check
Last commit

4 months ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
0
Star History
24 stars in the last 90 days

Explore Similar Projects

Starred by Aravind Srinivas Aravind Srinivas(Cofounder of Perplexity), Ross Taylor Ross Taylor(Cofounder of General Reasoning; Creator of Papers with Code), and
3 more.

pixel-cnn by openai

0.1%
2k
TensorFlow implementation for PixelCNN++ research paper
created 9 years ago
updated 5 years ago
Feedback? Help us improve.