MPP-LLaVA by Coobiw

MLLM for training LLaVA-like models on limited hardware

Created 2 years ago

561 stars

Top 57.1% on SourcePulse

Project Summary

This project provides a framework for training and deploying multimodal large language models (MLLMs) based on the QwenLM architecture, specifically targeting users with limited hardware resources (e.g., RTX 3090/4090 24GB GPUs). It enables multimodal capabilities including single-image, multi-image, and video-based question answering and multi-turn conversations, with the goal of making advanced MLLM training accessible for personal projects.

How It Works

The core innovation lies in its implementation of Pipeline Parallelism (PP) combined with Data Parallelism (DP) using DeepSpeed. This approach distributes the model layers across multiple GPUs, allowing for the training of larger models than would otherwise fit into single GPU memory. The framework supports a two-stage training process, mirroring LLaVA's pretrain and supervised fine-tuning (SFT) stages, and offers custom data formats for continued training.

Quick Start & Requirements

Health Check

Last Commit

10 months ago

Responsiveness

Inactive

Pull Requests (30d)

0

Issues (30d)

1

Star History

29 stars in the last 30 days

Explore Similar Projects

Long-RL by NVlabs

Framework for scaling RL to long video sequences

Created 6 months ago

Updated 3 months ago

Starred by

Jason Huggins

Jason Huggins(Creator of Selenium).

BakLLaVA by SkunkworksAI

Multimodal model for visual instruction tuning, enhanced from LLaVA

Created 2 years ago

Updated 1 year ago

Omega-AI by dromara

Java DL framework for model training/inference, supporting multi-GPU

Created 6 years ago

Updated 3 months ago

cube-studio by data-infra

Unified cloud-native AI platform for end-to-end ML workflows

Created 1 year ago

Updated 2 months ago

Starred by

Chuan Li

Chuan Li(Chief Scientific Officer at Lambda).

NeMo-Framework-Launcher by NVIDIA

Cloud-native tool for launching NeMo framework training jobs

Created 3 years ago

Updated 8 months ago

Starred by

Aravind Srinivas

Aravind Srinivas(Cofounder of Perplexity),

Jiayi Pan

Jiayi Pan(Author of SWE-Gym; MTS at xAI), and

1 more.

lxmert by airsplay

PyTorch code for cross-modality representation learning via Transformers

Created 6 years ago

Updated 3 years ago

Starred by

Yaowei Zheng

Yaowei Zheng(Author of LLaMA-Factory),

Ying Sheng

Ying Sheng(Coauthor of SGLang), and

8 more.

DeepSpeed-MII by deepspeedai

Python library for high-throughput, low-latency, and cost-effective model inference

Created 3 years ago

Updated 6 months ago

Starred by

Amanpreet Singh

Amanpreet Singh(Cofounder of Contextual AI),

Johannes Hagemann

Johannes Hagemann(Cofounder of Prime Intellect), and

2 more.

Megatron-DeepSpeed by bigscience-workshop

Transformer LM research repo for BERT & GPT-2 training at scale

Created 4 years ago

Updated 1 year ago

minimind-v by jingyaogong

VLM for training vision-language models from scratch

Created 1 year ago

Updated 2 weeks ago

Starred by

Jeff Hammerbacher

Jeff Hammerbacher(Cofounder of Cloudera),

Anastasis Germanidis

Anastasis Germanidis(Cofounder of Runway), and

7 more.

DALI by NVIDIA

GPU-accelerated library for data pre-processing in deep learning

Created 7 years ago

Updated 4 days ago

Starred by

Chip Huyen

Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems").

MNN by alibaba

Lightweight deep learning framework for on-device inference and training

Created 6 years ago

Updated 3 days ago

Starred by

Lilian Weng

Lilian Weng(Cofounder of Thinking Machines Lab),

Aravind Srinivas

Aravind Srinivas(Cofounder of Perplexity), and

26 more.

tensor2tensor by tensorflow

Deprecated library for deep learning models/datasets, successor to Trax

Created 8 years ago

Updated 2 years ago

Feedback? Help us improve.