InternVL-U by OpenGVLab

Unified multimodal AI for understanding, generation, and editing

Created 2 months ago

286 stars

Top 91.5% on SourcePulse

Project Summary

Summary

InternVL-U is a 4B-parameter Unified Multimodal Model (UMM) designed to democratize advanced multimodal AI capabilities. It integrates understanding, reasoning, image generation, and editing into a single, efficient framework, targeting researchers and developers seeking a versatile tool for complex visual-AI tasks.

How It Works

The model employs a unified yet modular design, combining a state-of-the-art MLLM backbone with a specialized MMDiT-based visual generation head. It utilizes decoupled visual representations and modality-specific modules for flexibility. A key innovation is its high-quality data synthesis pipeline, leveraging Chain-of-Thought (CoT) to align abstract user intent with precise visual execution, particularly for challenging tasks like text rendering and scientific reasoning. This approach enables strong performance across generation, editing, understanding, and reasoning within a practical parameter scale.

Quick Start & Requirements

Installation requires pip install -r requirements.txt. Model checkpoints are available on Hugging Face. Inference requires a CUDA-enabled GPU and PyTorch with torch_dtype=torch.bfloat16 support.

Model Checkpoint: Hugging Face
Technical Report: arXiv
Evaluation Toolkit: GenEditEvalKit
Text Editing Benchmark: TextEdit Benchmark

Highlighted Details

A 4B-parameter Unified Multimodal Model (UMM) supporting understanding, reasoning, generation, and editing.
Achieves performance exceeding open-source UMM baselines in generation and editing at its parameter scale.
Features a strong MLLM backbone integrated with an MMDiT visual generator.
Supports multi-image understanding inference.
Associated with dedicated evaluation tools (GenEditEvalKit, TextEdit Benchmark).

Maintenance & Community

Developed by Shanghai AI Laboratory, InternVL-U Team. Recent updates in March 2026 indicate active development. No specific community channels (e.g., Discord, Slack) are listed.

Licensing & Compatibility

The software license is not specified. This omission requires clarification for any adoption decision, especially concerning commercial use or derivative works.

InternVL-U by OpenGVLab

Explore Similar Projects

GoT by rongyaofang

Ovis-U1 by AIDC-AI

Awesome-Multimodal-Modeling by OpenEnvision

Mirage by UMass-Embodied-AGI

mammothmoda by bytedance

SEED-X by AILab-CVC

deepgen by deepgenteam

Thyme by yfzhang114

Lance by bytedance

SenseNova-U1 by OpenSenseNova

Bagel by ByteDance-Seed

Janus by deepseek-ai