Ovis-U1 by AIDC-AI

Unified multimodal model for understanding, generation, and editing

Created 7 months ago

445 stars

Top 67.3% on SourcePulse

Project Summary

Ovis-U1 is a 3-billion-parameter unified model designed for multimodal understanding, text-to-image generation, and image editing. It targets researchers and developers working with complex visual and textual data, offering a single framework to handle diverse AI tasks with state-of-the-art performance.

How It Works

Ovis-U1 employs a diffusion-based visual decoder (MMDiT) and a bidirectional token refiner. This architecture facilitates high-fidelity image synthesis and improved interaction between text and vision modalities. The model is trained synergistically on a mixed dataset encompassing understanding, generation, and editing tasks, which the authors claim enhances generalization and accuracy in real-world scenarios.

Quick Start & Requirements

Installation: Clone the repository, create a conda environment with Python 3.10, activate it, install dependencies from requirements.txt, and then install the package in editable mode (pip install -e .).
Prerequisites: Python 3.10, Torch 2.4.0, Transformers 4.51.3, DeepSpeed 0.15.4.
Demo: A browser-based demo is available.

Highlighted Details

Achieves leading scores on benchmarks like OpenCompass (69.6), DPG-Bench (83.72), and ImgEdit-Bench (4.00).
Demonstrates strong performance on GenEval (0.89 overall).
Offers example scripts for single-image understanding, multi-image understanding, text-to-image generation, and image editing.

Maintenance & Community

The project is associated with AIDC-AI and acknowledges contributions from the Ovis and FLUX projects. Hiring information for researchers in multimodal AI is provided.

Licensing & Compatibility

Released under the Apache License 2.0, permitting commercial use and linking with closed-source projects.

Limitations & Caveats

The authors provide a disclaimer regarding potential copyright issues or improper content generation due to the complexity of the data and usage scenarios, advising users to contact them for any concerns.

Ovis-U1 by AIDC-AI

Explore Similar Projects

OneCAT by onecat-ai

TokenFlow by ByteVisionLab

metaquery by facebookresearch

OmniGen2 by VectorSpaceLab

best_AI_papers_2023 by louisfb01

SEED-X by AILab-CVC

Awesome-Unified-Multimodal-Models by AIDC-AI

Liquid by FoundationVision

Awesome_Matching_Pretraining_Transfering by Paranioar

PaddleMIX by PaddlePaddle

Bagel by ByteDance-Seed

Janus by deepseek-ai