Ovis-U1  by AIDC-AI

Unified multimodal model for understanding, generation, and editing

Created 3 months ago
414 stars

Top 70.7% on SourcePulse

GitHubView on GitHub
Project Summary

Ovis-U1 is a 3-billion-parameter unified model designed for multimodal understanding, text-to-image generation, and image editing. It targets researchers and developers working with complex visual and textual data, offering a single framework to handle diverse AI tasks with state-of-the-art performance.

How It Works

Ovis-U1 employs a diffusion-based visual decoder (MMDiT) and a bidirectional token refiner. This architecture facilitates high-fidelity image synthesis and improved interaction between text and vision modalities. The model is trained synergistically on a mixed dataset encompassing understanding, generation, and editing tasks, which the authors claim enhances generalization and accuracy in real-world scenarios.

Quick Start & Requirements

  • Installation: Clone the repository, create a conda environment with Python 3.10, activate it, install dependencies from requirements.txt, and then install the package in editable mode (pip install -e .).
  • Prerequisites: Python 3.10, Torch 2.4.0, Transformers 4.51.3, DeepSpeed 0.15.4.
  • Demo: A browser-based demo is available.

Highlighted Details

  • Achieves leading scores on benchmarks like OpenCompass (69.6), DPG-Bench (83.72), and ImgEdit-Bench (4.00).
  • Demonstrates strong performance on GenEval (0.89 overall).
  • Offers example scripts for single-image understanding, multi-image understanding, text-to-image generation, and image editing.

Maintenance & Community

The project is associated with AIDC-AI and acknowledges contributions from the Ovis and FLUX projects. Hiring information for researchers in multimodal AI is provided.

Licensing & Compatibility

Released under the Apache License 2.0, permitting commercial use and linking with closed-source projects.

Limitations & Caveats

The authors provide a disclaimer regarding potential copyright issues or improper content generation due to the complexity of the data and usage scenarios, advising users to contact them for any concerns.

Health Check
Last Commit

1 month ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
1
Star History
14 stars in the last 30 days

Explore Similar Projects

Feedback? Help us improve.