UniPic by SkyworkAI

Unified visual model for understanding and generation

Created 7 months ago

854 stars

Top 41.7% on SourcePulse

Project Summary

SkyworkAI/UniPic presents a 1.5B-parameter unified autoregressive model for visual tasks, including image understanding, text-to-image generation, and image editing. It aims to provide a single, cohesive architecture for diverse visual AI applications, targeting researchers and developers in computer vision and generative AI.

How It Works

UniPic employs a unified autoregressive modeling approach, treating visual tasks as sequence-to-sequence problems. This allows a single model to handle diverse inputs and outputs, from image captions to generated images, by tokenizing and processing visual information alongside text. This unified architecture simplifies deployment and potentially improves cross-task generalization.

Quick Start & Requirements

Install dependencies via pip install -r requirements.txt after creating a Python 3.10.14 virtual environment.
Download model checkpoints using huggingface-cli download Skywork/Skywork-UniPic-1.5B --local-dir checkpoint --repo-type model.
Requires PyTorch and Hugging Face Hub.
Text-to-image and image editing require 1024 image size.
Official checkpoints: 🤗 UniPic checkpoint
Tech Report: 📖 Tech Report

Highlighted Details

Achieves an overall score of 0.86 on GenEval, competitive with state-of-the-art models like BAGEL† (0.88) and Ovis-U1 (0.89).
Demonstrates strong performance on DPG-Bench, with an overall score of 85.50, outperforming many diffusion and autoregressive models.
Image editing capabilities are experimental and not production-ready, facing challenges in precision, control, and consistency.
Supports image generation, image editing, and image-to-text tasks.

Maintenance & Community

The project is associated with SkyworkAI. Further community or maintenance details are not explicitly provided in the README.

Licensing & Compatibility

Licensed under the MIT License, permitting commercial use and integration with closed-source projects.

Limitations & Caveats

The image editing functionality is explicitly stated as an exploratory research module, not production-ready, with noted issues in precision, control, and consistency.

Health Check

Last Commit

1 month ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

19 stars in the last 30 days