UniPic  by SkyworkAI

Unified visual model for understanding and generation

Created 1 month ago
785 stars

Top 44.6% on SourcePulse

GitHubView on GitHub
Project Summary

SkyworkAI/UniPic presents a 1.5B-parameter unified autoregressive model for visual tasks, including image understanding, text-to-image generation, and image editing. It aims to provide a single, cohesive architecture for diverse visual AI applications, targeting researchers and developers in computer vision and generative AI.

How It Works

UniPic employs a unified autoregressive modeling approach, treating visual tasks as sequence-to-sequence problems. This allows a single model to handle diverse inputs and outputs, from image captions to generated images, by tokenizing and processing visual information alongside text. This unified architecture simplifies deployment and potentially improves cross-task generalization.

Quick Start & Requirements

  • Install dependencies via pip install -r requirements.txt after creating a Python 3.10.14 virtual environment.
  • Download model checkpoints using huggingface-cli download Skywork/Skywork-UniPic-1.5B --local-dir checkpoint --repo-type model.
  • Requires PyTorch and Hugging Face Hub.
  • Text-to-image and image editing require 1024 image size.
  • Official checkpoints: 🤗 UniPic checkpoint
  • Tech Report: 📖 Tech Report

Highlighted Details

  • Achieves an overall score of 0.86 on GenEval, competitive with state-of-the-art models like BAGEL† (0.88) and Ovis-U1 (0.89).
  • Demonstrates strong performance on DPG-Bench, with an overall score of 85.50, outperforming many diffusion and autoregressive models.
  • Image editing capabilities are experimental and not production-ready, facing challenges in precision, control, and consistency.
  • Supports image generation, image editing, and image-to-text tasks.

Maintenance & Community

The project is associated with SkyworkAI. Further community or maintenance details are not explicitly provided in the README.

Licensing & Compatibility

Licensed under the MIT License, permitting commercial use and integration with closed-source projects.

Limitations & Caveats

The image editing functionality is explicitly stated as an exploratory research module, not production-ready, with noted issues in precision, control, and consistency.

Health Check
Last Commit

1 week ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
6
Star History
58 stars in the last 30 days

Explore Similar Projects

Starred by Vincent Weisser Vincent Weisser(Cofounder of Prime Intellect), Patrick von Platen Patrick von Platen(Author of Hugging Face Diffusers; Research Engineer at Mistral), and
2 more.

IP-Adapter by tencent-ailab

0.3%
6k
Adapter for image prompt in text-to-image diffusion models
Created 2 years ago
Updated 1 year ago
Feedback? Help us improve.