pico-banana-400k  by apple

Dataset advances text-guided image editing capabilities

Created 2 weeks ago

New!

1,469 stars

Top 27.9% on SourcePulse

GitHubView on GitHub
1 Expert Loves This Project
Project Summary

Pico-Banana-400K is a large-scale dataset of ~400K text-image-edit triplets, advancing text-guided image editing research. It enables training and evaluation of sophisticated image manipulation models, targeting researchers and engineers seeking controllable, instruction-aware, and conversational editing capabilities.

How It Works

Dataset construction uses a two-stage pipeline: Gemini-2.5-Flash generates natural-language editing instructions from Open Images, with Qwen-2.5-Instruct-7B summaries. Nano-Banana performs edits, followed by Gemini-2.5-Pro automated quality assessment (Instruction Compliance, Realism, Preservation, Technical Quality). High-scoring edits (~0.7+) are included; failure cases are retained for robustness and preference learning, offering comprehensive, quality-controlled supervision.

Quick Start & Requirements

Dataset acquisition involves downloading manifest files from Apple's public CDN. Source image retrieval requires awscli for Open Images archives (train_0.tar.gz, train_1.tar.gz) and wget for metadata. A Python script (map_openimage_url_to_local.py) maps URLs to local paths. Prerequisites include awscli, wget, Python, and archive utilities. Setup demands significant data download and extraction time/storage. No direct links to quick-start guides, documentation, or demos are provided.

Highlighted Details

  • Scale: ~400K data points (~257K SFT, ~56K preference, ~72K multi-turn).
  • Edit Diversity: 35 operations across 8 categories, from pixel adjustments to scene composition and style transfer.
  • Image Quality: Source images from Open Images; edits/results are 512–1024 px.
  • Generation & Evaluation: Gemini-2.5-Flash for instructions, Gemini-2.5-Pro for quality assessment.

Maintenance & Community

No specific details on maintenance, community channels, or active development beyond the initial release are provided.

Licensing & Compatibility

Released under CC BY-NC-ND 4.0, permitting research and non-commercial use only. Commercial use and derivative redistribution are strictly prohibited. Source images follow CC BY 2.0. Compatibility for commercial applications or closed-source integration is not supported.

Limitations & Caveats

The CC BY-NC-ND 4.0 license is the primary limitation, prohibiting commercial use and derivative works. Data acquisition complexity, requiring specific tools and large downloads, presents a non-trivial setup hurdle.

Health Check
Last Commit

1 week ago

Responsiveness

Inactive

Pull Requests (30d)
1
Issues (30d)
12
Star History
1,491 stars in the last 14 days

Explore Similar Projects

Feedback? Help us improve.