Discover and explore top open-source AI tools and projects—updated daily.
appleDataset advances text-guided image editing capabilities
New!
Top 27.9% on SourcePulse
Pico-Banana-400K is a large-scale dataset of ~400K text-image-edit triplets, advancing text-guided image editing research. It enables training and evaluation of sophisticated image manipulation models, targeting researchers and engineers seeking controllable, instruction-aware, and conversational editing capabilities.
How It Works
Dataset construction uses a two-stage pipeline: Gemini-2.5-Flash generates natural-language editing instructions from Open Images, with Qwen-2.5-Instruct-7B summaries. Nano-Banana performs edits, followed by Gemini-2.5-Pro automated quality assessment (Instruction Compliance, Realism, Preservation, Technical Quality). High-scoring edits (~0.7+) are included; failure cases are retained for robustness and preference learning, offering comprehensive, quality-controlled supervision.
Quick Start & Requirements
Dataset acquisition involves downloading manifest files from Apple's public CDN. Source image retrieval requires awscli for Open Images archives (train_0.tar.gz, train_1.tar.gz) and wget for metadata. A Python script (map_openimage_url_to_local.py) maps URLs to local paths. Prerequisites include awscli, wget, Python, and archive utilities. Setup demands significant data download and extraction time/storage. No direct links to quick-start guides, documentation, or demos are provided.
Highlighted Details
Maintenance & Community
No specific details on maintenance, community channels, or active development beyond the initial release are provided.
Licensing & Compatibility
Released under CC BY-NC-ND 4.0, permitting research and non-commercial use only. Commercial use and derivative redistribution are strictly prohibited. Source images follow CC BY 2.0. Compatibility for commercial applications or closed-source integration is not supported.
Limitations & Caveats
The CC BY-NC-ND 4.0 license is the primary limitation, prohibiting commercial use and derivative works. Data acquisition complexity, requiring specific tools and large downloads, presents a non-trivial setup hurdle.
1 week ago
Inactive
bloc97
orpatashnik
timothybrooks