Discover and explore top open-source AI tools and projects—updated daily.
HorizonWind2004Self-supervised learning for enhanced unified multimodal models
Top 89.3% on SourcePulse
This repository implements "Reconstruction Alignment" (RecA), a self-supervised learning technique that unlocks the zero-shot potential of Unified Multimodal Models (UMMs). RecA significantly enhances task performance and image editing capabilities, targeting researchers and engineers aiming to maximize UMM efficiency.
How It Works
RecA utilizes a novel reconstruction alignment approach via self-supervised learning to boost UMM performance. Applied to architectures like BAGEL, Harmon, Show-o, and OpenUni, it consistently yields substantial improvements. This method enables state-of-the-art results with remarkable efficiency, often surpassing larger models in zero-shot benchmarks.
Quick Start & Requirements
BAGEL/inference.ipynb.arxiv.org/pdf/2509.07295), Project Page (reconstruction-alignment.github.io/), HF Models (huggingface.co/collections/sanaka87/realign-68ad2176380355a3dcedc068), HF Demo (huggingface.co/spaces/sanaka87/BAGEL-RecA).Highlighted Details
Maintenance & Community
Recent September 2025 updates indicate active development. Contact via email (sanaka@berkeley.edu, xdwang@eecs.berkeley.edu); issues recommended for implementation questions. No dedicated community channels or roadmap links are provided.
Licensing & Compatibility
Features mixed licensing: majority Apache License. BAGEL/Show-o are Apache-licensed; Harmon/OpenUni use the S-Lab license. Users must comply with these terms, particularly the S-Lab license's potential commercial use restrictions.
Limitations & Caveats
Training code for Show-o and OpenUni architectures is pending release. Future work includes scaling BAGEL training and supporting new UMM architectures like Show-o2. The S-Lab license terms for Harmon/OpenUni require further investigation for commercial applications.
2 weeks ago
Inactive
BAAI-DCAI
huggingface