Discover and explore top open-source AI tools and projects—updated daily.
ByteDance-SeedUnified multimodal foundation model
Top 9.4% on SourcePulse
BAGEL is an open-source unified multimodal foundation model designed for both understanding and generation tasks. It aims to provide state-of-the-art performance across various benchmarks, including visual understanding, text-to-image generation, and image editing, targeting researchers and developers working with multimodal AI.
How It Works
BAGEL is a 7B active parameter (14B total) model trained on large-scale interleaved multimodal data. Its architecture supports advanced capabilities like free-form visual manipulation, multiview synthesis, and world navigation, positioning it as a "world-modeling" system beyond traditional image editing. The model offers fine-grained control over generation through parameters like cfg_text_scale, cfg_image_scale, and various cfg_renorm_type options for managing text and image guidance during the diffusion process.
Quick Start & Requirements
python=3.10), and install requirements (pip install -r requirements.txt flash_attn==2.5.8 --no-build-isolation).snapshot_download.python app.py with options for VRAM (32GB+ for full, 12-32GB with NF4 quantization).Highlighted Details
Maintenance & Community
Licensing & Compatibility
Limitations & Caveats
1 week ago
1 day
InternLM