Discover and explore top open-source AI tools and projects—updated daily.
Advanced multimodal LLM for text, image, and video
Top 99.8% on SourcePulse
Valley is a multimodal large language model developed by ByteDance, designed to process and understand text, images, and video data. It targets researchers and developers seeking advanced capabilities in multimodal AI, offering strong performance on e-commerce and short-video benchmarks, and achieving top rankings on leaderboards like OpenCompass for models under 10 billion parameters.
How It Works
The foundational Valley model aligns with Siglip and Qwen2.5, employing LargeMLP and ConvAdapter for its projector component. The "Valley-Eagle" variant builds upon this by integrating an additional, parallel VisionEncoder (specifically Qwen2vl) that enables flexible adjustment of token counts. This architectural enhancement is designed to improve the model's performance, particularly in challenging or "extreme" scenarios.
Quick Start & Requirements
Installation involves setting up PyTorch 2.4.0 with CUDA 12.1 support and installing dependencies via requirements.txt
. The repository provides Python code examples for performing inference with single images, multiple images, and video data, utilizing Hugging Face Transformers and a custom ValleyEagleChat
class. Official links are available for Hugging Face, ModelScope, and the research paper.
Highlighted Details
Maintenance & Community
The project is developed by ByteDance's Tiktop-Ecommerce Team, with hiring efforts noted for Beijing, Shanghai, Hangzhou, and Singapore locations. No specific community channels (e.g., Discord, Slack) or public roadmaps are mentioned in the README.
Licensing & Compatibility
The open-source models are licensed under the Apache-2.0 license, which generally permits commercial use and integration into closed-source projects.
Limitations & Caveats
The provided README does not explicitly detail any limitations, known bugs, or alpha status. Performance claims are primarily based on internal benchmarks and specific leaderboard evaluations.
1 month ago
Inactive