Discover and explore top open-source AI tools and projects—updated daily.
tulerfengAn all-in-one model for multimodal reasoning across image and video
Top 78.1% on SourcePulse
<2-3 sentences summarising what the project addresses and solves, the target audience, and the benefit.> OneThinker is an all-in-one multimodal reasoning generalist designed for image and video analysis. It targets researchers and engineers needing a unified model for diverse visual tasks, offering cross-task knowledge transfer and zero-shot generalization benefits.
How It Works
This project introduces OneThinker, a unified multimodal reasoning model built upon Qwen3-VL-8B. It leverages a large-scale OneThinker-600k multi-task corpus and a high-quality OneThinker-SFT-340k dataset with Chain-of-Thought (CoT) annotations. A novel EMA-GRPO reinforcement learning method balances heterogeneous reward signals across tasks, enabling effective cross-task and cross-modality knowledge transfer.
Quick Start & Requirements
llamafactory (Python 3.11) with pip install -e ".[torch,metrics]" and easyr1 (Python 3.11) with pip install -e .. Download required datasets.Highlighted Details
Maintenance & Community
Licensing & Compatibility
Limitations & Caveats
sam2, and some QA evaluations depend on VLMEvalKit. The project mandates specific Python versions (3.11) and distinct environment setups for SFT and RL.2 days ago
Inactive