Discover and explore top open-source AI tools and projects—updated daily.
OsillyMultimodal LLM for deep research and extensive search
New!
Top 77.2% on SourcePulse
This project introduces Vision-DeepResearch, a multimodal large language model (MLLM) designed for extended, long-horizon deep-research tasks. It targets researchers and advanced users needing to perform complex, multi-turn reasoning and extensive web-based information retrieval. The primary benefit is enabling MLLMs to engage in significantly deeper and more interactive research processes than previously possible.
How It Works
Vision-DeepResearch extends traditional MLLM capabilities by enabling dozens of reasoning turns and hundreds of search engine interactions. Its core innovation lies in facilitating a "deep-research" workflow, allowing the model to iteratively refine queries, analyze search results, and synthesize information over extended periods. This approach is advantageous for tackling complex problems that require sustained investigation and information gathering, moving beyond single-turn question-answering.
Quick Start & Requirements
Setup involves cloning the repository and installing several core components: verl, Megatron-LM, mbridge, and rllm via pip. Data preparation requires converting provided Parquet datasets to JSONL format using provided scripts. Training involves distinct SFT and RL phases, with RL training necessitating the deployment of vLLM-served Extract and Judge models, and configuration of API keys (SERP_API_KEY, JINA_API_KEY) and OSS settings. Evaluation requires running an OpenAI-compatible API service for the model and configuring evaluation scripts with specific data formats and model endpoints. Significant GPU resources (e.g., 8x GPUs for vLLM serving) and specific dependencies like vLLM are required.
Highlighted Details
Maintenance & Community
The project has seen recent releases of SFT and RL code, along with datasets and an 8B model, as of early February 2026. No explicit community channels (e.g., Discord, Slack) or detailed roadmap are provided in the README.
Licensing & Compatibility
The README does not specify a software license. This lack of explicit licensing information presents a significant adoption blocker, as it leaves the terms for use, modification, and distribution unclear, particularly for commercial applications.
Limitations & Caveats
The Vision-DeepResearch-30B-A3B model weights are listed as "coming soon." While demo datasets are available, the full datasets might require further preparation or are not yet fully released. The setup process is complex, requiring the installation of multiple specialized libraries and the deployment of external model services, indicating a steep learning curve and potentially high resource requirements for users.
2 weeks ago
Inactive
salesforce