OmniSearch  by Alibaba-NLP

Multimodal RAG benchmark with a self-adaptive planning agent

created 9 months ago
354 stars

Top 79.9% on sourcepulse

GitHubView on GitHub
Project Summary

OmniSearch introduces a novel self-adaptive planning agent for multimodal retrieval-augmented generation (mRAG), specifically addressing the limitations of existing benchmarks in reflecting real-world dynamic knowledge retrieval needs. It targets researchers and developers working with multimodal question-answering systems, offering a framework to benchmark and improve mRAG performance on complex, evolving queries.

How It Works

OmniSearch functions as a planning agent that dynamically determines retrieval actions based on the current stage of question resolution and the content retrieved so far. This approach contrasts with static retrieval methods by enabling real-time adaptation, aiming to provide more relevant and sufficient knowledge for dynamic questions. The project also introduces the Dyn-VQA dataset, designed to capture these dynamic question characteristics.

Quick Start & Requirements

  • Installation: pip install -r requirement.txt
  • Prerequisites: Python 3.11.9, PyTorch >= 2.0.0, requests, google-search-results, serpapi. OpenAI API key and Google Search API key are required for the GPT-4V implementation.
  • Running GPT-4V: python main.py --test_dataset 'path/to/dataset.jsonl' --dataset_name NAME --meta_save_path 'path/to/results'
  • Running Qwen-VL: python Omnisearch_qwen.py --test_dataset '/path/to/dataset.jsonl' --dataset_name NAME --meta_save_path '/path/to/results' --model_path '/local/path/to/OmniSearch-Qwen-Chat-VL-weight'
  • Demo: Chinese Web Demo available on ModelScope.

Highlighted Details

  • First planning agent for multimodal RAG.
  • Introduces Dyn-VQA dataset with three types of dynamic questions.
  • Benchmarks various mRAG methods with leading MLLMs on Dyn-VQA.
  • Supports GPT-4V and Qwen-VL models.

Maintenance & Community

The project is contributed by Alibaba-NLP researchers. Inspiration is drawn from ReACT, SelfAsk, and FleshLLMs. TODO items include releasing Qwen-VL-Chat code and weights, and creating a benchmark for Dyn-VQA.

Licensing & Compatibility

The repository does not explicitly state a license in the provided README. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

The project is presented as a research artifact with TODO items for full release. Specific details on hardware requirements beyond standard Python dependencies are not detailed, and API key management for external services is a necessary setup step.

Health Check
Last commit

3 months ago

Responsiveness

1 week

Pull Requests (30d)
0
Issues (30d)
0
Star History
47 stars in the last 90 days

Explore Similar Projects

Starred by Jason Liu Jason Liu(Author of Instructor) and Ross Taylor Ross Taylor(Cofounder of General Reasoning; Creator of Papers with Code).

Search-R1 by PeterGriffinJin

1.1%
3k
RL framework for training LLMs to use search engines
created 5 months ago
updated 3 weeks ago
Feedback? Help us improve.