OmniSearch by Alibaba-NLP

Multimodal RAG benchmark with a self-adaptive planning agent

Created 1 year ago

405 stars

Top 71.8% on SourcePulse

Project Summary

OmniSearch introduces a novel self-adaptive planning agent for multimodal retrieval-augmented generation (mRAG), specifically addressing the limitations of existing benchmarks in reflecting real-world dynamic knowledge retrieval needs. It targets researchers and developers working with multimodal question-answering systems, offering a framework to benchmark and improve mRAG performance on complex, evolving queries.

How It Works

OmniSearch functions as a planning agent that dynamically determines retrieval actions based on the current stage of question resolution and the content retrieved so far. This approach contrasts with static retrieval methods by enabling real-time adaptation, aiming to provide more relevant and sufficient knowledge for dynamic questions. The project also introduces the Dyn-VQA dataset, designed to capture these dynamic question characteristics.

Quick Start & Requirements

Installation: pip install -r requirement.txt
Prerequisites: Python 3.11.9, PyTorch >= 2.0.0, requests, google-search-results, serpapi. OpenAI API key and Google Search API key are required for the GPT-4V implementation.
Running GPT-4V: python main.py --test_dataset 'path/to/dataset.jsonl' --dataset_name NAME --meta_save_path 'path/to/results'
Running Qwen-VL: python Omnisearch_qwen.py --test_dataset '/path/to/dataset.jsonl' --dataset_name NAME --meta_save_path '/path/to/results' --model_path '/local/path/to/OmniSearch-Qwen-Chat-VL-weight'
Demo: Chinese Web Demo available on ModelScope.

Highlighted Details

First planning agent for multimodal RAG.
Introduces Dyn-VQA dataset with three types of dynamic questions.
Benchmarks various mRAG methods with leading MLLMs on Dyn-VQA.
Supports GPT-4V and Qwen-VL models.

Maintenance & Community

The project is contributed by Alibaba-NLP researchers. Inspiration is drawn from ReACT, SelfAsk, and FleshLLMs. TODO items include releasing Qwen-VL-Chat code and weights, and creating a benchmark for Dyn-VQA.

Licensing & Compatibility

The repository does not explicitly state a license in the provided README. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

The project is presented as a research artifact with TODO items for full release. Specific details on hardware requirements beyond standard Python dependencies are not detailed, and API key management for external services is a necessary setup step.

Health Check

Last Commit

8 months ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

6 stars in the last 30 days