MMSearch  by CaraJ7

Multimodal search engine pipeline and benchmark for large multimodal models (LMMs)

created 10 months ago
457 stars

Top 67.1% on sourcepulse

GitHubView on GitHub
Project Summary

MMSearch provides a comprehensive pipeline and benchmark for evaluating Large Multi-modal Models (LMMs) as multimodal search engines. It addresses the gap in standardized evaluation for LMMs in search tasks, offering a framework for researchers and developers to assess and compare model performance in this domain.

How It Works

MMSearch introduces a pipeline, MMSearch-Engine, to enable LMMs to function as multimodal search engines. The benchmark comprises 300 manually curated instances across 14 subfields, designed to avoid overlap with existing LMM training data. Evaluation employs a step-wise strategy, assessing models on requery, rerank, and summarization tasks, culminating in an end-to-end search process. This approach allows for granular understanding of LMM capabilities in different stages of a search query.

Quick Start & Requirements

  • Install dependencies: pip install requirements.txt and playwright install.
  • Evaluation supports models from VLMEvalKit (requires separate installation) or custom LMMs by implementing an infer function.
  • Evaluation tasks: scripts/run_end2end.sh, scripts/run_rerank.sh, scripts/run_summarization.sh. Final score calculation: scripts/run_get_final_score.sh.
  • Demo: demo/run_demo_cli.sh.
  • Dataset: load_dataset("CaraJ/MMSearch") from Huggingface.
  • Project page: 🌐 Webpage
  • Paper: 📖 Paper

Highlighted Details

  • Benchmark designed to ensure correct answers require search, not memorization.
  • Step-wise evaluation strategy (requery, rerank, summarization) for detailed analysis.
  • Supports evaluation of custom LMMs with minimal effort.
  • Command-line demo available for new queries.
  • Leaderboard for community contributions.

Maintenance & Community

  • Project accepted to ICLR 2025.
  • Active development with recent updates for evaluation code and Huggingface dataset release.
  • Contribution to the leaderboard is welcomed via email.

Licensing & Compatibility

  • The repository does not explicitly state a license. The associated Huggingface dataset is also not explicitly licensed.

Limitations & Caveats

  • The end-to-end task requires internet interaction and may be subject to search engine rate limits.
  • Some VLMEvalKit models may not support text-only inference, impacting end-to-end task performance.
  • The demo requires the search engine (Google Lens) to be in English for proper functionality.
Health Check
Last commit

6 months ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
0
Star History
31 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.