MMSearch by CaraJ7

Multimodal search engine pipeline and benchmark for large multimodal models (LMMs)

Created 1 year ago

482 stars

Top 63.6% on SourcePulse

Project Summary

MMSearch provides a comprehensive pipeline and benchmark for evaluating Large Multi-modal Models (LMMs) as multimodal search engines. It addresses the gap in standardized evaluation for LMMs in search tasks, offering a framework for researchers and developers to assess and compare model performance in this domain.

How It Works

MMSearch introduces a pipeline, MMSearch-Engine, to enable LMMs to function as multimodal search engines. The benchmark comprises 300 manually curated instances across 14 subfields, designed to avoid overlap with existing LMM training data. Evaluation employs a step-wise strategy, assessing models on requery, rerank, and summarization tasks, culminating in an end-to-end search process. This approach allows for granular understanding of LMM capabilities in different stages of a search query.

Quick Start & Requirements

Install dependencies: pip install requirements.txt and playwright install.
Evaluation supports models from VLMEvalKit (requires separate installation) or custom LMMs by implementing an infer function.
Evaluation tasks: scripts/run_end2end.sh, scripts/run_rerank.sh, scripts/run_summarization.sh. Final score calculation: scripts/run_get_final_score.sh.
Demo: demo/run_demo_cli.sh.
Dataset: load_dataset("CaraJ/MMSearch") from Huggingface.
Project page: 🌐 Webpage
Paper: 📖 Paper

Highlighted Details

Benchmark designed to ensure correct answers require search, not memorization.
Step-wise evaluation strategy (requery, rerank, summarization) for detailed analysis.
Supports evaluation of custom LMMs with minimal effort.
Command-line demo available for new queries.
Leaderboard for community contributions.

Maintenance & Community

Project accepted to ICLR 2025.
Active development with recent updates for evaluation code and Huggingface dataset release.
Contribution to the leaderboard is welcomed via email.

Licensing & Compatibility

The repository does not explicitly state a license. The associated Huggingface dataset is also not explicitly licensed.

Limitations & Caveats

The end-to-end task requires internet interaction and may be subject to search engine rate limits.
Some VLMEvalKit models may not support text-only inference, impacting end-to-end task performance.
The demo requires the search engine (Google Lens) to be in English for proper functionality.

Health Check

Last Commit

11 months ago

Responsiveness

1 day

Pull Requests (30d)

0

Issues (30d)

1

Star History

2 stars in the last 30 days

Explore Similar Projects

multimodal-search-r1 by EvolvingLMMs-Lab

RL framework for LMMs to perform multimodal search

Created 10 months ago

Updated 4 months ago

OmniSearch by Alibaba-NLP

Multimodal RAG benchmark with a self-adaptive planning agent

Created 1 year ago

Updated 8 months ago

Qwen3-VL-Embedding by QwenLM

State-of-the-art multimodal embedding and reranking for information retrieval

Created 3 days ago

Updated 1 day ago

Starred by

Saining Xie

Saining Xie(Professor at NYU).

vstar by penghao-wu

PyTorch implementation for a multimodal LLM research paper

Created 2 years ago

Updated 2 years ago

Starred by

Jeff Hammerbacher

Jeff Hammerbacher(Cofounder of Cloudera).

stark by snap-stanford

LLM retrieval benchmark on textual/relational knowledge bases (NeurIPS 2024)

Created 1 year ago

Updated 1 week ago

Local_Pdf_Chat_RAG by weiwill88

RAG system for local PDF Q&A, aiding RAG beginners

Created 11 months ago

Updated 2 months ago

Starred by

John Resig

John Resig(Author of jQuery; Chief Software Architect at Khan Academy),

Chenlin Meng

Chenlin Meng(Cofounder of Pika), and

9 more.

clip-retrieval by rom1504

CLIP retrieval system for semantic search

Created 4 years ago

Updated 4 months ago

Starred by

Philip Howes

Philip Howes(Cofounder of Baseten).

evalscope by modelscope

Evaluation framework for large models

Created 2 years ago

Updated 3 days ago

Starred by

Lilian Weng

Lilian Weng(Cofounder of Thinking Machines Lab) and

Travis Fischer

Travis Fischer(Founder of Agentic).

lmms-eval by EvolvingLMMs-Lab

LMM evaluation toolkit for text, image, video, and audio tasks

Created 1 year ago

Updated 16 hours ago

Starred by

Zack Li

Zack Li(Cofounder of Nexa AI),

Xiaofan Luan

Xiaofan Luan(VP Engineering at Zilliz), and

1 more.

pyserini by castorini

Python toolkit for reproducible information retrieval research

Created 6 years ago

Updated 5 days ago

Starred by

Pawel Garbacki

Pawel Garbacki(Cofounder of Fireworks AI).

VLMEvalKit by open-compass

Evaluation toolkit for large multi-modality models (LMMs)

Created 2 years ago

Updated 2 days ago

Starred by

Chip Huyen

Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems").

RAG-Anything by HKUDS

All-in-one multimodal RAG system

Created 7 months ago

Updated 5 days ago

Feedback? Help us improve.