Gen-Searcher by tulerfeng

Agentic search framework for knowledge-grounded image generation

Created 2 months ago

347 stars

Top 80.0% on SourcePulse

Project Summary

Gen-Searcher introduces a multimodal deep research agent designed to enhance image generation by incorporating complex real-world knowledge. It addresses the need for more accurate and contextually relevant image synthesis by enabling agents to perform web searches, browse evidence, reason across multiple sources, and retrieve visual references before generation. This project is targeted at researchers and developers in AI and computer vision, offering a novel approach to grounding image generation in real-world information.

How It Works

Gen-Searcher trains a multimodal deep research agent for image generation requiring complex real-world knowledge. Its core approach involves an agentic search loop: web search, evidence browsing, multi-source reasoning, and visual reference retrieval, all preceding image synthesis. This enables more accurate and up-to-date results by grounding generation in real-world context, a novel capability for such agents. The project introduces dedicated training datasets (Gen-Searcher-SFT-10k, Gen-Searcher-RL-6k) and a new benchmark (KnowGen) to facilitate this research.

Quick Start & Requirements

Primary installation involves cloning the repository and setting up two distinct Conda environments for SFT and RL training, each requiring specific pip installations for libraries like LLaMA-Factory, rllm, and vllm. Key prerequisites include Python 3.11, substantial GPU resources (minimum 8x 80GB for SFT, 4x 80GB for RL), and API keys for services like Serper and Jina. Official project pages, paper, models, and datasets are available on Hug

Gen-Searcher by tulerfeng

Explore Similar Projects

metaquery by facebookresearch

InternVL-U by OpenGVLab

SEED-X by AILab-CVC

OmniGen2 by VectorSpaceLab

SEED by AILab-CVC

Liquid by FoundationVision

deepgen by deepgenteam

MiniGPT-5 by UCSB-AI

BLIP3o by JiuhaiChen

RPG-DiffusionMaster by YangLing0818

NExT-GPT by NExT-GPT

Janus by deepseek-ai