multi-modal-researcher by langchain-ai

Research workflow for topic analysis and podcast generation

Created 8 months ago

587 stars

Top 55.4% on SourcePulse

View on GitHub

3 Experts Love This Project

Paige Bailey

DevRel Lead at Google DeepMind

Philipp Schmid

DevRel at Google DeepMind

Omar Sanseviero

DevRel at Google DeepMind

Project Summary

This project provides an automated research and podcast generation workflow leveraging LangGraph and Google's Gemini 2.5 models. It's designed for users who need to quickly synthesize information from web searches and YouTube videos into a written report and an audio podcast, featuring multi-speaker text-to-speech.

How It Works

The system orchestrates a LangGraph workflow that integrates Gemini's native capabilities for web search and YouTube analysis. It starts with a research topic and an optional YouTube URL. A search node queries Google, and if a video URL is provided, a video analysis node processes it. The gathered insights are then synthesized into a markdown report and a podcast script, which is converted to audio using a multi-speaker text-to-speech model. This approach streamlines complex information gathering and content creation into a single, automated process.

Quick Start & Requirements

Install: Clone the repository, set up the .env file with your GEMINI_API_KEY, install uv package manager, and run uvx --refresh --from "langgraph-cli[inmem]" --with-editable . --python 3.11 langgraph dev --allow-blocking.
Prerequisites: Python 3.11+, uv package manager, Google Gemini API key.
Access: The application is accessible via http://127.0.0.1:2024 for API and https://smith.langchain.com/studio/?baseUrl=http://127.0.0.1:2024 for the Studio UI.
Docs: API Docs available at http://127.0.0.1:2024/docs.

Highlighted Details

Leverages Gemini 2.5's native YouTube understanding and Google Search tool integration.
Generates a research report with citations and a 2-speaker podcast with distinct voices.
Configurable models and temperature settings for search, synthesis, video analysis, and TTS.
Workflow defined using LangGraph, with nodes for search, video analysis (conditional), report creation, and podcast creation.

Maintenance & Community

The project is part of the langchain-ai organization. Further community and roadmap details are not explicitly provided in the README.

Licensing & Compatibility

The project is licensed under the MIT License, permitting commercial use and integration with closed-source projects.

Limitations & Caveats

The project relies heavily on Google's Gemini API, requiring an active API key. Specific performance characteristics or potential rate limits are not detailed. The "preview" status of the TTS model might indicate potential instability or changes.

Health Check

Last Commit

8 months ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

3 stars in the last 30 days