yt-fts  by NotJoeMartinez

CLI tool for YouTube full-text search and semantic analysis

created 2 years ago
1,722 stars

Top 25.3% on sourcepulse

GitHubView on GitHub
Project Summary

yt-fts provides command-line tools for searching YouTube channel transcripts, enabling users to find specific keywords or phrases within videos. It supports both traditional full-text search and advanced semantic search using OpenAI embeddings, making it valuable for researchers, content creators, and anyone needing to quickly locate information within extensive video archives.

How It Works

The tool leverages yt-dlp to download subtitles for specified YouTube channels, storing them in a SQLite database for efficient querying. For semantic search, it integrates with the OpenAI API to generate embeddings for transcripts, which are then managed by ChromaDB. This dual approach allows for precise keyword matching and contextually relevant semantic retrieval, with an LLM chat interface for interactive Q&A powered by the retrieved information.

Quick Start & Requirements

  • Install via pip: pip install yt-fts
  • Requires Python 3.x.
  • OpenAI API key is necessary for semantic search and LLM features (set as OPENAI_API_KEY environment variable or via --openai-api-key flag).
  • Browser cookies can be used for authentication (--cookies-from-browser).
  • Official documentation: https://github.com/NotJoeMartinez/yt-fts

Highlighted Details

  • Full-text search supports SQLite's enhanced query syntax (AND, OR, wildcards).
  • Semantic search enables context-aware retrieval and LLM-powered Q&A.
  • Video summarization feature provides time-stamped transcript snippets.
  • Supports parallel subtitle downloads (--jobs) for faster ingestion.

Maintenance & Community

The project is maintained by NotJoeMartinez. Community interaction channels are not explicitly listed in the README.

Licensing & Compatibility

The project appears to be licensed under the MIT License, allowing for commercial use and integration with closed-source projects.

Limitations & Caveats

Semantic search and LLM features are dependent on the OpenAI API, incurring potential costs. The update command currently only refreshes full-text search data, not semantic embeddings. Search strings for full-text search are limited to 40 characters.

Health Check
Last commit

1 month ago

Responsiveness

1 week

Pull Requests (30d)
5
Issues (30d)
5
Star History
41 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.