yt-fts  by NotJoeMartinez

CLI tool for YouTube full-text search and semantic analysis

Created 2 years ago
1,741 stars

Top 24.6% on SourcePulse

GitHubView on GitHub
Project Summary

yt-fts provides command-line tools for searching YouTube channel transcripts, enabling users to find specific keywords or phrases within videos. It supports both traditional full-text search and advanced semantic search using OpenAI embeddings, making it valuable for researchers, content creators, and anyone needing to quickly locate information within extensive video archives.

How It Works

The tool leverages yt-dlp to download subtitles for specified YouTube channels, storing them in a SQLite database for efficient querying. For semantic search, it integrates with the OpenAI API to generate embeddings for transcripts, which are then managed by ChromaDB. This dual approach allows for precise keyword matching and contextually relevant semantic retrieval, with an LLM chat interface for interactive Q&A powered by the retrieved information.

Quick Start & Requirements

  • Install via pip: pip install yt-fts
  • Requires Python 3.x.
  • OpenAI API key is necessary for semantic search and LLM features (set as OPENAI_API_KEY environment variable or via --openai-api-key flag).
  • Browser cookies can be used for authentication (--cookies-from-browser).
  • Official documentation: https://github.com/NotJoeMartinez/yt-fts

Highlighted Details

  • Full-text search supports SQLite's enhanced query syntax (AND, OR, wildcards).
  • Semantic search enables context-aware retrieval and LLM-powered Q&A.
  • Video summarization feature provides time-stamped transcript snippets.
  • Supports parallel subtitle downloads (--jobs) for faster ingestion.

Maintenance & Community

The project is maintained by NotJoeMartinez. Community interaction channels are not explicitly listed in the README.

Licensing & Compatibility

The project appears to be licensed under the MIT License, allowing for commercial use and integration with closed-source projects.

Limitations & Caveats

Semantic search and LLM features are dependent on the OpenAI API, incurring potential costs. The update command currently only refreshes full-text search data, not semantic embeddings. Search strings for full-text search are limited to 40 characters.

Health Check
Last Commit

1 month ago

Responsiveness

1 week

Pull Requests (30d)
0
Issues (30d)
0
Star History
11 stars in the last 30 days

Explore Similar Projects

Starred by Jared Palmer Jared Palmer(Ex-VP AI at Vercel; Founder of Turborepo; Author of Formik, TSDX) and Andrew Kane Andrew Kane(Author of pgvector).

chatgpt-pgvector by gannonh

0%
938
Domain-specific chat completions app
Created 2 years ago
Updated 2 years ago
Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), Zack Li Zack Li(Cofounder of Nexa AI), and
12 more.

search_with_lepton by leptonai

0.0%
8k
Conversational search engine demo
Created 1 year ago
Updated 2 weeks ago
Feedback? Help us improve.