Discover and explore top open-source AI tools and projects—updated daily.
transitive-bullshitSemantic search tool for YouTube playlists
Top 59.0% on SourcePulse
This project provides a semantic search engine for YouTube playlists, enabling users to find specific moments within videos using natural language queries. It's designed for podcast listeners and content creators who want to improve content discovery and access.
How It Works
The system leverages OpenAI's text-embedding-ada-002 model to generate 1536-dimensional embeddings for chunks of YouTube video transcripts. These embeddings capture semantic meaning beyond keywords. A hosted Pinecone vector database is used for efficient k-NN searches across these embeddings, allowing for high-accuracy retrieval of relevant video segments. Transcripts are obtained via HTML scraping, with a TODO to integrate Whisper for improved accuracy.
Quick Start & Requirements
npm installnpx tsx src/bin/resolve-yt-playlist.tsnpx tsx src/bin/process-yt-playlist.tsnpx tsx src/bin/query.tsnpx tsx src/bin/generate-thumbnails.ts (approx. 2 hours)npm run devHighlighted Details
text-embedding-ada-002 for deep semantic understanding.Maintenance & Community
Licensing & Compatibility
Limitations & Caveats
The project relies on HTML scraping for YouTube transcripts, which may be fragile and miss some episodes lacking automated captions. A TODO item suggests using Whisper for more robust transcription. Thumbnail generation is resource-intensive and time-consuming.
2 years ago
Inactive
Dicklesworthstone
gannonh
freedmand
oramasearch