feedgrab  by iBigQiang

Universal content fetcher and processor

Created 1 month ago
297 stars

Top 89.5% on SourcePulse

GitHubView on GitHub
Project Summary

feedgrab is a universal content aggregator designed to fetch, normalize, and digest content from over seven platforms, including WeChat, XHS, X/Twitter, YouTube, Bilibili, Telegram, and RSS feeds. It offers a structured output format for any given URL, catering to developers, researchers, and power users seeking to automate content collection and analysis. The project provides multiple usage modes: a Python CLI/library, Claude Code skills for AI-powered transcription and analysis, and an MCP server for exposing reading capabilities as tools.

How It Works

feedgrab employs a sophisticated, multi-layered architecture to maximize content retrieval success. For X/Twitter, it implements a six-tier fallback strategy, starting with GraphQL API (requiring cookies for full data), progressing through FxTwitter, Syndication, oEmbed, Jina Reader, and finally Playwright. Other platforms leverage specific APIs (e.g., YouTube's InnerTube API, Bilibili's API, GitHub's REST API) or browser automation (Playwright) when necessary. Content is processed through stages like text extraction (Jina Reader), video/audio transcription (Whisper via Groq), and unified data structuring before outputting as Markdown files with YAML front matter.

Quick Start & Requirements

Installation is recommended via pip: pip install git+https://github.com/iBigQiang/feedgrab.git. Optional dependencies for enhanced features like stealth browsing ([stealth]), Twitter search ([twitter]), or XHS API ([xhs]) can be installed. For video/audio transcription, yt-dlp and ffmpeg are required, along with a GROQ_API_KEY. A guided setup is available via the feedgrab setup command, which checks environment, configures settings, detects browser UA, handles platform logins (e.g., X/Twitter, XHS via browser or CDP), and enables features. Local installation from source requires cloning the repository and running pip install -e ".[all]".

Highlighted Details

  • Supports 7+ platforms with robust fallback mechanisms, especially for X/Twitter's complex API landscape.
  • Integrates with Claude Code skills for AI-powered video/podcast transcription and content analysis.
  • Outputs content into structured Markdown files compatible with Obsidian, including rich YAML front matter with metadata like likes, views, published dates, and source URLs.
  • Offers advanced CLI features for batch operations, including fetching user timelines, bookmarks, search results, and public WeChat articles, with options for filtering and merging.
  • Includes a six-tier fallback strategy for X/Twitter, aiming for maximum data retrieval even with API changes.

Maintenance & Community

The project is maintained by @iBigQiang and is a fusion and upgrade of x-reader by @runes_leo and baoyu-danger-x-to-markdown by @dotey. Specific community links (Discord/Slack) are not detailed in the README.

Licensing & Compatibility

The project is released under the MIT License, which permits commercial use and integration into closed-source projects, subject to the license terms.

Limitations & Caveats

Full functionality, particularly for X/Twitter's GraphQL API and advanced features like bookmark fetching, requires obtaining and configuring authentication cookies. Video transcription via Whisper necessitates a Groq API key. Some advanced browser-based features might require specific browser configurations (e.g., Chrome CDP for cookie extraction). The reliance on third-party APIs means functionality can be subject to changes or deprecations by the platform providers.

Health Check
Last Commit

1 day ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
1
Star History
274 stars in the last 30 days

Explore Similar Projects

Feedback? Help us improve.