Discover and explore top open-source AI tools and projects—updated daily.
jiggy-aiLLM-powered summarization bot for Hacker News
Top 100.0% on SourcePulse
Summary
This project offers an open-source bot designed to summarize top stories from Hacker News using OpenAI's gpt-3.5-turbo model, automatically posting the results to a Telegram channel. It serves as a practical demonstration of current LLM capabilities for content distillation and provides a platform for further experimentation, benefiting users seeking curated news digests and developers exploring LLM applications.
How It Works
The bot monitors the Hacker News API (/topstories.json) for new top stories. Upon detection, it employs a basic, error-prone HTML parser to extract text content from the story's URL. This extracted text is then processed by gpt-3.5-turbo for summarization. While prompt engineering is used to mitigate issues with paywalled or difficult-to-parse sites, text extraction remains fragile. The final output (title, summary, URL) is posted to a designated Telegram channel, with summaries for non-PDF/HTML links or specific commercial sites like Reddit/Twitter being non-functional.
Quick Start & Requirements
Setup necessitates configuring several environment variables for authentication and connection: OPENAI_API_KEY, PostgreSQL credentials (HNSUM_POSTGRES_HOST, HNSUM_POSTGRES_USER, HNSUM_POSTGRES_PASS), and Telegram API details (HNSUM_TELEGRAM_API_TOKEN, HNSUM_TELEGRAM_CHANNEL_ID). A running PostgreSQL database instance is required for state management. Users can observe the bot's output live on the HN Summary Telegram channel and view curated summaries on the project's website.
Highlighted Details
Maintenance & Community
The project welcomes community contributions through issues and pull requests. Direct feedback or inquiries can be sent to the maintainer @wskish via Telegram or Twitter. The operational status and output of the bot can be monitored on its Telegram channel.
Licensing & Compatibility The specific open-source license governing this project is not detailed in the provided README. Potential adopters should investigate and confirm licensing terms, particularly concerning commercial use or integration into proprietary systems.
Limitations & Caveats The core summarization relies on LLMs prone to factual hallucinations, often presented with authoritative tone. The text extraction module is basic and error-prone, especially with paywalled sites or non-standard HTML, leading to potentially fanciful summaries based on titles alone. Text extraction for Reddit, Twitter, and other commercial links is explicitly broken. Telegram message length limits (4K characters) result in truncated output.
1 year ago
Inactive
firecrawl