hn_summary  by jiggy-ai

LLM-powered summarization bot for Hacker News

Created 3 years ago
250 stars

Top 100.0% on SourcePulse

GitHubView on GitHub
Project Summary

Summary This project offers an open-source bot designed to summarize top stories from Hacker News using OpenAI's gpt-3.5-turbo model, automatically posting the results to a Telegram channel. It serves as a practical demonstration of current LLM capabilities for content distillation and provides a platform for further experimentation, benefiting users seeking curated news digests and developers exploring LLM applications.

How It Works The bot monitors the Hacker News API (/topstories.json) for new top stories. Upon detection, it employs a basic, error-prone HTML parser to extract text content from the story's URL. This extracted text is then processed by gpt-3.5-turbo for summarization. While prompt engineering is used to mitigate issues with paywalled or difficult-to-parse sites, text extraction remains fragile. The final output (title, summary, URL) is posted to a designated Telegram channel, with summaries for non-PDF/HTML links or specific commercial sites like Reddit/Twitter being non-functional.

Quick Start & Requirements Setup necessitates configuring several environment variables for authentication and connection: OPENAI_API_KEY, PostgreSQL credentials (HNSUM_POSTGRES_HOST, HNSUM_POSTGRES_USER, HNSUM_POSTGRES_PASS), and Telegram API details (HNSUM_TELEGRAM_API_TOKEN, HNSUM_TELEGRAM_CHANNEL_ID). A running PostgreSQL database instance is required for state management. Users can observe the bot's output live on the HN Summary Telegram channel and view curated summaries on the project's website.

Highlighted Details

  • Demonstrates practical application of large language models like GPT-3.5-turbo for automated content summarization.
  • Offers a foundation for experimenting with advanced LLM features, including potential integration of semantic search capabilities.
  • Aims to broaden access to and understanding of top Hacker News content through automated distillation.

Maintenance & Community The project welcomes community contributions through issues and pull requests. Direct feedback or inquiries can be sent to the maintainer @wskish via Telegram or Twitter. The operational status and output of the bot can be monitored on its Telegram channel.

Licensing & Compatibility The specific open-source license governing this project is not detailed in the provided README. Potential adopters should investigate and confirm licensing terms, particularly concerning commercial use or integration into proprietary systems.

Limitations & Caveats The core summarization relies on LLMs prone to factual hallucinations, often presented with authoritative tone. The text extraction module is basic and error-prone, especially with paywalled sites or non-standard HTML, leading to potentially fanciful summaries based on titles alone. Text extraction for Reddit, Twitter, and other commercial links is explicitly broken. Telegram message length limits (4K characters) result in truncated output.

Health Check
Last Commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
1 stars in the last 30 days

Explore Similar Projects

Starred by Tobi Lutke Tobi Lutke(Cofounder of Shopify), Dirk Englund Dirk Englund(MIT EECS Professor and Cofounder of Axiomatic AI), and
25 more.

firecrawl by firecrawl

2.4%
82k
API service for turning websites into LLM-ready data
Created 1 year ago
Updated 1 day ago
Feedback? Help us improve.