hn_summary by jiggy-ai

LLM-powered summarization bot for Hacker News

Created 3 years ago

250 stars

Top 100.0% on SourcePulse

Project Summary

Summary This project offers an open-source bot designed to summarize top stories from Hacker News using OpenAI's gpt-3.5-turbo model, automatically posting the results to a Telegram channel. It serves as a practical demonstration of current LLM capabilities for content distillation and provides a platform for further experimentation, benefiting users seeking curated news digests and developers exploring LLM applications.

How It Works The bot monitors the Hacker News API (/topstories.json) for new top stories. Upon detection, it employs a basic, error-prone HTML parser to extract text content from the story's URL. This extracted text is then processed by gpt-3.5-turbo for summarization. While prompt engineering is used to mitigate issues with paywalled or difficult-to-parse sites, text extraction remains fragile. The final output (title, summary, URL) is posted to a designated Telegram channel, with summaries for non-PDF/HTML links or specific commercial sites like Reddit/Twitter being non-functional.

Quick Start & Requirements Setup necessitates configuring several environment variables for authentication and connection: OPENAI_API_KEY, PostgreSQL credentials (HNSUM_POSTGRES_HOST, HNSUM_POSTGRES_USER, HNSUM_POSTGRES_PASS), and Telegram API details (HNSUM_TELEGRAM_API_TOKEN, HNSUM_TELEGRAM_CHANNEL_ID). A running PostgreSQL database instance is required for state management. Users can observe the bot's output live on the HN Summary Telegram channel and view curated summaries on the project's website.

Highlighted Details

Demonstrates practical application of large language models like GPT-3.5-turbo for automated content summarization.
Offers a foundation for experimenting with advanced LLM features, including potential integration of semantic search capabilities.
Aims to broaden access to and understanding of top Hacker News content through automated distillation.

Maintenance & Community The project welcomes community contributions through issues and pull requests. Direct feedback or inquiries can be sent to the maintainer @wskish via Telegram or Twitter. The operational status and output of the bot can be monitored on its Telegram channel.

Licensing & Compatibility The specific open-source license governing this project is not detailed in the provided README. Potential adopters should investigate and confirm licensing terms, particularly concerning commercial use or integration into proprietary systems.

Limitations & Caveats The core summarization relies on LLMs prone to factual hallucinations, often presented with authoritative tone. The text extraction module is basic and error-prone, especially with paywalled sites or non-standard HTML, leading to potentially fanciful summaries based on titles alone. Text extraction for Reddit, Twitter, and other commercial links is explicitly broken. Telegram message length limits (4K characters) result in truncated output.

hn_summary by jiggy-ai

Explore Similar Projects

parse_hub_bot by z-mio

insights-bot by nekomeowww

wexin-read-mcp by Bwkyd

sum4all by fatwang2

classifai by 10up

hacker-news-digest by polyrabbit

auto-news by finaldie

feiyangdigital-bot by youshandefeiyang

tap4-ai-crawler by 6677-ai

ai-trend-publish by OpenAISpace

social-media-agent by langchain-ai

firecrawl by firecrawl