Discover and explore top open-source AI tools and projects—updated daily.
teng-linFull-content web fetcher for AI agents and content workflows
Top 95.4% on SourcePulse
Summary
agent-fetch tackles the challenge of retrieving complete, clean web content for AI agents and content workflows, often hindered by server-side fingerprinting that truncates standard HTTP tool responses. It offers a local, API-key-free solution employing browser impersonation and multiple extraction strategies to deliver full article text with preserved structure and links. This benefits AI agents, RAG pipelines, and LLM applications needing rich, accurate web data beyond summaries.
How It Works
The core approach uses customizable TLS fingerprints for browser impersonation to evade server detection. Upon connection, it runs multiple extraction strategies in parallel—Readability, text density, JSON-LD, framework-specific extractors (Next.js, RSC, WP API), and CSS selectors. This multi-pronged method ensures comprehensive content retrieval across diverse architectures. The most complete result is selected, with metadata intelligently composed from the best source, providing a robust alternative to less capable fetchers or cloud APIs.
Quick Start & Requirements
Installation is via npm install @teng-lin/agent-fetch or direct execution with npx agent-fetch <url>. The tool operates locally, requiring Node.js/npm. No specific hardware, GPU, or API keys are mandated. AI agent integration is supported via npx skills add teng-lin/agent-fetch.
Highlighted Details
chrome-143, ios-safari-18) for browser impersonation.--include, --exclude), concurrency limits, and rate limiting (--delay).--cookie, --cookie-file).Maintenance & Community
The provided README lacks specific details on maintainers, community channels (Discord/Slack), sponsorships, or a public roadmap.
Licensing & Compatibility
Released under the permissive MIT license, allowing broad compatibility for commercial and closed-source applications without significant restrictions.
Limitations & Caveats
Users must comply with website Terms of Service and robots.txt; the tool grants no permissions or bypasses access controls. Legal responsibility for copyright and data protection rests with the user. Extraction success may vary on sites employing highly sophisticated anti-scraping techniques beyond TLS fingerprinting.
1 month ago
Inactive