Discover and explore top open-source AI tools and projects—updated daily.
geekjourneyxWeb content parser for AI Agents
New!
Top 94.6% on SourcePulse
Summary
This project provides jina-cli, a lightweight command-line interface tool designed to wrap the Jina AI Reader API. It simplifies fetching and parsing content from any URL into formats readily usable by AI agents and LLMs, such as Markdown, plain text, or HTML. The tool is particularly beneficial for processing complex web pages like X (Twitter) posts, blogs, and news sites, enabling efficient data ingestion for AI workflows.
How It Works
The jina-cli tool acts as an intermediary, abstracting the complexities of web scraping and content parsing by leveraging the Jina AI Reader API. It offers two primary functionalities: read for extracting and converting URL content into structured formats, and search for performing AI-powered web searches. This approach allows users to quickly obtain clean, LLM-friendly data from diverse web sources without deep technical knowledge of web scraping intricacies.
Quick Start & Requirements
SKILL.md) into the OpenClaw workspace. No separate CLI binary installation is needed.npx skills add https://github.com/geekjourneyx/jina-cli --skill jina-cli.curl ... | bash) is provided for Linux/macOS, automatically downloading and installing the binary. Manual installation from pre-compiled binaries or building from source (go build) is also supported.jina --version to confirm installation and jina read --url "https://example.com" for basic functionality testing.Highlighted Details
jina read command supports outputting content in JSON (default), Markdown, or plain text formats, with options to save directly to files or process multiple URLs from a specified file.jina search command enables AI-powered web searches, allowing users to specify search queries, restrict results to particular websites (--site), and limit the number of returned results (--limit).~/.jina-reader/config.yaml) allows customization of API endpoints, request timeouts, and default output formats. Settings can be managed via jina config commands.--no-cache), using proxy servers (--proxy), extracting specific HTML elements via CSS selectors (--target-selector), waiting for elements to load (--wait-for-selector), handling Single Page Applications (SPAs) with POST requests (--post), and passing custom cookies (--cookie).Maintenance & Community
The project is primarily maintained by a single author, geekjourneyx, with links provided to their X (Twitter) and WeChat official accounts for engagement. No other core contributors, sponsorships, or dedicated community channels (e.g., Discord, Slack) are explicitly detailed in the README.
Licensing & Compatibility
The project is licensed under the MIT License. This permissive license allows for broad usage, including modification, distribution, and commercial application, without significant restrictions, making it compatible with most integration scenarios.
Limitations & Caveats
The core functionality relies on the external Jina AI Reader API, making the tool dependent on its availability and performance. The "Skill" based installation methods are tied to specific AI assistant environments (OpenClaw, Claude Code), while the CLI binary offers broader system access. Users requiring higher request rates will need to obtain and configure a Jina AI API key.
3 weeks ago
Inactive
browserbase