jina-cli  by geekjourneyx

Web content parser for AI Agents

Created 3 weeks ago

New!

273 stars

Top 94.6% on SourcePulse

GitHubView on GitHub
Project Summary

Summary

This project provides jina-cli, a lightweight command-line interface tool designed to wrap the Jina AI Reader API. It simplifies fetching and parsing content from any URL into formats readily usable by AI agents and LLMs, such as Markdown, plain text, or HTML. The tool is particularly beneficial for processing complex web pages like X (Twitter) posts, blogs, and news sites, enabling efficient data ingestion for AI workflows.

How It Works

The jina-cli tool acts as an intermediary, abstracting the complexities of web scraping and content parsing by leveraging the Jina AI Reader API. It offers two primary functionalities: read for extracting and converting URL content into structured formats, and search for performing AI-powered web searches. This approach allows users to quickly obtain clean, LLM-friendly data from diverse web sources without deep technical knowledge of web scraping intricacies.

Quick Start & Requirements

  • Installation: Three methods are available:
    • OpenClaw Skill: For local AI assistants, install by copying a skill file (SKILL.md) into the OpenClaw workspace. No separate CLI binary installation is needed.
    • Claude Code Skill: For AI-assisted development within Claude Code. Requires Node.js (v18.0.0+) and installation via npx skills add https://github.com/geekjourneyx/jina-cli --skill jina-cli.
    • CLI Binary: For terminal and scripting use. A one-line installer script (curl ... | bash) is provided for Linux/macOS, automatically downloading and installing the binary. Manual installation from pre-compiled binaries or building from source (go build) is also supported.
  • Prerequisites: Node.js (for Claude Code Skill), Go (for building from source).
  • Verification: Use jina --version to confirm installation and jina read --url "https://example.com" for basic functionality testing.
  • Documentation: Installation scripts and detailed command references are included in the repository.

Highlighted Details

  • Content Extraction: The jina read command supports outputting content in JSON (default), Markdown, or plain text formats, with options to save directly to files or process multiple URLs from a specified file.
  • Web Search: The jina search command enables AI-powered web searches, allowing users to specify search queries, restrict results to particular websites (--site), and limit the number of returned results (--limit).
  • Configuration Management: A configuration file (~/.jina-reader/config.yaml) allows customization of API endpoints, request timeouts, and default output formats. Settings can be managed via jina config commands.
  • Advanced Parsing & SPA Handling: Advanced options include bypassing cache (--no-cache), using proxy servers (--proxy), extracting specific HTML elements via CSS selectors (--target-selector), waiting for elements to load (--wait-for-selector), handling Single Page Applications (SPAs) with POST requests (--post), and passing custom cookies (--cookie).
  • API Key Support: Users can provide an API key via configuration, environment variables, or command-line arguments to access higher rate limits from the Jina AI Reader API.

Maintenance & Community

The project is primarily maintained by a single author, geekjourneyx, with links provided to their X (Twitter) and WeChat official accounts for engagement. No other core contributors, sponsorships, or dedicated community channels (e.g., Discord, Slack) are explicitly detailed in the README.

Licensing & Compatibility

The project is licensed under the MIT License. This permissive license allows for broad usage, including modification, distribution, and commercial application, without significant restrictions, making it compatible with most integration scenarios.

Limitations & Caveats

The core functionality relies on the external Jina AI Reader API, making the tool dependent on its availability and performance. The "Skill" based installation methods are tied to specific AI assistant environments (OpenClaw, Claude Code), while the CLI binary offers broader system access. Users requiring higher request rates will need to obtain and configure a Jina AI API key.

Health Check
Last Commit

3 weeks ago

Responsiveness

Inactive

Pull Requests (30d)
1
Issues (30d)
3
Star History
277 stars in the last 26 days

Explore Similar Projects

Starred by Will Brown Will Brown(Research Lead at Prime Intellect), Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), and
16 more.

stagehand by browserbase

0.5%
22k
AI browser automation framework for production
Created 2 years ago
Updated 17 hours ago
Feedback? Help us improve.