openclaw-search-skills  by blessonism

Deep search and content extraction toolkit

Created 1 month ago
363 stars

Top 77.5% on SourcePulse

GitHubView on GitHub
Project Summary

This repository provides OpenClaw skills for advanced, multi-source deep search and content extraction. It automates structured research by aggregating results from multiple search engines, intelligently assessing query intent, and enabling deep dives into citation chains. The skills benefit users requiring comprehensive information retrieval and contextual research beyond standard search capabilities.

How It Works

The search-layer skill orchestrates parallel searches across Brave, Exa, Tavily, and Grok, applying intent-aware scoring and deduplication. It supports a "Retrieval path" for general queries and a "Research lane" for deeper analysis. A key innovation is the "Thread-pulling path," which follows citation links from initial results (e.g., GitHub issues, forum posts, articles) to extract deeper context. content-extract and mineru-extract skills convert extracted URLs into clean Markdown, with MinerU handling anti-scraping sites.

Quick Start & Requirements

Installation is recommended via the OpenClaw agent by instructing it to install the skill from the provided GitHub URL. Manual installation involves cloning and symlinking to your OpenClaw skills directory. Requirements include Python 3.10+, the OpenClaw runtime, and API keys for Exa and Tavily. Grok API keys are optional but enhance search; a MinerU token is needed for anti-scraping sites. Dependencies include requests, with trafilatura, beautifulsoup4, and lxml required for v3.0 thread-pulling features.

Highlighted Details

  • Multi-Source Search: Integrates Brave, Exa, Tavily, and Grok for broad search coverage.
  • Intent-Aware Search: Dynamically adjusts strategy and scoring based on query intent (e.g., factual, status, comparison).
  • Thread-Pulling Path: Enables deep contextual research by following citation chains across platforms like GitHub, Hacker News, and Reddit.
  • Content Extraction: Converts web pages and documents (PDF, Office) into clean Markdown.

Maintenance & Community

No specific details regarding maintainers, community channels, or project roadmap were found in the provided README content.

Licensing & Compatibility

The project is released under the MIT license, which is permissive and generally allows for commercial use and integration into closed-source projects.

Limitations & Caveats

Core functionality relies on obtaining and configuring API keys for external search services (Exa, Tavily). Advanced thread-pulling features require additional Python dependencies. Extraction from certain anti-scraping websites necessitates a MinerU token.

Health Check
Last Commit

1 week ago

Responsiveness

Inactive

Pull Requests (30d)
3
Issues (30d)
2
Star History
310 stars in the last 30 days

Explore Similar Projects

Feedback? Help us improve.