Marco-DeepResearch  by AIDC-AI

Frameworks and benchmarks for challenging agentic search

Created 5 months ago
286 stars

Top 91.6% on SourcePulse

GitHubView on GitHub
Project Summary

This project addresses critical limitations in AI agents for real-world applications, focusing on domain-specific reasoning, hierarchical rule application, and large-scale information seeking. It introduces benchmarks and frameworks like HSCodeComp and DeepWideSearch to evaluate and advance agent capabilities, benefiting researchers and developers aiming for more robust AI agents.

How It Works

The initiative introduces several key components: HSCodeComp for hierarchical rule application, DeepWideSearch for deep-and-wide information seeking, Table-as-Search for a hierarchical multi-agent framework formalizing search as table completion, and UMEM for a self-evolving memory system that jointly optimizes memory extraction and management. These approaches aim to tackle complex decision-making, broad exploration with deep reasoning, structured information synthesis, and generalizable long-term memory.

Quick Start & Requirements

  • Python: 3.10+
  • Installation: Project-specific pip install -r requirements.txt for HSCodeComp, DeepWideSearch, Table-as-Search, and pip install -e . for UMEM.
  • Running Evaluations: Specific Python scripts and bash commands are provided for each sub-project.
  • Links:
    • HSCodeComp README: Marco-DeepResearch-Family/HSCodeComp/README.md
    • DeepWideSearch README: Marco-DeepResearch-Family/DeepWideSearch/README.md
    • Table-as-Search README: Marco-DeepResearch-Family/Table-as-Search/README.md
    • UMEM README: Marco-DeepResearch-Family/UMEM/README.md
    • Family Overview: Marco-DeepResearch-Family/README.md

Highlighted Details

  • HSCodeComp benchmark: 95.0% human performance vs. 46.8% best AI.
  • DeepWideSearch benchmark: 414 avg. information units, 4.21 avg. reasoning depth.
  • Table-as-Search: Achieves 40%+ gain on hard BD tasks (Success Rate 15.2% → 55.8%).
  • UMEM: Compresses tuning from 3-5 days to ~10 minutes, outperforms human-optimized baselines by +11% on image auditing.
  • DeepWideSearch evaluation: A-MapReduce achieves 79.09% Core Entity Accuracy, 51.78% Column F1.

Maintenance & Community

  • Contributors: AI Business, Alibaba International Digital Commerce. Key contacts: Tian Lan, Longyue Wang.
  • Community Links: None explicitly provided in the README.

Licensing & Compatibility

  • License: Apache-2.0 License.
  • Compatibility: No explicit restrictions mentioned for commercial use or closed-source linking, standard for Apache-2.0.

Limitations & Caveats

HSCodeComp shows a significant gap remains versus human experts (95.0% vs. 65.0% for Marco Agent), indicating room for improvement. Datasets are constructed from publicly accessible data, with a disclaimer about potential copyright issues or improper content.

Health Check
Last Commit

5 days ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
33 stars in the last 30 days

Explore Similar Projects

Feedback? Help us improve.