Marco-DeepResearch by ATH-MaaS

Frameworks and benchmarks for challenging agentic search

Created 8 months ago

316 stars

Top 85.3% on SourcePulse

Project Summary

This project addresses critical limitations in AI agents for real-world applications, focusing on domain-specific reasoning, hierarchical rule application, and large-scale information seeking. It introduces benchmarks and frameworks like HSCodeComp and DeepWideSearch to evaluate and advance agent capabilities, benefiting researchers and developers aiming for more robust AI agents.

How It Works

The initiative introduces several key components: HSCodeComp for hierarchical rule application, DeepWideSearch for deep-and-wide information seeking, Table-as-Search for a hierarchical multi-agent framework formalizing search as table completion, and UMEM for a self-evolving memory system that jointly optimizes memory extraction and management. These approaches aim to tackle complex decision-making, broad exploration with deep reasoning, structured information synthesis, and generalizable long-term memory.

Quick Start & Requirements

Python: 3.10+
Installation: Project-specific pip install -r requirements.txt for HSCodeComp, DeepWideSearch, Table-as-Search, and pip install -e . for UMEM.
Running Evaluations: Specific Python scripts and bash commands are provided for each sub-project.
Links:
- HSCodeComp README: Marco-DeepResearch-Family/HSCodeComp/README.md
- DeepWideSearch README: Marco-DeepResearch-Family/DeepWideSearch/README.md
- Table-as-Search README: Marco-DeepResearch-Family/Table-as-Search/README.md
- UMEM README: Marco-DeepResearch-Family/UMEM/README.md
- Family Overview: Marco-DeepResearch-Family/README.md

Highlighted Details

HSCodeComp benchmark: 95.0% human performance vs. 46.8% best AI.
DeepWideSearch benchmark: 414 avg. information units, 4.21 avg. reasoning depth.
Table-as-Search: Achieves 40%+ gain on hard BD tasks (Success Rate 15.2% → 55.8%).
UMEM: Compresses tuning from 3-5 days to ~10 minutes, outperforms human-optimized baselines by +11% on image auditing.
DeepWideSearch evaluation: A-MapReduce achieves 79.09% Core Entity Accuracy, 51.78% Column F1.

Maintenance & Community

Contributors: AI Business, Alibaba International Digital Commerce. Key contacts: Tian Lan, Longyue Wang.
Community Links: None explicitly provided in the README.

Licensing & Compatibility

License: Apache-2.0 License.
Compatibility: No explicit restrictions mentioned for commercial use or closed-source linking, standard for Apache-2.0.

Limitations & Caveats

HSCodeComp shows a significant gap remains versus human experts (95.0% vs. 65.0% for Marco Agent), indicating room for improvement. Datasets are constructed from publicly accessible data, with a disclaimer about potential copyright issues or improper content.

Health Check

Last Commit

6 days ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

5 stars in the last 30 days