awesome-data-agents  by HKUSTDial

Data agents taxonomy and curated research list

Created 3 months ago
295 stars

Top 89.8% on SourcePulse

GitHubView on GitHub
Project Summary

This project addresses the terminological ambiguity and inconsistent adoption of "data agents" by introducing a systematic, hierarchical taxonomy (L0-L5) based on autonomy. It aims to clarify capability boundaries and responsibility allocation for systems orchestrating Data + AI ecosystems, benefiting researchers and practitioners by providing a structured overview of advancements and a framework for evaluating agent capabilities.

How It Works

This project introduces a novel 6-level taxonomy (L0-L5) for data agents, classifying them by autonomy and human/agent roles, inspired by driving automation standards. It leverages Large Language Models (LLMs) to drive the progression from manual operations to fully autonomous, generative agents. The approach systematically reviews and organizes existing research by autonomy level and data-task domains (management, preparation, analysis), highlighting critical evolutionary leaps and technical gaps, particularly the transition from procedural execution (L2) to autonomous orchestration (L3).

Quick Start & Requirements

The provided README does not contain specific installation instructions, primary run commands, or detailed prerequisites beyond the general context of Large Language Models (LLMs) and data science. Links to the survey paper (arXiv:2510.23587) and slides are available.

Highlighted Details

  • Novel 6-level taxonomy (L0-L5) for data agents, based on autonomy and human/agent roles.
  • Curated, continuously updated paper list categorized by autonomy level and data tasks.
  • Focus on LLMs as enablers for increasingly autonomous data agents.
  • Analysis of critical evolutionary leaps and technical gaps, especially L2-to-L3 transition.

Maintenance & Community

The README does not provide information regarding project maintainers, community channels (e.g., Discord, Slack), or a roadmap.

Licensing & Compatibility

The README does not specify a software license or provide compatibility notes for commercial use.

Limitations & Caveats

The project acknowledges that higher levels of autonomy (L4-L5) are largely aspirational, and a fully realized L3 data agent is not yet present, with current efforts categorized as "Proto-L3." The focus is on surveying existing research rather than providing a runnable software agent.

Health Check
Last Commit

2 weeks ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
119 stars in the last 30 days

Explore Similar Projects

Feedback? Help us improve.