DCI-Agent-Lite  by DCI-Agent

Rethinking retrieval for agentic search via direct corpus interaction

Created 1 month ago
290 stars

Top 90.7% on SourcePulse

GitHubView on GitHub
Project Summary

Summary

DCI-Agent-Lite introduces a Direct Corpus Interaction (DCI) paradigm for agentic search and deep research on personal knowledge bases. It targets users needing to query private data without external services or pre-built indices. The benefit is a simplified, high-resolution retrieval system treating the corpus as an open research environment, yielding improved accuracy.

How It Works

DCI agents search raw files directly using terminal tools (rg, find, sed), bypassing semantic retrievers and index builds. This "zero-index retrieval" enables immediate operation, fine-grained control, and free search primitive composition. Built on Pi with bash tools and lightweight context management, it balances minimal complexity with robust long-horizon research.

Quick Start & Requirements

  • Primary install: bash setup.sh (Unix/macOS). Manual setup requires uv, ripgrep, npm, Python deps, API keys (.env), and cloning pi-mono (https://github.com/jdf-prog/pi-mono.git).
  • Prerequisites: Unix/macOS, uv, ripgrep, npm, Python, LLM API keys (OpenAI/Anthropic). Specific models (e.g., gpt-5.4-nano) are needed for benchmark performance.
  • Links: assets/docs/setup.md, Hugging Face datasets.
  • TUI Command: uv run dci-agent-lite --terminal --provider openai --model gpt-5.4-nano --cwd "corpus/wiki_corpus" --extra-arg="--thinking high"

Highlighted Details

  • Privacy: Operates locally on private corpora without external service reliance.
  • Zero-index retrieval: Uses terminal commands directly on raw files, eliminating embeddings/index builds for immediate use.
  • Performance: Achieves 62.9% accuracy on BrowseComp-Plus with gpt-5.4-nano, outperforming other LLMs.
  • Broad benchmarks: Superior results across 13 agentic search, QA, and IR-ranking tasks.

Maintenance & Community

  • Core Contributors: Zhuofeng Li, Dongfu Jiang, Haoxiang Zhang, Cong Wei, Pan Lu, Ping Nie.
  • Advisors: Yejin Choi, James Zou, Jiawei Han, Wenhu Chen, Jimmy Lin, Yu Zhang.
  • No community channels or roadmaps linked in the README.

Licensing & Compatibility

  • License: Not specified in the README. Citation is to an arXiv preprint.
  • Compatibility: Designed for local Unix/macOS execution. Commercial use compatibility is undetermined due to the missing license.

Limitations & Caveats

  • Performance claims depend on specific, potentially experimental LLMs (e.g., gpt-5.4-nano).
  • Setup involves external dependencies and API key management.
  • The absence of a clear license is a significant adoption barrier.
  • The project's research focus suggests potential instability or rapid evolution.
Health Check
Last Commit

2 weeks ago

Responsiveness

Inactive

Pull Requests (30d)
1
Issues (30d)
3
Star History
290 stars in the last 30 days

Explore Similar Projects

Feedback? Help us improve.