DCI-Agent-Lite by DCI-Agent

Rethinking retrieval for agentic search via direct corpus interaction

Created 2 months ago

367 stars

Top 76.6% on SourcePulse

Project Summary

Summary

DCI-Agent-Lite introduces a Direct Corpus Interaction (DCI) paradigm for agentic search and deep research on personal knowledge bases. It targets users needing to query private data without external services or pre-built indices. The benefit is a simplified, high-resolution retrieval system treating the corpus as an open research environment, yielding improved accuracy.

How It Works

DCI agents search raw files directly using terminal tools (rg, find, sed), bypassing semantic retrievers and index builds. This "zero-index retrieval" enables immediate operation, fine-grained control, and free search primitive composition. Built on Pi with bash tools and lightweight context management, it balances minimal complexity with robust long-horizon research.

Quick Start & Requirements

Primary install: bash setup.sh (Unix/macOS). Manual setup requires uv, ripgrep, npm, Python deps, API keys (.env), and cloning pi-mono (https://github.com/jdf-prog/pi-mono.git).
Prerequisites: Unix/macOS, uv, ripgrep, npm, Python, LLM API keys (OpenAI/Anthropic). Specific models (e.g., gpt-5.4-nano) are needed for benchmark performance.
Links: assets/docs/setup.md, Hugging Face datasets.
TUI Command: uv run dci-agent-lite --terminal --provider openai --model gpt-5.4-nano --cwd "corpus/wiki_corpus" --extra-arg="--thinking high"

Highlighted Details

Privacy: Operates locally on private corpora without external service reliance.
Zero-index retrieval: Uses terminal commands directly on raw files, eliminating embeddings/index builds for immediate use.
Performance: Achieves 62.9% accuracy on BrowseComp-Plus with gpt-5.4-nano, outperforming other LLMs.
Broad benchmarks: Superior results across 13 agentic search, QA, and IR-ranking tasks.

Maintenance & Community

Core Contributors: Zhuofeng Li, Dongfu Jiang, Haoxiang Zhang, Cong Wei, Pan Lu, Ping Nie.
Advisors: Yejin Choi, James Zou, Jiawei Han, Wenhu Chen, Jimmy Lin, Yu Zhang.
No community channels or roadmaps linked in the README.

Licensing & Compatibility

License: Not specified in the README. Citation is to an arXiv preprint.
Compatibility: Designed for local Unix/macOS execution. Commercial use compatibility is undetermined due to the missing license.

Limitations & Caveats

Performance claims depend on specific, potentially experimental LLMs (e.g., gpt-5.4-nano).
Setup involves external dependencies and API key management.
The absence of a clear license is a significant adoption barrier.
The project's research focus suggests potential instability or rapid evolution.

Health Check

Last Commit

1 month ago

Responsiveness

Inactive

Pull Requests (30d)

0

Issues (30d)

0

Star History

29 stars in the last 30 days

Explore Similar Projects

SearchCLI by volcengine

AI search and retrieval CLI for agent and business systems

Created 2 months ago

Updated 3 days ago

Marco-DeepResearch by ATH-MaaS

Frameworks and benchmarks for challenging agentic search

Created 8 months ago

Updated 6 days ago

ctx by ctxrs

Fast local search for coding agent context

Created 4 months ago

Updated 1 day ago

workshop-agentic-search by iamleonie

Agentic search for context engineering

Created 3 months ago

Updated 3 months ago

OpenSeeker by PolarSeeker

Frontier search agent system

Created 4 months ago

Updated 2 weeks ago

VibeSearchBench by VibeBench

Evaluating advanced search agents with complex, multi-turn interactions

Created 1 month ago

Updated 1 month ago

coding_agent_session_search by Dicklesworthstone

Search AI coding agent history locally

Created 7 months ago

Updated 20 hours ago

Starred by

Tim Suchanek

Tim Suchanek(Founder of expand.ai),

Elie Bursztein

Elie Bursztein(Cybersecurity Lead at Google DeepMind), and

4 more.

mgrep by mixedbread-ai

Semantic CLI for code and document search

Created 8 months ago

Updated 2 months ago

Starred by

Jeff Hammerbacher

Jeff Hammerbacher(Cofounder of Cloudera),

Shizhe Diao

Shizhe Diao(Author of LMFlow; Research Scientist at NVIDIA), and

1 more.

aiq by NVIDIA-AI-Blueprints

AI research assistant for on-premise deep report generation

Created 1 year ago

Updated 1 day ago

Starred by

Tim Suchanek

Tim Suchanek(Founder of expand.ai).

semble by MinishLab

Fast, accurate code search for AI agents

Created 3 months ago

Updated 3 days ago

knowhere by Ontos-AI

Document memory infrastructure for AI agents

Created 2 months ago

Updated 2 days ago

anysearch-skill by anysearch-ai

Real-time search engine skill for AI agents

Created 2 months ago

Updated 1 day ago

Feedback? Help us improve.