deep-seek by dzhng

LLM retrieval engine for comprehensive entity collection from many sources

Created 1 year ago

505 stars

Top 61.7% on SourcePulse

2 Experts Love This Project

ckathleen

Managing Partner of Topology Ventures

transitive-bullshit

Founder of Agentic

Project Summary

DeepSeek is an experimental, LLM-powered retrieval engine designed to comprehensively collect and enrich entities from a vast number of internet sources. Unlike typical "answer engines" that aim for a single correct response, DeepSeek functions as a "retrieval engine," outputting a detailed table of entities and their associated data, complete with confidence scores. This makes it suitable for users needing exhaustive data aggregation rather than concise summaries.

How It Works

DeepSeek employs a multi-step "flow engineering" architecture. It begins with a "Plan" phase, where the LLM defines the entities to extract and the relevant data columns based on the user query. The "Search" phase utilizes both keyword and neural search via Exa to find relevant content. In the "Extract" phase, a novel technique inserts special tokens into content, allowing the LLM to efficiently identify and extract specific entities and their associated data. Finally, the "Enrich" phase uses a smaller LLM to populate the defined columns for each entity, assigning confidence scores to the extracted data.

Quick Start & Requirements

Install via npm, yarn, pnpm, or bun.
Run npm run dev (or equivalent) to start the dev server.
Requires API keys for Anthropic and Exa, configured in a .env file.
Running the agent can take ~5 minutes and cost $0.1-$3 in API credits.
Demo: https://deep-seek.vercel.app/

Highlighted Details

Processes hundreds of sources to retrieve and enrich dozens of entities.
Generates confidence scores for extracted data, highlighting potential conflicts or guesses.
Utilizes Exa for both keyword and neural search capabilities.
Employs a token-efficient LLM extraction technique using special sentence delimiters.

Maintenance & Community

Project lead can be contacted via email (david@aomni.com) or Twitter for collaboration and discussion.
Future work includes sorting/ranking, improved entity resolution, source verification, deep browsing, and streaming data.

Licensing & Compatibility

The README does not explicitly state a license.

Limitations & Caveats

The project is experimental and the provided demo does not run the agent due to cost.
Entity resolution for similar items (e.g., M2 vs. M3 Macbooks) needs improvement.
Source verification during enrichment is an area for enhancement.
Real-time streaming of results is not yet implemented.

Health Check

Last Commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)

0

Issues (30d)

0

Star History

0 stars in the last 30 days

Explore Similar Projects

SAG by Zleap-AI

SQL-driven RAG engine for dynamic knowledge graph construction

Created 2 months ago

Updated 1 month ago

A-Guide-to-Retrieval-Augmented-LLM by Wang-Shuo

Intro to retrieval augmented LLMs

Created 2 years ago

Updated 2 years ago

ChatKBQA by LHRLAB

Research paper resources for knowledge base question answering

Created 2 years ago

Updated 3 months ago

RAGOnMedicalKG by liuhuanyong

RAG pipeline for medical Q&A, combining LLMs with a knowledge graph

Created 1 year ago

Updated 1 year ago

LLM4IR-Survey by RUC-NLPIR

Survey of LLMs for Information Retrieval

Created 2 years ago

Updated 1 month ago

Starred by

Travis Addair

Travis Addair(Cofounder of Predibase),

Travis Fischer

Travis Fischer(Founder of Agentic), and

3 more.

evaporate by HazyResearch

Code and data for a research paper on using LLMs to generate structured views of data lakes

Created 2 years ago

Updated 1 year ago

Starred by

Jeff Hammerbacher

Jeff Hammerbacher(Cofounder of Cloudera).

stark by snap-stanford

LLM retrieval benchmark on textual/relational knowledge bases (NeurIPS 2024)

Created 1 year ago

Updated 1 week ago

Starred by

Andrew Kane

Andrew Kane(Author of pgvector),

Robert Stojnic

Robert Stojnic(Cocreator of Papers with Code), and

2 more.

BLINK by facebookresearch

Entity Linker library using Wikipedia as the knowledge base

Created 6 years ago

Updated 2 years ago

graph-rag-agent by 1517005260

GraphRAG + DeepSearch for interpretable Q&A agents

Created 11 months ago

Updated 2 months ago

Starred by

Elvis Saravia

Elvis Saravia(Founder of DAIR.AI),

Rotem Weiss

Rotem Weiss(Cofounder of Tavily), and

1 more.

local-deep-researcher by langchain-ai

Local web research assistant using local LLMs

Created 1 year ago

Updated 5 months ago

Starred by

Elvis Saravia

Elvis Saravia(Founder of DAIR.AI).

rag-from-scratch by langchain-ai

RAG tutorial for expanding LLM knowledge via external data

Created 1 year ago

Updated 6 months ago

Starred by

Elie Bursztein

Elie Bursztein(Cybersecurity Lead at Google DeepMind),

Yiran Wu

Yiran Wu(Coauthor of AutoGen), and

2 more.

RAG_Techniques by NirDiamant

RAG techniques showcase for enhanced generation systems

Created 1 year ago

Updated 1 month ago

Feedback? Help us improve.