LLM retrieval engine for comprehensive entity collection from many sources
Top 63.4% on sourcepulse
DeepSeek is an experimental, LLM-powered retrieval engine designed to comprehensively collect and enrich entities from a vast number of internet sources. Unlike typical "answer engines" that aim for a single correct response, DeepSeek functions as a "retrieval engine," outputting a detailed table of entities and their associated data, complete with confidence scores. This makes it suitable for users needing exhaustive data aggregation rather than concise summaries.
How It Works
DeepSeek employs a multi-step "flow engineering" architecture. It begins with a "Plan" phase, where the LLM defines the entities to extract and the relevant data columns based on the user query. The "Search" phase utilizes both keyword and neural search via Exa to find relevant content. In the "Extract" phase, a novel technique inserts special tokens into content, allowing the LLM to efficiently identify and extract specific entities and their associated data. Finally, the "Enrich" phase uses a smaller LLM to populate the defined columns for each entity, assigning confidence scores to the extracted data.
Quick Start & Requirements
npm run dev
(or equivalent) to start the dev server..env
file.Highlighted Details
Maintenance & Community
Licensing & Compatibility
Limitations & Caveats
1 year ago
Inactive