introspect  by defog-ai

Service for deep research on internal data

created 1 year ago
336 stars

Top 83.0% on sourcepulse

GitHubView on GitHub
1 Expert Loves This Project
Project Summary

Defog Introspect is an AI-powered research service designed for structured and unstructured data analysis, enabling users to query databases, CSVs, Excel files, and PDFs, augmented by web search. It targets data analysts and researchers seeking to derive insights from diverse data sources through natural language interaction.

How It Works

The system employs an AI agent that utilizes tool-use capabilities. An LLM orchestrates queries across three primary tools: text_to_sql for structured data, web_search for external context, and pdf_with_citations for document analysis. The agent recursively employs these tools until it gathers sufficient information to answer the user's question. Default models include o4-mini for text-to-SQL, gemini-2.0-flash for web search, and claude-3-7-sonnet for PDF analysis and overall orchestration.

Quick Start & Requirements

  • Install via Docker Compose: docker compose up --build
  • Requires API keys for OpenAI, Anthropic, and Gemini, configured in a .env file.
  • Access the application at http://localhost:80.
  • Demo available at: https://demo.defog.ai/reports (user: admin, pass: admin).

Highlighted Details

  • Supports a wide range of databases (PostgreSQL, MySQL, BigQuery, Snowflake, etc.) and file formats (CSV, Excel).
  • Integrates PDF analysis with citation support.
  • Web search capability provides external context for queries.
  • Modular design with separate backend (Python) and frontend (JavaScript/TypeScript) components.

Maintenance & Community

  • Maintained by Defog.ai.
  • Future plans include user-selectable models and documentation for custom tools and data source integrations.

Licensing & Compatibility

  • License details are not explicitly stated in the README.

Limitations & Caveats

  • The project is marked as "Coming soon" for documentation.
  • Users cannot currently select specific LLM models for different tasks via configuration.
  • Integration with cloud storage services like Google Drive and OneDrive for unstructured data is not yet implemented.
Health Check
Last commit

1 month ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
1
Star History
23 stars in the last 90 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems) and Elie Bursztein Elie Bursztein(Cybersecurity Lead at Google DeepMind).

LightRAG by HKUDS

1.0%
19k
RAG framework for fast, simple retrieval-augmented generation
created 10 months ago
updated 20 hours ago
Feedback? Help us improve.