flock by dais-polymtl

DuckDB extension for multimodal querying

Created 1 year ago

288 stars

Top 91.3% on SourcePulse

Project Summary

FlockMTL is a DuckDB extension enabling multimodal querying by integrating Large Language Models (LLMs) and Retrieval-Augmented Generation (RAG) directly into OLAP systems. It allows users to perform semantic analysis tasks, such as text generation, classification, summarization, and embedding generation, using declarative SQL queries, targeting data analysts and researchers who need to combine structured data analytics with AI capabilities.

How It Works

FlockMTL extends DuckDB with custom SQL functions that interface with various LLM providers (OpenAI, Azure, Ollama). It supports end-to-end RAG pipelines and provides Map and Reduce functions, allowing complex semantic tasks to be orchestrated and executed directly within the database, leveraging DuckDB's efficient OLAP engine for combined data and semantic analysis.

Quick Start & Requirements

Install DuckDB version 1.1.1 or later.
Install FlockMTL via DuckDB's community catalog: INSTALL flockmtl FROM community;
Load the extension: LOAD flockmtl;
Requires credentials/API keys for OpenAI, Azure, or Ollama.
Supported OS: Linux, macOS, Windows.

Highlighted Details

Declarative SQL interface for LLM and RAG tasks.
Supports OpenAI, Azure, and Ollama providers.
Enables end-to-end RAG pipelines within DuckDB.
Includes Map and Reduce functions for combining semantic tasks and analytics.

Maintenance & Community

The project is under active development by the Data & AI Systems Laboratory (DAIS Lab) at Polytechnique Montréal. Users can report bugs or request features via provided links. Contribution guidelines are available for code contributions.

Licensing & Compatibility

Licensed under the MIT License, permitting commercial use and integration with closed-source applications.

Limitations & Caveats

The project is presented as a research artifact, and while it supports multiple LLM providers, the performance and specific capabilities will depend on the chosen provider and model. Specific details on RAG pipeline configuration or advanced tuning are likely found in linked documentation.

Health Check

Last Commit

1 month ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

3 stars in the last 30 days