flock  by dais-polymtl

DuckDB extension for multimodal querying

Created 1 year ago
267 stars

Top 95.9% on SourcePulse

GitHubView on GitHub
Project Summary

FlockMTL is a DuckDB extension enabling multimodal querying by integrating Large Language Models (LLMs) and Retrieval-Augmented Generation (RAG) directly into OLAP systems. It allows users to perform semantic analysis tasks, such as text generation, classification, summarization, and embedding generation, using declarative SQL queries, targeting data analysts and researchers who need to combine structured data analytics with AI capabilities.

How It Works

FlockMTL extends DuckDB with custom SQL functions that interface with various LLM providers (OpenAI, Azure, Ollama). It supports end-to-end RAG pipelines and provides Map and Reduce functions, allowing complex semantic tasks to be orchestrated and executed directly within the database, leveraging DuckDB's efficient OLAP engine for combined data and semantic analysis.

Quick Start & Requirements

  • Install DuckDB version 1.1.1 or later.
  • Install FlockMTL via DuckDB's community catalog: INSTALL flockmtl FROM community;
  • Load the extension: LOAD flockmtl;
  • Requires credentials/API keys for OpenAI, Azure, or Ollama.
  • Supported OS: Linux, macOS, Windows.

Highlighted Details

  • Declarative SQL interface for LLM and RAG tasks.
  • Supports OpenAI, Azure, and Ollama providers.
  • Enables end-to-end RAG pipelines within DuckDB.
  • Includes Map and Reduce functions for combining semantic tasks and analytics.

Maintenance & Community

The project is under active development by the Data & AI Systems Laboratory (DAIS Lab) at Polytechnique Montréal. Users can report bugs or request features via provided links. Contribution guidelines are available for code contributions.

Licensing & Compatibility

Licensed under the MIT License, permitting commercial use and integration with closed-source applications.

Limitations & Caveats

The project is presented as a research artifact, and while it supports multiple LLM providers, the performance and specific capabilities will depend on the chosen provider and model. Specific details on RAG pipeline configuration or advanced tuning are likely found in linked documentation.

Health Check
Last Commit

1 day ago

Responsiveness

Inactive

Pull Requests (30d)
28
Issues (30d)
5
Star History
9 stars in the last 30 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), Nir Gazit Nir Gazit(Cofounder of Traceloop), and
4 more.

llmware by llmware-ai

0.6%
14k
Framework for enterprise RAG pipelines using small, specialized models
Created 2 years ago
Updated 1 month ago
Starred by Andrej Karpathy Andrej Karpathy(Founder of Eureka Labs; Formerly at Tesla, OpenAI; Author of CS 231n), Anton Troynikov Anton Troynikov(Cofounder of Chroma), and
44 more.

llama_index by run-llama

0.3%
44k
Data framework for building LLM-powered agents
Created 2 years ago
Updated 19 hours ago
Feedback? Help us improve.