duckle  by SouravRoy-ETL

Visual ETL/ELT studio with local AI pipeline generation

Created 3 weeks ago

New!

447 stars

Top 66.4% on SourcePulse

GitHubView on GitHub
Project Summary

Duckle is a local-first ETL/ELT studio designed for individual engineers and small teams, offering a visual drag-and-drop pipeline designer that compiles to SQL and executes on DuckDB. It provides a tiny, serverless desktop application with a git-friendly workspace and an integrated, on-device AI assistant, enabling users to build and manage data pipelines efficiently without cloud dependencies.

How It Works

Duckle operates as a visual pipeline designer where users drag, wire, and configure data sources, transformations, and sinks. The core engine compiles these visual workflows into executable SQL queries that run natively on DuckDB. A key feature is "Duckie," an on-device AI assistant that generates pipeline JSON from natural language prompts, running entirely offline via llama.cpp. This approach offers a compact, auditable, and local-first data processing solution.

Quick Start & Requirements

Download the pre-compiled binary for Windows, macOS, or Linux from the latest releases. On first launch, users are guided to install the DuckDB engine (~30 MB + extensions) and optionally the Duckie AI Assistant (~1.1 GB). Linux users require WebKitGTK 4.1. Setup for DuckDB takes approximately 30 seconds, while the AI assistant download averages 5-10 minutes. Further details are available in the Download / Install and Quickstart (60 s) sections.

Highlighted Details

  • Extensive Connectors: Ships with over 290 connectors for files (CSV, Parquet, JSON), databases (PostgreSQL, MySQL, Snowflake, BigQuery), cloud storage (S3, GCS, Azure Blob), streaming platforms (Kafka, NATS), SaaS APIs, NoSQL, and vector databases.
  • Local AI Assistant: "Duckie" uses Qwen 2.5 Coder 1.5B running via llama.cpp for offline, on-CPU pipeline generation from natural language.
  • Git-Friendly Workspaces: Pipelines, connections, and configurations are stored as plain JSON/Markdown files in a user-defined folder, facilitating standard Git version control workflows.
  • DuckDB Execution: Pipelines are compiled to SQL and executed by DuckDB, providing native performance, live previews, and generated SQL visibility for each node.

Maintenance & Community

The project is actively developed, with a public beta status and a visible roadmap (docs/roadmap.md) outlining planned features like multi-shard streaming and embedded Python/Rust stages. Contributions are welcomed, with guidelines provided in CONTRIBUTING.md. Releases are managed via GitHub releases.

Licensing & Compatibility

Duckle is dual-licensed under MIT OR Apache-2.0, allowing for commercial use, forking, and integration into closed-source projects without usage limits or telemetry.

Limitations & Caveats

Duckle is designed as a single-machine, embedded studio and is not intended to replace distributed data warehouses. While in public beta, APIs may evolve before version 1.0. The CLI run mode for headless execution is planned for version 1.0.

Health Check
Last Commit

15 hours ago

Responsiveness

Inactive

Pull Requests (30d)
2
Issues (30d)
32
Star History
447 stars in the last 25 days

Explore Similar Projects

Starred by Eric Zhu Eric Zhu(Coauthor of AutoGen; Research Scientist at Microsoft Research), Elie Bursztein Elie Bursztein(Cybersecurity Lead at Google DeepMind), and
27 more.

goose by aaif-goose

2.5%
49k
Open-source AI agent for automating complex engineering tasks
Created 1 year ago
Updated 15 hours ago
Feedback? Help us improve.