Discover and explore top open-source AI tools and projects—updated daily.
chdb-ioIn-process OLAP SQL engine for Python data analysis
Top 18.0% on SourcePulse
chDB provides an in-process OLAP SQL engine, leveraging the power of ClickHouse for high-performance analytical queries directly within Python applications. It targets data scientists, engineers, and researchers who need to query diverse data formats without the overhead of setting up and managing a separate ClickHouse instance. The primary benefit is simplified data analysis workflows, reduced data copying, and seamless integration with Python data ecosystems.
How It Works
chDB embeds ClickHouse's core OLAP engine, enabling SQL queries directly within the Python process. It minimizes data transfer overhead between C++ and Python using python memoryview. The engine supports a vast array of input and output formats, including Parquet, CSV, JSON, Arrow, and ORC, alongside compliance with the Python DB API 2.0. Advanced features include support for User-Defined Functions (UDFs) and efficient streaming query processing for large datasets. It also offers AI-assisted SQL generation, translating natural language prompts into executable SQL queries.
Quick Start & Requirements
pip install chdbpython3 -m chdb "SQL" [OutputFormat]) or programmatically using the chdb Python API (chdb.connect(), chdb.query()).examples/, tests/), and project documentation is linked via the README.Highlighted Details
Maintenance & Community
The project maintains an active community presence via Discord (https://discord.gg/D2Daa2fM5K) and Twitter (@chdb). Contributions are welcomed for testing, documentation, and code improvements. Bindings for other languages are also encouraged.
Licensing & Compatibility
chDB is released under the Apache 2.0 license. This permissive license allows for commercial use, modification, and distribution, including integration within closed-source applications.
Limitations & Caveats
Currently, chDB officially supports Python 3.8+ on macOS and Linux; Windows support is not explicitly mentioned. User-Defined Functions (UDFs) must be stateless. Streaming queries require careful resource management (explicit close() or with statement) to prevent blocking subsequent operations. AI-assisted SQL generation requires proper configuration of AI providers and API keys.
3 days ago
Inactive
Canner
vanna-ai
mindsdb