slothdb  by SouravRoy-ETL

Embedded SQL database for direct file querying

Created 1 month ago
805 stars

Top 43.4% on SourcePulse

GitHubView on GitHub
Project Summary

Summary

SlothDB is an embedded, file-first analytical SQL database designed for high performance across diverse environments, from local development to server deployments and web browsers. It enables users to query data directly from various file formats (CSV, Parquet, JSON, Avro, Excel, Arrow, SQLite) without requiring a separate import step or server process, offering significant speed advantages for analytical workloads on single machines.

How It Works

Built from scratch in C++20, SlothDB is a vectorized, columnar engine optimized for analytics. Its file-first architecture allows direct SQL querying of local or remote files. Key features include "live views" for growing files and an .ask sub-REPL translating natural language to SQL via a fast, local rules parser or optional Qwen2.5-Coder LLM, ensuring data privacy. A highly optimized WebAssembly (WASM) build offers significantly smaller sizes for edge computing and browser environments.

Quick Start & Requirements

Install via pip install slothdb (Python), npm install @slothdb/wasm (Node.js), or download the CLI binary. Python 3.8+ is recommended. A live playground is at https://slothdb.org/playground/, with documentation in docs/DOCUMENTATION.md. Demo: python -c "import slothdb; slothdb.demo()".

Highlighted Details

  • Up to 5x faster than DuckDB on benchmarks (JOINs, Avro/CSV decode).
  • Integrated natural language query (.ask) with local rules parser and optional LLM fallback (29 languages).
  • "Live views" offer incremental updates for growing files.
  • Small WASM bundle (~1.3 MB, sub-1 MB edge) for resource-constrained environments.
  • Built-in support for CSV, Parquet, JSON, Avro, Excel, Arrow, SQLite.

Maintenance & Community

Active Discord community (discord.gg/XJWyGmX5G). Robust CI, comprehensive tests, and active maintainer involvement via GitHub issues and Discord.

Licensing & Compatibility

MIT license permits unrestricted use, modification, and distribution, including commercial, closed-source applications.

Limitations & Caveats

Single-node embedded engine; no distributed execution. Lacks multi-writer transactions (MVCC) and is not optimized for OLTP. No secondary indexes (scan-based execution). Partial window function coverage. Only anonymous public S3 access. Young codebase may have SQL edge cases.

Health Check
Last Commit

5 days ago

Responsiveness

Inactive

Pull Requests (30d)
1
Issues (30d)
3
Star History
853 stars in the last 30 days

Explore Similar Projects

Feedback? Help us improve.