Discover and explore top open-source AI tools and projects—updated daily.
asg017SQLite extension for GGUF text embeddings
Top 99.6% on SourcePulse
Summary
sqlite-lembed is an SQLite extension for generating text embeddings using GGUF models via llama.cpp. It targets developers and researchers aiming to integrate AI-powered semantic capabilities directly into SQLite applications, offering efficient, local embedding generation and semantic search without external services.
How It Works
This extension integrates llama.cpp into SQLite, enabling it to load and execute text embedding models in the GGUF format. Users register GGUF models by inserting their path into the temp.lembed_models virtual table. The lembed() SQL function then computes embeddings for input text using these registered models. This approach facilitates local AI-driven text analysis, with embeddings generated in a BLOB format compatible with the sqlite-vec extension for subsequent vector search.
Quick Start & Requirements
Load the extension using .load ./lembed0. GGUF format embedding models are required; a sample all-MiniLM-L6-v2.e4ce9877.q8_0.gguf is available for download. Register models via SQL: INSERT INTO temp.lembed_models(name, model) select 'model-name', lembed_model_from_file('path/to/model.gguf');. Use lembed('model-name', 'your text') to generate embeddings. Pre-converted GGUF models for nomic-embed-text-v1.5 and mxbai-embed-large-v1 are also provided.
Highlighted Details
sqlite-vec for vector storage and similarity search.lembed() use a BLOB format natively understood by sqlite-vec, simplifying data management.Maintenance & Community
Marked as "A work-in-progress!". Issue #2 requests batch processing support. No specific community channels or detailed contributor information are provided in the README.
Licensing & Compatibility
The README does not specify a software license, preventing determination of commercial use or closed-source linking compatibility without further clarification.
Limitations & Caveats
Key limitations include the lack of batch support for embedding generation, processing each text input individually. Pre-compiled versions lack GPU acceleration, potentially leading to slow embedding generation. Users requiring faster performance should compile the extension from source.
1 year ago
Inactive
Dicklesworthstone
asg017