Universal vector dataset tooling
Top 99.8% on sourcepulse
This library provides a universal interface for vector datasets, enabling seamless export, import, and re-embedding across various vector databases and RAG platforms. It targets developers and researchers working with large-scale vector data, offering a standardized format (VDF) to abstract away database-specific complexities and facilitate data migration and model experimentation.
How It Works
The core of vector-io is the Universal Vector Dataset Format (VDF), a standardized structure comprising a VDF_META.json
file and associated Parquet files. This format decouples data from specific vector databases, allowing for agnostic operations. The library provides CLI tools (export_vdf
, import_vdf
, reembed_vdf
) that leverage this format to translate data between different vector stores and to re-generate embeddings using specified models.
Quick Start & Requirements
pip install vdf-io
Highlighted Details
reembed_vdf
utility to change embedding models without altering the vector store.model_name
, dimensions
, and metric
for comprehensive dataset description.Maintenance & Community
Licensing & Compatibility
Limitations & Caveats
5 days ago
Inactive