Lightweight library for unstructured data ETL into embeddings
Top 94.3% on sourcepulse
Radient is a Python library designed for efficient, multi-modal data vectorization and the construction of vector-centric ETL pipelines. It targets developers and researchers working with unstructured data like audio, images, graphs, and text, enabling similarity search, RAG, and regression analysis by converting diverse data types into embeddings.
How It Works
Radient abstracts the complexity of various vectorization libraries (e.g., Sentence Transformers, TorchAudio, RDKit) into a unified interface. It supports multiple data modalities through dedicated vectorizer classes (e.g., text_vectorizer
, image_vectorizer
). For performance, it offers an accelerate
function that optimizes vectorizers on-the-fly, reportedly reducing vectorization time by over 40% on CPU. The library also provides a Workflow
object to build data pipelines, chaining data sources, transformations, vectorizers, and sinks into directed graphs.
Quick Start & Requirements
pip install radient
sentence-transformers
upon first use.Highlighted Details
accelerate
function for on-the-fly vectorizer optimization.Workflow
object.Maintenance & Community
The project is maintained by fzliu. No specific community channels (Discord/Slack) or roadmap are explicitly mentioned in the README.
Licensing & Compatibility
The README does not explicitly state a license. This requires further investigation for commercial use or closed-source linking.
Limitations & Caveats
Radient explicitly states it will not provide LLM connectors or focus on building context-aware systems for RAG, recommending projects like Haystack and LlamaIndex for those use cases. Future features like sparse, binary, and multi-vector support, as well as broader Huggingface model integration, are listed as "Coming soon™".
5 days ago
Inactive