radient by fzliu

Lightweight library for unstructured data ETL into embeddings

Created 1 year ago

281 stars

Top 92.8% on SourcePulse

Project Summary

Radient is a Python library designed for efficient, multi-modal data vectorization and the construction of vector-centric ETL pipelines. It targets developers and researchers working with unstructured data like audio, images, graphs, and text, enabling similarity search, RAG, and regression analysis by converting diverse data types into embeddings.

How It Works

Radient abstracts the complexity of various vectorization libraries (e.g., Sentence Transformers, TorchAudio, RDKit) into a unified interface. It supports multiple data modalities through dedicated vectorizer classes (e.g., text_vectorizer, image_vectorizer). For performance, it offers an accelerate function that optimizes vectorizers on-the-fly, reportedly reducing vectorization time by over 40% on CPU. The library also provides a Workflow object to build data pipelines, chaining data sources, transformations, vectorizers, and sinks into directed graphs.

Quick Start & Requirements

Primary install: pip install radient
Prerequisites: Python environment. The library can automatically install missing dependencies like sentence-transformers upon first use.
Links: Official Documentation (implied by repo structure)

Highlighted Details

Supports vectorization for text, audio, graphs, images, and molecules.
Offers an accelerate function for on-the-fly vectorizer optimization.
Enables building complex ETL pipelines using a Workflow object.
Integrates with various underlying libraries including Sentence Transformers, TorchAudio, and RDKit.

Maintenance & Community

The project is maintained by fzliu. No specific community channels (Discord/Slack) or roadmap are explicitly mentioned in the README.

Licensing & Compatibility

The README does not explicitly state a license. This requires further investigation for commercial use or closed-source linking.

Limitations & Caveats

Radient explicitly states it will not provide LLM connectors or focus on building context-aware systems for RAG, recommending projects like Haystack and LlamaIndex for those use cases. Future features like sparse, binary, and multi-vector support, as well as broader Huggingface model integration, are listed as "Coming soon™".

Health Check

Last Commit

2 weeks ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

0 stars in the last 30 days