radient  by fzliu

Lightweight library for unstructured data ETL into embeddings

created 1 year ago
278 stars

Top 94.3% on sourcepulse

GitHubView on GitHub
Project Summary

Radient is a Python library designed for efficient, multi-modal data vectorization and the construction of vector-centric ETL pipelines. It targets developers and researchers working with unstructured data like audio, images, graphs, and text, enabling similarity search, RAG, and regression analysis by converting diverse data types into embeddings.

How It Works

Radient abstracts the complexity of various vectorization libraries (e.g., Sentence Transformers, TorchAudio, RDKit) into a unified interface. It supports multiple data modalities through dedicated vectorizer classes (e.g., text_vectorizer, image_vectorizer). For performance, it offers an accelerate function that optimizes vectorizers on-the-fly, reportedly reducing vectorization time by over 40% on CPU. The library also provides a Workflow object to build data pipelines, chaining data sources, transformations, vectorizers, and sinks into directed graphs.

Quick Start & Requirements

  • Primary install: pip install radient
  • Prerequisites: Python environment. The library can automatically install missing dependencies like sentence-transformers upon first use.
  • Links: Official Documentation (implied by repo structure)

Highlighted Details

  • Supports vectorization for text, audio, graphs, images, and molecules.
  • Offers an accelerate function for on-the-fly vectorizer optimization.
  • Enables building complex ETL pipelines using a Workflow object.
  • Integrates with various underlying libraries including Sentence Transformers, TorchAudio, and RDKit.

Maintenance & Community

The project is maintained by fzliu. No specific community channels (Discord/Slack) or roadmap are explicitly mentioned in the README.

Licensing & Compatibility

The README does not explicitly state a license. This requires further investigation for commercial use or closed-source linking.

Limitations & Caveats

Radient explicitly states it will not provide LLM connectors or focus on building context-aware systems for RAG, recommending projects like Haystack and LlamaIndex for those use cases. Future features like sparse, binary, and multi-vector support, as well as broader Huggingface model integration, are listed as "Coming soon™".

Health Check
Last commit

5 days ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
3 stars in the last 90 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems) and Jeff Hammerbacher Jeff Hammerbacher(Cofounder of Cloudera).

towhee by towhee-io

0.2%
3k
Framework for neural data processing pipelines
created 4 years ago
updated 9 months ago
Feedback? Help us improve.