lotus  by lotus-data

Query engine for LLM-powered data processing using semantic operators

created 1 year ago
1,251 stars

Top 32.2% on sourcepulse

GitHubView on GitHub
Project Summary

LOTUS is a semantic query engine designed for efficient LLM-powered data processing, targeting data scientists and engineers who need to build complex reasoning pipelines over structured and unstructured data. It offers a declarative, Pandas-like API with semantic operators that leverage natural language expressions for data transformations, simplifying the creation of AI-driven analytics.

How It Works

LOTUS implements a semantic operator model, extending traditional relational operators with natural language predicates. This approach allows users to define data operations (like joins, filters, and aggregations) using high-level, human-readable expressions. The engine then optimizes and executes these operations using various AI-based algorithms, abstracting away the underlying LLM complexities and enabling flexible, composable AI pipelines.

Quick Start & Requirements

  • Installation: pip install lotus-ai (stable) or pip install git+https://github.com/lotus-data/lotus.git@main (latest).
  • Prerequisites: Python 3.10, Conda recommended. For Mac users, specific Faiss installations (CPU or GPU) are required. Requires LLM API keys (e.g., OpenAI, Ollama, vLLM) configured via lotus.settings.configure(lm=lm).
  • Resources: LLM API usage costs apply.
  • Docs/Demo: Colab tutorial, Documentation, Examples.

Highlighted Details

  • Supports a wide range of LLMs via LiteLLM and SentenceTransformers for retrieval/reranking.
  • Offers semantic operators like sem_join, sem_filter, sem_map, sem_extract, sem_agg, sem_topk, sem_sim_join, and sem_search.
  • Integrates seamlessly with Pandas DataFrames.
  • Leverages natural language expressions for defining complex data operations.

Maintenance & Community

Health Check
Last commit

5 days ago

Responsiveness

1 day

Pull Requests (30d)
6
Issues (30d)
7
Star History
89 stars in the last 90 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), Elie Bursztein Elie Bursztein(Cybersecurity Lead at Google DeepMind), and
7 more.

mindsdb by mindsdb

0.5%
35k
AI query engine for federated data sources
created 7 years ago
updated 1 day ago
Feedback? Help us improve.