lotus  by lotus-data

Query engine for LLM-powered data processing using semantic operators

Created 1 year ago
1,295 stars

Top 30.8% on SourcePulse

GitHubView on GitHub
Project Summary

LOTUS is a semantic query engine designed for efficient LLM-powered data processing, targeting data scientists and engineers who need to build complex reasoning pipelines over structured and unstructured data. It offers a declarative, Pandas-like API with semantic operators that leverage natural language expressions for data transformations, simplifying the creation of AI-driven analytics.

How It Works

LOTUS implements a semantic operator model, extending traditional relational operators with natural language predicates. This approach allows users to define data operations (like joins, filters, and aggregations) using high-level, human-readable expressions. The engine then optimizes and executes these operations using various AI-based algorithms, abstracting away the underlying LLM complexities and enabling flexible, composable AI pipelines.

Quick Start & Requirements

  • Installation: pip install lotus-ai (stable) or pip install git+https://github.com/lotus-data/lotus.git@main (latest).
  • Prerequisites: Python 3.10, Conda recommended. For Mac users, specific Faiss installations (CPU or GPU) are required. Requires LLM API keys (e.g., OpenAI, Ollama, vLLM) configured via lotus.settings.configure(lm=lm).
  • Resources: LLM API usage costs apply.
  • Docs/Demo: Colab tutorial, Documentation, Examples.

Highlighted Details

  • Supports a wide range of LLMs via LiteLLM and SentenceTransformers for retrieval/reranking.
  • Offers semantic operators like sem_join, sem_filter, sem_map, sem_extract, sem_agg, sem_topk, sem_sim_join, and sem_search.
  • Integrates seamlessly with Pandas DataFrames.
  • Leverages natural language expressions for defining complex data operations.

Maintenance & Community

Health Check
Last Commit

1 day ago

Responsiveness

1 day

Pull Requests (30d)
8
Issues (30d)
6
Star History
39 stars in the last 30 days

Explore Similar Projects

Starred by Elie Bursztein Elie Bursztein(Cybersecurity Lead at Google DeepMind), Michael Chiang Michael Chiang(Cofounder of Ollama), and
2 more.

enrichmcp by featureform

0.3%
611
ORM for AI agents
Created 5 months ago
Updated 1 week ago
Starred by Elie Bursztein Elie Bursztein(Cybersecurity Lead at Google DeepMind), Joe Walnes Joe Walnes(Head of Experimental Projects at Stripe), and
1 more.

KAG by OpenSPG

0.4%
8k
Logical reasoning framework for domain knowledge bases
Created 1 year ago
Updated 1 month ago
Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), Nir Gazit Nir Gazit(Cofounder of Traceloop), and
4 more.

llmware by llmware-ai

0.6%
14k
Framework for enterprise RAG pipelines using small, specialized models
Created 2 years ago
Updated 1 month ago
Feedback? Help us improve.