Easy-RAG by yuntianhe2014

RAG system for learning, usage, and extension

Created 1 year ago

523 stars

Top 60.2% on SourcePulse

Project Summary

Easy-RAG is an open-source Retrieval Augmented Generation (RAG) system designed for learning, usage, and extensibility, enabling AI-powered web searches. It caters to users who want to build and customize RAG pipelines, offering features for knowledge base management, multi-turn chat, and internet-based AI search.

How It Works

The system supports various data formats (txt, csv, pdf, docx, mp3, mp4, wav, excel) for knowledge base creation, updating, and deletion. It leverages vectorization with support for Chroma, FAISS, and Elasticsearch. For chat, it offers multi-turn conversations with LLMs and RAG-based Q&A using different retrieval methods including reranking with the BGE-reranker-large model. AI web search is integrated via SearxNG, allowing LLMs to query the internet. Audio/video processing uses funasr for speech-to-text.

Quick Start & Requirements

Install: pip3 install -r requirements.txt
Prerequisites: Ollama (for LLMs and embeddings), SearxNG (for web search), Python 3.8+ (tested with 3.10.9), BGE-reranker-large model download.
Setup: Requires Ollama model downloads (ollama run qwen2:7b, ollama run mofanke/acge_text_embedding:latest), SearxNG setup (Docker recommended), and configuration of vector database and reranker paths in Config/config.py.
Run: python webui.py
Docs: https://github.com/yuntianhe2014/Easy-RAG

Highlighted Details

Supports multiple vector databases: Chroma, FAISS, Elasticsearch (with plans for Milvus, MongoDB).
Integrates AI web search via SearxNG for internet-connected AI queries.
Includes speech-to-text for audio/video files using funasr.
Implements reranking with BGE-reranker-large for improved retrieval efficiency.
Offers a separate tool for real-time knowledge graph extraction.

Maintenance & Community

The project is actively updated, with recent additions including AI web search, Elasticsearch support, FAISS support, and reranking. Future plans include more vector database integrations and voice output.

Licensing & Compatibility

The repository does not explicitly state a license in the README. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

The initial release focuses on specific vector databases and does not include all planned integrations. The README mentions that funasr model downloads might be slow on first startup. The licensing status requires clarification for commercial adoption.

Easy-RAG by yuntianhe2014

Explore Similar Projects

yacy_expert by yacy

multimodal-search-r1 by EvolvingLMMs-Lab

bilibili-rag by via007

DataChad by gustavz

Chat_with_Datawhale_langchain by logan-zou

lennyhub-rag by traversaal-ai

Local_Pdf_Chat_RAG by weiwill88

swirl-search by swirlai

Chinese-LangChain by yanqiangmiffy

Yuxi-Know by xerrors

Verba by weaviate

SurfSense by MODSetter