Chinese-LangChain  by yanqiangmiffy

Gradio SDK for local knowledge base QA using ChatGLM-6B + LangChain

Created 2 years ago
2,801 stars

Top 17.0% on SourcePulse

GitHubView on GitHub
Project Summary

Chinese-LangChain is an open-source project that enables localized knowledge base retrieval and intelligent answer generation using the ChatGLM-6B model and the LangChain framework. It is designed for users who want to build conversational AI applications with custom data in Chinese, offering features like document parsing, incremental knowledge updates, and web search integration.

How It Works

The project leverages LangChain to orchestrate interactions between the ChatGLM-6B large language model and a user-defined knowledge base. It utilizes FAISS for efficient vector indexing and retrieval of relevant information from documents (PDF, DOCX, PPT) and other text sources. The system supports both direct model question-answering and retrieval-augmented generation, allowing it to provide answers grounded in the provided knowledge base.

Quick Start & Requirements

  • Install: python main.py
  • Prerequisites: Python 3.x, Gradio, LangChain, Transformers, Sentence-Transformers, FAISS-CPU, Unstructured, DuckDuckGo-Search, MDTex2HTML, Chardet, Cchardet.
  • Hardware: Minimum 9GB VRAM (12GB recommended), 32GB RAM.
  • Demo: 🤗 DEMO

Highlighted Details

  • Supports multi-GPU inference for the model.
  • Integrates web search capabilities for real-time information retrieval.
  • Offers a choice between model-only Q&A and retrieval-augmented Q&A.
  • Provides pre-processed Wikipedia-zh corpus and FAISS indexes.

Maintenance & Community

The project is actively developed with recent updates including Streamlit support and multi-machine multi-GPU inference. Community interaction is encouraged via QQ group (details not provided in README).

Licensing & Compatibility

The project is licensed under "OpenRAIL". Specific terms and restrictions should be reviewed for commercial use or integration into closed-source projects.

Limitations & Caveats

The project is described as "not yet perfect" and welcomes suggestions and PRs. Some planned features, such as comparing retrieval/LLM generation results and filtering/sorting retrieval results, are still under development.

Health Check
Last Commit

2 years ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
12 stars in the last 30 days

Explore Similar Projects

Starred by John Resig John Resig(Author of jQuery; Chief Software Architect at Khan Academy), Simon Horup Eskildsen Simon Horup Eskildsen(Cofounder of Turbopuffer), and
21 more.

meilisearch by meilisearch

0.2%
53k
Search engine API for integrating AI-powered hybrid search
Created 7 years ago
Updated 1 day ago
Starred by Shizhe Diao Shizhe Diao(Author of LMFlow; Research Scientist at NVIDIA), Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), and
2 more.

LightRAG by HKUDS

1.2%
21k
RAG framework for fast, simple retrieval-augmented generation
Created 11 months ago
Updated 2 days ago
Feedback? Help us improve.