Chinese-LangChain  by yanqiangmiffy

Gradio SDK for local knowledge base QA using ChatGLM-6B + LangChain

created 2 years ago
2,789 stars

Top 17.5% on sourcepulse

GitHubView on GitHub
Project Summary

Chinese-LangChain is an open-source project that enables localized knowledge base retrieval and intelligent answer generation using the ChatGLM-6B model and the LangChain framework. It is designed for users who want to build conversational AI applications with custom data in Chinese, offering features like document parsing, incremental knowledge updates, and web search integration.

How It Works

The project leverages LangChain to orchestrate interactions between the ChatGLM-6B large language model and a user-defined knowledge base. It utilizes FAISS for efficient vector indexing and retrieval of relevant information from documents (PDF, DOCX, PPT) and other text sources. The system supports both direct model question-answering and retrieval-augmented generation, allowing it to provide answers grounded in the provided knowledge base.

Quick Start & Requirements

  • Install: python main.py
  • Prerequisites: Python 3.x, Gradio, LangChain, Transformers, Sentence-Transformers, FAISS-CPU, Unstructured, DuckDuckGo-Search, MDTex2HTML, Chardet, Cchardet.
  • Hardware: Minimum 9GB VRAM (12GB recommended), 32GB RAM.
  • Demo: 🤗 DEMO

Highlighted Details

  • Supports multi-GPU inference for the model.
  • Integrates web search capabilities for real-time information retrieval.
  • Offers a choice between model-only Q&A and retrieval-augmented Q&A.
  • Provides pre-processed Wikipedia-zh corpus and FAISS indexes.

Maintenance & Community

The project is actively developed with recent updates including Streamlit support and multi-machine multi-GPU inference. Community interaction is encouraged via QQ group (details not provided in README).

Licensing & Compatibility

The project is licensed under "OpenRAIL". Specific terms and restrictions should be reviewed for commercial use or integration into closed-source projects.

Limitations & Caveats

The project is described as "not yet perfect" and welcomes suggestions and PRs. Some planned features, such as comparing retrieval/LLM generation results and filtering/sorting retrieval results, are still under development.

Health Check
Last commit

2 years ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
0
Star History
39 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.