knowledge-gpt  by geeks-of-data

Knowledge extraction tool using GPT models

created 2 years ago
285 stars

Top 92.8% on sourcepulse

GitHubView on GitHub
Project Summary

This project provides a framework for extracting knowledge from diverse sources like websites, PDFs, PPTX, DOCX, and YouTube content, enabling Q&A sessions powered by large language models like GPT. It's designed for developers and researchers looking to build applications that leverage contextual information retrieval and generation.

How It Works

The core mechanism involves transforming text from various sources into fixed-size vector embeddings using either OpenAI or open-source models. When a query is received, it's also vectorized and compared against the stored embeddings to find the most relevant information. This context is then used to construct a prompt for a language model, generating a precise answer. The approach supports multiple data types and extraction methods, including speech-to-text for YouTube audio.

Quick Start & Requirements

  • Install via pip: pip install knowledgegpt
  • Requires OpenAI API key (set in example_config.py).
  • Download spaCy model: python3 -m spacy download en_core_web_sm
  • For API server: uvicorn server:app --reload
  • Docker: docker build -t knowledgegptimage . and docker run -p 8888:8888 knowledgegptimage
  • Official PyPI: https://pypi.org/project/knowledgegpt/

Highlighted Details

  • Supports extraction from websites, PDFs, DOCX, PPTX, and YouTube (audio/transcripts).
  • Integrates with OpenAI's GPT models for answer generation.
  • Offers flexibility in choosing embedding models (OpenAI or Hugging Face).
  • Includes a RESTful API for server deployment.

Maintenance & Community

The project is open-source, encouraging contributions via pull requests. Further community engagement details (e.g., Discord/Slack) are not explicitly mentioned in the README.

Licensing & Compatibility

The README does not specify a license. Users should verify licensing for commercial use or integration into closed-source projects.

Limitations & Caveats

The project is under active development with several "TODO" items, including integration with vector databases (Pinecone, Milvus, Qdrant) and a web interface. Support for audio files larger than 25MB and advanced web scraping are also listed as future work.

Health Check
Last commit

2 years ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
0
Star History
4 stars in the last 90 days

Explore Similar Projects

Starred by Jared Palmer Jared Palmer(Ex-VP of AI at Vercel; Founder of Turborepo; Author of Formik, TSDX).

chatgpt-pgvector by gannonh

0%
938
Domain-specific chat completions app
created 2 years ago
updated 2 years ago
Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems) and Elie Bursztein Elie Bursztein(Cybersecurity Lead at Google DeepMind).

LightRAG by HKUDS

1.0%
19k
RAG framework for fast, simple retrieval-augmented generation
created 10 months ago
updated 18 hours ago
Feedback? Help us improve.