knowledge-gpt  by geeks-of-data

Knowledge extraction tool using GPT models

Created 2 years ago
288 stars

Top 91.2% on SourcePulse

GitHubView on GitHub
Project Summary

This project provides a framework for extracting knowledge from diverse sources like websites, PDFs, PPTX, DOCX, and YouTube content, enabling Q&A sessions powered by large language models like GPT. It's designed for developers and researchers looking to build applications that leverage contextual information retrieval and generation.

How It Works

The core mechanism involves transforming text from various sources into fixed-size vector embeddings using either OpenAI or open-source models. When a query is received, it's also vectorized and compared against the stored embeddings to find the most relevant information. This context is then used to construct a prompt for a language model, generating a precise answer. The approach supports multiple data types and extraction methods, including speech-to-text for YouTube audio.

Quick Start & Requirements

  • Install via pip: pip install knowledgegpt
  • Requires OpenAI API key (set in example_config.py).
  • Download spaCy model: python3 -m spacy download en_core_web_sm
  • For API server: uvicorn server:app --reload
  • Docker: docker build -t knowledgegptimage . and docker run -p 8888:8888 knowledgegptimage
  • Official PyPI: https://pypi.org/project/knowledgegpt/

Highlighted Details

  • Supports extraction from websites, PDFs, DOCX, PPTX, and YouTube (audio/transcripts).
  • Integrates with OpenAI's GPT models for answer generation.
  • Offers flexibility in choosing embedding models (OpenAI or Hugging Face).
  • Includes a RESTful API for server deployment.

Maintenance & Community

The project is open-source, encouraging contributions via pull requests. Further community engagement details (e.g., Discord/Slack) are not explicitly mentioned in the README.

Licensing & Compatibility

The README does not specify a license. Users should verify licensing for commercial use or integration into closed-source projects.

Limitations & Caveats

The project is under active development with several "TODO" items, including integration with vector databases (Pinecone, Milvus, Qdrant) and a web interface. Support for audio files larger than 25MB and advanced web scraping are also listed as future work.

Health Check
Last Commit

2 years ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
2 stars in the last 30 days

Explore Similar Projects

Starred by Christian Laforte Christian Laforte(Distinguished Engineer at NVIDIA; Former CTO at Stability AI), Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), and
1 more.

Amphion by open-mmlab

0.2%
9k
Toolkit for audio, music, and speech generation research
Created 1 year ago
Updated 3 months ago
Starred by Nir Gazit Nir Gazit(Cofounder of Traceloop), Chris Van Pelt Chris Van Pelt(Cofounder of Weights & Biases), and
1 more.

go-openai by sashabaranov

0.1%
10k
Go client for OpenAI API
Created 5 years ago
Updated 6 days ago
Feedback? Help us improve.