Knowledge extraction tool using GPT models
Top 92.8% on sourcepulse
This project provides a framework for extracting knowledge from diverse sources like websites, PDFs, PPTX, DOCX, and YouTube content, enabling Q&A sessions powered by large language models like GPT. It's designed for developers and researchers looking to build applications that leverage contextual information retrieval and generation.
How It Works
The core mechanism involves transforming text from various sources into fixed-size vector embeddings using either OpenAI or open-source models. When a query is received, it's also vectorized and compared against the stored embeddings to find the most relevant information. This context is then used to construct a prompt for a language model, generating a precise answer. The approach supports multiple data types and extraction methods, including speech-to-text for YouTube audio.
Quick Start & Requirements
pip install knowledgegpt
example_config.py
).python3 -m spacy download en_core_web_sm
uvicorn server:app --reload
docker build -t knowledgegptimage .
and docker run -p 8888:8888 knowledgegptimage
Highlighted Details
Maintenance & Community
The project is open-source, encouraging contributions via pull requests. Further community engagement details (e.g., Discord/Slack) are not explicitly mentioned in the README.
Licensing & Compatibility
The README does not specify a license. Users should verify licensing for commercial use or integration into closed-source projects.
Limitations & Caveats
The project is under active development with several "TODO" items, including integration with vector databases (Pinecone, Milvus, Qdrant) and a web interface. Support for audio files larger than 25MB and advanced web scraping are also listed as future work.
2 years ago
1 day