web-crawl-q-and-a-example by openai

Q&A bot using OpenAI API

Created 2 years ago

322 stars

Top 84.5% on SourcePulse

View on GitHub

1 Expert Loves This Project

Logan Kilpatrick

Product Lead on Google AI Studio

Project Summary

This repository provides a practical example for developers looking to build a question-answering system over their own website content using OpenAI's API and embedding models. It's designed for users familiar with Python and the OpenAI ecosystem who want to leverage their data for conversational AI applications.

How It Works

The project demonstrates a common pattern for RAG (Retrieval-Augmented Generation): website content is crawled, chunked, embedded using OpenAI's text-embedding-ada-002 model, and stored in a vector database (likely ChromaDB, though not explicitly stated in the README). When a user asks a question, it's also embedded, and the most similar document chunks are retrieved from the vector database to provide context for the OpenAI API (e.g., gpt-3.5-turbo) to generate an answer.

Quick Start & Requirements

Install: pip install -r requirements.txt
Prerequisites: Python 3.7+, OpenAI API key.
Demo: OpenAI Documentation Tutorial

Highlighted Details

Demonstrates end-to-end RAG pipeline for custom data.
Utilizes OpenAI's embedding and completion models.
Focuses on practical implementation for website Q&A.

Maintenance & Community

This is an example repository from OpenAI, likely maintained as part of their documentation and examples. No specific community channels or roadmap are indicated.

Licensing & Compatibility

The repository itself is not explicitly licensed in the provided README snippet. OpenAI's general policy for example code is often permissive, but users should verify the license within the repository for commercial use or closed-source linking.

Limitations & Caveats

The example is a basic demonstration and may require significant adaptation for production use, including robust error handling, scalable data storage, and more sophisticated chunking/retrieval strategies. The specific vector database used is not detailed in the README.

Health Check

Last Commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

1 stars in the last 30 days