OpenGPT  by CogStack

Framework for grounded instruction datasets and domain-expert LLMs

created 2 years ago
358 stars

Top 79.2% on sourcepulse

GitHubView on GitHub
Project Summary

OpenGPT is a framework for building instruction-following datasets and training domain-specific conversational LLMs, particularly for healthcare. It empowers users to create high-quality, grounded datasets and train custom models, exemplified by the NHS-LLM, a healthcare-focused conversational AI.

How It Works

The framework facilitates a data-centric approach to LLM training. It involves collecting domain-specific text data, using a prompt database or custom prompts to generate question-answer pairs or conversational data, and then training an LLM on this curated dataset. This method ensures the LLM is "grounded" in the specific domain knowledge.

Quick Start & Requirements

  • Install via pip: pip install opengpt
  • For LLaMA models, install additional requirements: pip install -r ./llama_train_requirements.txt
  • Tutorials and example datasets are available, including a Google Colab notebook for creating a mini healthcare LLM.

Highlighted Details

  • Provides pre-generated datasets for healthcare, including NHS UK Q/A and conversations.
  • Includes a prompt database and notebooks for prompt creation and dataset generation.
  • Supports training custom conversational LLMs on user-generated datasets.

Maintenance & Community

  • Project is associated with CogStack.
  • Further questions can be directed to their discourse forum.

Licensing & Compatibility

  • The README does not explicitly state a license.

Limitations & Caveats

The framework's primary focus is on dataset generation and LLM training, with limited information provided on pre-trained models or advanced deployment features. The absence of a stated license may pose compatibility concerns for commercial use.

Health Check
Last commit

2 years ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
0
Star History
10 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.