OpenGPT by CogStack

Framework for grounded instruction datasets and domain-expert LLMs

Created 2 years ago

360 stars

Top 78.3% on SourcePulse

View on GitHub

3 Experts Love This Project

Wing Lian

Founder of Axolotl AI

Teknium

Cofounder of Nous Research

Omar Sanseviero

DevRel at Google DeepMind

Project Summary

OpenGPT is a framework for building instruction-following datasets and training domain-specific conversational LLMs, particularly for healthcare. It empowers users to create high-quality, grounded datasets and train custom models, exemplified by the NHS-LLM, a healthcare-focused conversational AI.

How It Works

The framework facilitates a data-centric approach to LLM training. It involves collecting domain-specific text data, using a prompt database or custom prompts to generate question-answer pairs or conversational data, and then training an LLM on this curated dataset. This method ensures the LLM is "grounded" in the specific domain knowledge.

Quick Start & Requirements

Install via pip: pip install opengpt
For LLaMA models, install additional requirements: pip install -r ./llama_train_requirements.txt
Tutorials and example datasets are available, including a Google Colab notebook for creating a mini healthcare LLM.

Highlighted Details

Provides pre-generated datasets for healthcare, including NHS UK Q/A and conversations.
Includes a prompt database and notebooks for prompt creation and dataset generation.
Supports training custom conversational LLMs on user-generated datasets.

Maintenance & Community

Project is associated with CogStack.
Further questions can be directed to their discourse forum.

Licensing & Compatibility

The README does not explicitly state a license.

Limitations & Caveats

The framework's primary focus is on dataset generation and LLM training, with limited information provided on pre-trained models or advanced deployment features. The absence of a stated license may pose compatibility concerns for commercial use.

Health Check

Last Commit

2 years ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

0 stars in the last 30 days