Framework for grounded instruction datasets and domain-expert LLMs
Top 79.2% on sourcepulse
OpenGPT is a framework for building instruction-following datasets and training domain-specific conversational LLMs, particularly for healthcare. It empowers users to create high-quality, grounded datasets and train custom models, exemplified by the NHS-LLM, a healthcare-focused conversational AI.
How It Works
The framework facilitates a data-centric approach to LLM training. It involves collecting domain-specific text data, using a prompt database or custom prompts to generate question-answer pairs or conversational data, and then training an LLM on this curated dataset. This method ensures the LLM is "grounded" in the specific domain knowledge.
Quick Start & Requirements
pip install opengpt
pip install -r ./llama_train_requirements.txt
Highlighted Details
Maintenance & Community
Licensing & Compatibility
Limitations & Caveats
The framework's primary focus is on dataset generation and LLM training, with limited information provided on pre-trained models or advanced deployment features. The absence of a stated license may pose compatibility concerns for commercial use.
2 years ago
1 day