Discover and explore top open-source AI tools and projects—updated daily.
Taxonomy for LLM alignment tuning via synthetic data generation
Top 91.8% on SourcePulse
This repository provides a structured taxonomy for contributing "skills" and "knowledge" to train Large Language Models (LLMs) using the InstructLab's Large-Scale Alignment for Chatbots (LAB) method. It targets researchers, developers, and power users seeking to enhance LLM capabilities with curated, synthetic data derived from community contributions.
How It Works
The core of the project is a hierarchical directory structure representing domains and subdomains, inspired by the Dewey Decimal Classification system. Contributions are made via qna.yaml
files at the leaf nodes, containing question-answer pairs and optional context. Skills are performative or instructional, while knowledge contributions are fact-based, referencing external documents stored in a separate Git repository. This structured approach enables the generation of targeted synthetic data for LLM alignment tuning.
Quick Start & Requirements
CONTRIBUTING.md
file.Highlighted Details
Maintenance & Community
Licensing & Compatibility
Limitations & Caveats
1 month ago
1 week