Research code for self-teaching language models
Top 48.1% on sourcepulse
This repository provides the code for Quiet-STaR, a method enabling language models to self-teach thinking processes before generating responses. It targets researchers and practitioners in LLM development seeking to improve reasoning capabilities. The primary benefit is enhanced, more coherent output through an internal "thought" generation process.
How It Works
Quiet-STaR modifies the Mistral architecture by introducing a "thought" generation phase. This involves patching Hugging Face's transformers
library (specifically version 4.37.0.dev0) with custom modeling_mistral.py
and configuration_mistral.py
files. The model learns to generate intermediate thought tokens alongside its final output, which are then masked during inference to produce cleaner results.
Quick Start & Requirements
transformers
version 4.37.0.dev0.Highlighted Details
Trainer
for ease of use.Maintenance & Community
No specific community channels or maintenance details are provided in the README.
Licensing & Compatibility
The repository's license is not specified in the README. Compatibility with commercial or closed-source projects is not detailed.
Limitations & Caveats
The model is not inherently trained to avoid generating start/end thought tokens, necessitating manual masking during inference. The implementation is tied to a specific, potentially development version of Hugging Face transformers
(4.37.0.dev0), raising concerns about future compatibility and reproducibility.
11 months ago
Inactive