quiet-star  by ezelikman

Research code for self-teaching language models

Created 1 year ago
741 stars

Top 46.8% on SourcePulse

GitHubView on GitHub
Project Summary

This repository provides the code for Quiet-STaR, a method enabling language models to self-teach thinking processes before generating responses. It targets researchers and practitioners in LLM development seeking to improve reasoning capabilities. The primary benefit is enhanced, more coherent output through an internal "thought" generation process.

How It Works

Quiet-STaR modifies the Mistral architecture by introducing a "thought" generation phase. This involves patching Hugging Face's transformers library (specifically version 4.37.0.dev0) with custom modeling_mistral.py and configuration_mistral.py files. The model learns to generate intermediate thought tokens alongside its final output, which are then masked during inference to produce cleaner results.

Quick Start & Requirements

  • Install: Requires Hugging Face transformers version 4.37.0.dev0.
  • Prerequisites: Python, PyTorch, Hugging Face libraries.
  • Inference: Requires masking of start and end thought tokens during generation. An 8-token ahead model is available on Hugging Face.

Highlighted Details

  • Implements the Quiet-STaR method for self-taught reasoning in LLMs.
  • Leverages standard Hugging Face Trainer for ease of use.
  • Requires careful masking of thought tokens during inference.

Maintenance & Community

No specific community channels or maintenance details are provided in the README.

Licensing & Compatibility

The repository's license is not specified in the README. Compatibility with commercial or closed-source projects is not detailed.

Limitations & Caveats

The model is not inherently trained to avoid generating start/end thought tokens, necessitating manual masking during inference. The implementation is tied to a specific, potentially development version of Hugging Face transformers (4.37.0.dev0), raising concerns about future compatibility and reproducibility.

Health Check
Last Commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
3 stars in the last 30 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), Neel Nanda Neel Nanda(Research Scientist at Google DeepMind), and
1 more.

TransformerLens by TransformerLensOrg

1.0%
3k
Library for mechanistic interpretability research on GPT-style language models
Created 3 years ago
Updated 1 day ago
Starred by Peter Norvig Peter Norvig(Author of "Artificial Intelligence: A Modern Approach"; Research Director at Google), Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), and
2 more.

Hands-On-Large-Language-Models by HandsOnLLM

1.4%
16k
Code examples for "Hands-On Large Language Models" book
Created 1 year ago
Updated 1 month ago
Feedback? Help us improve.