coconut  by facebookresearch

Research paper implementation for LLM reasoning in latent space

created 6 months ago
1,218 stars

Top 32.9% on sourcepulse

GitHubView on GitHub
1 Expert Loves This Project
Project Summary

This repository provides the official implementation for training Large Language Models (LLMs) to reason within a continuous latent space, addressing the challenge of structured reasoning in LLMs. It is intended for researchers and practitioners working on advancing LLM reasoning capabilities.

How It Works

The project introduces a novel training methodology that enables LLMs to learn and operate within a continuous latent space for reasoning. This approach aims to improve the structured and step-by-step reasoning abilities of LLMs by explicitly modeling intermediate reasoning steps. The core idea involves training stages that progressively refine the model's ability to generate and utilize continuous latent representations for problem-solving.

Quick Start & Requirements

  • Install: Clone the repository and install dependencies using pip install -r requirements.txt within a conda environment with Python 3.12.
  • Prerequisites: Requires wandb for logging. Data should be in a specific JSON format.
  • Resources: Example commands suggest using torchrun with multiple GPUs (e.g., 4x A100 80GB).
  • Links:
    • Data preprocessing for GSM8K: preprocessing/gsm_icot.bash
    • Configuration examples: args/ directory

Highlighted Details

  • Supports training for Coconut, CoT, Coconut (w/o thoughts), and no-CoT models.
  • Configurable training parameters include epochs per stage, latent space padding, and learning rate.
  • Enables loading checkpoints for evaluation or initializing Coconut from CoT-tuned models.
  • Includes instructions and configurations for reproducing experiments on GSM8K, ProntoQA, and ProsQA datasets.

Maintenance & Community

The project is from Meta AI (facebookresearch). No specific community links (Discord/Slack) or roadmap are provided in the README.

Licensing & Compatibility

  • License: MIT License.
  • Compatibility: Permissive MIT license allows for commercial use and integration with closed-source projects.

Limitations & Caveats

The README implies significant computational resources (multiple high-end GPUs) are needed for reproducing experiments. The debug mode disables logging and model saving, potentially hindering analysis.

Health Check
Last commit

6 months ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
0
Star History
138 stars in the last 90 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), Jeff Hammerbacher Jeff Hammerbacher(Cofounder of Cloudera), and
10 more.

open-r1 by huggingface

0.2%
25k
SDK for reproducing DeepSeek-R1
created 6 months ago
updated 3 days ago
Feedback? Help us improve.