calm  by shaochenze

Continuous Autoregressive Language Models for efficient text generation

Created 2 months ago
686 stars

Top 49.7% on SourcePulse

GitHubView on GitHub
Project Summary

CALM (Continuous Autoregressive Language Models) introduces a paradigm shift to overcome the token-by-token generation bottleneck in Large Language Models (LLMs). It enables predicting a single continuous vector representing an entire chunk of K tokens, significantly improving training and inference efficiency. This approach offers a novel scaling dimension for LLMs, termed "semantic bandwidth," benefiting researchers and practitioners seeking more efficient and scalable language models.

How It Works

CALM employs a two-stage process. First, a high-fidelity autoencoder compresses K tokens into a continuous vector and reconstructs them with near-perfect accuracy. Second, a continuous-domain language model performs autoregressive prediction in this vector space. This method reduces the number of autoregressive steps by a factor of K, leading to substantial efficiency gains and enabling scaling based on semantic bandwidth.

Quick Start & Requirements

  • Installation: Clone the repository (git clone https://github.com/shaochenze/calm.git) and install dependencies (pip install -r requirements.txt).
  • Data Preparation: Download and process "the pile-uncopyrighted" dataset using bash data/get_data.sh. Requires at least 2.5TB of free disk space.
  • Training: Two main stages:
    1. Train the autoencoder (bash train/train_autoencoder.sh).
    2. Train the CALM language model using energy-based training (bash train/train_energy.sh).
  • Alternative Training: Scripts for Diffusion and Flow Matching generative heads are available (train/train_diffusion.sh, train/train_flow.sh).
  • Baseline: A standard autoregressive Transformer baseline can be trained (train/train_ar.sh).
  • Evaluation: Use bash train/eval_energy.sh to evaluate checkpoints.
  • Prerequisites: Python, PyTorch (implied by torchrun), sufficient disk space (2.5TB+), and multi-GPU setup (e.g., 8 GPUs per node). Scripts utilize bf16 for mixed-precision training.

Highlighted Details

  • Ultra-Efficient: Dramatically improves training and inference efficiency by reducing autoregressive steps by a factor of K.
  • New Scaling Axis: Introduces "semantic bandwidth" (K) as a dimension for LLM scaling, beyond parameters and data.
  • Likelihood-Free Toolkit: Provides algorithms for continuous domain modeling, including a robust autoencoder, Energy-Based Training, BrierLM metric for evaluation, and Temperature Sampling.
  • Performance Claims: Pre-trained CALM models achieve BrierLM scores: CALM-M (371M) at 5.72, CALM-L (735M) at 6.58, CALM-XL (1.82B) at 8.53. The baseline AR model is expected to reach ~6.05.

Maintenance & Community

  • Contact: For questions, submit an issue or contact chenzeshao@tencent.com.
  • No specific community links (Discord/Slack) or roadmap are mentioned.

Licensing & Compatibility

  • The README does not explicitly state the license type or provide compatibility notes for commercial use.

Limitations & Caveats

  • Requires a substantial 2.5TB+ disk space for the dataset.
  • Alternative generative heads (Diffusion, Flow Matching) showed slightly lower performance compared to the Energy-based head in experiments.
  • The README does not specify hardware requirements beyond what's implied by the training scripts (e.g., multi-GPU setup).
Health Check
Last Commit

1 month ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
1
Star History
22 stars in the last 30 days

Explore Similar Projects

Starred by Shizhe Diao Shizhe Diao(Author of LMFlow; Research Scientist at NVIDIA), Yineng Zhang Yineng Zhang(Inference Lead at SGLang; Research Scientist at Together AI), and
8 more.

EAGLE by SafeAILab

0.9%
2k
Speculative decoding research paper for faster LLM inference
Created 2 years ago
Updated 3 weeks ago
Feedback? Help us improve.