Discover and explore top open-source AI tools and projects—updated daily.
Mwie1024Extreme-ratio Chain-of-Thought compression for efficient LLM reasoning
Top 53.9% on SourcePulse
Extra-CoT introduces a novel three-stage framework for compressing Chain-of-Thought (CoT) reasoning in Large Language Models (LLMs) to extreme token budgets, targeting up to 80% reduction while preserving reasoning fidelity and achieving significant wall-clock speedups. It is designed for researchers and engineers seeking to deploy efficient LLM reasoning capabilities without sacrificing accuracy, enabling faster and more cost-effective inference.
How It Works
Extra-CoT employs a three-stage approach: Stage 1 (Compressor) generates high-fidelity compressed rationales by preserving critical information like formulas and anchors. Stage 2 (Mixed-ratio SFT) trains a single model to reliably follow multiple compression ratios, preventing "control collapse" at low budgets. Stage 3 (CHRPO) utilizes a hierarchical reinforcement learning algorithm to learn an adaptive policy, enabling the model to dynamically allocate tokens for ultra-low budgets. This method tackles the common failure mode of extreme CoT compression where symbolic consistency and controllability degrade.
Quick Start & Requirements
This repository provides code for SFT (Supervised Fine-Tuning) and vLLM-based evaluation, along with a ratio-controlled inference interface.
LLaMA-Factory directory and execute: FORCE_TORCHRUN=1 NNODES=1 NODE_RANK=0 MASTER_ADDR=0.0.0.0 MASTER_PORT=12345 llamafactory-cli train examples/train_full/qwen3-1.7b_full_sft.yamlvllm serve your_model_path --served-model-name local_core_model --host 0.0.0.0 --port 8000 --max-model-len 20000python eval_all_ratios_vllm.py --host 127.0.0.1 --port 8000 --model local_core_model --output_dir outputs/qwen3-1.7bHighlighted Details
<COMP_POLICY> mode enables dynamic token allocation, achieving 85.8% accuracy on GSM8K with only 0.24 realized compression ratio.<COMP_XX>, <COMP_POLICY>), a vLLM-based evaluation script, and LLaMA-Factory integration for SFT.Maintenance & Community
No specific community channels (e.g., Discord, Slack), roadmap, or maintenance details are provided in the README. The project appears to be research-driven, with contributions from the authors of the associated paper.
Licensing & Compatibility
The project's license is not specified in the README. This omission presents a significant blocker for evaluating commercial use or closed-source integration compatibility.
Limitations & Caveats
The repository primarily provides code for inference, evaluation, and SFT fine-tuning. It does not explicitly include the training code for the Stage 1 Compressor or the Stage 3 CHRPO policy. The absence of a specified license is a critical limitation for adoption. Setup requires familiarity with LLaMA-Factory and vLLM.
4 days ago
Inactive