LongWriter by THUDM

Long-context LLM for 10,000+ word generation

Created 1 year ago

1,815 stars

Top 23.5% on SourcePulse

View on GitHub

2 Experts Love This Project

Luis Capelo

Cofounder of Lightning AI

Omar Sanseviero

DevRel at Google DeepMind

Project Summary

LongWriter is an open-source project enabling large language models (LLMs) to generate exceptionally long texts, exceeding 10,000 words. It targets researchers and developers working with long-context LLMs, offering models and tools to achieve extended generation capabilities, significantly reducing generation time for lengthy content.

How It Works

LongWriter fine-tunes existing long-context LLMs, such as GLM-4-9B and Llama-3.1-8B, using a proprietary dataset and training methodology. The core innovation lies in its ability to maintain coherence and quality over extended outputs, addressing a common limitation in LLM generation. This is achieved through specialized training data and potentially architectural adjustments, allowing models to handle much larger token sequences than standard fine-tuning.

Quick Start & Requirements

Install: pip install transformers>=4.43.0
Prerequisites: Python 3.x, PyTorch, Hugging Face transformers library. GPU with sufficient VRAM is highly recommended for efficient inference. CUDA 12+ is beneficial for vLLM integration.
Usage: Load models via Hugging Face transformers or vllm for accelerated inference.
Resources: Training requires significant computational resources. Inference can be resource-intensive depending on the model size and generation length.
Links: HF Repo, Paper, HF Space

Highlighted Details

Generates over 10,000 words in approximately one minute using vLLM.
Offers two fine-tuned models: LongWriter-glm4-9b and LongWriter-llama3.1-8b.
Introduces evaluation benchmarks: LongBench-Write and LongWrite-Ruler.
Open-sources AgentWrite, an automated ultra-long output data construction pipeline.

Maintenance & Community

The project is associated with THUDM (Tsinghua University) and has contributions from multiple authors. The primary development appears active, with recent updates including vLLM integration.

Licensing & Compatibility

The repository's license is not explicitly stated in the README. However, the models are hosted on Hugging Face, which typically uses Apache 2.0 or similar permissive licenses for model weights, but users should verify the specific license for each model. Compatibility with commercial use depends on the underlying base model licenses and the LongWriter fine-tuning license.

Limitations & Caveats

The README does not detail specific limitations or known issues. The effectiveness of the "ultra-long" generation may vary depending on the prompt and the specific model used. The AgentWrite pipeline requires API keys, implying potential costs for data generation.

Health Check

Last Commit

6 months ago

Responsiveness

1 day

Pull Requests (30d)

Issues (30d)

Star History

27 stars in the last 30 days