LongWriter  by THUDM

Long-context LLM for 10,000+ word generation

created 11 months ago
1,705 stars

Top 25.5% on sourcepulse

GitHubView on GitHub
1 Expert Loves This Project
Project Summary

LongWriter is an open-source project enabling large language models (LLMs) to generate exceptionally long texts, exceeding 10,000 words. It targets researchers and developers working with long-context LLMs, offering models and tools to achieve extended generation capabilities, significantly reducing generation time for lengthy content.

How It Works

LongWriter fine-tunes existing long-context LLMs, such as GLM-4-9B and Llama-3.1-8B, using a proprietary dataset and training methodology. The core innovation lies in its ability to maintain coherence and quality over extended outputs, addressing a common limitation in LLM generation. This is achieved through specialized training data and potentially architectural adjustments, allowing models to handle much larger token sequences than standard fine-tuning.

Quick Start & Requirements

  • Install: pip install transformers>=4.43.0
  • Prerequisites: Python 3.x, PyTorch, Hugging Face transformers library. GPU with sufficient VRAM is highly recommended for efficient inference. CUDA 12+ is beneficial for vLLM integration.
  • Usage: Load models via Hugging Face transformers or vllm for accelerated inference.
  • Resources: Training requires significant computational resources. Inference can be resource-intensive depending on the model size and generation length.
  • Links: HF Repo, Paper, HF Space

Highlighted Details

  • Generates over 10,000 words in approximately one minute using vLLM.
  • Offers two fine-tuned models: LongWriter-glm4-9b and LongWriter-llama3.1-8b.
  • Introduces evaluation benchmarks: LongBench-Write and LongWrite-Ruler.
  • Open-sources AgentWrite, an automated ultra-long output data construction pipeline.

Maintenance & Community

The project is associated with THUDM (Tsinghua University) and has contributions from multiple authors. The primary development appears active, with recent updates including vLLM integration.

Licensing & Compatibility

The repository's license is not explicitly stated in the README. However, the models are hosted on Hugging Face, which typically uses Apache 2.0 or similar permissive licenses for model weights, but users should verify the specific license for each model. Compatibility with commercial use depends on the underlying base model licenses and the LongWriter fine-tuning license.

Limitations & Caveats

The README does not detail specific limitations or known issues. The effectiveness of the "ultra-long" generation may vary depending on the prompt and the specific model used. The AgentWrite pipeline requires API keys, implying potential costs for data generation.

Health Check
Last commit

1 month ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
1
Star History
69 stars in the last 90 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems) and Georgios Konstantopoulos Georgios Konstantopoulos(CTO, General Partner at Paradigm).

LongLoRA by dvlab-research

0.1%
3k
LongLoRA: Efficient fine-tuning for long-context LLMs
created 1 year ago
updated 11 months ago
Feedback? Help us improve.