LongChat  by DachengLi1

Long-context LLM chatbot training and evaluation framework

created 2 years ago
524 stars

Top 61.1% on sourcepulse

GitHubView on GitHub
Project Summary

LongChat provides an open-source framework for training and evaluating long-context Large Language Models (LLMs) for chatbot applications. It addresses the challenge of extending LLM context windows, enabling chatbots to process and generate responses based on significantly larger amounts of text. The project is suitable for researchers and developers working on advanced NLP tasks requiring extended context understanding.

How It Works

LongChat leverages techniques to extend the context length of existing LLMs, notably Llama 2, to 32K tokens. The training process involves fine-tuning with specific scripts like train_condense_16K.py, which utilizes techniques such as FlashAttention for efficient processing of long sequences. This approach aims to maintain performance and coherence over extended contexts, a common challenge in LLM development.

Quick Start & Requirements

  • Installation: pip install longchat or pip install -e . after cloning the repository.
  • Prerequisites: Python 3.10, Conda environment recommended. FlashAttention is recommended for very long sequence lengths. Training example requires 8xA100 GPUs.
  • Resources: Training requires significant GPU resources. Evaluation can be performed on models like lmsys/longchat-13b-16k.
  • Links: HuggingFace Models, Blog Post

Highlighted Details

  • Supports up to 32K context lengths with LongChat v1.5 based on Llama 2.
  • Includes evaluation tools (longeval) for tasks like topic and line recall.
  • Provides scripts for fine-tuning and generating custom test cases.
  • Offers pre-trained models like LongChat-13b-16k and LongChat-7b-16k on HuggingFace.

Maintenance & Community

The project is actively maintained, with recent updates including LongChat v1.5. Further community engagement details (e.g., Discord/Slack) are not explicitly provided in the README.

Licensing & Compatibility

The repository does not explicitly state a license. However, it is based on Llama 2, which has its own usage policies. Compatibility for commercial use or closed-source linking would depend on the underlying Llama 2 license and any specific terms set by the LongChat project.

Limitations & Caveats

The provided training script example assumes specific hardware (8xA100 GPUs) and uses dummy data, requiring adaptation for real-world use cases. The "topics" evaluation task's output requires manual inspection or automated parsing (e.g., via GPT-3.5-turbo), which may introduce variability.

Health Check
Last commit

1 year ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
0
Star History
5 stars in the last 90 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), Jeff Hammerbacher Jeff Hammerbacher(Cofounder of Cloudera), and
1 more.

yarn by jquesnelle

1.0%
2k
Context window extension method for LLMs (research paper, models)
created 2 years ago
updated 1 year ago
Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems) and Georgios Konstantopoulos Georgios Konstantopoulos(CTO, General Partner at Paradigm).

LongLoRA by dvlab-research

0.1%
3k
LongLoRA: Efficient fine-tuning for long-context LLMs
created 1 year ago
updated 11 months ago
Starred by Andrej Karpathy Andrej Karpathy(Founder of Eureka Labs; Formerly at Tesla, OpenAI; Author of CS 231n), George Hotz George Hotz(Author of tinygrad; Founder of the tiny corp, comma.ai), and
10 more.

TinyLlama by jzhang38

0.3%
9k
Tiny pretraining project for a 1.1B Llama model
created 1 year ago
updated 1 year ago
Feedback? Help us improve.