LongChat by DachengLi1

Long-context LLM chatbot training and evaluation framework

Created 2 years ago

533 stars

Top 59.4% on SourcePulse

View on GitHub

7 Experts Love This Project

Casper Hansen

Author of AutoAWQ

Yaowei Zheng

Author of LLaMA-Factory

Binyuan Hui

Research Scientist at Alibaba Qwen

Pawel Garbacki

Cofounder of Fireworks AI

and 3 more!

Project Summary

LongChat provides an open-source framework for training and evaluating long-context Large Language Models (LLMs) for chatbot applications. It addresses the challenge of extending LLM context windows, enabling chatbots to process and generate responses based on significantly larger amounts of text. The project is suitable for researchers and developers working on advanced NLP tasks requiring extended context understanding.

How It Works

LongChat leverages techniques to extend the context length of existing LLMs, notably Llama 2, to 32K tokens. The training process involves fine-tuning with specific scripts like train_condense_16K.py, which utilizes techniques such as FlashAttention for efficient processing of long sequences. This approach aims to maintain performance and coherence over extended contexts, a common challenge in LLM development.

Quick Start & Requirements

Installation: pip install longchat or pip install -e . after cloning the repository.
Prerequisites: Python 3.10, Conda environment recommended. FlashAttention is recommended for very long sequence lengths. Training example requires 8xA100 GPUs.
Resources: Training requires significant GPU resources. Evaluation can be performed on models like lmsys/longchat-13b-16k.
Links: HuggingFace Models, Blog Post

Highlighted Details

Supports up to 32K context lengths with LongChat v1.5 based on Llama 2.
Includes evaluation tools (longeval) for tasks like topic and line recall.
Provides scripts for fine-tuning and generating custom test cases.
Offers pre-trained models like LongChat-13b-16k and LongChat-7b-16k on HuggingFace.

Maintenance & Community

The project is actively maintained, with recent updates including LongChat v1.5. Further community engagement details (e.g., Discord/Slack) are not explicitly provided in the README.

Licensing & Compatibility

The repository does not explicitly state a license. However, it is based on Llama 2, which has its own usage policies. Compatibility for commercial use or closed-source linking would depend on the underlying Llama 2 license and any specific terms set by the LongChat project.

Limitations & Caveats

The provided training script example assumes specific hardware (8xA100 GPUs) and uses dummy data, requiring adaptation for real-world use cases. The "topics" evaluation task's output requires manual inspection or automated parsing (e.g., via GPT-3.5-turbo), which may introduce variability.

Health Check

Last Commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

0 stars in the last 30 days