xgen by salesforce

LLM research release with 8k sequence length

Created 2 years ago

723 stars

Top 47.7% on SourcePulse

View on GitHub

5 Experts Love This Project

Junyang Lin

Core Maintainer at Alibaba Qwen

Binyuan Hui

Research Scientist at Alibaba Qwen

Ying Sheng

Coauthor of SGLang

Jared Palmer

SVP at GitHub; Founder of Turborepo; Author of Formik, TSDX

and 1 more!

Project Summary

XGen is a family of 7-billion parameter Large Language Models (LLMs) developed by Salesforce AI Research, specifically designed to handle long input sequences up to 8,000 tokens. This research release targets developers and researchers working with extended contexts, offering improved performance on tasks requiring comprehension of lengthy documents or conversations.

How It Works

XGen models are trained on an 8K input sequence length, a significant increase over many contemporary LLMs. This extended context window is achieved through architectural choices and training methodologies detailed in their associated research paper, enabling the model to maintain coherence and understanding over much longer text spans. The models leverage the OpenAI Tiktoken package for tokenization.

Quick Start & Requirements

Install: pip install tiktoken
Dependencies: PyTorch, Transformers library.
Usage: Models are available on HuggingFace Hub (e.g., Salesforce/xgen-7b-8k-base). The provided Python snippet demonstrates loading and generating text using the transformers library.
Resources: Requires sufficient GPU memory for a 7B parameter model, with torch_dtype=torch.bfloat16 recommended for efficiency.

Highlighted Details

Supports up to 8K sequence length for extended context understanding.
Available in base and instruction-tuned variants.
Utilizes OpenAI's Tiktoken for tokenization.
Research release tied to academic publication.

Maintenance & Community

This is a research release by Salesforce AI Research. Further community engagement details are not provided in the README.

Licensing & Compatibility

The models are released for research purposes only. Specific licensing terms beyond this research focus are not detailed in the README.

Limitations & Caveats

This release is for research purposes only and has not been evaluated for all downstream applications. Users are strongly advised to assess and address potential concerns regarding accuracy, safety, and fairness before deployment, especially in high-risk scenarios.

Health Check

Last Commit

11 months ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

1 stars in the last 30 days