xgen  by salesforce

LLM research release with 8k sequence length

created 2 years ago
720 stars

Top 48.8% on sourcepulse

GitHubView on GitHub
Project Summary

XGen is a family of 7-billion parameter Large Language Models (LLMs) developed by Salesforce AI Research, specifically designed to handle long input sequences up to 8,000 tokens. This research release targets developers and researchers working with extended contexts, offering improved performance on tasks requiring comprehension of lengthy documents or conversations.

How It Works

XGen models are trained on an 8K input sequence length, a significant increase over many contemporary LLMs. This extended context window is achieved through architectural choices and training methodologies detailed in their associated research paper, enabling the model to maintain coherence and understanding over much longer text spans. The models leverage the OpenAI Tiktoken package for tokenization.

Quick Start & Requirements

  • Install: pip install tiktoken
  • Dependencies: PyTorch, Transformers library.
  • Usage: Models are available on HuggingFace Hub (e.g., Salesforce/xgen-7b-8k-base). The provided Python snippet demonstrates loading and generating text using the transformers library.
  • Resources: Requires sufficient GPU memory for a 7B parameter model, with torch_dtype=torch.bfloat16 recommended for efficiency.

Highlighted Details

  • Supports up to 8K sequence length for extended context understanding.
  • Available in base and instruction-tuned variants.
  • Utilizes OpenAI's Tiktoken for tokenization.
  • Research release tied to academic publication.

Maintenance & Community

This is a research release by Salesforce AI Research. Further community engagement details are not provided in the README.

Licensing & Compatibility

The models are released for research purposes only. Specific licensing terms beyond this research focus are not detailed in the README.

Limitations & Caveats

This release is for research purposes only and has not been evaluated for all downstream applications. Users are strongly advised to assess and address potential concerns regarding accuracy, safety, and fairness before deployment, especially in high-risk scenarios.

Health Check
Last commit

6 months ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
0
Star History
3 stars in the last 90 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems) and Georgios Konstantopoulos Georgios Konstantopoulos(CTO, General Partner at Paradigm).

LongLoRA by dvlab-research

0.1%
3k
LongLoRA: Efficient fine-tuning for long-context LLMs
created 1 year ago
updated 11 months ago
Feedback? Help us improve.