XVERSE-13B  by xverse-ai

Multilingual LLM for chat, knowledge QA, and code generation

created 2 years ago
645 stars

Top 52.6% on sourcepulse

GitHubView on GitHub
Project Summary

XVERSE-13B is a multilingual large language model developed by XVERSE Technology Inc., designed for tasks requiring extensive context understanding and generation. It targets researchers and developers needing a powerful, open-source LLM with strong multilingual capabilities and a long context window, offering significant advantages in handling complex queries and extended dialogues.

How It Works

XVERSE-13B utilizes a standard Decoder-only Transformer architecture. Its key innovation lies in its extended 8K context length, the longest among models of its size, enabling more comprehensive multi-turn conversations and detailed analysis. The model is trained on a massive 3.2 trillion token dataset encompassing over 40 languages, with a focus on achieving superior performance in Chinese and English. A custom BPE tokenizer with a 100,534 token vocabulary supports multilingualism efficiently.

Quick Start & Requirements

  • Install: pip install -r requirements.txt
  • Dependencies: Transformers library, PyTorch.
  • Usage: Load via Hugging Face transformers library. Example code provided for loading and inference.
  • Demo: A chat_demo.py script is available for running a web server.
  • Links: Hugging Face, ModelScope, OpenXLab

Highlighted Details

  • Supports INT8 and INT4 quantization, significantly reducing VRAM requirements (INT4: 10.9GB VRAM, 55.0 MMLU accuracy).
  • Offers a 256K context window version (XVERSE-13B-256K) for extremely long sequence tasks.
  • Achieves competitive benchmark scores, outperforming models like Llama2-13B in Chinese benchmarks (e.g., C-Eval 63.5 vs 35.6).
  • Supports full fine-tuning using frameworks like LLaMA Efficient Tuning with DeepSpeed.

Maintenance & Community

  • Active development with recent updates including GGUF/GPTQ quantization and the 256K context model.
  • Community support via WeChat (Chinese).

Licensing & Compatibility

  • Source code licensed under Apache-2.0.
  • Model weights require adherence to a specific Model License Agreement.
  • Weights are fully open for academic research and free for commercial use. Commercial license applications are available.

Limitations & Caveats

Like all LLMs, XVERSE-13B may produce inaccurate, biased, or offensive content. Developers must conduct safety testing and tuning for specific applications. The model's knowledge cutoff is July 2023. The repository warns against using the model for harmful purposes and disclaims liability for misuse.

Health Check
Last commit

1 year ago

Responsiveness

1 week

Pull Requests (30d)
0
Issues (30d)
0
Star History
1 stars in the last 90 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), Travis Fischer Travis Fischer(Founder of Agentic), and
6 more.

codellama by meta-llama

0.1%
16k
Inference code for CodeLlama models
created 1 year ago
updated 11 months ago
Feedback? Help us improve.