XVERSE-13B by xverse-ai

Multilingual LLM for chat, knowledge QA, and code generation

Created 2 years ago

645 stars

Top 51.7% on SourcePulse

View on GitHub

1 Expert Loves This Project

Yaowei Zheng

Author of LLaMA-Factory

Project Summary

XVERSE-13B is a multilingual large language model developed by XVERSE Technology Inc., designed for tasks requiring extensive context understanding and generation. It targets researchers and developers needing a powerful, open-source LLM with strong multilingual capabilities and a long context window, offering significant advantages in handling complex queries and extended dialogues.

How It Works

XVERSE-13B utilizes a standard Decoder-only Transformer architecture. Its key innovation lies in its extended 8K context length, the longest among models of its size, enabling more comprehensive multi-turn conversations and detailed analysis. The model is trained on a massive 3.2 trillion token dataset encompassing over 40 languages, with a focus on achieving superior performance in Chinese and English. A custom BPE tokenizer with a 100,534 token vocabulary supports multilingualism efficiently.

Quick Start & Requirements

Install: pip install -r requirements.txt
Dependencies: Transformers library, PyTorch.
Usage: Load via Hugging Face transformers library. Example code provided for loading and inference.
Demo: A chat_demo.py script is available for running a web server.
Links: Hugging Face, ModelScope, OpenXLab

Highlighted Details

Supports INT8 and INT4 quantization, significantly reducing VRAM requirements (INT4: 10.9GB VRAM, 55.0 MMLU accuracy).
Offers a 256K context window version (XVERSE-13B-256K) for extremely long sequence tasks.
Achieves competitive benchmark scores, outperforming models like Llama2-13B in Chinese benchmarks (e.g., C-Eval 63.5 vs 35.6).
Supports full fine-tuning using frameworks like LLaMA Efficient Tuning with DeepSpeed.

Maintenance & Community

Active development with recent updates including GGUF/GPTQ quantization and the 256K context model.
Community support via WeChat (Chinese).

Licensing & Compatibility

Source code licensed under Apache-2.0.
Model weights require adherence to a specific Model License Agreement.
Weights are fully open for academic research and free for commercial use. Commercial license applications are available.

Limitations & Caveats

Like all LLMs, XVERSE-13B may produce inaccurate, biased, or offensive content. Developers must conduct safety testing and tuning for specific applications. The model's knowledge cutoff is July 2023. The repository warns against using the model for harmful purposes and disclaims liability for misuse.

XVERSE-13B by xverse-ai

Explore Similar Projects

Chinese-Mixtral by ymcui

Index-1.9B by bilibili

YuLan-Chat by RUC-GSAI

BlueLM by vivo-ai-lab

TigerBot by TigerResearch

Baichuan-13B by baichuan-inc

Kimi-K2 by MoonshotAI

MOSS by OpenMOSS

ChatGLM3 by zai-org

Qwen by QwenLM

ChatGLM2-6B by zai-org

ChatGLM-6B by zai-org