XVERSE-13B  by xverse-ai

Multilingual LLM for chat, knowledge QA, and code generation

Created 2 years ago
645 stars

Top 51.6% on SourcePulse

GitHubView on GitHub
1 Expert Loves This Project
Project Summary

XVERSE-13B is a multilingual large language model developed by XVERSE Technology Inc., designed for tasks requiring extensive context understanding and generation. It targets researchers and developers needing a powerful, open-source LLM with strong multilingual capabilities and a long context window, offering significant advantages in handling complex queries and extended dialogues.

How It Works

XVERSE-13B utilizes a standard Decoder-only Transformer architecture. Its key innovation lies in its extended 8K context length, the longest among models of its size, enabling more comprehensive multi-turn conversations and detailed analysis. The model is trained on a massive 3.2 trillion token dataset encompassing over 40 languages, with a focus on achieving superior performance in Chinese and English. A custom BPE tokenizer with a 100,534 token vocabulary supports multilingualism efficiently.

Quick Start & Requirements

  • Install: pip install -r requirements.txt
  • Dependencies: Transformers library, PyTorch.
  • Usage: Load via Hugging Face transformers library. Example code provided for loading and inference.
  • Demo: A chat_demo.py script is available for running a web server.
  • Links: Hugging Face, ModelScope, OpenXLab

Highlighted Details

  • Supports INT8 and INT4 quantization, significantly reducing VRAM requirements (INT4: 10.9GB VRAM, 55.0 MMLU accuracy).
  • Offers a 256K context window version (XVERSE-13B-256K) for extremely long sequence tasks.
  • Achieves competitive benchmark scores, outperforming models like Llama2-13B in Chinese benchmarks (e.g., C-Eval 63.5 vs 35.6).
  • Supports full fine-tuning using frameworks like LLaMA Efficient Tuning with DeepSpeed.

Maintenance & Community

  • Active development with recent updates including GGUF/GPTQ quantization and the 256K context model.
  • Community support via WeChat (Chinese).

Licensing & Compatibility

  • Source code licensed under Apache-2.0.
  • Model weights require adherence to a specific Model License Agreement.
  • Weights are fully open for academic research and free for commercial use. Commercial license applications are available.

Limitations & Caveats

Like all LLMs, XVERSE-13B may produce inaccurate, biased, or offensive content. Developers must conduct safety testing and tuning for specific applications. The model's knowledge cutoff is July 2023. The repository warns against using the model for harmful purposes and disclaims liability for misuse.

Health Check
Last Commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
1
Star History
1 stars in the last 30 days

Explore Similar Projects

Starred by Phil Wang Phil Wang(Prolific Research Paper Implementer), Lianmin Zheng Lianmin Zheng(Coauthor of SGLang, vLLM), and
6 more.

Kimi-K2 by MoonshotAI

1.7%
8k
State-of-the-art MoE language model
Created 2 months ago
Updated 1 week ago
Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), Omar Sanseviero Omar Sanseviero(DevRel at Google DeepMind), and
5 more.

MOSS by OpenMOSS

0.0%
12k
Open-source tool-augmented conversational language model
Created 2 years ago
Updated 1 year ago
Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), Jeff Hammerbacher Jeff Hammerbacher(Cofounder of Cloudera), and
4 more.

ChatGLM-6B by zai-org

0.0%
41k
Bilingual dialogue language model for research
Created 2 years ago
Updated 1 year ago
Feedback? Help us improve.