TeleChat2 by Tele-AI

Chinese large language model series trained on domestic hardware

Created 1 year ago

266 stars

Top 96.3% on SourcePulse

Project Summary

TeleChat2 is a family of large language models developed by China Telecom Artificial Intelligence Research Institute, offering a range of parameter sizes from 3B to 115B. These models are notable for being trained entirely on domestic computing power and are designed for improved performance in general Q&A, coding, and mathematical reasoning, with specific versions supporting Function Call capabilities.

How It Works

TeleChat2 models utilize a standard Decoder-only architecture, incorporating Rotary Embedding for positional encoding, SwiGLU activation functions, and RMSNorm for pre-normalization. They employ Grouped Query Attention (GQA) to optimize parameter and computation efficiency. The training methodology involves curriculum learning, starting with high-quality educational and multilingual data, then incorporating complex reasoning and coding data, and finally fine-tuning with high-quality data for performance enhancement. For Mixture-of-Experts (MoE) models, specific optimizations are applied to communication efficiency and load balancing within the expert parallel domain.

Quick Start & Requirements

Installation: Models are available via Hugging Face, ModelScope, and Gitee. Inference can be performed using Hugging Face Transformers.
Prerequisites: PyTorch, Transformers library. Specific hardware requirements depend on model size (e.g., 115B models require significant GPU resources).
Resources: Detailed documentation for inference, fine-tuning (DeepSpeed, LLaMA-Factory), and quantization (AWQ, GPTQ) is available.

Highlighted Details

The 115B model is claimed to be the first billion-parameter model trained and open-sourced using entirely domestic computing power.
Models demonstrate competitive performance across various benchmarks like C-Eval, MMLU, GSM8K, and HumanEval compared to other leading models.
Function Call capabilities are optimized, showing strong performance on relevant benchmarks.
Support for various integration frameworks including LLaMA-Factory, AWQ, GPTQ, Ollama, text-generation-webui, LangChain, and LlamaIndex.

Maintenance & Community

The project is actively updated with new model releases, including MoE variants. Links to Hugging Face, ModelScope, and Gitee are provided for access.

Licensing & Compatibility

The models are released under a "TeleChat Model Community License Agreement." Commercial use requires submitting an application to tele_ai@chinatelecom.cn for a specific license grant. The license agreement also includes a declaration against using the models for illegal activities or internet services without security review.

Limitations & Caveats

The license agreement explicitly disclaims responsibility for any issues arising from the use of the models, including data security, public opinion risks, or misuse. Users are cautioned against using the models for internet services without proper security review and备案.

TeleChat2 by Tele-AI

Explore Similar Projects

r1-ktransformers-guide by ubergarm

Aurora by WangRongsheng

dots.llm1 by rednote-hilab

llm-inference-benchmark by ninehills

Telechat by Tele-AI

tiny-llm-zh by wdndev

Tencent-Hunyuan-Large by Tencent-Hunyuan

KuiperLLama by zjhellofss

Skywork by SkyworkAI

starcoder2 by bigcode-project

Baichuan-7B by baichuan-inc

ChatGLM3 by zai-org