TeleChat2  by Tele-AI

Chinese large language model series trained on domestic hardware

Created 1 year ago
254 stars

Top 99.1% on SourcePulse

GitHubView on GitHub
Project Summary

TeleChat2 is a family of large language models developed by China Telecom Artificial Intelligence Research Institute, offering a range of parameter sizes from 3B to 115B. These models are notable for being trained entirely on domestic computing power and are designed for improved performance in general Q&A, coding, and mathematical reasoning, with specific versions supporting Function Call capabilities.

How It Works

TeleChat2 models utilize a standard Decoder-only architecture, incorporating Rotary Embedding for positional encoding, SwiGLU activation functions, and RMSNorm for pre-normalization. They employ Grouped Query Attention (GQA) to optimize parameter and computation efficiency. The training methodology involves curriculum learning, starting with high-quality educational and multilingual data, then incorporating complex reasoning and coding data, and finally fine-tuning with high-quality data for performance enhancement. For Mixture-of-Experts (MoE) models, specific optimizations are applied to communication efficiency and load balancing within the expert parallel domain.

Quick Start & Requirements

  • Installation: Models are available via Hugging Face, ModelScope, and Gitee. Inference can be performed using Hugging Face Transformers.
  • Prerequisites: PyTorch, Transformers library. Specific hardware requirements depend on model size (e.g., 115B models require significant GPU resources).
  • Resources: Detailed documentation for inference, fine-tuning (DeepSpeed, LLaMA-Factory), and quantization (AWQ, GPTQ) is available.

Highlighted Details

  • The 115B model is claimed to be the first billion-parameter model trained and open-sourced using entirely domestic computing power.
  • Models demonstrate competitive performance across various benchmarks like C-Eval, MMLU, GSM8K, and HumanEval compared to other leading models.
  • Function Call capabilities are optimized, showing strong performance on relevant benchmarks.
  • Support for various integration frameworks including LLaMA-Factory, AWQ, GPTQ, Ollama, text-generation-webui, LangChain, and LlamaIndex.

Maintenance & Community

The project is actively updated with new model releases, including MoE variants. Links to Hugging Face, ModelScope, and Gitee are provided for access.

Licensing & Compatibility

The models are released under a "TeleChat Model Community License Agreement." Commercial use requires submitting an application to tele_ai@chinatelecom.cn for a specific license grant. The license agreement also includes a declaration against using the models for illegal activities or internet services without security review.

Limitations & Caveats

The license agreement explicitly disclaims responsibility for any issues arising from the use of the models, including data security, public opinion risks, or misuse. Users are cautioned against using the models for internet services without proper security review and备案.

Health Check
Last Commit

1 month ago

Responsiveness

1 week

Pull Requests (30d)
0
Issues (30d)
0
Star History
2 stars in the last 30 days

Explore Similar Projects

Starred by Yineng Zhang Yineng Zhang(Inference Lead at SGLang; Research Scientist at Together AI).

dots.llm1 by rednote-hilab

0.2%
462
MoE model for research
Created 4 months ago
Updated 4 weeks ago
Feedback? Help us improve.