Telechat by Tele-AI

Chinese LLM for dialogue, long-form generation, and code assistance

Created 2 years ago

1,854 stars

Top 23.2% on SourcePulse

Project Summary

TeleChat is a suite of large language models developed by China Telecom AI, offering 1B, 7B, and 12B parameter versions trained on extensive Chinese and English datasets. It aims to provide robust conversational AI capabilities for general Q&A, coding, and mathematical tasks, with specialized versions for long-form content generation and improved multi-turn dialogue.

How It Works

TeleChat models utilize a standard Decoder-only architecture enhanced with Rotary Embedding for positional encoding, SwiGLU activation functions, and RMSNorm for layer normalization. The 12B model features a decoupled word embedding and output layer for improved training stability. Training employs scientific data ratio learning and curriculum learning, dynamically adjusting dataset weights based on model performance to ensure comprehensive learning across diverse data types.

Quick Start & Requirements

Installation: Primarily through Hugging Face transformers library.
Dependencies: Python, PyTorch, transformers, deepspeed, auto-gptq. GPU acceleration (NVIDIA recommended) is essential for efficient operation. CUDA 11.x or higher is implied for GPU usage.
Resources: Requires significant GPU memory, especially for the 12B model and longer sequence lengths (e.g., 4096 tokens). Fine-tuning with DeepSpeed Zero-3 is supported for memory optimization.
Links: Hugging Face, ModelScope, Tutorials

Highlighted Details

Offers 7B and 12B models with INT8 and INT4 quantization for reduced memory footprint and faster inference.
Supports fine-tuning with DeepSpeed, including Zero-3 for memory optimization and FlashAttention2 integration.
Provides an 8K context window version with NTK-aware and attention scaling extrapolation, extending to 96K.
Demonstrates strong performance in long-form writing tasks like work summaries, plans, and proposals.

Maintenance & Community

Active development with regular updates (e.g., 12B-V2 release).
Community engagement via WeChat.
Technical reports and model details are available.

Licensing & Compatibility

License: "TeleChat Model Community License Agreement".
Commercial Use: Permitted upon application and approval via tele_ai@chinatelecom.cn.
Restrictions: Prohibits use for activities harmful to national security or illegal purposes. Requires safety review and filing for internet services.

Limitations & Caveats

The project disclaims responsibility for any issues arising from model use, including data security, public opinion risks, or misuse. Users must adhere to the stated usage restrictions.

Telechat by Tele-AI

Explore Similar Projects

mint by dpressel

r1-ktransformers-guide by ubergarm

Building-a-Small-LLM-from-Scratch by KaihuaTang

TeleChat2 by Tele-AI

Chinese-Mixtral-8x7B by HIT-SCIR

XVERSE-13B by xverse-ai

tiny-llm-zh by wdndev

starcoder2 by bigcode-project

LLMZoo by FreedomIntelligence

ppl.nn by OpenPPL

EAGLE by SafeAILab

LongLoRA by JIA-Lab-research