GLM-4.5  by zai-org

Foundation models for intelligent agents

Created 2 months ago
2,649 stars

Top 17.8% on SourcePulse

GitHubView on GitHub
1 Expert Loves This Project
Project Summary

GLM-4.5 is an open-source series of large language models designed for intelligent agents, offering both a 355B parameter (GLM-4.5) and a more efficient 106B parameter (GLM-4.5-Air) variant. These models unify reasoning, coding, and agent capabilities, featuring a hybrid reasoning approach with distinct "thinking" and "non-thinking" modes for complex tasks and immediate responses, respectively.

How It Works

GLM-4.5 models are hybrid reasoning systems that combine a large base model with specialized reasoning and tool-use capabilities. They employ a dual-mode architecture: "thinking mode" for intricate problem-solving and tool integration, and "non-thinking mode" for faster, direct responses. This approach aims to balance computational depth with response latency, making them suitable for diverse agentic applications.

Quick Start & Requirements

  • Installation: pip install -r requirements.txt transformers
  • Inference Frameworks: Supports transformers, vLLM, and SGLang.
  • Hardware:
    • BF16: GLM-4.5 requires 16x H100 or 8x H200 GPUs; GLM-4.5-Air requires 4x H100 or 2x H200 GPUs.
    • FP8: GLM-4.5 requires 8x H100 or 4x H200 GPUs; GLM-4.5-Air requires 2x H100 or 1x H200 GPUs.
    • Context Length: Full 128K context requires double the GPU counts listed above.
    • Memory: Server memory must exceed 1TB for normal operation.
  • Fine-tuning: Supports LoRA and SFT/RL via Llama Factory and Swift.
  • Links: Hugging Face, ModelScope, Technical Blog

Highlighted Details

  • Achieves 63.2 on 12 industry benchmarks, ranking 3rd among all models.
  • GLM-4.5-Air offers competitive 59.8 performance with superior efficiency.
  • Open-sourced base, hybrid reasoning, and FP8 versions.
  • Supports 128K context length.

Maintenance & Community

  • Community channels: WeChat, Discord.
  • API services available on Z.ai API Platform and Zhipu AI Open Platform.

Licensing & Compatibility

  • MIT License. Permissive for commercial use and secondary development.

Limitations & Caveats

  • Inference requires substantial high-end GPU resources (e.g., 8x H100 for FP8 GLM-4.5-Air).
  • FP8 inference requires hardware natively supporting FP8.
  • Flash infer issues may require specific environment variable configurations.
Health Check
Last Commit

3 days ago

Responsiveness

Inactive

Pull Requests (30d)
1
Issues (30d)
21
Star History
517 stars in the last 30 days

Explore Similar Projects

Starred by Georgi Gerganov Georgi Gerganov(Author of llama.cpp, whisper.cpp), Alex Yu Alex Yu(Research Scientist at OpenAI; Former Cofounder of Luma AI), and
13 more.

Qwen3 by QwenLM

0.4%
25k
Large language model series by Qwen team, Alibaba Cloud
Created 1 year ago
Updated 2 weeks ago
Feedback? Help us improve.