GLM-4.5  by zai-org

Foundation models for intelligent agents

Created 5 months ago
3,738 stars

Top 12.9% on SourcePulse

GitHubView on GitHub
1 Expert Loves This Project
Project Summary

GLM-4.5 is an open-source series of large language models designed for intelligent agents, offering both a 355B parameter (GLM-4.5) and a more efficient 106B parameter (GLM-4.5-Air) variant. These models unify reasoning, coding, and agent capabilities, featuring a hybrid reasoning approach with distinct "thinking" and "non-thinking" modes for complex tasks and immediate responses, respectively.

How It Works

GLM-4.5 models are hybrid reasoning systems that combine a large base model with specialized reasoning and tool-use capabilities. They employ a dual-mode architecture: "thinking mode" for intricate problem-solving and tool integration, and "non-thinking mode" for faster, direct responses. This approach aims to balance computational depth with response latency, making them suitable for diverse agentic applications.

Quick Start & Requirements

  • Installation: pip install -r requirements.txt transformers
  • Inference Frameworks: Supports transformers, vLLM, and SGLang.
  • Hardware:
    • BF16: GLM-4.5 requires 16x H100 or 8x H200 GPUs; GLM-4.5-Air requires 4x H100 or 2x H200 GPUs.
    • FP8: GLM-4.5 requires 8x H100 or 4x H200 GPUs; GLM-4.5-Air requires 2x H100 or 1x H200 GPUs.
    • Context Length: Full 128K context requires double the GPU counts listed above.
    • Memory: Server memory must exceed 1TB for normal operation.
  • Fine-tuning: Supports LoRA and SFT/RL via Llama Factory and Swift.
  • Links: Hugging Face, ModelScope, Technical Blog

Highlighted Details

  • Achieves 63.2 on 12 industry benchmarks, ranking 3rd among all models.
  • GLM-4.5-Air offers competitive 59.8 performance with superior efficiency.
  • Open-sourced base, hybrid reasoning, and FP8 versions.
  • Supports 128K context length.

Maintenance & Community

  • Community channels: WeChat, Discord.
  • API services available on Z.ai API Platform and Zhipu AI Open Platform.

Licensing & Compatibility

  • MIT License. Permissive for commercial use and secondary development.

Limitations & Caveats

  • Inference requires substantial high-end GPU resources (e.g., 8x H100 for FP8 GLM-4.5-Air).
  • FP8 inference requires hardware natively supporting FP8.
  • Flash infer issues may require specific environment variable configurations.
Health Check
Last Commit

2 weeks ago

Responsiveness

Inactive

Pull Requests (30d)
5
Issues (30d)
10
Star History
477 stars in the last 30 days

Explore Similar Projects

Starred by Tony Lee Tony Lee(Author of HELM; Research Engineer at Meta), Vincent Weisser Vincent Weisser(Cofounder of Prime Intellect), and
16 more.

Qwen3 by QwenLM

0.4%
26k
Large language model series by Qwen team, Alibaba Cloud
Created 1 year ago
Updated 2 days ago
Feedback? Help us improve.