Skip to content
  1.  
  2. © 2023 – 2025 OpenRouter, Inc
    Favicon for z-ai

    z-ai

    Browse models from z-ai

    5 models

    Tokens processed on OpenRouter

    • Z.AI: GLM 4.6GLM 4.6
      1.06B tokens

      Compared with GLM-4.5, this generation brings several key improvements: Longer context window: The context window has been expanded from 128K to 200K tokens, enabling the model to handle more complex agentic tasks. Superior coding performance: The model achieves higher scores on code benchmarks and demonstrates better real-world performance in applications such as Claude Code、Cline、Roo Code and Kilo Code, including improvements in generating visually polished front-end pages. Advanced reasoning: GLM-4.6 shows a clear improvement in reasoning performance and supports tool use during inference, leading to stronger overall capability. More capable agents: GLM-4.6 exhibits stronger performance in tool using and search-based agents, and integrates more effectively within agent frameworks. Refined writing: Better aligns with human preferences in style and readability, and performs more naturally in role-playing scenarios.

    by z-ai
    200K context
    $0.50/M input tokens$1.75/M output tokens
  3. Z.AI: GLM 4.5VGLM 4.5V
    39.2M tokens

    GLM-4.5V is a vision-language foundation model for multimodal agent applications. Built on a Mixture-of-Experts (MoE) architecture with 106B parameters and 12B activated parameters, it achieves state-of-the-art results in video understanding, image Q&A, OCR, and document parsing, with strong gains in front-end web coding, grounding, and spatial reasoning. It offers a hybrid inference mode: a "thinking mode" for deep reasoning and a "non-thinking mode" for fast responses. Reasoning behavior can be toggled via the reasoning enabled boolean. Learn more in our docs

    by z-ai66K context$0.60/M input tokens$1.80/M output tokens
  4. Z.AI: GLM 4.5GLM 4.5
    462M tokens

    GLM-4.5 is our latest flagship foundation model, purpose-built for agent-based applications. It leverages a Mixture-of-Experts (MoE) architecture and supports a context length of up to 128k tokens. GLM-4.5 delivers significantly enhanced capabilities in reasoning, code generation, and agent alignment. It supports a hybrid inference mode with two options, a "thinking mode" designed for complex reasoning and tool use, and a "non-thinking mode" optimized for instant responses. Users can control the reasoning behaviour with the reasoning enabled boolean. Learn more in our docs

    by z-ai131K context$0.35/M input tokens$1.55/M output tokens
  5. Z.AI: GLM 4.5 AirGLM 4.5 Air
    79.6M tokens

    GLM-4.5-Air is the lightweight variant of our latest flagship model family, also purpose-built for agent-centric applications. Like GLM-4.5, it adopts the Mixture-of-Experts (MoE) architecture but with a more compact parameter size. GLM-4.5-Air also supports hybrid inference modes, offering a "thinking mode" for advanced reasoning and tool use, and a "non-thinking mode" for real-time interaction. Users can control the reasoning behaviour with the reasoning enabled boolean. Learn more in our docs

    by z-ai131K context$0.13/M input tokens$0.85/M output tokens
  6. Z.AI: GLM 4 32B GLM 4 32B
    14.1M tokens

    GLM 4 32B is a cost-effective foundation language model. It can efficiently perform complex tasks and has significantly enhanced capabilities in tool use, online search, and code-related intelligent tasks. It is made by the same lab behind the thudm models.

    by z-ai128K context$0.10/M input tokens$0.10/M output tokens