GLM-5.2 Released: New Long-Context AI Model for Agents and Coding
GLM-5.2 is a 753-billion-parameter Mixture-of-Experts long-context AI model from Z.ai (formerly Zhipu AI) that runs a usable 1-million-token context window and posts the highest open-weights score on the Artificial Analysis Intelligence Index v4.1 (51 points) as of June 17, 2026 (z.ai/blog/glm-5.2, artificialanalysis.ai). It landed on the GLM Coding Plan on June 13, 2026, with full open weights under an MIT license dropping on Hugging Face and ModelScope three days later (simonwillison.net, June 17, 2026). Within a week it had been crowned the top open-weights model on Artificial Analysis, tied Claude Opus 4.8 on Terminal-Bench 2.1 to within a few points, and pushed Z.ai’s parent stock up 48% in a single Hong Kong session (SCMP, June 15, 2026).
If you build with AI agents, code in long-running sessions, or watch the open-weights frontier, GLM-5.2 is the release you should not sleep on. Here’s what changed, what’s actually shipping, and where it lands in the 2026 leaderboards.
What Is GLM-5.2 and Who Built It?
GLM-5.2 is a long-context AI model and the third flagship in Z.ai’s GLM-5 line, following GLM-5 (Feb 12, 2026) and GLM-5.1 (April 7, 2026) (z.ai docs). Z.ai is the international brand of Zhipu AI, a Beijing-based AI lab spun out of Tsinghua University in 2019 (Wikipedia, June 2026). Zhipu listed on the Hong Kong Stock Exchange on January 8, 2026 as Knowledge Atlas Technology (HKEX: 2513), making it the first of China’s “AI Tigers” to IPO (Reuters, Jan 7, 2026).
The GLM family uses an “autoregressive blank infilling” pre-training objective first published in ACL 2022 and built on Huawei Ascend hardware under the MindSpore framework (Z.ai GLM-5 paper, arXiv 2602.15763). Z.ai has been on the U.S. Commerce Department’s Entity List since January 2025 over national security concerns, which means the entire GLM-5 lineage was trained without access to NVIDIA’s top-tier chips (CNBC, June 26, 2025).
That context matters when you evaluate GLM-5.2. It’s not just “another big Chinese model.” It’s the first frontier-tier open-weights model trained entirely outside the NVIDIA stack.
Architecture and What’s Actually New
GLM-5.2 keeps the GLM-5 Mixture-of-Experts backbone and adds two engineering changes that make a 1-million-token context affordable to run: IndexShare and a beefed-up Multi-Token Prediction (MTP) layer (Hugging Face blog, June 17, 2026).
Parameter counts vary slightly by source. Z.ai’s docs and several reports cite 744B total / 40B active parameters, the same backbone as GLM-5.1 (docs.z.ai/guides/llm/glm-5.2, Together AI model card). Simon Willison, downloading and inspecting the actual weights, called it “a 753B parameter, 1.51TB monster — with 40 active parameters” (simonwillison.net). The difference comes down to whether you count embedding parameters in the total. Either way, ~40B activate per token.
The three big upgrades:
- IndexShare: Every four sparse-attention layers now share one indexer. At a 1M-token context, this trims per-token FLOPs by 2.9×. Z.ai published the technique in arXiv 2603.12201.
- MTP improvements: The multi-token prediction layer was tuned for speculative decoding, pushing acceptance length up by 20% in the ablation table.
- DeepSeek Sparse Attention (DSA) carried over from GLM-5: Selectively attends to a small set of top-k tokens instead of all of them, which is what makes 1M-context inference economically sane (GLM-5 technical report).
Z.ai also exposed two thinking-effort levels — High (default, faster) and Max (recommended for complex coding work), which consumes 3× more quota during peak hours (z.ai blog).
The Headline Number: A Solid 1M-Token Context
GLM-5.2’s context window is 1,048,576 tokens — about 5× larger than GLM-5.1’s ~200K limit — with up to 131,072 tokens of output per response (docs.z.ai, Together AI). In Claude Code, you opt into the full window by appending [1m] to the model name (GLM-5.2[1m]) (MarkTechPost, June 14, 2026).
Z.ai is the first to admit that a big context window is easy to claim and hard to keep honest. They spent months specifically training the model on long-horizon coding-agent scenarios — large-scale implementation, automated research, performance optimization, and complex debugging — so the 1M context actually feels lossless under real engineering pressure (z.ai/blog/glm-5.2). A coding agent can now hold an entire mid-sized repository in working memory, plus the conversation history, plus its own prior tool calls, without constantly summarizing and re-feeding context.
Benchmarks: How GLM-5.2 Actually Performs
Z.ai published official scores on June 16, 2026 (Hugging Face blog). On the long-horizon suite, GLM-5.2 trails Claude Opus 4.8 by roughly 1% while edging out GPT-5.5 by 1% and Opus 4.7 by 11%.
Here’s the picture in a single table.
| Benchmark | GLM-5.2 | GLM-5.1 | Claude Opus 4.8 | GPT-5.5 | Gemini 3.1 Pro |
|---|---|---|---|---|---|
| Terminal-Bench 2.1 | 81.0 | 63.5 | 85.0 | 84.0 | 74.0 |
| SWE-bench Pro | 62.1 | 58.4 | 69.2 | 58.6 | 54.2 |
| FrontierSWE (Dominance) | 74.4 | 30.5 | 75.1 | 72.6 | 39.6 |
| PostTrainBench | 34.3 | 20.1 | 37.2 | 28.4 | 21.6 |
| SWE-Marathon | 13.0 | 1.0 | 26.0 | 12.0 | 4.0 |
| AIME 2026 | 99.2 | 95.3 | 95.7 | 98.3 | 98.2 |
| HLE (text) | 40.5 | 31.0 | 49.8* | 41.4* | 45.0 |
| MCP-Atlas (Public) | 76.8 | 71.8 | 77.8 | 75.3 | 69.2 |
Sources: Z.ai / Hugging Face blog, VentureBeat, June 16, 2026. * = full-set HLE.
Three things stand out:
- Terminal-Bench jumped from 62.0 to 81.0 in a single generation — that’s a ~19-point leap on a hard agentic-coding benchmark.
- On FrontierSWE, GLM-5.2 is the highest-ranked open-source model, finishing within 0.7 points of Claude Opus 4.8 and ahead of GPT-5.5.
- SWE-Marathon (an ultra-long-horizon benchmark) still has Opus 4.8 ahead by 13 points — that’s where GLM-5.2 has room left to grow.
Independent evaluators agreed. Artificial Analysis gave GLM-5.2 a 51 on the Intelligence Index v4.1, ahead of MiniMax-M3 (44), DeepSeek V4 Pro max (44), and Kimi K2.6 (43) (artificialanalysis.ai, June 17, 2026). On GDPval-AA v2 — their real-world agentic work benchmark — GLM-5.2 scored 1524, ahead of every other open model and effectively level with GPT-5.5 xhigh (1514). Design Arena put it at #1 globally with an ELO of 1360, edging out Claude Fable 5 even after Fable was pulled from sampling (Linas Substack, June 19, 2026).
The honest trade-off: GLM-5.2 uses about 43k output tokens per Intelligence Index task — up from GLM-5.1’s 26k and above peers like MiniMax-M3 (24k) and Kimi K2.6 (35k) (Artificial Analysis, June 17, 2026). It’s not the most token-efficient open model at its intelligence level, but it’s still cheaper per task than frontier closed models.
Pricing and Access
The standalone API launched on June 16, 2026 at $1.40 per million input tokens, $4.40 per million output, with cached input at $0.26/M and a limited-time free cached-input storage tier (docs.z.ai/guides/overview/pricing, OpenRouter).
Here is how that stacks up against the closed-source frontier, per VentureBeat’s June 16, 2026 snapshot (VentureBeat):
| Model | Input $/1M | Output $/1M | Combined |
|---|---|---|---|
| DeepSeek V4 Pro | $0.435 | $0.87 | $1.305 |
| MiniMax-M3 | $0.30 | $1.20 | $1.50 |
| GLM-5.2 | $1.40 | $4.40 | $5.80 |
| Kimi K2.6 | $0.95 | $4.00 | $4.95 |
| Gemini 3.1 Pro (≤200K) | $2.00 | $12.00 | $14.00 |
| GPT-5.5 | $5.00 | $30.00 | $35.00 |
| Claude Opus 4.8 | $5.00 | $25.00 | $30.00 |
| Claude Fable 5 | $10.00 | $50.00 | $60.00 |
Per output token, GLM-5.2 is roughly 5.7× cheaper than Opus 4.8 and 6.8× cheaper than GPT-5.5 (codingfleet.com). On Artificial Analysis’s per-task cost metric, GLM-5.2 runs about $0.46 per task vs. GLM-5.1’s $0.25, Kimi K2.6’s $0.31, MiniMax-M3’s $0.18, and DeepSeek V4 Pro max’s $0.05 — so it sits on the Pareto frontier of intelligence vs. cost per task without being the cheapest option on the board.
If you’d rather subscribe than meter tokens, Z.ai runs the GLM Coding Plan (z.ai/subscribe) at $12.60/month (Lite), $50.40/month (Pro), or $112/month (Max) on annual billing — and includes out-of-the-box drop-in support for Claude Code, Cline, OpenClaw, Kilo Code, Roo Code, Droid, Crush, and Factory (VentureBeat).
Third-party providers hosting GLM-5.2 on day zero include DeepInfra, Novita, Nebius, Parasail, Siliconflow, GMI Cloud, Baseten, Fireworks, Together AI, FriendliAI, Featherless, and OpenRouter (Artificial Analysis providers list, datacamp.com).
Drop-In Compatibility with Claude Code and Friends
GLM-5.2 ships with an Anthropic-compatible endpoint and OpenAI-compatible endpoint, so you can swap it into Claude Code, Cline, OpenCode, OpenClaw, Kilo Code, or Goose with a base-URL and model-name change — no framework rewrite (docs.z.ai/devpack/tool/claude, datacamp.com).
The Coding Plan endpoint is https://api.z.ai/api/coding/paas/v4 and the standard endpoint is https://api.z.ai/api/paas/v4/. A minimal Claude Code setup looks like this:
// ~/.claude/settings.json
{
"env": {
"CLAUDE_CODE_AUTO_COMPACT_WINDOW": "1000000",
"ANTHROPIC_DEFAULT_HAIKU_MODEL": "glm-4.5-air",
"ANTHROPIC_DEFAULT_SONNET_MODEL": "GLM-5.2[1m]",
"ANTHROPIC_DEFAULT_OPUS_MODEL": "GLM-5.2[1m]"
}
}
Then /effort max and you’re running a frontier-tier model inside the Claude Code harness for a fraction of the API cost. Cline, Kilo Code, and the rest picked up day-one support — Kilo Code confirmed it on X within hours of launch (VentureBeat).
This is why GLM-5.2 is landing so hard with developers who were already paying $200/month for Opus. The switch is essentially free.
Self-Hosting GLM-5.2
The MIT-licensed weights are live on Hugging Face at zai-org/GLM-5.2 and on ModelScope at ZhipuAI/GLM-5.2, with an FP8 quant at zai-org/GLM-5.2-FP8 (huggingface.co/zai-org/GLM-5.2). For deployment, Z.ai lists official support for vLLM (≥0.23.0), SGLang (≥0.5.13.post1), xLLM, ktransformers, and plain transformers (zai-org/GLM-5 GitHub).
Realistic hardware tiers (Spheron Network guide, Unsloth docs):
- 8× H200 with vLLM FP8 — full-fat production deploy.
- 4× H100 with Q4 GGUF — practical production at lower cost.
- 2× Mac Studio with 256GB unified memory each — Unsloth’s 2-bit dynamic GGUF shoves GLM-5.2 into 239GB while keeping ~82% accuracy (Unsloth docs).
- MLX builds are available for Apple Silicon Macs (ivanfioravanti on X, June 17, 2026).
If you want to skip the devops, Hugging Face ran free inference on every hosted provider of GLM-5.2 for a six-hour window after launch (Z.ai on X, June 18, 2026).
Why the Timing Matters
GLM-5.2 landed on June 13, 2026 — about 24 hours after the U.S. Commerce Department ordered Anthropic to disable Claude Fable 5 and Claude Mythos 5 for all foreign nationals over a reported jailbreak concern (Reuters, June 13, 2026).
VentureBeat’s coverage frames GLM-5.2’s release as a direct response (VentureBeat, June 16, 2026). The market noticed: Knowledge Atlas Technology (Zhipu’s listed entity) jumped as much as 48% in a single Hong Kong session on June 13, 2026, ending the day up 32.8% — putting the stock up roughly 820% since the January IPO (SCMP, June 15, 2026).
Vercel CEO Guillermo Rauch said he was “genuinely impressed, almost shocked” at how good GLM-5.2 is at coding (Reddit r/LocalLLaMA, June 21, 2026). Simon Willison, who has tracked open-weights models since Llama, called GLM-5.2 “probably the most powerful text-only open weights LLM” available as of mid-June 2026 (simonwillison.net).
Honest Limitations to Know About
GLM-5.2 is text-only — no image or video input. For multimodal work in the GLM family, you’d use GLM-4.6V (MIT-licensed, open) or GLM-5V-Turbo (proprietary) (docs.z.ai/guides/vlm/glm-5v-turbo). It’s also more token-hungry than peers — a Max-effort run can spend 85k output tokens on one task, and several developers reported wall-clock runtimes roughly 3× longer than Claude Fable 5 (Reddit r/ClaudeCode).
On hallucination it’s actually strong: GLM-5.2 scored a 28% hallucination rate on AA-Omniscience vs. 36% for Opus 4.8, 48% for Fable 5, and 86% for GPT-5.5 (benchlm.ai). And because Z.ai is on the U.S. Entity List, hosting GLM-5.2 yourself via the MIT weights sidesteps any data flowing back to Z.ai — using Z.ai’s hosted API carries the same regulatory caveats that apply to any Chinese-hosted AI service (InfoWorld, June 17, 2026).
Should You Switch?
If you code in long-running agent loops, want Claude Code or Cline performance without the $200/month price tag, or self-host open-weights models on Apple Silicon or your own GPU rack — yes, GLM-5.2 is worth a serious test drive today. The Anthropic-compatible endpoint makes it a five-minute swap.
If you depend on vision input, run tight latency budgets, or care most about token efficiency at the margin, the calculus shifts. Kimi K2.6, DeepSeek V4 Pro, and GLM-5.1 are cheaper per token, and MiniMax-M3 is faster on the Artificial Analysis speed test (Artificial Analysis providers).
But on raw intelligence per dollar for long-horizon coding and agent work, GLM-5.2 just set a new open-weights bar. The closed-source labs should be paying attention.