China’s Z.AI Releases GLM-5.2: A Model That Rivals Claude Opus—Using Zero Nvidia Chips

Briefly

GLM-5.2 trails Claude Opus 4.8 by simply 1% on FrontierSWE—a benchmark measuring multi-hour autonomous engineering tasks—whereas beating GPT-5.5 on the identical take a look at. It ships underneath an MIT license with zero regional restrictions.
The mannequin was constructed totally on Huawei Ascend chips with no NVIDIA {hardware} concerned.
Unsloth AI already launched 2-bit GGUF quantizations that shrink the mannequin from 1.51TB to 238GB. You may nonetheless want 256GB of RAM or VRAM—however at that time, you’ll be able to run it.

Z.ai dropped GLM-5.2 on June 16, promising high stage performances, beating its already superior GLM 5.1.

The Beijing-based lab, which has been on the U.S. Entity Record since January 2025, seems to be benefiting from rising considerations over America’s strategy to AI. Over the previous week, the ban on Anthropic Fable and the discharge of this new mannequin have helped drive zAI’s fill up 90%, sending it to a brand new all-time excessive.

GLM 5.2 has the numbers to again up the hype.

On FrontierSWE—a benchmark that evaluates whether or not an AI agent can full open-ended technical tasks measured in hours, masking techniques optimization, large-scale code development, and utilized ML analysis, scored by dominance charge—GLM-5.2 hit 74.4 towards Claude Opus 4.8’s 75.1. It edged out GPT-5.5 at 72.6. On SWE-bench Professional, which exams autonomous decision of real-world GitHub points scored as a cross charge, GLM-5.2 scored 62.1 to GPT-5.5’s 58.6—and cleared its predecessor GLM-5.1’s 58.4 by a large margin.

The standard bounce makes it the very best open-source mannequin thus far within the Synthetic Evaluation Intelligence Index, which aggregates the outcomes of 9 totally different scores to evaluate the overall high quality of an AI mannequin. OpenRouter’s benchmarks put it in the identical class because the now banned Claude Fable 5.

The {hardware} used to attain this feat is one other fascinating a part of the story. GLM-5.2 was skilled on Huawei Ascend chips—no Nvidia anyplace within the pipeline. Emad Mostaque, founding father of Stability AI, estimated complete coaching prices at round $25 million, 80% of that in post-training, which might make it extraordinarily low cost in comparison towards its friends.

As Decrypt reported earlier this 12 months, Z.ai was already coaching picture fashions on Huawei’s Ascend Atlas servers with out a single American chip. GLM-5.2 takes that infrastructure additional—a 744-billion-parameter mixture-of-experts mannequin with a real 1 million-token context window, 5 instances the 200K restrict on GLM-5.1, and an MIT license meaning no authorities directive can flip the entry change.

Tokens are the chunks of tet a mannequin can learn and generate whereas Parameters are the variety of inside settings and values that decide how a mannequin processes data and generates responses

Who it is for and what it prices

For builders, the context window is the operational shift. Entire-repo navigation, multi-file refactors, and lengthy agentic pipelines that beforehand required chunking grow to be single-call workflows. API pricing runs $1.40 per million enter tokens and $4.40 per million output—towards Claude Opus 4.8’s $5 enter and $25 output. The Coding Plan begins at round $18 a month and works instantly inside Claude Code, Cline, Kilo Code, and hottest agentic environments.

Native deployment can also be technically doable. Unsloth AI pushed 2-bit GGUF quantizations that compress the mannequin from 1.51TB right down to 238GB whereas retaining ~82% accuracy.

Don’t get too excited, although. That also means it calls for 256GB of unified reminiscence or an identical RAM/VRAM combo—a maxed M4 Extremely Mac Studio or a workstation with a mid-range GPU and 256GB of system RAM with mixture-of-experts offloading. It’s nonetheless some huge cash, however no less than one thing which you can purchase and run on your home should you actually wish to.

We ran a fast take a look at, asking GLM-5.2 to construct our normal recreation mixing typing mechanics with a shooter. The UI wasn’t the prettiest—different fashions generated extra polished-looking interfaces, however the expertise was essentially the most assorted: totally different situations throughout waves, enemy varieties that shifted, bosses showing later within the run.

It generated extra numerous recreation states than the rest we examined for a similar job in a zero shot setup.

If you wish to play it, it’s reside in our Itch.io profile.

That variance factors towards the place GLM-5.2 makes essentially the most financial sense. For multi-shot era workflows and agentic pipelines the place output range issues greater than polish, the mathematics at open-source pricing ranges is difficult to argue with. For the toughest sustained duties—SWE-Marathon, the place it scores 13.0 towards Opus 4.8’s 26.0—the hole to the closed frontier remains to be actual, and 13 factors vast.

Open-source weights are reside on HuggingFace underneath the MIT license. The quantized weights are additionally obtainable on HuggingFace. GLM Coding Plan subscribers can change now with the mannequin string GLM-5.2, and it’s additionally obtainable at no cost testing on z.AI with some utilization constraints.