Free GLM-4.7 Flash API on Cloudflare

Free GLM-4.7 Flash API via Cloudflare Workers AI — fast bilingual chat and agents within 10,000 free Neurons/day.

About the Model

Why use GLM-4.7 Flash?

GLM-4.7 Flash is Z.ai's speed-optimized model, built for low-latency chat, agents, and tool calling while keeping high answer quality. Ideal when responsiveness matters.

Languages & capabilities

Excellent bilingual Chinese/English performance, strong instruction following and reliable function-calling for agentic apps.

Why it's free

Covered by Cloudflare Workers AI's 10,000 free Neurons per day, so you can prototype and ship for free through the /ai/run endpoint.

How to Access for Free (via Cloudflare Workers AI)

Free access on Cloudflare Workers AI

Call GLM-4.7 Flash through Cloudflare's global edge using the unified /ai/run REST endpoint. Every Cloudflare account includes 10,000 Neurons per day for free, so you can prototype low-latency chat and agents at no cost.

Authentication (BYOK)

Use your own Cloudflare Account ID and an API token with Workers AI access via Authorization: Bearer <token>.

Try it in your browser

Pick a model, paste your own free key, and run. Your key is sent once to call the provider and never stored on our servers.

The model's reply will stream here.

Code Examples

curl
curl https://api.cloudflare.com/client/v4/accounts/$CF_ACCOUNT_ID/ai/run/@cf/zai-org/glm-4.7-flash \
  -H "Authorization: Bearer $CF_API_TOKEN" \
  -d '{"messages":[{"role":"user","content":"Write a haiku about the sea."}]}'
python
import os, requests

account = os.environ["CF_ACCOUNT_ID"]
token = os.environ["CF_API_TOKEN"]

resp = requests.post(
    f"https://api.cloudflare.com/client/v4/accounts/{account}/ai/run/@cf/zai-org/glm-4.7-flash",
    headers={"Authorization": f"Bearer {token}"},
    json={"messages": [{"role": "user", "content": "Write a haiku about the sea."}]},
)
print(resp.json()["result"]["response"])