- Add Rule 11 for Z.AI Coding Plan API usage (not BigModel) - Add transliteration standards for non-Latin scripts - Document GLM model options and Python implementation
6.6 KiB
Z.AI GLM API Rules for AI Agents
Last Updated: 2025-12-08
Status: MANDATORY for all LLM API calls in scripts
CRITICAL: Use Z.AI Coding Plan, NOT BigModel API
This project uses the Z.AI Coding Plan endpoint, which is the SAME endpoint that OpenCode uses internally.
The regular BigModel API (open.bigmodel.cn) will NOT work with the tokens stored in this project. You MUST use the Z.AI Coding Plan endpoint.
API Configuration
Correct Endpoint
| Property | Value |
|---|---|
| API URL | https://api.z.ai/api/coding/paas/v4/chat/completions |
| Auth Header | Authorization: Bearer {ZAI_API_TOKEN} |
| Content-Type | application/json |
Available Models
| Model | Description | Cost |
|---|---|---|
glm-4.5 |
Standard GLM-4.5 | Free (0 per token) |
glm-4.5-air |
GLM-4.5 Air variant | Free |
glm-4.5-flash |
Fast GLM-4.5 | Free |
glm-4.5v |
Vision-capable GLM-4.5 | Free |
glm-4.6 |
Latest GLM-4.6 (recommended) | Free |
Recommended Model: glm-4.6 for best quality
Authentication
Token Location
The Z.AI API token can be obtained from two locations:
-
Environment Variable (preferred for scripts):
# In .env file at project root ZAI_API_TOKEN=your_token_here -
OpenCode Auth File (reference only):
~/.local/share/opencode/auth.jsonThe token is stored under key
zai-coding-plan.
Getting the Token
If you need to set up the token:
- The token is shared with OpenCode's Z.AI Coding Plan
- Check
~/.local/share/opencode/auth.jsonfor existing token - Add to
.envfile asZAI_API_TOKEN
Python Implementation
Correct Implementation
import os
import httpx
class GLMClient:
"""Client for Z.AI GLM API (Coding Plan endpoint)."""
# CORRECT endpoint - Z.AI Coding Plan
API_URL = "https://api.z.ai/api/coding/paas/v4/chat/completions"
def __init__(self, model: str = "glm-4.6"):
self.api_key = os.environ.get("ZAI_API_TOKEN")
if not self.api_key:
raise ValueError("ZAI_API_TOKEN not found in environment")
self.model = model
self.client = httpx.AsyncClient(
timeout=60.0,
headers={
"Authorization": f"Bearer {self.api_key}",
"Content-Type": "application/json",
}
)
async def chat(self, messages: list) -> dict:
"""Send chat completion request."""
response = await self.client.post(
self.API_URL,
json={
"model": self.model,
"messages": messages,
"temperature": 0.3,
}
)
response.raise_for_status()
return response.json()
WRONG Implementation (DO NOT USE)
# WRONG - This endpoint will fail with quota errors
API_URL = "https://open.bigmodel.cn/api/paas/v4/chat/completions"
# WRONG - This is for regular BigModel API, not Z.AI Coding Plan
api_key = os.environ.get("ZHIPU_API_KEY")
Request Format
Chat Completion Request
{
"model": "glm-4.6",
"messages": [
{
"role": "system",
"content": "You are a helpful assistant."
},
{
"role": "user",
"content": "Your prompt here"
}
],
"temperature": 0.3,
"max_tokens": 4096
}
Response Format
{
"id": "request-id",
"created": 1733651234,
"model": "glm-4.6",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "Response text here"
},
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 100,
"completion_tokens": 50,
"total_tokens": 150
}
}
Error Handling
Common Errors
| Error | Cause | Solution |
|---|---|---|
401 Unauthorized |
Invalid or missing token | Check ZAI_API_TOKEN in .env |
403 Quota exceeded |
Wrong endpoint (BigModel) | Use Z.AI Coding Plan endpoint |
429 Rate limited |
Too many requests | Add delay between requests |
500 Server error |
API issue | Retry with exponential backoff |
Retry Strategy
import asyncio
from tenacity import retry, stop_after_attempt, wait_exponential
@retry(
stop=stop_after_attempt(3),
wait=wait_exponential(multiplier=1, min=2, max=10)
)
async def call_api_with_retry(client, messages):
return await client.chat(messages)
Integration with CH-Annotator
When using GLM for entity recognition or verification, always reference CH-Annotator v1.7.0:
PROMPT = """You are a heritage institution classifier following CH-Annotator v1.7.0 convention.
## CH-Annotator GRP.HER Definition
Heritage institutions are organizations that:
- Collect, preserve, and provide access to cultural heritage materials
- Include: museums (GRP.HER.MUS), libraries (GRP.HER.LIB), archives (GRP.HER.ARC), galleries (GRP.HER.GAL)
## Entity to Analyze
...
"""
See .opencode/CH_ANNOTATOR_CONVENTION.md for full convention details.
Scripts Using GLM API
The following scripts use the Z.AI GLM API:
| Script | Purpose |
|---|---|
scripts/reenrich_wikidata_with_verification.py |
Wikidata entity verification using GLM-4.6 |
When creating new scripts that need LLM capabilities, follow this pattern.
Environment Setup Checklist
When setting up a new environment:
- Check
~/.local/share/opencode/auth.jsonfor existing Z.AI token - Add
ZAI_API_TOKENto.envfile - Verify endpoint is
https://api.z.ai/api/coding/paas/v4/chat/completions - Test with
glm-4.6model - Reference CH-Annotator v1.7.0 for entity recognition tasks
AI Agent Rules
DO
- Use
https://api.z.ai/api/coding/paas/v4/chat/completionsendpoint - Get token from
ZAI_API_TOKENenvironment variable - Use
glm-4.6as the default model - Reference CH-Annotator v1.7.0 for entity tasks
- Add retry logic with exponential backoff
- Handle JSON parsing errors gracefully
DO NOT
- Use
open.bigmodel.cnendpoint (wrong API) - Use
ZHIPU_API_KEYenvironment variable (wrong key) - Hard-code API tokens in scripts
- Skip error handling for API calls
- Forget to load
.envfile before accessing environment
Related Documentation
- CH-Annotator Convention:
.opencode/CH_ANNOTATOR_CONVENTION.md - Entity Annotation:
data/entity_annotation/ch_annotator-v1_7_0.yaml - Wikidata Enrichment Script:
scripts/reenrich_wikidata_with_verification.py
Version History
| Date | Change |
|---|---|
| 2025-12-08 | Initial documentation - Fixed API endpoint discovery |