# Z.AI GLM API Rules for AI Agents **Last Updated**: 2025-12-08 **Status**: MANDATORY for all LLM API calls in scripts --- ## CRITICAL: Use Z.AI Coding Plan, NOT BigModel API **This project uses the Z.AI Coding Plan endpoint, which is the SAME endpoint that OpenCode uses internally.** The regular BigModel API (`open.bigmodel.cn`) will NOT work with the tokens stored in this project. You MUST use the Z.AI Coding Plan endpoint. --- ## API Configuration ### Correct Endpoint | Property | Value | |----------|-------| | **API URL** | `https://api.z.ai/api/coding/paas/v4/chat/completions` | | **Auth Header** | `Authorization: Bearer {ZAI_API_TOKEN}` | | **Content-Type** | `application/json` | ### Available Models | Model | Description | Cost | |-------|-------------|------| | `glm-4.5` | Standard GLM-4.5 | Free (0 per token) | | `glm-4.5-air` | GLM-4.5 Air variant | Free | | `glm-4.5-flash` | Fast GLM-4.5 | Free | | `glm-4.5v` | Vision-capable GLM-4.5 | Free | | `glm-4.6` | Latest GLM-4.6 (recommended) | Free | **Recommended Model**: `glm-4.6` for best quality --- ## Authentication ### Token Location The Z.AI API token can be obtained from two locations: 1. **Environment Variable** (preferred for scripts): ```bash # In .env file at project root ZAI_API_TOKEN=your_token_here ``` 2. **OpenCode Auth File** (reference only): ``` ~/.local/share/opencode/auth.json ``` The token is stored under key `zai-coding-plan`. ### Getting the Token If you need to set up the token: 1. The token is shared with OpenCode's Z.AI Coding Plan 2. Check `~/.local/share/opencode/auth.json` for existing token 3. Add to `.env` file as `ZAI_API_TOKEN` --- ## Python Implementation ### Correct Implementation ```python import os import httpx class GLMClient: """Client for Z.AI GLM API (Coding Plan endpoint).""" # CORRECT endpoint - Z.AI Coding Plan API_URL = "https://api.z.ai/api/coding/paas/v4/chat/completions" def __init__(self, model: str = "glm-4.6"): self.api_key = os.environ.get("ZAI_API_TOKEN") if not self.api_key: raise ValueError("ZAI_API_TOKEN not found in environment") self.model = model self.client = httpx.AsyncClient( timeout=60.0, headers={ "Authorization": f"Bearer {self.api_key}", "Content-Type": "application/json", } ) async def chat(self, messages: list) -> dict: """Send chat completion request.""" response = await self.client.post( self.API_URL, json={ "model": self.model, "messages": messages, "temperature": 0.3, } ) response.raise_for_status() return response.json() ``` ### WRONG Implementation (DO NOT USE) ```python # WRONG - This endpoint will fail with quota errors API_URL = "https://open.bigmodel.cn/api/paas/v4/chat/completions" # WRONG - This is for regular BigModel API, not Z.AI Coding Plan api_key = os.environ.get("ZHIPU_API_KEY") ``` --- ## Request Format ### Chat Completion Request ```json { "model": "glm-4.6", "messages": [ { "role": "system", "content": "You are a helpful assistant." }, { "role": "user", "content": "Your prompt here" } ], "temperature": 0.3, "max_tokens": 4096 } ``` ### Response Format ```json { "id": "request-id", "created": 1733651234, "model": "glm-4.6", "choices": [ { "index": 0, "message": { "role": "assistant", "content": "Response text here" }, "finish_reason": "stop" } ], "usage": { "prompt_tokens": 100, "completion_tokens": 50, "total_tokens": 150 } } ``` --- ## Error Handling ### Common Errors | Error | Cause | Solution | |-------|-------|----------| | `401 Unauthorized` | Invalid or missing token | Check ZAI_API_TOKEN in .env | | `403 Quota exceeded` | Wrong endpoint (BigModel) | Use Z.AI Coding Plan endpoint | | `429 Rate limited` | Too many requests | Add delay between requests | | `500 Server error` | API issue | Retry with exponential backoff | ### Retry Strategy ```python import asyncio from tenacity import retry, stop_after_attempt, wait_exponential @retry( stop=stop_after_attempt(3), wait=wait_exponential(multiplier=1, min=2, max=10) ) async def call_api_with_retry(client, messages): return await client.chat(messages) ``` --- ## Integration with CH-Annotator When using GLM for entity recognition or verification, always reference CH-Annotator v1.7.0: ```python PROMPT = """You are a heritage institution classifier following CH-Annotator v1.7.0 convention. ## CH-Annotator GRP.HER Definition Heritage institutions are organizations that: - Collect, preserve, and provide access to cultural heritage materials - Include: museums (GRP.HER.MUS), libraries (GRP.HER.LIB), archives (GRP.HER.ARC), galleries (GRP.HER.GAL) ## Entity to Analyze ... """ ``` See `.opencode/CH_ANNOTATOR_CONVENTION.md` for full convention details. --- ## Scripts Using GLM API The following scripts use the Z.AI GLM API: | Script | Purpose | |--------|---------| | `scripts/reenrich_wikidata_with_verification.py` | Wikidata entity verification using GLM-4.6 | When creating new scripts that need LLM capabilities, follow this pattern. --- ## Environment Setup Checklist When setting up a new environment: - [ ] Check `~/.local/share/opencode/auth.json` for existing Z.AI token - [ ] Add `ZAI_API_TOKEN` to `.env` file - [ ] Verify endpoint is `https://api.z.ai/api/coding/paas/v4/chat/completions` - [ ] Test with `glm-4.6` model - [ ] Reference CH-Annotator v1.7.0 for entity recognition tasks --- ## AI Agent Rules ### DO - Use `https://api.z.ai/api/coding/paas/v4/chat/completions` endpoint - Get token from `ZAI_API_TOKEN` environment variable - Use `glm-4.6` as the default model - Reference CH-Annotator v1.7.0 for entity tasks - Add retry logic with exponential backoff - Handle JSON parsing errors gracefully ### DO NOT - Use `open.bigmodel.cn` endpoint (wrong API) - Use `ZHIPU_API_KEY` environment variable (wrong key) - Hard-code API tokens in scripts - Skip error handling for API calls - Forget to load `.env` file before accessing environment --- ## Related Documentation - **CH-Annotator Convention**: `.opencode/CH_ANNOTATOR_CONVENTION.md` - **Entity Annotation**: `data/entity_annotation/ch_annotator-v1_7_0.yaml` - **Wikidata Enrichment Script**: `scripts/reenrich_wikidata_with_verification.py` --- ## Version History | Date | Change | |------|--------| | 2025-12-08 | Initial documentation - Fixed API endpoint discovery |