glam/schemas/20251121/linkml/modules/classes/LLMResponse.yaml
kempersc 626bd3a095 refactor(schemas): apply naming conventions to 261 class files
- Apply Rule 39: RiC-O style hasOrHad*/isOrWas* for temporal slots
- Apply Rule 43: Singular noun convention (keywords → keyword)
- Update slot references to match renamed slot files
- Maintain schema integrity across all class definitions
2026-01-10 15:36:33 +01:00

339 lines
13 KiB
YAML

id: https://nde.nl/ontology/hc/class/LLMResponse
name: llm_response_class
title: LLM Response Class
version: 1.0.0
prefixes:
linkml: https://w3id.org/linkml/
hc: https://nde.nl/ontology/hc/
schema: http://schema.org/
prov: http://www.w3.org/ns/prov#
dct: http://purl.org/dc/terms/
xsd: http://www.w3.org/2001/XMLSchema#
imports:
- linkml:types
- ../metadata
- ./SpecificityAnnotation
- ./TemplateSpecificityScores
- ../enums/LLMProviderEnum
- ../enums/FinishReasonEnum
- ../enums/ThinkingModeEnum
- ../slots/content
- ../slots/reasoning_content
- ../slots/model
- ../slots/provider
- ../slots/prompt_token
- ../slots/completion_token
- ../slots/total_token
- ../slots/cached_token
- ../slots/finish_reason
- ../slots/latency_ms
- ../slots/thinking_mode
- ../slots/clear_thinking
- ../slots/created
- ../slots/cost_usd
- ../slots/request_id
- ../slots/specificity_annotation
- ../slots/template_specificity
default_range: string
classes:
LLMResponse:
class_uri: prov:Activity
description: |
Provenance metadata for LLM API responses, including GLM 4.7 Thinking Modes.
Captures complete response metadata from LLM providers (ZhipuAI GLM, Anthropic,
OpenAI, etc.) for traceability and analysis. The key innovation is capturing
`reasoning_content` - the chain-of-thought reasoning that GLM 4.7 exposes
through its three thinking modes.
**GLM 4.7 Thinking Modes** (https://docs.z.ai/guides/capabilities/thinking-mode):
1. **Interleaved Thinking** (default, since GLM-4.5):
- Model thinks between tool calls and after receiving tool results
- Enables complex, step-by-step reasoning with tool chaining
- Returns `reasoning_content` alongside `content` in every response
2. **Preserved Thinking** (new in GLM-4.7):
- Retains reasoning_content from previous assistant turns in context
- Preserves reasoning continuity across multi-turn conversations
- Improves model performance and increases cache hit rates
- **Enabled by default on Coding Plan endpoint**
- Requires returning EXACT, UNMODIFIED reasoning_content back to API
- Set via: `"clear_thinking": false` (do NOT clear previous reasoning)
3. **Turn-level Thinking** (new in GLM-4.7):
- Control reasoning computation on a per-turn basis
- Enable/disable thinking independently for each request in a session
- Useful for balancing speed (simple queries) vs accuracy (complex tasks)
- Set via: `"thinking": {"type": "enabled"}` or `"thinking": {"type": "disabled"}`
**Critical Implementation Note for Preserved Thinking**:
When using Preserved Thinking with tool calls, thinking blocks MUST be:
1. Explicitly preserved in the messages array
2. Returned together with tool results
3. Kept in EXACT original sequence (no reordering/editing)
**PROV-O Alignment**:
- LLMResponse IS a prov:Activity (the inference process)
- content IS prov:Entity (the generated output)
- model/provider IS prov:Agent (the AI system)
- reasoning_content documents the prov:Plan (how the agent reasoned)
- prompt (input) IS prov:used (input to the activity)
**Use Cases**:
- DSPy RAG responses with reasoning traces
- Heritage institution extraction provenance
- LinkML schema conformity validation
- Ontology mapping decision logs
- Multi-turn agent conversations with preserved context
exact_mappings:
- prov:Activity
close_mappings:
- schema:Action
- schema:CreativeWork
slots:
- cached_token
- clear_thinking
- completion_token
- content
- cost_usd
- created
- finish_reason
- latency_ms
- model
- prompt_token
- provider
- reasoning_content
- request_id
- specificity_annotation
- template_specificity
- thinking_mode
- total_token
slot_usage:
content:
description: |
The final LLM response text (message.content from API response).
PROV-O: prov:generated - the entity produced by this activity.
This is the primary output shown to users and used for downstream processing.
slot_uri: prov:generated
range: string
required: true
examples:
- value: The Rijksmuseum is a national museum in Amsterdam dedicated to Dutch
arts and history.
description: Extracted heritage institution description
reasoning_content:
description: |
Interleaved Thinking - the model's chain-of-thought reasoning.
PROV-O: prov:hadPlan - documents HOW the agent reasoned.
**GLM 4.7 Interleaved Thinking**:
GLM 4.7 returns `reasoning_content` in every response, exposing the
model's step-by-step reasoning process. This enables:
1. **Schema Validation**: Model reasons about LinkML constraints before generating output
2. **Ontology Mapping**: Explicit reasoning about CIDOC-CRM, CPOV, TOOI class mappings
3. **RDF Quality**: Chain-of-thought validates triple construction
4. **Transparency**: Full audit trail of extraction decisions
May be null for providers that don't expose reasoning (Claude, GPT-4).
slot_uri: prov:hadPlan
range: string
required: false
examples:
- value: 'The user is asking about Dutch heritage institutions. I need to
identify: 1) Institution name: Rijksmuseum, 2) Type: Museum (maps to InstitutionTypeEnum.MUSEUM),
3) Location: Amsterdam (city in Noord-Holland province)...'
description: GLM 4.7 interleaved thinking showing explicit schema reasoning
model:
description: |
The LLM model identifier from the API response.
PROV-O: Part of prov:wasAssociatedWith - identifies the specific model version.
Common values:
- glm-4.7: ZhipuAI GLM 4.7 (with Interleaved Thinking)
- glm-4.6: ZhipuAI GLM 4.6
- claude-3-opus-20240229: Anthropic Claude Opus
- gpt-4-turbo: OpenAI GPT-4 Turbo
slot_uri: schema:softwareVersion
range: string
required: true
examples:
- value: glm-4.7
description: ZhipuAI GLM 4.7 with Interleaved Thinking
provider:
description: |
The LLM provider/platform.
PROV-O: prov:wasAssociatedWith - the agent (organization) providing the model.
Used by DSPy to route requests and track provider-specific behavior.
slot_uri: prov:wasAssociatedWith
range: LLMProviderEnum
required: true
examples:
- value: zai
description: ZhipuAI (Z.AI) - GLM models
request_id:
description: |
Unique request ID from the LLM provider API (for tracing/debugging).
Enables correlation with provider logs for troubleshooting.
slot_uri: dct:identifier
range: string
required: false
examples:
- value: req_8f3a2b1c4d5e6f7g
description: Provider-assigned request identifier
created:
description: |
Timestamp when the LLM response was generated (from API response).
PROV-O: prov:endedAtTime - when the inference activity completed.
slot_uri: prov:endedAtTime
range: datetime
required: true
examples:
- value: '2025-12-23T10:30:00Z'
description: UTC timestamp of response generation
prompt_token:
description: |
Number of tokens in the input prompt.
From API response: usage.prompt_tokens
slot_uri: schema:value
range: integer
minimum_value: 0
examples:
- value: 150
description: 150 tokens in the input prompt
completion_token:
description: |
Number of tokens in the model's response (content + reasoning_content).
From API response: usage.completion_tokens
Note: For GLM 4.7, this includes tokens from both content and reasoning_content.
slot_uri: schema:value
range: integer
minimum_value: 0
examples:
- value: 450
description: 450 tokens in the completion (content + reasoning)
total_token:
description: |
Total tokens used (prompt + completion).
From API response: usage.total_tokens
slot_uri: schema:value
range: integer
minimum_value: 0
examples:
- value: 600
description: 600 total tokens (150 prompt + 450 completion)
cached_token:
description: |
Number of prompt tokens served from cache (if provider supports caching).
From API response: usage.prompt_tokens_details.cached_tokens
Cached tokens typically have reduced cost and latency.
slot_uri: schema:value
range: integer
minimum_value: 0
required: false
examples:
- value: 50
description: 50 tokens served from provider's prompt cache
finish_reason:
description: |
Why the model stopped generating (from API response).
Common values:
- stop: Natural completion (hit stop token)
- length: Hit max_tokens limit
- tool_calls: Model invoked a tool (function calling)
- content_filter: Response filtered for safety
slot_uri: schema:status
range: FinishReasonEnum
required: false
examples:
- value: stop
description: Model completed naturally
latency_ms:
description: |
Response latency in milliseconds (time from request to response).
Measured client-side (includes network time).
slot_uri: schema:duration
range: integer
minimum_value: 0
required: false
examples:
- value: 1250
description: 1.25 seconds total response time
cost_usd:
description: |
Estimated cost in USD for this LLM call.
For Z.AI Coding Plan: $0.00 (free tier for GLM models)
For other providers: calculated from token counts and pricing
slot_uri: schema:price
range: float
minimum_value: 0.0
required: false
examples:
- value: 0.0
description: Free (Z.AI Coding Plan)
- value: 0.015
description: OpenAI GPT-4 Turbo cost estimate
thinking_mode:
description: |
The GLM 4.7 thinking mode used for this request.
**Available Modes**:
- **enabled**: Thinking enabled (default) - model reasons before responding
- **disabled**: Thinking disabled - faster responses, no reasoning_content
- **interleaved**: Interleaved thinking - think between tool calls (default behavior)
- **preserved**: Preserved thinking - retain reasoning across turns (Coding Plan default)
slot_uri: schema:actionOption
range: ThinkingModeEnum
required: false
examples:
- value: preserved
description: Preserved thinking for multi-turn agent conversations
- value: interleaved
description: Default interleaved thinking between tool calls
- value: disabled
description: Disabled for fast, simple queries
clear_thinking:
description: |
Whether to clear previous reasoning_content from context.
**Preserved Thinking Control**:
- **false**: Preserved Thinking enabled (keep reasoning, better cache hits)
- **true**: Clear previous reasoning (default for standard API)
**Z.AI Coding Plan**: Default is `false` (Preserved Thinking enabled)
**Critical Implementation Note**:
When clear_thinking is false, you MUST return the EXACT, UNMODIFIED
reasoning_content back to the API in subsequent turns.
slot_uri: schema:Boolean
range: boolean
required: false
examples:
- value: false
description: Keep reasoning for Preserved Thinking (recommended)
- value: true
description: Clear previous reasoning (fresh context each turn)
specificity_annotation:
range: SpecificityAnnotation
inlined: true
template_specificity:
range: TemplateSpecificityScores
inlined: true
comments:
- reasoning_content is the key field for Interleaved Thinking (GLM 4.7)
- Store reasoning_content for debugging, auditing, and DSPy optimization
- 'Z.AI Coding Plan endpoint: https://api.z.ai/api/coding/paas/v4/chat/completions'
- 'For DSPy: use LLMResponse to track all LLM calls in the pipeline'
- See AGENTS.md Rule 11 for Z.AI API configuration
see_also:
- https://www.w3.org/TR/prov-o/
- https://api.z.ai/docs
- https://dspy-docs.vercel.app/