- Apply Rule 39: RiC-O style hasOrHad*/isOrWas* for temporal slots - Apply Rule 43: Singular noun convention (keywords → keyword) - Update slot references to match renamed slot files - Maintain schema integrity across all class definitions
339 lines
13 KiB
YAML
339 lines
13 KiB
YAML
id: https://nde.nl/ontology/hc/class/LLMResponse
|
|
name: llm_response_class
|
|
title: LLM Response Class
|
|
version: 1.0.0
|
|
prefixes:
|
|
linkml: https://w3id.org/linkml/
|
|
hc: https://nde.nl/ontology/hc/
|
|
schema: http://schema.org/
|
|
prov: http://www.w3.org/ns/prov#
|
|
dct: http://purl.org/dc/terms/
|
|
xsd: http://www.w3.org/2001/XMLSchema#
|
|
imports:
|
|
- linkml:types
|
|
- ../metadata
|
|
- ./SpecificityAnnotation
|
|
- ./TemplateSpecificityScores
|
|
- ../enums/LLMProviderEnum
|
|
- ../enums/FinishReasonEnum
|
|
- ../enums/ThinkingModeEnum
|
|
- ../slots/content
|
|
- ../slots/reasoning_content
|
|
- ../slots/model
|
|
- ../slots/provider
|
|
- ../slots/prompt_token
|
|
- ../slots/completion_token
|
|
- ../slots/total_token
|
|
- ../slots/cached_token
|
|
- ../slots/finish_reason
|
|
- ../slots/latency_ms
|
|
- ../slots/thinking_mode
|
|
- ../slots/clear_thinking
|
|
- ../slots/created
|
|
- ../slots/cost_usd
|
|
- ../slots/request_id
|
|
- ../slots/specificity_annotation
|
|
- ../slots/template_specificity
|
|
default_range: string
|
|
|
|
classes:
|
|
LLMResponse:
|
|
class_uri: prov:Activity
|
|
description: |
|
|
Provenance metadata for LLM API responses, including GLM 4.7 Thinking Modes.
|
|
|
|
Captures complete response metadata from LLM providers (ZhipuAI GLM, Anthropic,
|
|
OpenAI, etc.) for traceability and analysis. The key innovation is capturing
|
|
`reasoning_content` - the chain-of-thought reasoning that GLM 4.7 exposes
|
|
through its three thinking modes.
|
|
|
|
**GLM 4.7 Thinking Modes** (https://docs.z.ai/guides/capabilities/thinking-mode):
|
|
|
|
1. **Interleaved Thinking** (default, since GLM-4.5):
|
|
- Model thinks between tool calls and after receiving tool results
|
|
- Enables complex, step-by-step reasoning with tool chaining
|
|
- Returns `reasoning_content` alongside `content` in every response
|
|
|
|
2. **Preserved Thinking** (new in GLM-4.7):
|
|
- Retains reasoning_content from previous assistant turns in context
|
|
- Preserves reasoning continuity across multi-turn conversations
|
|
- Improves model performance and increases cache hit rates
|
|
- **Enabled by default on Coding Plan endpoint**
|
|
- Requires returning EXACT, UNMODIFIED reasoning_content back to API
|
|
- Set via: `"clear_thinking": false` (do NOT clear previous reasoning)
|
|
|
|
3. **Turn-level Thinking** (new in GLM-4.7):
|
|
- Control reasoning computation on a per-turn basis
|
|
- Enable/disable thinking independently for each request in a session
|
|
- Useful for balancing speed (simple queries) vs accuracy (complex tasks)
|
|
- Set via: `"thinking": {"type": "enabled"}` or `"thinking": {"type": "disabled"}`
|
|
|
|
**Critical Implementation Note for Preserved Thinking**:
|
|
When using Preserved Thinking with tool calls, thinking blocks MUST be:
|
|
1. Explicitly preserved in the messages array
|
|
2. Returned together with tool results
|
|
3. Kept in EXACT original sequence (no reordering/editing)
|
|
|
|
**PROV-O Alignment**:
|
|
- LLMResponse IS a prov:Activity (the inference process)
|
|
- content IS prov:Entity (the generated output)
|
|
- model/provider IS prov:Agent (the AI system)
|
|
- reasoning_content documents the prov:Plan (how the agent reasoned)
|
|
- prompt (input) IS prov:used (input to the activity)
|
|
|
|
**Use Cases**:
|
|
- DSPy RAG responses with reasoning traces
|
|
- Heritage institution extraction provenance
|
|
- LinkML schema conformity validation
|
|
- Ontology mapping decision logs
|
|
- Multi-turn agent conversations with preserved context
|
|
exact_mappings:
|
|
- prov:Activity
|
|
close_mappings:
|
|
- schema:Action
|
|
- schema:CreativeWork
|
|
slots:
|
|
- cached_token
|
|
- clear_thinking
|
|
- completion_token
|
|
- content
|
|
- cost_usd
|
|
- created
|
|
- finish_reason
|
|
- latency_ms
|
|
- model
|
|
- prompt_token
|
|
- provider
|
|
- reasoning_content
|
|
- request_id
|
|
- specificity_annotation
|
|
- template_specificity
|
|
- thinking_mode
|
|
- total_token
|
|
slot_usage:
|
|
content:
|
|
description: |
|
|
The final LLM response text (message.content from API response).
|
|
PROV-O: prov:generated - the entity produced by this activity.
|
|
|
|
This is the primary output shown to users and used for downstream processing.
|
|
slot_uri: prov:generated
|
|
range: string
|
|
required: true
|
|
examples:
|
|
- value: The Rijksmuseum is a national museum in Amsterdam dedicated to Dutch
|
|
arts and history.
|
|
description: Extracted heritage institution description
|
|
reasoning_content:
|
|
description: |
|
|
Interleaved Thinking - the model's chain-of-thought reasoning.
|
|
PROV-O: prov:hadPlan - documents HOW the agent reasoned.
|
|
|
|
**GLM 4.7 Interleaved Thinking**:
|
|
GLM 4.7 returns `reasoning_content` in every response, exposing the
|
|
model's step-by-step reasoning process. This enables:
|
|
|
|
1. **Schema Validation**: Model reasons about LinkML constraints before generating output
|
|
2. **Ontology Mapping**: Explicit reasoning about CIDOC-CRM, CPOV, TOOI class mappings
|
|
3. **RDF Quality**: Chain-of-thought validates triple construction
|
|
4. **Transparency**: Full audit trail of extraction decisions
|
|
|
|
May be null for providers that don't expose reasoning (Claude, GPT-4).
|
|
slot_uri: prov:hadPlan
|
|
range: string
|
|
required: false
|
|
examples:
|
|
- value: 'The user is asking about Dutch heritage institutions. I need to
|
|
identify: 1) Institution name: Rijksmuseum, 2) Type: Museum (maps to InstitutionTypeEnum.MUSEUM),
|
|
3) Location: Amsterdam (city in Noord-Holland province)...'
|
|
description: GLM 4.7 interleaved thinking showing explicit schema reasoning
|
|
model:
|
|
description: |
|
|
The LLM model identifier from the API response.
|
|
PROV-O: Part of prov:wasAssociatedWith - identifies the specific model version.
|
|
|
|
Common values:
|
|
- glm-4.7: ZhipuAI GLM 4.7 (with Interleaved Thinking)
|
|
- glm-4.6: ZhipuAI GLM 4.6
|
|
- claude-3-opus-20240229: Anthropic Claude Opus
|
|
- gpt-4-turbo: OpenAI GPT-4 Turbo
|
|
slot_uri: schema:softwareVersion
|
|
range: string
|
|
required: true
|
|
examples:
|
|
- value: glm-4.7
|
|
description: ZhipuAI GLM 4.7 with Interleaved Thinking
|
|
provider:
|
|
description: |
|
|
The LLM provider/platform.
|
|
PROV-O: prov:wasAssociatedWith - the agent (organization) providing the model.
|
|
|
|
Used by DSPy to route requests and track provider-specific behavior.
|
|
slot_uri: prov:wasAssociatedWith
|
|
range: LLMProviderEnum
|
|
required: true
|
|
examples:
|
|
- value: zai
|
|
description: ZhipuAI (Z.AI) - GLM models
|
|
request_id:
|
|
description: |
|
|
Unique request ID from the LLM provider API (for tracing/debugging).
|
|
Enables correlation with provider logs for troubleshooting.
|
|
slot_uri: dct:identifier
|
|
range: string
|
|
required: false
|
|
examples:
|
|
- value: req_8f3a2b1c4d5e6f7g
|
|
description: Provider-assigned request identifier
|
|
created:
|
|
description: |
|
|
Timestamp when the LLM response was generated (from API response).
|
|
PROV-O: prov:endedAtTime - when the inference activity completed.
|
|
slot_uri: prov:endedAtTime
|
|
range: datetime
|
|
required: true
|
|
examples:
|
|
- value: '2025-12-23T10:30:00Z'
|
|
description: UTC timestamp of response generation
|
|
prompt_token:
|
|
description: |
|
|
Number of tokens in the input prompt.
|
|
From API response: usage.prompt_tokens
|
|
slot_uri: schema:value
|
|
range: integer
|
|
minimum_value: 0
|
|
examples:
|
|
- value: 150
|
|
description: 150 tokens in the input prompt
|
|
completion_token:
|
|
description: |
|
|
Number of tokens in the model's response (content + reasoning_content).
|
|
From API response: usage.completion_tokens
|
|
|
|
Note: For GLM 4.7, this includes tokens from both content and reasoning_content.
|
|
slot_uri: schema:value
|
|
range: integer
|
|
minimum_value: 0
|
|
examples:
|
|
- value: 450
|
|
description: 450 tokens in the completion (content + reasoning)
|
|
total_token:
|
|
description: |
|
|
Total tokens used (prompt + completion).
|
|
From API response: usage.total_tokens
|
|
slot_uri: schema:value
|
|
range: integer
|
|
minimum_value: 0
|
|
examples:
|
|
- value: 600
|
|
description: 600 total tokens (150 prompt + 450 completion)
|
|
cached_token:
|
|
description: |
|
|
Number of prompt tokens served from cache (if provider supports caching).
|
|
From API response: usage.prompt_tokens_details.cached_tokens
|
|
|
|
Cached tokens typically have reduced cost and latency.
|
|
slot_uri: schema:value
|
|
range: integer
|
|
minimum_value: 0
|
|
required: false
|
|
examples:
|
|
- value: 50
|
|
description: 50 tokens served from provider's prompt cache
|
|
finish_reason:
|
|
description: |
|
|
Why the model stopped generating (from API response).
|
|
|
|
Common values:
|
|
- stop: Natural completion (hit stop token)
|
|
- length: Hit max_tokens limit
|
|
- tool_calls: Model invoked a tool (function calling)
|
|
- content_filter: Response filtered for safety
|
|
slot_uri: schema:status
|
|
range: FinishReasonEnum
|
|
required: false
|
|
examples:
|
|
- value: stop
|
|
description: Model completed naturally
|
|
latency_ms:
|
|
description: |
|
|
Response latency in milliseconds (time from request to response).
|
|
Measured client-side (includes network time).
|
|
slot_uri: schema:duration
|
|
range: integer
|
|
minimum_value: 0
|
|
required: false
|
|
examples:
|
|
- value: 1250
|
|
description: 1.25 seconds total response time
|
|
cost_usd:
|
|
description: |
|
|
Estimated cost in USD for this LLM call.
|
|
|
|
For Z.AI Coding Plan: $0.00 (free tier for GLM models)
|
|
For other providers: calculated from token counts and pricing
|
|
slot_uri: schema:price
|
|
range: float
|
|
minimum_value: 0.0
|
|
required: false
|
|
examples:
|
|
- value: 0.0
|
|
description: Free (Z.AI Coding Plan)
|
|
- value: 0.015
|
|
description: OpenAI GPT-4 Turbo cost estimate
|
|
thinking_mode:
|
|
description: |
|
|
The GLM 4.7 thinking mode used for this request.
|
|
|
|
**Available Modes**:
|
|
- **enabled**: Thinking enabled (default) - model reasons before responding
|
|
- **disabled**: Thinking disabled - faster responses, no reasoning_content
|
|
- **interleaved**: Interleaved thinking - think between tool calls (default behavior)
|
|
- **preserved**: Preserved thinking - retain reasoning across turns (Coding Plan default)
|
|
slot_uri: schema:actionOption
|
|
range: ThinkingModeEnum
|
|
required: false
|
|
examples:
|
|
- value: preserved
|
|
description: Preserved thinking for multi-turn agent conversations
|
|
- value: interleaved
|
|
description: Default interleaved thinking between tool calls
|
|
- value: disabled
|
|
description: Disabled for fast, simple queries
|
|
clear_thinking:
|
|
description: |
|
|
Whether to clear previous reasoning_content from context.
|
|
|
|
**Preserved Thinking Control**:
|
|
- **false**: Preserved Thinking enabled (keep reasoning, better cache hits)
|
|
- **true**: Clear previous reasoning (default for standard API)
|
|
|
|
**Z.AI Coding Plan**: Default is `false` (Preserved Thinking enabled)
|
|
|
|
**Critical Implementation Note**:
|
|
When clear_thinking is false, you MUST return the EXACT, UNMODIFIED
|
|
reasoning_content back to the API in subsequent turns.
|
|
slot_uri: schema:Boolean
|
|
range: boolean
|
|
required: false
|
|
examples:
|
|
- value: false
|
|
description: Keep reasoning for Preserved Thinking (recommended)
|
|
- value: true
|
|
description: Clear previous reasoning (fresh context each turn)
|
|
specificity_annotation:
|
|
range: SpecificityAnnotation
|
|
inlined: true
|
|
template_specificity:
|
|
range: TemplateSpecificityScores
|
|
inlined: true
|
|
comments:
|
|
- reasoning_content is the key field for Interleaved Thinking (GLM 4.7)
|
|
- Store reasoning_content for debugging, auditing, and DSPy optimization
|
|
- 'Z.AI Coding Plan endpoint: https://api.z.ai/api/coding/paas/v4/chat/completions'
|
|
- 'For DSPy: use LLMResponse to track all LLM calls in the pipeline'
|
|
- See AGENTS.md Rule 11 for Z.AI API configuration
|
|
see_also:
|
|
- https://www.w3.org/TR/prov-o/
|
|
- https://api.z.ai/docs
|
|
- https://dspy-docs.vercel.app/
|