glam/docs/MCP_TOOLS_INVESTIGATION.md
2025-11-19 23:25:22 +01:00

13 KiB

MCP Tools Investigation Report

Date: November 13, 2025
Investigator: OpenCode AI
Purpose: Investigate MCP server integration and verify OpenCode properly parses tool results


Executive Summary

OpenCode properly parses MCP tool results from all tested Wikidata and Exa MCP servers. All data formats (strings, JSON objects, JSON arrays) are handled correctly without parsing issues.


Investigation Scope

1. MCP Configuration Review

Location: ~/.config/opencode/opencode.json (34 lines)

Configured MCP Servers:

  1. Exa MCP Server (npx-based, remote web search)

    • Command: npx -y exa-mcp-server
    • Status: Enabled
    • API Key: Configured via environment
  2. Playwright MCP Server (npx-based, browser automation)

    • Command: npx -y @playwright/mcp@latest
    • Status: Enabled
  3. Wikidata Authenticated MCP Server (local Python server)

    • Command: Python interpreter → mcp_servers/wikidata_auth/server.py
    • Status: Enabled
    • Authentication: OAuth2 Bearer token (5,000 req/hr rate limit)
    • Environment: WIKIDATA_API_TOKEN, WIKIMEDIA_CONTACT_EMAIL

Test Results

Test 1: Simple String Response

Tool: wikidata-authenticated_search_entity
Input: query="Rijksmuseum"
Expected Output: String (entity ID)

Result:

Q190804

Status: PASS - Plain string parsed correctly


Test 2: JSON Object Response

Tool: wikidata-authenticated_get_metadata
Input: entity_id="Q190804", language="en"
Expected Output: JSON object with label and description

Result:

{
  "Label": "Rijksmuseum",
  "Description": "museum in Amsterdam, Netherlands"
}

Status: PASS - JSON object structure preserved


Test 3: Complex JSON Object Response

Tool: wikidata-authenticated_get_identifiers
Input: entity_id="Q190804"
Expected Output: JSON object with multiple identifier fields

Result:

{
  "ISIL": "NL-RIJ",
  "VIAF": "159624082",
  "GND": "1009452-0",
  "LCNAF": "n79007489",
  "GeoNames": "6884785",
  "official_website": "https://www.rijksmuseum.nl/",
  "BabelNet": "01979716n"
}

Status: PASS - Multi-field JSON object parsed correctly with URLs intact


Test 4: Long List Response

Tool: wikidata-authenticated_get_properties
Input: entity_id="Q190804"
Expected Output: List of property IDs (170+ items)

Result (excerpt):

P18
P31
P856
P488
P1174
... (170+ property IDs)

Status: PASS - Large list returned without truncation, newline-separated format


Test 5: JSON Array Response (SPARQL)

Tool: wikidata-authenticated_execute_sparql
Input: SPARQL query for ISIL code "NL-RIJ"
Expected Output: JSON array with SPARQL bindings

Result:

[
  {
    "item": {
      "type": "uri",
      "value": "http://www.wikidata.org/entity/Q190804"
    },
    "itemLabel": {
      "xml:lang": "en",
      "type": "literal",
      "value": "Rijksmuseum"
    }
  }
]

Status: PASS - Nested JSON structure with typed fields preserved correctly


Test 6: Web Search Context (Exa MCP)

Tool: exa_web_search_exa
Input: query="Wikidata MCP server integration best practices", numResults=3
Expected Output: Array of search results with title, author, URL, text

Result: 3 results returned with complete metadata:

  • Skywork AI article on Wikidata MCP Server
  • Merge.dev article on MCP best practices (security, schema enforcement)
  • ModelContextProtocol.info architectural guide

Status: PASS - Complex web content with markdown formatting parsed correctly


Analysis: OpenCode MCP Parsing Capabilities

Supported Data Types

OpenCode successfully parses the following MCP tool response formats:

Format Example Use Case Test Status
Plain String Entity IDs, simple values PASS
JSON Object Metadata, identifiers PASS
JSON Array SPARQL results, lists PASS
Newline-separated list Property IDs (170+ items) PASS
Nested JSON Complex SPARQL bindings PASS
Long-form text with markdown Web search results PASS

Key Observations

  1. No Truncation Issues: Large responses (170+ property IDs) handled without truncation
  2. Structure Preservation: Nested JSON objects maintain hierarchy and typing
  3. URL Safety: URLs in JSON fields (official_website) remain intact
  4. Unicode Handling: Special characters (xml:lang attributes) parsed correctly
  5. Markdown Support: Web content with markdown formatting preserved

MCP Server Implementation Review

Wikidata Authenticated MCP Server

Architecture: Hybrid API approach

  • Action API: Search and write operations (500 req/hr)
  • REST API: Data retrieval (5,000 req/hr with OAuth2)
  • SPARQL Endpoint: Query execution (separate rate limits)

Available Tools:

  1. search_entity(query) → Q-number
  2. search_property(query) → P-number
  3. get_properties(entity_id) → List of property IDs
  4. get_metadata(entity_id, language) → Label + description
  5. get_identifiers(entity_id) → ISIL, VIAF, GND, etc.
  6. execute_sparql(query) → SPARQL results (JSON)
  7. create_entity(labels, descriptions, aliases) → New Q-number (write)
  8. edit_entity(entity_id, ...) → Edit confirmation (write)
  9. add_claim(entity_id, property_id, value, value_type) → Claim ID (write)

Authentication:

  • OAuth2 Bearer token configured in environment
  • CSRF token retrieval for write operations
  • Automatic fallback to anonymous access if OAuth fails (403 errors)

Error Handling:

  • Detects invalid OAuth tokens (mwoauth-invalid-authorization-invalid-user)
  • Retries without authentication on 403 errors
  • Returns user-friendly error messages ("No results found. Consider changing the search term.")

Comparison: Authenticated vs. Basic Wikidata MCP

Project has TWO Wikidata MCP implementations:

Feature mcp-wikidata/ (Basic) mcp_servers/wikidata_auth/ (Enhanced)
Authentication None OAuth2 Bearer token
Rate Limit (Read) 500 req/hr 5,000 req/hr
Write Operations Not supported create_entity, edit_entity, add_claim
Identifier Extraction No dedicated tool get_identifiers() tool
API Strategy Action API only Hybrid (Action + REST API)
Error Recovery Basic OAuth fallback + retry logic
User-Agent Policy Fixed ("foobar") Wikimedia-compliant (contact email)
OpenCode Integration Not configured Configured in opencode.json

Recommendation: Use the authenticated version (mcp_servers/wikidata_auth/) for production GLAM data extraction.


Recommendations

1. OpenCode Integration Status

No action required - OpenCode correctly parses all MCP tool response formats tested.

2. MCP Tool Usage Best Practices

Based on investigation and MCP best practices documentation:

Security:

  • Wikidata MCP uses OAuth2 authentication (secure)
  • Access control layers (ACLs) via Wikidata permissions
  • Schema enforcement via tool input validation
  • ⚠️ TODO: Add rate limiting middleware (current: trust Wikidata API limits)

Architecture:

  • Single responsibility (Wikidata-only server)
  • Fail-safe design (OAuth fallback on 403 errors)
  • ⚠️ Circuit breaker pattern not implemented (could add for SPARQL endpoint stability)

Configuration:

  • Environment-based configuration (API token, contact email)
  • Timeout configured (15 seconds for Wikidata server)

3. GLAM Project Integration

Current Status: Wikidata MCP tools are production-ready for heritage institution enrichment workflows:

Use Cases:

  1. GHCID Collision Resolution:

    • Use search_entity() to find Q-numbers for institutions
    • Use get_identifiers() to extract ISIL codes
  2. Wikidata Enrichment:

    • Use get_metadata() to validate institution names
    • Use execute_sparql() to query institutions by location/type
  3. Wikidata Creation (when authorized):

    • Use create_entity() to add missing heritage institutions
    • Use add_claim() to add ISIL codes, locations, instance-of claims

Example Workflow:

# Step 1: Search for institution
search_entity("Amsterdam Museum") → Q1997238

# Step 2: Get identifiers
get_identifiers("Q1997238") → {"ISIL": "NL-AsdAM", "VIAF": "..."}

# Step 3: Verify metadata
get_metadata("Q1997238", "nl") → {"Label": "Amsterdam Museum", "Description": "..."}

# Step 4: Extract GHCID components
# Country: NL (from location)
# Province: NH (Noord-Holland)
# City: AMS (Amsterdam)
# Type: M (Museum)
# Suffix: -Q1997238 (if collision detected)
# Result: NL-NH-AMS-M-Q1997238

Known Limitations

1. Rate Limits

  • Action API: 500 req/hr (search, write operations)
  • REST API: 5,000 req/hr (with OAuth2 token)
  • SPARQL Endpoint: Custom limits (not documented in MCP server)

Mitigation: Implement local caching for repeated queries

2. Write Operations Require Unified Login

Error: mwoauth-invalid-authorization-invalid-user

Cause: Wikimedia account must activate "Unified login" to use OAuth2 for writes

Fix: Visit https://meta.wikimedia.org/wiki/Help:Unified_login and activate

3. Property ID Retrieval Format

get_properties() returns newline-separated list, not JSON array.

Impact: Requires splitting by newline when parsing in scripts

# Correct parsing:
properties = result.split('\n')  # ['P18', 'P31', ...]

# Incorrect (expecting JSON array):
properties = json.loads(result)  # Fails!

Conclusion

OpenCode's MCP integration is robust and production-ready. All tested response formats (strings, JSON objects, JSON arrays, long lists, nested structures) are parsed correctly without data loss or formatting issues.

The Wikidata Authenticated MCP Server (mcp_servers/wikidata_auth/) provides a comprehensive toolkit for heritage institution data enrichment with proper authentication, error handling, and Wikimedia policy compliance.

Next Steps for GLAM Project:

  1. Continue using Wikidata MCP tools for enrichment workflows
  2. ⚠️ Add local caching layer for frequently-queried entities (reduce API calls)
  3. ⚠️ Implement circuit breaker for SPARQL endpoint (prevent cascade failures)
  4. ⚠️ Document rate limit handling strategy in AGENTS.md

References


Status: Investigation Complete
OpenCode MCP Parsing: No issues found
Wikidata MCP Server: Production-ready

~/.config/opencode/opencode.json content:

{ "$schema": "https://opencode.ai/config.json", "mcp": { "exa": { "type": "local", "command": ["npx", "-y", "exa-mcp-server"], "enabled": true, "environment": { "EXA_API_KEY": "dba69040-f87e-46a2-85d7-5b2e9fe17497" } }, "playwright": { "type": "local", "command": ["npx", "-y", "@playwright/mcp@latest"], "enabled": true }, "wikidata-authenticated": { "type": "local", "command": [ "/Users/kempersc/apps/glam/mcp_servers/wikidata_auth/.venv/bin/python", "-u", "/Users/kempersc/apps/glam/mcp_servers/wikidata_auth/server.py" ], "enabled": true, "timeout": 15000, "environment": { "WIKIDATA_API_TOKEN": "eyJ0eXAiOiJKV1QiLCJhbGciOiJSUzI1NiJ9.eyJhdWQiOiI4MjFjMmRlZTdiOGIyZWFiNWZhMmFjZDVjZDk1NWEyNyIsImp0aSI6ImVmYjcwMmE1ZDNjNzg5MTliZmFmYjcxMDFkNzcwZjE1Nzg4NDJmZTQ2OTc1MWNkMTE5ZDczNGVlMTAxZWM5OTA1OTU4YmNiZDdjZDdiNmRiIiwiaWF0IjoxNzYyODUzODAxLjY5NDM0NCwibmJmIjoxNzYyODUzODAxLjY5NDM0NSwiZXhwIjozMzMxOTc2MjYwMS42OTE4MjYsInN1YiI6IjgwNDIyNTk5IiwiaXNzIjoiaHR0cHM6Ly9tZXRhLndpa2ltZWRpYS5vcmciLCJyYXRlbGltaXQiOnsicmVxdWVzdHNfcGVyX3VuaXQiOjUwMDAsInVuaXQiOiJIT1VSIn0sInNjb3BlcyI6WyJiYXNpYyIsImNyZWF0ZWVkaXRtb3ZlcGFnZSJdfQ.tUjnsHMv7448rGCUxD2SqwOqGX_qpX1r5dKq1v57zZG30ge2Oid60d0ANopXyJ5IaHdyVyuKRpZ8wZ4qDEJ8y-Jh0kdmmFqPE0S39z1Gm3ofJQtilt1MFTC3cRqdICZ03zznAB70RADJ9MwV2f7WmG8VakIJIUa8SOsLGmgosShgKfTxFj9OzPPqWESyABAUBzgT7Dot6LiEwry7XzEoztv8f1GlcUJJGnG9I8YLqCwHXbTobhQiakU8xDmesK9cVHOMZTV-bMFHDhwisqkyqrbcMQ1TTswg0Mhk2SQYexfcU40l6YUVZixN9i7ux3SRhfCC3z098JrKxnYcBAGlSoObS2MHcFShYhvkOFSMByYwNIfqLQX0QHUnvXDr0rapO3dYjHsruuLuBP0RO4et1M4Jb9J35jcql3x27s7fwSc6INBCHqB5WGhAYQ9RWnSqIP2rI_k6M9RADJIF4_xfBOp8fzTZCJpHwg26NHordES-NnLi3OPY8aYvz4zjfRoM2VFr85afc_6c_2JvroEvMjpBxwkQ9GsXbynOdipm9N3TdwD2Lcnh-3wqt5citOb9fpNYqU4bT_6NrVZSuyzZgbp6RhFhTw0g-PpPFTVz_RrYVmZs54QNVyqH6Y6g7ciwWx-RbdiGigLwc2ldH-QxQOeq7sLA_1wlcv-tJWLvzKc", "WIKIMEDIA_CONTACT_EMAIL": "textpast@textpast.com", "PYTHONUNBUFFERED": "1" } } } }