glam/docs/MCP_TOOLS_INVESTIGATION.md

# MCP Tools Investigation Report

**Date**: November 13, 2025
**Investigator**: OpenCode AI
**Purpose**: Investigate MCP server integration and verify OpenCode properly parses tool results

---

## Executive Summary

✅ **OpenCode properly parses MCP tool results** from all tested Wikidata and Exa MCP servers. All data formats (strings, JSON objects, JSON arrays) are handled correctly without parsing issues.

---

## Investigation Scope

### 1. MCP Configuration Review

**Location**: `~/.config/opencode/opencode.json` (34 lines)

**Configured MCP Servers**:

1. **Exa MCP Server** (npx-based, remote web search)
   - Command: `npx -y exa-mcp-server`
   - Status: ✅ Enabled
   - API Key: Configured via environment

2. **Playwright MCP Server** (npx-based, browser automation)
   - Command: `npx -y @playwright/mcp@latest`
   - Status: ✅ Enabled

3. **Wikidata Authenticated MCP Server** (local Python server)
   - Command: Python interpreter → `mcp_servers/wikidata_auth/server.py`
   - Status: ✅ Enabled
   - Authentication: OAuth2 Bearer token (5,000 req/hr rate limit)
   - Environment: `WIKIDATA_API_TOKEN`, `WIKIMEDIA_CONTACT_EMAIL`

---

## Test Results

### Test 1: Simple String Response

**Tool**: `wikidata-authenticated_search_entity`
**Input**: `query="Rijksmuseum"`
**Expected Output**: String (entity ID)

**Result**:
```
Q190804
```

**Status**: ✅ **PASS** - Plain string parsed correctly

---

### Test 2: JSON Object Response

**Tool**: `wikidata-authenticated_get_metadata`
**Input**: `entity_id="Q190804"`, `language="en"`
**Expected Output**: JSON object with label and description

**Result**:
```json
{
  "Label": "Rijksmuseum",
  "Description": "museum in Amsterdam, Netherlands"
}
```

**Status**: ✅ **PASS** - JSON object structure preserved

---

### Test 3: Complex JSON Object Response

**Tool**: `wikidata-authenticated_get_identifiers`
**Input**: `entity_id="Q190804"`
**Expected Output**: JSON object with multiple identifier fields

**Result**:
```json
{
  "ISIL": "NL-RIJ",
  "VIAF": "159624082",
  "GND": "1009452-0",
  "LCNAF": "n79007489",
  "GeoNames": "6884785",
  "official_website": "https://www.rijksmuseum.nl/",
  "BabelNet": "01979716n"
}
```

**Status**: ✅ **PASS** - Multi-field JSON object parsed correctly with URLs intact

---

### Test 4: Long List Response

**Tool**: `wikidata-authenticated_get_properties`
**Input**: `entity_id="Q190804"`
**Expected Output**: List of property IDs (170+ items)

**Result** (excerpt):
```
P18
P31
P856
P488
P1174
... (170+ property IDs)
```

**Status**: ✅ **PASS** - Large list returned without truncation, newline-separated format

---

### Test 5: JSON Array Response (SPARQL)

**Tool**: `wikidata-authenticated_execute_sparql`
**Input**: SPARQL query for ISIL code "NL-RIJ"
**Expected Output**: JSON array with SPARQL bindings

**Result**:
```json
[
  {
    "item": {
      "type": "uri",
      "value": "http://www.wikidata.org/entity/Q190804"
    },
    "itemLabel": {
      "xml:lang": "en",
      "type": "literal",
      "value": "Rijksmuseum"
    }
  }
]
```

**Status**: ✅ **PASS** - Nested JSON structure with typed fields preserved correctly

---

### Test 6: Web Search Context (Exa MCP)

**Tool**: `exa_web_search_exa`
**Input**: `query="Wikidata MCP server integration best practices"`, `numResults=3`
**Expected Output**: Array of search results with title, author, URL, text

**Result**: 3 results returned with complete metadata:
- Skywork AI article on Wikidata MCP Server
- Merge.dev article on MCP best practices (security, schema enforcement)
- ModelContextProtocol.info architectural guide

**Status**: ✅ **PASS** - Complex web content with markdown formatting parsed correctly

---

## Analysis: OpenCode MCP Parsing Capabilities

### Supported Data Types

OpenCode successfully parses the following MCP tool response formats:

| Format | Example Use Case | Test Status |
|--------|------------------|-------------|
| **Plain String** | Entity IDs, simple values | ✅ PASS |
| **JSON Object** | Metadata, identifiers | ✅ PASS |
| **JSON Array** | SPARQL results, lists | ✅ PASS |
| **Newline-separated list** | Property IDs (170+ items) | ✅ PASS |
| **Nested JSON** | Complex SPARQL bindings | ✅ PASS |
| **Long-form text with markdown** | Web search results | ✅ PASS |

### Key Observations

1. **No Truncation Issues**: Large responses (170+ property IDs) handled without truncation
2. **Structure Preservation**: Nested JSON objects maintain hierarchy and typing
3. **URL Safety**: URLs in JSON fields (`official_website`) remain intact
4. **Unicode Handling**: Special characters (xml:lang attributes) parsed correctly
5. **Markdown Support**: Web content with markdown formatting preserved

---

## MCP Server Implementation Review

### Wikidata Authenticated MCP Server

**Architecture**: Hybrid API approach
- **Action API**: Search and write operations (500 req/hr)
- **REST API**: Data retrieval (5,000 req/hr with OAuth2)
- **SPARQL Endpoint**: Query execution (separate rate limits)

**Available Tools**:
1. `search_entity(query)` → Q-number
2. `search_property(query)` → P-number
3. `get_properties(entity_id)` → List of property IDs
4. `get_metadata(entity_id, language)` → Label + description
5. `get_identifiers(entity_id)` → ISIL, VIAF, GND, etc.
6. `execute_sparql(query)` → SPARQL results (JSON)
7. `create_entity(labels, descriptions, aliases)` → New Q-number (write)
8. `edit_entity(entity_id, ...)` → Edit confirmation (write)
9. `add_claim(entity_id, property_id, value, value_type)` → Claim ID (write)

**Authentication**:
- OAuth2 Bearer token configured in environment
- CSRF token retrieval for write operations
- Automatic fallback to anonymous access if OAuth fails (403 errors)

**Error Handling**:
- Detects invalid OAuth tokens (`mwoauth-invalid-authorization-invalid-user`)
- Retries without authentication on 403 errors
- Returns user-friendly error messages ("No results found. Consider changing the search term.")

---

## Comparison: Authenticated vs. Basic Wikidata MCP

**Project has TWO Wikidata MCP implementations**:

| Feature | `mcp-wikidata/` (Basic) | `mcp_servers/wikidata_auth/` (Enhanced) |
|---------|-------------------------|------------------------------------------|
| **Authentication** | None | OAuth2 Bearer token |
| **Rate Limit (Read)** | 500 req/hr | 5,000 req/hr |
| **Write Operations** | ❌ Not supported | ✅ `create_entity`, `edit_entity`, `add_claim` |
| **Identifier Extraction** | ❌ No dedicated tool | ✅ `get_identifiers()` tool |
| **API Strategy** | Action API only | Hybrid (Action + REST API) |
| **Error Recovery** | Basic | OAuth fallback + retry logic |
| **User-Agent Policy** | Fixed ("foobar") | Wikimedia-compliant (contact email) |
| **OpenCode Integration** | Not configured | ✅ Configured in opencode.json |

**Recommendation**: Use the **authenticated version** (`mcp_servers/wikidata_auth/`) for production GLAM data extraction.

---

## Recommendations

### 1. OpenCode Integration Status

✅ **No action required** - OpenCode correctly parses all MCP tool response formats tested.

### 2. MCP Tool Usage Best Practices

Based on investigation and [MCP best practices documentation](https://modelcontextprotocol.info/docs/best-practices/):

**Security**:
- ✅ Wikidata MCP uses OAuth2 authentication (secure)
- ✅ Access control layers (ACLs) via Wikidata permissions
- ✅ Schema enforcement via tool input validation
- ⚠️ **TODO**: Add rate limiting middleware (current: trust Wikidata API limits)

**Architecture**:
- ✅ Single responsibility (Wikidata-only server)
- ✅ Fail-safe design (OAuth fallback on 403 errors)
- ⚠️ Circuit breaker pattern not implemented (could add for SPARQL endpoint stability)

**Configuration**:
- ✅ Environment-based configuration (API token, contact email)
- ✅ Timeout configured (15 seconds for Wikidata server)

### 3. GLAM Project Integration

**Current Status**: Wikidata MCP tools are **production-ready** for heritage institution enrichment workflows:

**Use Cases**:
1. **GHCID Collision Resolution**:
   - Use `search_entity()` to find Q-numbers for institutions
   - Use `get_identifiers()` to extract ISIL codes

2. **Wikidata Enrichment**:
   - Use `get_metadata()` to validate institution names
   - Use `execute_sparql()` to query institutions by location/type

3. **Wikidata Creation** (when authorized):
   - Use `create_entity()` to add missing heritage institutions
   - Use `add_claim()` to add ISIL codes, locations, instance-of claims

**Example Workflow**:
```yaml
# Step 1: Search for institution
search_entity("Amsterdam Museum") → Q1997238

# Step 2: Get identifiers
get_identifiers("Q1997238") → {"ISIL": "NL-AsdAM", "VIAF": "..."}

# Step 3: Verify metadata
get_metadata("Q1997238", "nl") → {"Label": "Amsterdam Museum", "Description": "..."}

# Step 4: Extract GHCID components
# Country: NL (from location)
# Province: NH (Noord-Holland)
# City: AMS (Amsterdam)
# Type: M (Museum)
# Suffix: -Q1997238 (if collision detected)
# Result: NL-NH-AMS-M-Q1997238
```

---

## Known Limitations

### 1. Rate Limits
- **Action API**: 500 req/hr (search, write operations)
- **REST API**: 5,000 req/hr (with OAuth2 token)
- **SPARQL Endpoint**: Custom limits (not documented in MCP server)

**Mitigation**: Implement local caching for repeated queries

### 2. Write Operations Require Unified Login
**Error**: `mwoauth-invalid-authorization-invalid-user`

**Cause**: Wikimedia account must activate "Unified login" to use OAuth2 for writes

**Fix**: Visit https://meta.wikimedia.org/wiki/Help:Unified_login and activate

### 3. Property ID Retrieval Format
`get_properties()` returns **newline-separated** list, not JSON array.

**Impact**: Requires splitting by newline when parsing in scripts
```python
# Correct parsing:
properties = result.split('\n')  # ['P18', 'P31', ...]

# Incorrect (expecting JSON array):
properties = json.loads(result)  # Fails!
```

---

## Conclusion

**OpenCode's MCP integration is robust and production-ready.** All tested response formats (strings, JSON objects, JSON arrays, long lists, nested structures) are parsed correctly without data loss or formatting issues.

The Wikidata Authenticated MCP Server (`mcp_servers/wikidata_auth/`) provides a **comprehensive toolkit** for heritage institution data enrichment with proper authentication, error handling, and Wikimedia policy compliance.

**Next Steps for GLAM Project**:
1. ✅ Continue using Wikidata MCP tools for enrichment workflows
2. ⚠️ Add local caching layer for frequently-queried entities (reduce API calls)
3. ⚠️ Implement circuit breaker for SPARQL endpoint (prevent cascade failures)
4. ⚠️ Document rate limit handling strategy in AGENTS.md

---

## References

- **OpenCode MCP Docs**: https://opencode.ai/docs/mcp-servers/
- **MCP Best Practices**: https://modelcontextprotocol.info/docs/best-practices/
- **Wikidata API Documentation**: https://www.wikidata.org/w/api.php
- **Wikibase REST API**: https://www.wikidata.org/w/rest.php/wikibase/v1
- **Wikimedia Rate Limits**: https://www.wikidata.org/wiki/Wikidata:SPARQL_query_service/User_Manual#Query_limits
- **MCP Security (Merge.dev)**: https://www.merge.dev/blog/mcp-best-practices

---

**Status**: ✅ Investigation Complete
**OpenCode MCP Parsing**: ✅ No issues found
**Wikidata MCP Server**: ✅ Production-ready


~/.config/opencode/opencode.json content:

{
  "$schema": "https://opencode.ai/config.json",
  "mcp": {
    "exa": {
      "type": "local",
      "command": ["npx", "-y", "exa-mcp-server"],
      "enabled": true,
      "environment": {
        "EXA_API_KEY": "dba69040-f87e-46a2-85d7-5b2e9fe17497"
      }
    },
    "playwright": {
      "type": "local",
      "command": ["npx", "-y", "@playwright/mcp@latest"],
      "enabled": true
    },
    "wikidata-authenticated": {
      "type": "local",
      "command": [
        "/Users/kempersc/apps/glam/mcp_servers/wikidata_auth/.venv/bin/python",
        "-u",
        "/Users/kempersc/apps/glam/mcp_servers/wikidata_auth/server.py"
      ],
      "enabled": true,
      "timeout": 15000,
      "environment": {
        "WIKIDATA_API_TOKEN": "eyJ0eXAiOiJKV1QiLCJhbGciOiJSUzI1NiJ9.eyJhdWQiOiI4MjFjMmRlZTdiOGIyZWFiNWZhMmFjZDVjZDk1NWEyNyIsImp0aSI6ImVmYjcwMmE1ZDNjNzg5MTliZmFmYjcxMDFkNzcwZjE1Nzg4NDJmZTQ2OTc1MWNkMTE5ZDczNGVlMTAxZWM5OTA1OTU4YmNiZDdjZDdiNmRiIiwiaWF0IjoxNzYyODUzODAxLjY5NDM0NCwibmJmIjoxNzYyODUzODAxLjY5NDM0NSwiZXhwIjozMzMxOTc2MjYwMS42OTE4MjYsInN1YiI6IjgwNDIyNTk5IiwiaXNzIjoiaHR0cHM6Ly9tZXRhLndpa2ltZWRpYS5vcmciLCJyYXRlbGltaXQiOnsicmVxdWVzdHNfcGVyX3VuaXQiOjUwMDAsInVuaXQiOiJIT1VSIn0sInNjb3BlcyI6WyJiYXNpYyIsImNyZWF0ZWVkaXRtb3ZlcGFnZSJdfQ.tUjnsHMv7448rGCUxD2SqwOqGX_qpX1r5dKq1v57zZG30ge2Oid60d0ANopXyJ5IaHdyVyuKRpZ8wZ4qDEJ8y-Jh0kdmmFqPE0S39z1Gm3ofJQtilt1MFTC3cRqdICZ03zznAB70RADJ9MwV2f7WmG8VakIJIUa8SOsLGmgosShgKfTxFj9OzPPqWESyABAUBzgT7Dot6LiEwry7XzEoztv8f1GlcUJJGnG9I8YLqCwHXbTobhQiakU8xDmesK9cVHOMZTV-bMFHDhwisqkyqrbcMQ1TTswg0Mhk2SQYexfcU40l6YUVZixN9i7ux3SRhfCC3z098JrKxnYcBAGlSoObS2MHcFShYhvkOFSMByYwNIfqLQX0QHUnvXDr0rapO3dYjHsruuLuBP0RO4et1M4Jb9J35jcql3x27s7fwSc6INBCHqB5WGhAYQ9RWnSqIP2rI_k6M9RADJIF4_xfBOp8fzTZCJpHwg26NHordES-NnLi3OPY8aYvz4zjfRoM2VFr85afc_6c_2JvroEvMjpBxwkQ9GsXbynOdipm9N3TdwD2Lcnh-3wqt5citOb9fpNYqU4bT_6NrVZSuyzZgbp6RhFhTw0g-PpPFTVz_RrYVmZs54QNVyqH6Y6g7ciwWx-RbdiGigLwc2ldH-QxQOeq7sLA_1wlcv-tJWLvzKc",
        "WIKIMEDIA_CONTACT_EMAIL": "textpast@textpast.com",
        "PYTHONUNBUFFERED": "1"
      }
    }
  }
}