# Digital Platform Discovery Guide

A comprehensive guide for discovering and documenting digital platforms used by heritage custodians.

## Overview

Digital platform discovery is the process of identifying and cataloging the online systems, discovery portals, and digital integrations used by heritage institutions. This documentation enables understanding of how heritage collections are made accessible online.

## Prerequisites

- Access to FireCrawl scraping tools
- Basic understanding of YAML structure
- Knowledge of common heritage digital platforms

## Discovery Workflow

### Step 1: Initial Website Mapping

Use FireCrawl to discover all URLs on the custodian's website:

```
Tool: firecrawl_firecrawl_map
Parameters:
  url: https://www.example-archive.nl/
  search: "onderzoeken collecties zoeken about over"
  limit: 50
```

Look for these common page patterns:
- `/onderzoeken` or `/research` - Main research/discovery page
- `/collecties` or `/collections` - Collection information
- `/zoeken` or `/search` - Search interfaces
- `/over-ons` or `/about` - Organization information
- `/contact` - Contact and system information

### Step 2: Scrape Key Pages

Scrape the main pages with relevant content:

```
Tool: firecrawl_firecrawl_scrape
Parameters:
  url: https://www.example-archive.nl/onderzoeken
  formats: ["markdown", "html"]
  onlyMainContent: true
```

### Step 3: Identify Digital Platforms

Look for these platform categories in the scraped content:

| Category | Examples | What to Look For |
|----------|----------|------------------|
| **Collection Management System** | MAIS-Flexis, Adlib, CollectiveAccess | "Powered by", system names in footer, API references |
| **Discovery Portals** | Beeldbank, Archiefstukken, Genealogie | Links to search interfaces, item counts |
| **External Integrations** | Archieven.nl, Europeana, Memorix | Partner logos, integration links, federation mentions |
| **APIs & Data Services** | OAI-PMH, SPARQL, REST APIs | Developer documentation, endpoint URLs |

### Step 4: Document Platform Details

For each platform discovered, document:

```yaml
collection_management_system:
  system_name: MAIS-Flexis
  vendor: DE REE Archiefsystemen
  vendor_url: https://www.de-ree.nl/
  version: null  # If unknown
  primary_use: archival_description
  provenance:
    source_url: https://www.example-archive.nl/over-ons
    xpath: /html/body/main/div[2]/p[3]  # Where info was found
    retrieved_on: "2025-01-15T10:30:00Z"
    retrieval_agent: firecrawl
```

### Step 5: Calculate Item Counts

When platforms display item counts, document them:

```yaml
auxiliary_digital_platforms:
  - platform_name: Beeldbank
    platform_url: https://beeldbank.example-archive.nl/
    platform_type: DISCOVERY_PORTAL
    content_type: images
    items_indexed: 125000
    description: "Digitized photographs, maps, and visual materials"
```

### Step 6: Document External Integrations

```yaml
external_platform_integrations:
  - platform_name: Archieven.nl
    integration_type: discovery_aggregator
    integration_url: https://www.archieven.nl/nl/zoeken?mivast=123
    items_contributed: 450000
    sync_frequency: daily
```

## Common Dutch Heritage Platforms

### Collection Management Systems

| System | Vendor | Common Users |
|--------|--------|--------------|
| MAIS-Flexis | DE REE Archiefsystemen | Regional archives |
| Adlib | Axiell | Museums |
| CollectiveAccess | Whirl-i-Gig | Various |
| ArchivesSpace | Lyrasis | Archives |
| Memorix Maior | Picturae | Archives, libraries |

### Discovery Aggregators

| Platform | URL | Description |
|----------|-----|-------------|
| Archieven.nl | archieven.nl | Dutch archival finding aids |
| Delpher | delpher.nl | Digitized newspapers, books |
| Europeana | europeana.eu | European cultural heritage |
| Collectie Nederland | collectienederland.nl | Dutch museum collections |
| Geheugen van Nederland | geheugenvannederland.nl | Digital heritage portal |

### Regional Platforms

| Platform | Region | Type |
|----------|--------|------|
| Geheugen van Drenthe | Drenthe | Regional memory |
| Brabants Historisch Informatie Centrum | Noord-Brabant | Regional archives |
| Beeldbank Amsterdam | Amsterdam | Image archives |

## Provenance Requirements

Every discovery claim MUST include:

| Field | Required | Description |
|-------|----------|-------------|
| `source_url` | YES | URL where platform was discovered |
| `xpath` | RECOMMENDED | XPath to element mentioning platform |
| `retrieved_on` | YES | ISO 8601 timestamp of discovery |
| `retrieval_agent` | YES | Tool used (firecrawl, playwright, manual) |

## Output Structure

The final `digital_platform_discovery_summary` should include:

```yaml
digital_platform_discovery_summary:
  discovery_metadata:
    retrieval_agent: firecrawl
    retrieval_timestamp: "2025-01-15T10:30:00Z"
    source_url: https://www.example-archive.nl/onderzoeken
    xpath_base: /html/body/main/section[2]
    html_file: web/GHCID/example-archive.nl/rendered.html
  platforms_discovered: 7
  total_items_indexed: 545393
  
collection_management_system:
  system_name: MAIS-Flexis
  vendor: DE REE Archiefsystemen
  # ... full details
  
auxiliary_digital_platforms:
  - platform_name: Beeldbank
    # ... full details
  - platform_name: Archiefstukken
    # ... full details
    
external_platform_integrations:
  - platform_name: Archieven.nl
    # ... full details
```

## Example: Drents Archief Discovery

The Drents Archief (NL-DR-ASS-A-DA) provides a comprehensive example:

### Discovered Platforms

1. **Collection Management**: MAIS-Flexis by DE REE Archiefsystemen
2. **Beeldbank**: 252,183 images
3. **Archiefstukken**: 1,155 finding aids
4. **Genealogie**: Person name search
5. **Kaarten**: 7,700 maps
6. **Kranten**: 276,852 newspaper pages
7. **Film en Geluid**: 8,658 audiovisual items

### External Integrations
- Archieven.nl (discovery aggregator)
- Memorix (digital asset management)
- Archives Portal Europe

## Quality Checklist

Before submitting digital platform discovery:

- [ ] All platforms have source URLs
- [ ] Item counts are documented where available
- [ ] Collection management system identified (if known)
- [ ] External integrations listed
- [ ] Provenance timestamps included
- [ ] XPath references provided for key claims

## Tools Reference

| Tool | MCP Name | Best For |
|------|----------|----------|
| FireCrawl Map | `firecrawl_firecrawl_map` | Discovering all URLs |
| FireCrawl Scrape | `firecrawl_firecrawl_scrape` | Extracting page content |
| FireCrawl Search | `firecrawl_firecrawl_search` | Finding specific platforms |
| Playwright Snapshot | `playwright_browser_snapshot` | JavaScript-heavy pages |

## Related Documentation

- [Rule 25: Digital Platform Discovery](.opencode/DIGITAL_PLATFORM_DISCOVERY_RULE.md) - Agent reference
- [Rule 26: Person Data Provenance](.opencode/PERSON_DATA_PROVENANCE_RULE.md) - Staff discovery
- [Drents Archief Example](data/custodian/NL-DR-ASS-A-DA.yaml) - Reference implementation

---

**Version**: 1.0  
**Last Updated**: 2025-01-15  
**Author**: GLAM Data Extraction Project