# Digital Platform Discovery Guide A comprehensive guide for discovering and documenting digital platforms used by heritage custodians. ## Overview Digital platform discovery is the process of identifying and cataloging the online systems, discovery portals, and digital integrations used by heritage institutions. This documentation enables understanding of how heritage collections are made accessible online. ## Prerequisites - Access to FireCrawl scraping tools - Basic understanding of YAML structure - Knowledge of common heritage digital platforms ## Discovery Workflow ### Step 1: Initial Website Mapping Use FireCrawl to discover all URLs on the custodian's website: ``` Tool: firecrawl_firecrawl_map Parameters: url: https://www.example-archive.nl/ search: "onderzoeken collecties zoeken about over" limit: 50 ``` Look for these common page patterns: - `/onderzoeken` or `/research` - Main research/discovery page - `/collecties` or `/collections` - Collection information - `/zoeken` or `/search` - Search interfaces - `/over-ons` or `/about` - Organization information - `/contact` - Contact and system information ### Step 2: Scrape Key Pages Scrape the main pages with relevant content: ``` Tool: firecrawl_firecrawl_scrape Parameters: url: https://www.example-archive.nl/onderzoeken formats: ["markdown", "html"] onlyMainContent: true ``` ### Step 3: Identify Digital Platforms Look for these platform categories in the scraped content: | Category | Examples | What to Look For | |----------|----------|------------------| | **Collection Management System** | MAIS-Flexis, Adlib, CollectiveAccess | "Powered by", system names in footer, API references | | **Discovery Portals** | Beeldbank, Archiefstukken, Genealogie | Links to search interfaces, item counts | | **External Integrations** | Archieven.nl, Europeana, Memorix | Partner logos, integration links, federation mentions | | **APIs & Data Services** | OAI-PMH, SPARQL, REST APIs | Developer documentation, endpoint URLs | ### Step 4: Document Platform Details For each platform discovered, document: ```yaml collection_management_system: system_name: MAIS-Flexis vendor: DE REE Archiefsystemen vendor_url: https://www.de-ree.nl/ version: null # If unknown primary_use: archival_description provenance: source_url: https://www.example-archive.nl/over-ons xpath: /html/body/main/div[2]/p[3] # Where info was found retrieved_on: "2025-01-15T10:30:00Z" retrieval_agent: firecrawl ``` ### Step 5: Calculate Item Counts When platforms display item counts, document them: ```yaml auxiliary_digital_platforms: - platform_name: Beeldbank platform_url: https://beeldbank.example-archive.nl/ platform_type: DISCOVERY_PORTAL content_type: images items_indexed: 125000 description: "Digitized photographs, maps, and visual materials" ``` ### Step 6: Document External Integrations ```yaml external_platform_integrations: - platform_name: Archieven.nl integration_type: discovery_aggregator integration_url: https://www.archieven.nl/nl/zoeken?mivast=123 items_contributed: 450000 sync_frequency: daily ``` ## Common Dutch Heritage Platforms ### Collection Management Systems | System | Vendor | Common Users | |--------|--------|--------------| | MAIS-Flexis | DE REE Archiefsystemen | Regional archives | | Adlib | Axiell | Museums | | CollectiveAccess | Whirl-i-Gig | Various | | ArchivesSpace | Lyrasis | Archives | | Memorix Maior | Picturae | Archives, libraries | ### Discovery Aggregators | Platform | URL | Description | |----------|-----|-------------| | Archieven.nl | archieven.nl | Dutch archival finding aids | | Delpher | delpher.nl | Digitized newspapers, books | | Europeana | europeana.eu | European cultural heritage | | Collectie Nederland | collectienederland.nl | Dutch museum collections | | Geheugen van Nederland | geheugenvannederland.nl | Digital heritage portal | ### Regional Platforms | Platform | Region | Type | |----------|--------|------| | Geheugen van Drenthe | Drenthe | Regional memory | | Brabants Historisch Informatie Centrum | Noord-Brabant | Regional archives | | Beeldbank Amsterdam | Amsterdam | Image archives | ## Provenance Requirements Every discovery claim MUST include: | Field | Required | Description | |-------|----------|-------------| | `source_url` | YES | URL where platform was discovered | | `xpath` | RECOMMENDED | XPath to element mentioning platform | | `retrieved_on` | YES | ISO 8601 timestamp of discovery | | `retrieval_agent` | YES | Tool used (firecrawl, playwright, manual) | ## Output Structure The final `digital_platform_discovery_summary` should include: ```yaml digital_platform_discovery_summary: discovery_metadata: retrieval_agent: firecrawl retrieval_timestamp: "2025-01-15T10:30:00Z" source_url: https://www.example-archive.nl/onderzoeken xpath_base: /html/body/main/section[2] html_file: web/GHCID/example-archive.nl/rendered.html platforms_discovered: 7 total_items_indexed: 545393 collection_management_system: system_name: MAIS-Flexis vendor: DE REE Archiefsystemen # ... full details auxiliary_digital_platforms: - platform_name: Beeldbank # ... full details - platform_name: Archiefstukken # ... full details external_platform_integrations: - platform_name: Archieven.nl # ... full details ``` ## Example: Drents Archief Discovery The Drents Archief (NL-DR-ASS-A-DA) provides a comprehensive example: ### Discovered Platforms 1. **Collection Management**: MAIS-Flexis by DE REE Archiefsystemen 2. **Beeldbank**: 252,183 images 3. **Archiefstukken**: 1,155 finding aids 4. **Genealogie**: Person name search 5. **Kaarten**: 7,700 maps 6. **Kranten**: 276,852 newspaper pages 7. **Film en Geluid**: 8,658 audiovisual items ### External Integrations - Archieven.nl (discovery aggregator) - Memorix (digital asset management) - Archives Portal Europe ## Quality Checklist Before submitting digital platform discovery: - [ ] All platforms have source URLs - [ ] Item counts are documented where available - [ ] Collection management system identified (if known) - [ ] External integrations listed - [ ] Provenance timestamps included - [ ] XPath references provided for key claims ## Tools Reference | Tool | MCP Name | Best For | |------|----------|----------| | FireCrawl Map | `firecrawl_firecrawl_map` | Discovering all URLs | | FireCrawl Scrape | `firecrawl_firecrawl_scrape` | Extracting page content | | FireCrawl Search | `firecrawl_firecrawl_search` | Finding specific platforms | | Playwright Snapshot | `playwright_browser_snapshot` | JavaScript-heavy pages | ## Related Documentation - [Rule 25: Digital Platform Discovery](.opencode/DIGITAL_PLATFORM_DISCOVERY_RULE.md) - Agent reference - [Rule 26: Person Data Provenance](.opencode/PERSON_DATA_PROVENANCE_RULE.md) - Staff discovery - [Drents Archief Example](data/custodian/NL-DR-ASS-A-DA.yaml) - Reference implementation --- **Version**: 1.0 **Last Updated**: 2025-01-15 **Author**: GLAM Data Extraction Project