glam/docs/DIGITAL_PLATFORM_DISCOVERY_GUIDE.md
2025-12-14 17:09:55 +01:00

223 lines
6.9 KiB
Markdown

# Digital Platform Discovery Guide
A comprehensive guide for discovering and documenting digital platforms used by heritage custodians.
## Overview
Digital platform discovery is the process of identifying and cataloging the online systems, discovery portals, and digital integrations used by heritage institutions. This documentation enables understanding of how heritage collections are made accessible online.
## Prerequisites
- Access to FireCrawl scraping tools
- Basic understanding of YAML structure
- Knowledge of common heritage digital platforms
## Discovery Workflow
### Step 1: Initial Website Mapping
Use FireCrawl to discover all URLs on the custodian's website:
```
Tool: firecrawl_firecrawl_map
Parameters:
url: https://www.example-archive.nl/
search: "onderzoeken collecties zoeken about over"
limit: 50
```
Look for these common page patterns:
- `/onderzoeken` or `/research` - Main research/discovery page
- `/collecties` or `/collections` - Collection information
- `/zoeken` or `/search` - Search interfaces
- `/over-ons` or `/about` - Organization information
- `/contact` - Contact and system information
### Step 2: Scrape Key Pages
Scrape the main pages with relevant content:
```
Tool: firecrawl_firecrawl_scrape
Parameters:
url: https://www.example-archive.nl/onderzoeken
formats: ["markdown", "html"]
onlyMainContent: true
```
### Step 3: Identify Digital Platforms
Look for these platform categories in the scraped content:
| Category | Examples | What to Look For |
|----------|----------|------------------|
| **Collection Management System** | MAIS-Flexis, Adlib, CollectiveAccess | "Powered by", system names in footer, API references |
| **Discovery Portals** | Beeldbank, Archiefstukken, Genealogie | Links to search interfaces, item counts |
| **External Integrations** | Archieven.nl, Europeana, Memorix | Partner logos, integration links, federation mentions |
| **APIs & Data Services** | OAI-PMH, SPARQL, REST APIs | Developer documentation, endpoint URLs |
### Step 4: Document Platform Details
For each platform discovered, document:
```yaml
collection_management_system:
system_name: MAIS-Flexis
vendor: DE REE Archiefsystemen
vendor_url: https://www.de-ree.nl/
version: null # If unknown
primary_use: archival_description
provenance:
source_url: https://www.example-archive.nl/over-ons
xpath: /html/body/main/div[2]/p[3] # Where info was found
retrieved_on: "2025-01-15T10:30:00Z"
retrieval_agent: firecrawl
```
### Step 5: Calculate Item Counts
When platforms display item counts, document them:
```yaml
auxiliary_digital_platforms:
- platform_name: Beeldbank
platform_url: https://beeldbank.example-archive.nl/
platform_type: DISCOVERY_PORTAL
content_type: images
items_indexed: 125000
description: "Digitized photographs, maps, and visual materials"
```
### Step 6: Document External Integrations
```yaml
external_platform_integrations:
- platform_name: Archieven.nl
integration_type: discovery_aggregator
integration_url: https://www.archieven.nl/nl/zoeken?mivast=123
items_contributed: 450000
sync_frequency: daily
```
## Common Dutch Heritage Platforms
### Collection Management Systems
| System | Vendor | Common Users |
|--------|--------|--------------|
| MAIS-Flexis | DE REE Archiefsystemen | Regional archives |
| Adlib | Axiell | Museums |
| CollectiveAccess | Whirl-i-Gig | Various |
| ArchivesSpace | Lyrasis | Archives |
| Memorix Maior | Picturae | Archives, libraries |
### Discovery Aggregators
| Platform | URL | Description |
|----------|-----|-------------|
| Archieven.nl | archieven.nl | Dutch archival finding aids |
| Delpher | delpher.nl | Digitized newspapers, books |
| Europeana | europeana.eu | European cultural heritage |
| Collectie Nederland | collectienederland.nl | Dutch museum collections |
| Geheugen van Nederland | geheugenvannederland.nl | Digital heritage portal |
### Regional Platforms
| Platform | Region | Type |
|----------|--------|------|
| Geheugen van Drenthe | Drenthe | Regional memory |
| Brabants Historisch Informatie Centrum | Noord-Brabant | Regional archives |
| Beeldbank Amsterdam | Amsterdam | Image archives |
## Provenance Requirements
Every discovery claim MUST include:
| Field | Required | Description |
|-------|----------|-------------|
| `source_url` | YES | URL where platform was discovered |
| `xpath` | RECOMMENDED | XPath to element mentioning platform |
| `retrieved_on` | YES | ISO 8601 timestamp of discovery |
| `retrieval_agent` | YES | Tool used (firecrawl, playwright, manual) |
## Output Structure
The final `digital_platform_discovery_summary` should include:
```yaml
digital_platform_discovery_summary:
discovery_metadata:
retrieval_agent: firecrawl
retrieval_timestamp: "2025-01-15T10:30:00Z"
source_url: https://www.example-archive.nl/onderzoeken
xpath_base: /html/body/main/section[2]
html_file: web/GHCID/example-archive.nl/rendered.html
platforms_discovered: 7
total_items_indexed: 545393
collection_management_system:
system_name: MAIS-Flexis
vendor: DE REE Archiefsystemen
# ... full details
auxiliary_digital_platforms:
- platform_name: Beeldbank
# ... full details
- platform_name: Archiefstukken
# ... full details
external_platform_integrations:
- platform_name: Archieven.nl
# ... full details
```
## Example: Drents Archief Discovery
The Drents Archief (NL-DR-ASS-A-DA) provides a comprehensive example:
### Discovered Platforms
1. **Collection Management**: MAIS-Flexis by DE REE Archiefsystemen
2. **Beeldbank**: 252,183 images
3. **Archiefstukken**: 1,155 finding aids
4. **Genealogie**: Person name search
5. **Kaarten**: 7,700 maps
6. **Kranten**: 276,852 newspaper pages
7. **Film en Geluid**: 8,658 audiovisual items
### External Integrations
- Archieven.nl (discovery aggregator)
- Memorix (digital asset management)
- Archives Portal Europe
## Quality Checklist
Before submitting digital platform discovery:
- [ ] All platforms have source URLs
- [ ] Item counts are documented where available
- [ ] Collection management system identified (if known)
- [ ] External integrations listed
- [ ] Provenance timestamps included
- [ ] XPath references provided for key claims
## Tools Reference
| Tool | MCP Name | Best For |
|------|----------|----------|
| FireCrawl Map | `firecrawl_firecrawl_map` | Discovering all URLs |
| FireCrawl Scrape | `firecrawl_firecrawl_scrape` | Extracting page content |
| FireCrawl Search | `firecrawl_firecrawl_search` | Finding specific platforms |
| Playwright Snapshot | `playwright_browser_snapshot` | JavaScript-heavy pages |
## Related Documentation
- [Rule 25: Digital Platform Discovery](.opencode/DIGITAL_PLATFORM_DISCOVERY_RULE.md) - Agent reference
- [Rule 26: Person Data Provenance](.opencode/PERSON_DATA_PROVENANCE_RULE.md) - Staff discovery
- [Drents Archief Example](data/custodian/NL-DR-ASS-A-DA.yaml) - Reference implementation
---
**Version**: 1.0
**Last Updated**: 2025-01-15
**Author**: GLAM Data Extraction Project