223 lines
6.9 KiB
Markdown
223 lines
6.9 KiB
Markdown
# Digital Platform Discovery Guide
|
|
|
|
A comprehensive guide for discovering and documenting digital platforms used by heritage custodians.
|
|
|
|
## Overview
|
|
|
|
Digital platform discovery is the process of identifying and cataloging the online systems, discovery portals, and digital integrations used by heritage institutions. This documentation enables understanding of how heritage collections are made accessible online.
|
|
|
|
## Prerequisites
|
|
|
|
- Access to FireCrawl scraping tools
|
|
- Basic understanding of YAML structure
|
|
- Knowledge of common heritage digital platforms
|
|
|
|
## Discovery Workflow
|
|
|
|
### Step 1: Initial Website Mapping
|
|
|
|
Use FireCrawl to discover all URLs on the custodian's website:
|
|
|
|
```
|
|
Tool: firecrawl_firecrawl_map
|
|
Parameters:
|
|
url: https://www.example-archive.nl/
|
|
search: "onderzoeken collecties zoeken about over"
|
|
limit: 50
|
|
```
|
|
|
|
Look for these common page patterns:
|
|
- `/onderzoeken` or `/research` - Main research/discovery page
|
|
- `/collecties` or `/collections` - Collection information
|
|
- `/zoeken` or `/search` - Search interfaces
|
|
- `/over-ons` or `/about` - Organization information
|
|
- `/contact` - Contact and system information
|
|
|
|
### Step 2: Scrape Key Pages
|
|
|
|
Scrape the main pages with relevant content:
|
|
|
|
```
|
|
Tool: firecrawl_firecrawl_scrape
|
|
Parameters:
|
|
url: https://www.example-archive.nl/onderzoeken
|
|
formats: ["markdown", "html"]
|
|
onlyMainContent: true
|
|
```
|
|
|
|
### Step 3: Identify Digital Platforms
|
|
|
|
Look for these platform categories in the scraped content:
|
|
|
|
| Category | Examples | What to Look For |
|
|
|----------|----------|------------------|
|
|
| **Collection Management System** | MAIS-Flexis, Adlib, CollectiveAccess | "Powered by", system names in footer, API references |
|
|
| **Discovery Portals** | Beeldbank, Archiefstukken, Genealogie | Links to search interfaces, item counts |
|
|
| **External Integrations** | Archieven.nl, Europeana, Memorix | Partner logos, integration links, federation mentions |
|
|
| **APIs & Data Services** | OAI-PMH, SPARQL, REST APIs | Developer documentation, endpoint URLs |
|
|
|
|
### Step 4: Document Platform Details
|
|
|
|
For each platform discovered, document:
|
|
|
|
```yaml
|
|
collection_management_system:
|
|
system_name: MAIS-Flexis
|
|
vendor: DE REE Archiefsystemen
|
|
vendor_url: https://www.de-ree.nl/
|
|
version: null # If unknown
|
|
primary_use: archival_description
|
|
provenance:
|
|
source_url: https://www.example-archive.nl/over-ons
|
|
xpath: /html/body/main/div[2]/p[3] # Where info was found
|
|
retrieved_on: "2025-01-15T10:30:00Z"
|
|
retrieval_agent: firecrawl
|
|
```
|
|
|
|
### Step 5: Calculate Item Counts
|
|
|
|
When platforms display item counts, document them:
|
|
|
|
```yaml
|
|
auxiliary_digital_platforms:
|
|
- platform_name: Beeldbank
|
|
platform_url: https://beeldbank.example-archive.nl/
|
|
platform_type: DISCOVERY_PORTAL
|
|
content_type: images
|
|
items_indexed: 125000
|
|
description: "Digitized photographs, maps, and visual materials"
|
|
```
|
|
|
|
### Step 6: Document External Integrations
|
|
|
|
```yaml
|
|
external_platform_integrations:
|
|
- platform_name: Archieven.nl
|
|
integration_type: discovery_aggregator
|
|
integration_url: https://www.archieven.nl/nl/zoeken?mivast=123
|
|
items_contributed: 450000
|
|
sync_frequency: daily
|
|
```
|
|
|
|
## Common Dutch Heritage Platforms
|
|
|
|
### Collection Management Systems
|
|
|
|
| System | Vendor | Common Users |
|
|
|--------|--------|--------------|
|
|
| MAIS-Flexis | DE REE Archiefsystemen | Regional archives |
|
|
| Adlib | Axiell | Museums |
|
|
| CollectiveAccess | Whirl-i-Gig | Various |
|
|
| ArchivesSpace | Lyrasis | Archives |
|
|
| Memorix Maior | Picturae | Archives, libraries |
|
|
|
|
### Discovery Aggregators
|
|
|
|
| Platform | URL | Description |
|
|
|----------|-----|-------------|
|
|
| Archieven.nl | archieven.nl | Dutch archival finding aids |
|
|
| Delpher | delpher.nl | Digitized newspapers, books |
|
|
| Europeana | europeana.eu | European cultural heritage |
|
|
| Collectie Nederland | collectienederland.nl | Dutch museum collections |
|
|
| Geheugen van Nederland | geheugenvannederland.nl | Digital heritage portal |
|
|
|
|
### Regional Platforms
|
|
|
|
| Platform | Region | Type |
|
|
|----------|--------|------|
|
|
| Geheugen van Drenthe | Drenthe | Regional memory |
|
|
| Brabants Historisch Informatie Centrum | Noord-Brabant | Regional archives |
|
|
| Beeldbank Amsterdam | Amsterdam | Image archives |
|
|
|
|
## Provenance Requirements
|
|
|
|
Every discovery claim MUST include:
|
|
|
|
| Field | Required | Description |
|
|
|-------|----------|-------------|
|
|
| `source_url` | YES | URL where platform was discovered |
|
|
| `xpath` | RECOMMENDED | XPath to element mentioning platform |
|
|
| `retrieved_on` | YES | ISO 8601 timestamp of discovery |
|
|
| `retrieval_agent` | YES | Tool used (firecrawl, playwright, manual) |
|
|
|
|
## Output Structure
|
|
|
|
The final `digital_platform_discovery_summary` should include:
|
|
|
|
```yaml
|
|
digital_platform_discovery_summary:
|
|
discovery_metadata:
|
|
retrieval_agent: firecrawl
|
|
retrieval_timestamp: "2025-01-15T10:30:00Z"
|
|
source_url: https://www.example-archive.nl/onderzoeken
|
|
xpath_base: /html/body/main/section[2]
|
|
html_file: web/GHCID/example-archive.nl/rendered.html
|
|
platforms_discovered: 7
|
|
total_items_indexed: 545393
|
|
|
|
collection_management_system:
|
|
system_name: MAIS-Flexis
|
|
vendor: DE REE Archiefsystemen
|
|
# ... full details
|
|
|
|
auxiliary_digital_platforms:
|
|
- platform_name: Beeldbank
|
|
# ... full details
|
|
- platform_name: Archiefstukken
|
|
# ... full details
|
|
|
|
external_platform_integrations:
|
|
- platform_name: Archieven.nl
|
|
# ... full details
|
|
```
|
|
|
|
## Example: Drents Archief Discovery
|
|
|
|
The Drents Archief (NL-DR-ASS-A-DA) provides a comprehensive example:
|
|
|
|
### Discovered Platforms
|
|
|
|
1. **Collection Management**: MAIS-Flexis by DE REE Archiefsystemen
|
|
2. **Beeldbank**: 252,183 images
|
|
3. **Archiefstukken**: 1,155 finding aids
|
|
4. **Genealogie**: Person name search
|
|
5. **Kaarten**: 7,700 maps
|
|
6. **Kranten**: 276,852 newspaper pages
|
|
7. **Film en Geluid**: 8,658 audiovisual items
|
|
|
|
### External Integrations
|
|
- Archieven.nl (discovery aggregator)
|
|
- Memorix (digital asset management)
|
|
- Archives Portal Europe
|
|
|
|
## Quality Checklist
|
|
|
|
Before submitting digital platform discovery:
|
|
|
|
- [ ] All platforms have source URLs
|
|
- [ ] Item counts are documented where available
|
|
- [ ] Collection management system identified (if known)
|
|
- [ ] External integrations listed
|
|
- [ ] Provenance timestamps included
|
|
- [ ] XPath references provided for key claims
|
|
|
|
## Tools Reference
|
|
|
|
| Tool | MCP Name | Best For |
|
|
|------|----------|----------|
|
|
| FireCrawl Map | `firecrawl_firecrawl_map` | Discovering all URLs |
|
|
| FireCrawl Scrape | `firecrawl_firecrawl_scrape` | Extracting page content |
|
|
| FireCrawl Search | `firecrawl_firecrawl_search` | Finding specific platforms |
|
|
| Playwright Snapshot | `playwright_browser_snapshot` | JavaScript-heavy pages |
|
|
|
|
## Related Documentation
|
|
|
|
- [Rule 25: Digital Platform Discovery](.opencode/DIGITAL_PLATFORM_DISCOVERY_RULE.md) - Agent reference
|
|
- [Rule 26: Person Data Provenance](.opencode/PERSON_DATA_PROVENANCE_RULE.md) - Staff discovery
|
|
- [Drents Archief Example](data/custodian/NL-DR-ASS-A-DA.yaml) - Reference implementation
|
|
|
|
---
|
|
|
|
**Version**: 1.0
|
|
**Last Updated**: 2025-01-15
|
|
**Author**: GLAM Data Extraction Project
|