glam/docs/NDE_TO_RDF_FIELD_MAPPING.md
2025-12-02 14:36:01 +01:00

215 lines
8.1 KiB
Markdown

# NDE to Heritage Custodian RDF Field Mapping
This document details which fields from the enriched NDE YAML entries are mapped to RDF and which remain unmapped.
## Summary
| Category | Mapped | Unmapped | Coverage |
|----------|--------|----------|----------|
| Core Identifiers | 10 | 0 | 100% |
| Labels & Names | 3 | 1 | 75% |
| Location | 4 | 2 | 67% |
| Timestamps | 2 | 0 | 100% |
| Social Media | 5 | 0 | 100% |
| External IDs | 6 | 10+ | ~40% |
| Google Maps | 3 | 15+ | ~15% |
| Wikidata Claims | 2 | 30+ | ~5% |
| Provenance | 0 | 15+ | 0% |
## Mapped Fields
### Core Identifiers (✅ Fully Mapped)
| Source Field | RDF Property | Notes |
|--------------|--------------|-------|
| `ghcid.ghcid_current` | `skos:notation` on `crm:E42_Identifier` | GHCID scheme |
| `ghcid.ghcid_numeric` | `dcterms:identifier`, `skos:notation` | Primary identifier |
| `ghcid.ghcid_uuid` | `skos:notation`, `schema:url` | UUID v5 |
| `ghcid.ghcid_uuid_sha256` | `skos:notation`, `schema:url` | UUID v8 |
| `ghcid.record_id` | `skos:notation`, `schema:url` | Database record ID |
| `identifiers[].identifier_scheme` | `skos:inScheme` | Identifier scheme |
| `identifiers[].identifier_value` | `skos:notation` | Identifier value |
| `wikidata_enrichment.wikidata_entity_id` | `owl:sameAs`, `skos:notation` | Wikidata linking |
### Labels & Names (✅ Mostly Mapped)
| Source Field | RDF Property | Notes |
|--------------|--------------|-------|
| `custodian_name.claim_value` | `skos:prefLabel@nl` | Primary label |
| `wikidata_enrichment.wikidata_label_nl` | `skos:prefLabel@nl` | Fallback label |
| `wikidata_enrichment.wikidata_label_en` | `skos:altLabel@en` | English alt label |
**Unmapped:**
- `wikidata_enrichment.wikidata_aliases` - multilingual aliases
- `wikidata_enrichment.wikidata_description_*` - descriptions
### Custodian Type (✅ Mapped)
| Source Field | RDF Property | Notes |
|--------------|--------------|-------|
| `original_entry.type[]` | `hc:custodian_type` | Type code → enum |
### Location & Place (✅ Partially Mapped)
| Source Field | RDF Property | Notes |
|--------------|--------------|-------|
| `google_maps_enrichment.coordinates.latitude` | `schema:latitude` | Coordinates |
| `google_maps_enrichment.coordinates.longitude` | `schema:longitude` | Coordinates |
| `google_maps_enrichment.formatted_address` | `schema:address` | Full address |
| `ghcid.location_resolution.geonames_id` | `schema:containedInPlace` | GeoNames URI |
**Unmapped:**
- `google_maps_enrichment.address_components[]` - structured address parts
- `google_maps_enrichment.utc_offset_minutes` - timezone
### Timestamps (✅ Mapped)
| Source Field | RDF Property | Notes |
|--------------|--------------|-------|
| `processing_timestamp` | `schema:dateCreated` | Record creation |
| `provenance.generated_at` | `schema:dateModified` | Last modification |
### Digital Platform (✅ Mapped)
| Source Field | RDF Property | Notes |
|--------------|--------------|-------|
| `wikidata_enrichment.wikidata_official_website` | `foaf:homepage`, `schema:url` | Primary website |
| `google_maps_enrichment.website` | `foaf:homepage` | Fallback website |
| `wikidata_claims.P8768_online_catalog_url.value` | `hc:collection_url` | Catalog URL(s) |
### Social Media Profiles (✅ Mapped)
| Source Field | RDF Property | Notes |
|--------------|--------------|-------|
| `web_claims.claims[].claim_type=social_*` | `hc:platform_type` | Platform type |
| `web_claims.claims[].claim_value` | `foaf:accountServiceHomepage` | Profile URL |
| Extracted from URL | `foaf:accountName` | Username |
| `web_claims.claims[].source_url` | `prov:wasDerivedFrom` | Source provenance |
| `web_claims.claims[].retrieved_on` | `prov:generatedAtTime` | Timestamp |
| `wikidata_claims.P2002_x__twitter__username.value` | `foaf:accountName` | Twitter from Wikidata |
### External Identifiers (✅ Partially Mapped)
| Source Field | RDF Property | Notes |
|--------------|--------------|-------|
| `wikidata_enrichment.wikidata_identifiers.viaf` | `skos:notation` | VIAF ID |
| `wikidata_enrichment.wikidata_identifiers.gnd` | `skos:notation` | GND ID |
| `wikidata_enrichment.wikidata_identifiers.isni` | `skos:notation` | ISNI |
| `wikidata_enrichment.wikidata_identifiers.lcnaf` | `skos:notation` | Library of Congress |
| `wikidata_enrichment.wikidata_identifiers.ringgold` | `skos:notation` | Ringgold ID |
---
## Unmapped Fields
### Google Maps Enrichment (❌ Not Mapped)
These fields contain valuable data but are not yet mapped to RDF:
| Field | Type | Potential Use |
|-------|------|---------------|
| `opening_hours.weekday_text[]` | Array<string> | Operating hours display |
| `opening_hours.periods[]` | Array<object> | Structured hours |
| `rating` | Float | User rating (1-5) |
| `total_ratings` | Integer | Number of reviews |
| `reviews[]` | Array<object> | User reviews with text, rating, author |
| `photo_urls[]` | Array<string> | Photo URLs |
| `photos_metadata[]` | Array<object> | Photo details, attributions |
| `phone_international` | String | Phone number |
| `phone_local` | String | Local phone format |
| `editorial_summary` | String | Google's description |
| `business_status` | String | OPERATIONAL, CLOSED, etc. |
| `google_maps_url` | String | Link to Google Maps |
| `street_view_url` | String | Street View URL |
| `google_place_types[]` | Array<string> | Google's type classification |
| `place_id` | String | Google Places ID |
**Rationale for not mapping:**
- Opening hours: Requires `schema:OpeningHoursSpecification` modeling
- Reviews: Privacy considerations, volatile data
- Photos: External dependencies, storage concerns
- Phone: Could be added with `schema:telephone`
### Wikidata Claims (❌ Mostly Not Mapped)
Many Wikidata properties are retrieved but not converted to RDF:
| Wikidata Property | Label | Notes |
|-------------------|-------|-------|
| P131 | Located in admin entity | Administrative hierarchy |
| P276 | Location | Building/structure |
| P17 | Country | Country entity |
| P571 | Inception | Founding date |
| P576 | Dissolved | Closure date |
| P84 | Architect | Building architect |
| P669 | Located on street | Street name |
| P1619 | Date of opening | Opening date |
| P166 | Award received | Awards |
| P2652 | Partnership with | Partnerships |
| P1343 | Described by source | Sources |
| P2851 | Payment types accepted | Payment methods |
| P3273 | Actorenregister ID | Dutch actors register |
| P646 | Freebase ID | Legacy identifier |
| P402 | OSM relation ID | OpenStreetMap |
**Rationale:**
- Many require complex modeling (dates with qualifiers)
- Some are volatile (awards, partnerships change)
- Some are domain-specific extensions
### Provenance Metadata (❌ Not Mapped)
| Field | Notes |
|-------|-------|
| `provenance.sources.*` | Detailed source tracking |
| `provenance.data_tier_summary` | Data quality tiers |
| `provenance.notes` | Human notes |
| `wikidata_enrichment.api_metadata.*` | API call details |
| `web_enrichment.web_archives[]` | WARC archive info |
**Rationale:**
- Could use PROV-O ontology for detailed provenance
- Currently simplified to timestamps only
### Museum Register Enrichment (❌ Not Mapped)
| Field | Notes |
|-------|-------|
| `museum_register_enrichment.registered_since` | Registration date |
| `museum_register_enrichment.province` | Province |
| `museum_register_enrichment.source_provenance` | Source details |
---
## Future Enhancements
### Priority 1: High Value, Easy to Add
- `google_maps_enrichment.phone_international``schema:telephone`
- `google_maps_enrichment.editorial_summary``schema:description`
- `wikidata_claims.P571_inception.value``schema:foundingDate`
### Priority 2: Moderate Complexity
- Opening hours → `schema:OpeningHoursSpecification`
- `google_maps_enrichment.rating``schema:aggregateRating`
- Wikidata relationships (P131, P276) → location hierarchy
### Priority 3: Complex Modeling Required
- Full provenance chain → PROV-O
- Organizational history → change events
- Collection metadata → separate entities
---
## Script Location
`scripts/nde_to_hc_rdf.py`
## Output Location
`data/nde/rdf/{ghcid_numeric}.ttl`
---
*Generated: 2025-12-02*
*Total entries converted: 1,619*
*Total triples: 114,705*