glam/docs/NDE_TO_RDF_FIELD_MAPPING.md
2025-12-02 14:36:01 +01:00

8.1 KiB

NDE to Heritage Custodian RDF Field Mapping

This document details which fields from the enriched NDE YAML entries are mapped to RDF and which remain unmapped.

Summary

Category Mapped Unmapped Coverage
Core Identifiers 10 0 100%
Labels & Names 3 1 75%
Location 4 2 67%
Timestamps 2 0 100%
Social Media 5 0 100%
External IDs 6 10+ ~40%
Google Maps 3 15+ ~15%
Wikidata Claims 2 30+ ~5%
Provenance 0 15+ 0%

Mapped Fields

Core Identifiers ( Fully Mapped)

Source Field RDF Property Notes
ghcid.ghcid_current skos:notation on crm:E42_Identifier GHCID scheme
ghcid.ghcid_numeric dcterms:identifier, skos:notation Primary identifier
ghcid.ghcid_uuid skos:notation, schema:url UUID v5
ghcid.ghcid_uuid_sha256 skos:notation, schema:url UUID v8
ghcid.record_id skos:notation, schema:url Database record ID
identifiers[].identifier_scheme skos:inScheme Identifier scheme
identifiers[].identifier_value skos:notation Identifier value
wikidata_enrichment.wikidata_entity_id owl:sameAs, skos:notation Wikidata linking

Labels & Names ( Mostly Mapped)

Source Field RDF Property Notes
custodian_name.claim_value skos:prefLabel@nl Primary label
wikidata_enrichment.wikidata_label_nl skos:prefLabel@nl Fallback label
wikidata_enrichment.wikidata_label_en skos:altLabel@en English alt label

Unmapped:

  • wikidata_enrichment.wikidata_aliases - multilingual aliases
  • wikidata_enrichment.wikidata_description_* - descriptions

Custodian Type ( Mapped)

Source Field RDF Property Notes
original_entry.type[] hc:custodian_type Type code → enum

Location & Place ( Partially Mapped)

Source Field RDF Property Notes
google_maps_enrichment.coordinates.latitude schema:latitude Coordinates
google_maps_enrichment.coordinates.longitude schema:longitude Coordinates
google_maps_enrichment.formatted_address schema:address Full address
ghcid.location_resolution.geonames_id schema:containedInPlace GeoNames URI

Unmapped:

  • google_maps_enrichment.address_components[] - structured address parts
  • google_maps_enrichment.utc_offset_minutes - timezone

Timestamps ( Mapped)

Source Field RDF Property Notes
processing_timestamp schema:dateCreated Record creation
provenance.generated_at schema:dateModified Last modification

Digital Platform ( Mapped)

Source Field RDF Property Notes
wikidata_enrichment.wikidata_official_website foaf:homepage, schema:url Primary website
google_maps_enrichment.website foaf:homepage Fallback website
wikidata_claims.P8768_online_catalog_url.value hc:collection_url Catalog URL(s)

Social Media Profiles ( Mapped)

Source Field RDF Property Notes
web_claims.claims[].claim_type=social_* hc:platform_type Platform type
web_claims.claims[].claim_value foaf:accountServiceHomepage Profile URL
Extracted from URL foaf:accountName Username
web_claims.claims[].source_url prov:wasDerivedFrom Source provenance
web_claims.claims[].retrieved_on prov:generatedAtTime Timestamp
wikidata_claims.P2002_x__twitter__username.value foaf:accountName Twitter from Wikidata

External Identifiers ( Partially Mapped)

Source Field RDF Property Notes
wikidata_enrichment.wikidata_identifiers.viaf skos:notation VIAF ID
wikidata_enrichment.wikidata_identifiers.gnd skos:notation GND ID
wikidata_enrichment.wikidata_identifiers.isni skos:notation ISNI
wikidata_enrichment.wikidata_identifiers.lcnaf skos:notation Library of Congress
wikidata_enrichment.wikidata_identifiers.ringgold skos:notation Ringgold ID

Unmapped Fields

Google Maps Enrichment ( Not Mapped)

These fields contain valuable data but are not yet mapped to RDF:

Field Type Potential Use
opening_hours.weekday_text[] Array Operating hours display
opening_hours.periods[] Array Structured hours
rating Float User rating (1-5)
total_ratings Integer Number of reviews
reviews[] Array User reviews with text, rating, author
photo_urls[] Array Photo URLs
photos_metadata[] Array Photo details, attributions
phone_international String Phone number
phone_local String Local phone format
editorial_summary String Google's description
business_status String OPERATIONAL, CLOSED, etc.
google_maps_url String Link to Google Maps
street_view_url String Street View URL
google_place_types[] Array Google's type classification
place_id String Google Places ID

Rationale for not mapping:

  • Opening hours: Requires schema:OpeningHoursSpecification modeling
  • Reviews: Privacy considerations, volatile data
  • Photos: External dependencies, storage concerns
  • Phone: Could be added with schema:telephone

Wikidata Claims ( Mostly Not Mapped)

Many Wikidata properties are retrieved but not converted to RDF:

Wikidata Property Label Notes
P131 Located in admin entity Administrative hierarchy
P276 Location Building/structure
P17 Country Country entity
P571 Inception Founding date
P576 Dissolved Closure date
P84 Architect Building architect
P669 Located on street Street name
P1619 Date of opening Opening date
P166 Award received Awards
P2652 Partnership with Partnerships
P1343 Described by source Sources
P2851 Payment types accepted Payment methods
P3273 Actorenregister ID Dutch actors register
P646 Freebase ID Legacy identifier
P402 OSM relation ID OpenStreetMap

Rationale:

  • Many require complex modeling (dates with qualifiers)
  • Some are volatile (awards, partnerships change)
  • Some are domain-specific extensions

Provenance Metadata ( Not Mapped)

Field Notes
provenance.sources.* Detailed source tracking
provenance.data_tier_summary Data quality tiers
provenance.notes Human notes
wikidata_enrichment.api_metadata.* API call details
web_enrichment.web_archives[] WARC archive info

Rationale:

  • Could use PROV-O ontology for detailed provenance
  • Currently simplified to timestamps only

Museum Register Enrichment ( Not Mapped)

Field Notes
museum_register_enrichment.registered_since Registration date
museum_register_enrichment.province Province
museum_register_enrichment.source_provenance Source details

Future Enhancements

Priority 1: High Value, Easy to Add

  • google_maps_enrichment.phone_internationalschema:telephone
  • google_maps_enrichment.editorial_summaryschema:description
  • wikidata_claims.P571_inception.valueschema:foundingDate

Priority 2: Moderate Complexity

  • Opening hours → schema:OpeningHoursSpecification
  • google_maps_enrichment.ratingschema:aggregateRating
  • Wikidata relationships (P131, P276) → location hierarchy

Priority 3: Complex Modeling Required

  • Full provenance chain → PROV-O
  • Organizational history → change events
  • Collection metadata → separate entities

Script Location

scripts/nde_to_hc_rdf.py

Output Location

data/nde/rdf/{ghcid_numeric}.ttl


Generated: 2025-12-02 Total entries converted: 1,619 Total triples: 114,705