# NDE to Heritage Custodian RDF Field Mapping This document details which fields from the enriched NDE YAML entries are mapped to RDF and which remain unmapped. ## Summary | Category | Mapped | Unmapped | Coverage | |----------|--------|----------|----------| | Core Identifiers | 10 | 0 | 100% | | Labels & Names | 3 | 1 | 75% | | Location | 4 | 2 | 67% | | Timestamps | 2 | 0 | 100% | | Social Media | 5 | 0 | 100% | | External IDs | 6 | 10+ | ~40% | | Google Maps | 3 | 15+ | ~15% | | Wikidata Claims | 2 | 30+ | ~5% | | Provenance | 0 | 15+ | 0% | ## Mapped Fields ### Core Identifiers (✅ Fully Mapped) | Source Field | RDF Property | Notes | |--------------|--------------|-------| | `ghcid.ghcid_current` | `skos:notation` on `crm:E42_Identifier` | GHCID scheme | | `ghcid.ghcid_numeric` | `dcterms:identifier`, `skos:notation` | Primary identifier | | `ghcid.ghcid_uuid` | `skos:notation`, `schema:url` | UUID v5 | | `ghcid.ghcid_uuid_sha256` | `skos:notation`, `schema:url` | UUID v8 | | `ghcid.record_id` | `skos:notation`, `schema:url` | Database record ID | | `identifiers[].identifier_scheme` | `skos:inScheme` | Identifier scheme | | `identifiers[].identifier_value` | `skos:notation` | Identifier value | | `wikidata_enrichment.wikidata_entity_id` | `owl:sameAs`, `skos:notation` | Wikidata linking | ### Labels & Names (✅ Mostly Mapped) | Source Field | RDF Property | Notes | |--------------|--------------|-------| | `custodian_name.claim_value` | `skos:prefLabel@nl` | Primary label | | `wikidata_enrichment.wikidata_label_nl` | `skos:prefLabel@nl` | Fallback label | | `wikidata_enrichment.wikidata_label_en` | `skos:altLabel@en` | English alt label | **Unmapped:** - `wikidata_enrichment.wikidata_aliases` - multilingual aliases - `wikidata_enrichment.wikidata_description_*` - descriptions ### Custodian Type (✅ Mapped) | Source Field | RDF Property | Notes | |--------------|--------------|-------| | `original_entry.type[]` | `hc:custodian_type` | Type code → enum | ### Location & Place (✅ Partially Mapped) | Source Field | RDF Property | Notes | |--------------|--------------|-------| | `google_maps_enrichment.coordinates.latitude` | `schema:latitude` | Coordinates | | `google_maps_enrichment.coordinates.longitude` | `schema:longitude` | Coordinates | | `google_maps_enrichment.formatted_address` | `schema:address` | Full address | | `ghcid.location_resolution.geonames_id` | `schema:containedInPlace` | GeoNames URI | **Unmapped:** - `google_maps_enrichment.address_components[]` - structured address parts - `google_maps_enrichment.utc_offset_minutes` - timezone ### Timestamps (✅ Mapped) | Source Field | RDF Property | Notes | |--------------|--------------|-------| | `processing_timestamp` | `schema:dateCreated` | Record creation | | `provenance.generated_at` | `schema:dateModified` | Last modification | ### Digital Platform (✅ Mapped) | Source Field | RDF Property | Notes | |--------------|--------------|-------| | `wikidata_enrichment.wikidata_official_website` | `foaf:homepage`, `schema:url` | Primary website | | `google_maps_enrichment.website` | `foaf:homepage` | Fallback website | | `wikidata_claims.P8768_online_catalog_url.value` | `hc:collection_url` | Catalog URL(s) | ### Social Media Profiles (✅ Mapped) | Source Field | RDF Property | Notes | |--------------|--------------|-------| | `web_claims.claims[].claim_type=social_*` | `hc:platform_type` | Platform type | | `web_claims.claims[].claim_value` | `foaf:accountServiceHomepage` | Profile URL | | Extracted from URL | `foaf:accountName` | Username | | `web_claims.claims[].source_url` | `prov:wasDerivedFrom` | Source provenance | | `web_claims.claims[].retrieved_on` | `prov:generatedAtTime` | Timestamp | | `wikidata_claims.P2002_x__twitter__username.value` | `foaf:accountName` | Twitter from Wikidata | ### External Identifiers (✅ Partially Mapped) | Source Field | RDF Property | Notes | |--------------|--------------|-------| | `wikidata_enrichment.wikidata_identifiers.viaf` | `skos:notation` | VIAF ID | | `wikidata_enrichment.wikidata_identifiers.gnd` | `skos:notation` | GND ID | | `wikidata_enrichment.wikidata_identifiers.isni` | `skos:notation` | ISNI | | `wikidata_enrichment.wikidata_identifiers.lcnaf` | `skos:notation` | Library of Congress | | `wikidata_enrichment.wikidata_identifiers.ringgold` | `skos:notation` | Ringgold ID | --- ## Unmapped Fields ### Google Maps Enrichment (❌ Not Mapped) These fields contain valuable data but are not yet mapped to RDF: | Field | Type | Potential Use | |-------|------|---------------| | `opening_hours.weekday_text[]` | Array | Operating hours display | | `opening_hours.periods[]` | Array | Structured hours | | `rating` | Float | User rating (1-5) | | `total_ratings` | Integer | Number of reviews | | `reviews[]` | Array | User reviews with text, rating, author | | `photo_urls[]` | Array | Photo URLs | | `photos_metadata[]` | Array | Photo details, attributions | | `phone_international` | String | Phone number | | `phone_local` | String | Local phone format | | `editorial_summary` | String | Google's description | | `business_status` | String | OPERATIONAL, CLOSED, etc. | | `google_maps_url` | String | Link to Google Maps | | `street_view_url` | String | Street View URL | | `google_place_types[]` | Array | Google's type classification | | `place_id` | String | Google Places ID | **Rationale for not mapping:** - Opening hours: Requires `schema:OpeningHoursSpecification` modeling - Reviews: Privacy considerations, volatile data - Photos: External dependencies, storage concerns - Phone: Could be added with `schema:telephone` ### Wikidata Claims (❌ Mostly Not Mapped) Many Wikidata properties are retrieved but not converted to RDF: | Wikidata Property | Label | Notes | |-------------------|-------|-------| | P131 | Located in admin entity | Administrative hierarchy | | P276 | Location | Building/structure | | P17 | Country | Country entity | | P571 | Inception | Founding date | | P576 | Dissolved | Closure date | | P84 | Architect | Building architect | | P669 | Located on street | Street name | | P1619 | Date of opening | Opening date | | P166 | Award received | Awards | | P2652 | Partnership with | Partnerships | | P1343 | Described by source | Sources | | P2851 | Payment types accepted | Payment methods | | P3273 | Actorenregister ID | Dutch actors register | | P646 | Freebase ID | Legacy identifier | | P402 | OSM relation ID | OpenStreetMap | **Rationale:** - Many require complex modeling (dates with qualifiers) - Some are volatile (awards, partnerships change) - Some are domain-specific extensions ### Provenance Metadata (❌ Not Mapped) | Field | Notes | |-------|-------| | `provenance.sources.*` | Detailed source tracking | | `provenance.data_tier_summary` | Data quality tiers | | `provenance.notes` | Human notes | | `wikidata_enrichment.api_metadata.*` | API call details | | `web_enrichment.web_archives[]` | WARC archive info | **Rationale:** - Could use PROV-O ontology for detailed provenance - Currently simplified to timestamps only ### Museum Register Enrichment (❌ Not Mapped) | Field | Notes | |-------|-------| | `museum_register_enrichment.registered_since` | Registration date | | `museum_register_enrichment.province` | Province | | `museum_register_enrichment.source_provenance` | Source details | --- ## Future Enhancements ### Priority 1: High Value, Easy to Add - `google_maps_enrichment.phone_international` → `schema:telephone` - `google_maps_enrichment.editorial_summary` → `schema:description` - `wikidata_claims.P571_inception.value` → `schema:foundingDate` ### Priority 2: Moderate Complexity - Opening hours → `schema:OpeningHoursSpecification` - `google_maps_enrichment.rating` → `schema:aggregateRating` - Wikidata relationships (P131, P276) → location hierarchy ### Priority 3: Complex Modeling Required - Full provenance chain → PROV-O - Organizational history → change events - Collection metadata → separate entities --- ## Script Location `scripts/nde_to_hc_rdf.py` ## Output Location `data/nde/rdf/{ghcid_numeric}.ttl` --- *Generated: 2025-12-02* *Total entries converted: 1,619* *Total triples: 114,705*