215 lines
8.1 KiB
Markdown
215 lines
8.1 KiB
Markdown
# NDE to Heritage Custodian RDF Field Mapping
|
|
|
|
This document details which fields from the enriched NDE YAML entries are mapped to RDF and which remain unmapped.
|
|
|
|
## Summary
|
|
|
|
| Category | Mapped | Unmapped | Coverage |
|
|
|----------|--------|----------|----------|
|
|
| Core Identifiers | 10 | 0 | 100% |
|
|
| Labels & Names | 3 | 1 | 75% |
|
|
| Location | 4 | 2 | 67% |
|
|
| Timestamps | 2 | 0 | 100% |
|
|
| Social Media | 5 | 0 | 100% |
|
|
| External IDs | 6 | 10+ | ~40% |
|
|
| Google Maps | 3 | 15+ | ~15% |
|
|
| Wikidata Claims | 2 | 30+ | ~5% |
|
|
| Provenance | 0 | 15+ | 0% |
|
|
|
|
## Mapped Fields
|
|
|
|
### Core Identifiers (✅ Fully Mapped)
|
|
|
|
| Source Field | RDF Property | Notes |
|
|
|--------------|--------------|-------|
|
|
| `ghcid.ghcid_current` | `skos:notation` on `crm:E42_Identifier` | GHCID scheme |
|
|
| `ghcid.ghcid_numeric` | `dcterms:identifier`, `skos:notation` | Primary identifier |
|
|
| `ghcid.ghcid_uuid` | `skos:notation`, `schema:url` | UUID v5 |
|
|
| `ghcid.ghcid_uuid_sha256` | `skos:notation`, `schema:url` | UUID v8 |
|
|
| `ghcid.record_id` | `skos:notation`, `schema:url` | Database record ID |
|
|
| `identifiers[].identifier_scheme` | `skos:inScheme` | Identifier scheme |
|
|
| `identifiers[].identifier_value` | `skos:notation` | Identifier value |
|
|
| `wikidata_enrichment.wikidata_entity_id` | `owl:sameAs`, `skos:notation` | Wikidata linking |
|
|
|
|
### Labels & Names (✅ Mostly Mapped)
|
|
|
|
| Source Field | RDF Property | Notes |
|
|
|--------------|--------------|-------|
|
|
| `custodian_name.claim_value` | `skos:prefLabel@nl` | Primary label |
|
|
| `wikidata_enrichment.wikidata_label_nl` | `skos:prefLabel@nl` | Fallback label |
|
|
| `wikidata_enrichment.wikidata_label_en` | `skos:altLabel@en` | English alt label |
|
|
|
|
**Unmapped:**
|
|
- `wikidata_enrichment.wikidata_aliases` - multilingual aliases
|
|
- `wikidata_enrichment.wikidata_description_*` - descriptions
|
|
|
|
### Custodian Type (✅ Mapped)
|
|
|
|
| Source Field | RDF Property | Notes |
|
|
|--------------|--------------|-------|
|
|
| `original_entry.type[]` | `hc:custodian_type` | Type code → enum |
|
|
|
|
### Location & Place (✅ Partially Mapped)
|
|
|
|
| Source Field | RDF Property | Notes |
|
|
|--------------|--------------|-------|
|
|
| `google_maps_enrichment.coordinates.latitude` | `schema:latitude` | Coordinates |
|
|
| `google_maps_enrichment.coordinates.longitude` | `schema:longitude` | Coordinates |
|
|
| `google_maps_enrichment.formatted_address` | `schema:address` | Full address |
|
|
| `ghcid.location_resolution.geonames_id` | `schema:containedInPlace` | GeoNames URI |
|
|
|
|
**Unmapped:**
|
|
- `google_maps_enrichment.address_components[]` - structured address parts
|
|
- `google_maps_enrichment.utc_offset_minutes` - timezone
|
|
|
|
### Timestamps (✅ Mapped)
|
|
|
|
| Source Field | RDF Property | Notes |
|
|
|--------------|--------------|-------|
|
|
| `processing_timestamp` | `schema:dateCreated` | Record creation |
|
|
| `provenance.generated_at` | `schema:dateModified` | Last modification |
|
|
|
|
### Digital Platform (✅ Mapped)
|
|
|
|
| Source Field | RDF Property | Notes |
|
|
|--------------|--------------|-------|
|
|
| `wikidata_enrichment.wikidata_official_website` | `foaf:homepage`, `schema:url` | Primary website |
|
|
| `google_maps_enrichment.website` | `foaf:homepage` | Fallback website |
|
|
| `wikidata_claims.P8768_online_catalog_url.value` | `hc:collection_url` | Catalog URL(s) |
|
|
|
|
### Social Media Profiles (✅ Mapped)
|
|
|
|
| Source Field | RDF Property | Notes |
|
|
|--------------|--------------|-------|
|
|
| `web_claims.claims[].claim_type=social_*` | `hc:platform_type` | Platform type |
|
|
| `web_claims.claims[].claim_value` | `foaf:accountServiceHomepage` | Profile URL |
|
|
| Extracted from URL | `foaf:accountName` | Username |
|
|
| `web_claims.claims[].source_url` | `prov:wasDerivedFrom` | Source provenance |
|
|
| `web_claims.claims[].retrieved_on` | `prov:generatedAtTime` | Timestamp |
|
|
| `wikidata_claims.P2002_x__twitter__username.value` | `foaf:accountName` | Twitter from Wikidata |
|
|
|
|
### External Identifiers (✅ Partially Mapped)
|
|
|
|
| Source Field | RDF Property | Notes |
|
|
|--------------|--------------|-------|
|
|
| `wikidata_enrichment.wikidata_identifiers.viaf` | `skos:notation` | VIAF ID |
|
|
| `wikidata_enrichment.wikidata_identifiers.gnd` | `skos:notation` | GND ID |
|
|
| `wikidata_enrichment.wikidata_identifiers.isni` | `skos:notation` | ISNI |
|
|
| `wikidata_enrichment.wikidata_identifiers.lcnaf` | `skos:notation` | Library of Congress |
|
|
| `wikidata_enrichment.wikidata_identifiers.ringgold` | `skos:notation` | Ringgold ID |
|
|
|
|
---
|
|
|
|
## Unmapped Fields
|
|
|
|
### Google Maps Enrichment (❌ Not Mapped)
|
|
|
|
These fields contain valuable data but are not yet mapped to RDF:
|
|
|
|
| Field | Type | Potential Use |
|
|
|-------|------|---------------|
|
|
| `opening_hours.weekday_text[]` | Array<string> | Operating hours display |
|
|
| `opening_hours.periods[]` | Array<object> | Structured hours |
|
|
| `rating` | Float | User rating (1-5) |
|
|
| `total_ratings` | Integer | Number of reviews |
|
|
| `reviews[]` | Array<object> | User reviews with text, rating, author |
|
|
| `photo_urls[]` | Array<string> | Photo URLs |
|
|
| `photos_metadata[]` | Array<object> | Photo details, attributions |
|
|
| `phone_international` | String | Phone number |
|
|
| `phone_local` | String | Local phone format |
|
|
| `editorial_summary` | String | Google's description |
|
|
| `business_status` | String | OPERATIONAL, CLOSED, etc. |
|
|
| `google_maps_url` | String | Link to Google Maps |
|
|
| `street_view_url` | String | Street View URL |
|
|
| `google_place_types[]` | Array<string> | Google's type classification |
|
|
| `place_id` | String | Google Places ID |
|
|
|
|
**Rationale for not mapping:**
|
|
- Opening hours: Requires `schema:OpeningHoursSpecification` modeling
|
|
- Reviews: Privacy considerations, volatile data
|
|
- Photos: External dependencies, storage concerns
|
|
- Phone: Could be added with `schema:telephone`
|
|
|
|
### Wikidata Claims (❌ Mostly Not Mapped)
|
|
|
|
Many Wikidata properties are retrieved but not converted to RDF:
|
|
|
|
| Wikidata Property | Label | Notes |
|
|
|-------------------|-------|-------|
|
|
| P131 | Located in admin entity | Administrative hierarchy |
|
|
| P276 | Location | Building/structure |
|
|
| P17 | Country | Country entity |
|
|
| P571 | Inception | Founding date |
|
|
| P576 | Dissolved | Closure date |
|
|
| P84 | Architect | Building architect |
|
|
| P669 | Located on street | Street name |
|
|
| P1619 | Date of opening | Opening date |
|
|
| P166 | Award received | Awards |
|
|
| P2652 | Partnership with | Partnerships |
|
|
| P1343 | Described by source | Sources |
|
|
| P2851 | Payment types accepted | Payment methods |
|
|
| P3273 | Actorenregister ID | Dutch actors register |
|
|
| P646 | Freebase ID | Legacy identifier |
|
|
| P402 | OSM relation ID | OpenStreetMap |
|
|
|
|
**Rationale:**
|
|
- Many require complex modeling (dates with qualifiers)
|
|
- Some are volatile (awards, partnerships change)
|
|
- Some are domain-specific extensions
|
|
|
|
### Provenance Metadata (❌ Not Mapped)
|
|
|
|
| Field | Notes |
|
|
|-------|-------|
|
|
| `provenance.sources.*` | Detailed source tracking |
|
|
| `provenance.data_tier_summary` | Data quality tiers |
|
|
| `provenance.notes` | Human notes |
|
|
| `wikidata_enrichment.api_metadata.*` | API call details |
|
|
| `web_enrichment.web_archives[]` | WARC archive info |
|
|
|
|
**Rationale:**
|
|
- Could use PROV-O ontology for detailed provenance
|
|
- Currently simplified to timestamps only
|
|
|
|
### Museum Register Enrichment (❌ Not Mapped)
|
|
|
|
| Field | Notes |
|
|
|-------|-------|
|
|
| `museum_register_enrichment.registered_since` | Registration date |
|
|
| `museum_register_enrichment.province` | Province |
|
|
| `museum_register_enrichment.source_provenance` | Source details |
|
|
|
|
---
|
|
|
|
## Future Enhancements
|
|
|
|
### Priority 1: High Value, Easy to Add
|
|
- `google_maps_enrichment.phone_international` → `schema:telephone`
|
|
- `google_maps_enrichment.editorial_summary` → `schema:description`
|
|
- `wikidata_claims.P571_inception.value` → `schema:foundingDate`
|
|
|
|
### Priority 2: Moderate Complexity
|
|
- Opening hours → `schema:OpeningHoursSpecification`
|
|
- `google_maps_enrichment.rating` → `schema:aggregateRating`
|
|
- Wikidata relationships (P131, P276) → location hierarchy
|
|
|
|
### Priority 3: Complex Modeling Required
|
|
- Full provenance chain → PROV-O
|
|
- Organizational history → change events
|
|
- Collection metadata → separate entities
|
|
|
|
---
|
|
|
|
## Script Location
|
|
|
|
`scripts/nde_to_hc_rdf.py`
|
|
|
|
## Output Location
|
|
|
|
`data/nde/rdf/{ghcid_numeric}.ttl`
|
|
|
|
---
|
|
|
|
*Generated: 2025-12-02*
|
|
*Total entries converted: 1,619*
|
|
*Total triples: 114,705*
|