- Introduced Rule 42: No Ontology Prefixes in Slot Names to enforce clean naming conventions. - Established Rule: No Rough Edits in Schema Files to ensure structural integrity during modifications. - Implemented Rule: No Version Indicators in Names to maintain stable semantic naming. - Created Rule: Ontology Detection vs Heuristics to emphasize the importance of verifying ontology definitions. - Defined Rule 50: Ontology-to-LinkML Mapping Convention to standardize mapping practices. - Added Rule: Polished Slot Storage Location to specify directory structure for polished slot files. - Enforced Rule: Preserve Bespoke Slots Until Refactoring to prevent unintended migrations during slot updates. - Instituted Rule 56: Semantic Consistency Over Simplicity to mandate execution of revisions in slot_fixes.yaml. - Added new Genealogy Archives Registry Enrichment class with multilingual support and structured aliases.
251 lines
8.7 KiB
Markdown
251 lines
8.7 KiB
Markdown
# Rule 40: KIEN Registry is Authoritative for Intangible Heritage Custodians
|
|
|
|
## Summary
|
|
|
|
For Intangible Heritage Custodians (Type I), the KIEN registry at `https://www.immaterieelerfgoed.nl/` is the **TIER_1_AUTHORITATIVE** source for contact data and addresses. Google Maps enrichment is **TIER_3_CROWD_SOURCED** and should NEVER override KIEN data.
|
|
|
|
## Empirical Validation (January 2025)
|
|
|
|
A comprehensive audit of 188 Type I custodian files revealed:
|
|
|
|
| Category | Count | Percentage |
|
|
|----------|-------|------------|
|
|
| ✅ Google Maps matches OK | 101 | 53.7% |
|
|
| 🔧 **FALSE_MATCH detected** | **62** | **33.0%** |
|
|
| ⚠️ No official website (valid) | 20 | 10.6% |
|
|
| 📭 No Google Maps data | 5 | 2.7% |
|
|
|
|
**Key Finding: 33% of Google Maps enrichment data for Type I custodians was incorrect.**
|
|
|
|
### False Match Categories Identified
|
|
|
|
1. **Domain mismatches** (39 files): Google Maps website ≠ KIEN official website
|
|
2. **Name mismatches** (8 files): Completely different organizations (e.g., "Ria Bos" heritage practitioner → "Ria Money Transfer Agent")
|
|
3. **Wrong location** (6 files): Same-ish name but different city (Amsterdam→Den Haag, Netherlands→Suriname!)
|
|
4. **Wrong organization type** (5 files): Federation vs specific member, heritage org vs webshop
|
|
5. **Different entity type** (3 files): Organization vs location/street name
|
|
6. **Different event** (3 files): Horse racing vs festival, different village's event
|
|
|
|
### Why Google Maps Fails for Type I
|
|
|
|
Google Maps is optimized for commercial businesses with physical storefronts. Type I intangible heritage custodians are fundamentally different:
|
|
|
|
- **Virtual organizations** without commercial presence
|
|
- **Person-based heritage** (individual practitioners preserving traditional crafts)
|
|
- **Volunteer networks** meeting in private residences
|
|
- **Event-based organizations** that exist only during festivals
|
|
- **Federations** that coordinate member organizations without own premises
|
|
|
|
## Rationale
|
|
|
|
Google Maps frequently returns **false matches** for intangible heritage organizations because:
|
|
|
|
1. **Virtual Organizations**: Many intangible heritage custodians operate as networks/platforms without commercial storefronts
|
|
2. **Name Collisions**: Common words in organization names (e.g., "Platform") match unrelated businesses
|
|
3. **No Physical Presence**: Organizations focused on intangible heritage (handwriting, oral traditions, crafts) often have no Google Maps listing
|
|
4. **Volunteer-Run**: Contact addresses are often private residences, not businesses
|
|
|
|
KIEN (Kenniscentrum Immaterieel Erfgoed Nederland) is the official Dutch registry for intangible cultural heritage and maintains verified contact information directly from the organizations.
|
|
|
|
## Data Tier Hierarchy for Type I Custodians
|
|
|
|
| Priority | Source | Data Tier | Trust Level |
|
|
|----------|--------|-----------|-------------|
|
|
| 1st | KIEN Registry (`immaterieelerfgoed.nl`) | TIER_1_AUTHORITATIVE | Highest |
|
|
| 2nd | Organization's Official Website | TIER_2_VERIFIED | High |
|
|
| 3rd | Wikidata | TIER_3_CROWD_SOURCED | Medium |
|
|
| 4th | Google Maps | TIER_3_CROWD_SOURCED | Low (verify!) |
|
|
|
|
## Required Workflow for Type I Enrichment
|
|
|
|
### Step 1: Scrape KIEN Page First
|
|
|
|
For every intangible heritage custodian, the KIEN profile page MUST be scraped to extract:
|
|
|
|
```yaml
|
|
kien_enrichment:
|
|
kien_name: "Platform Handschriftontwikkeling"
|
|
kien_url: "https://www.immaterieelerfgoed.nl/nl/page/2476/platform-handschriftontwikkeling"
|
|
heritage_page_url: "https://www.immaterieelerfgoed.nl/nl/handschrift"
|
|
heritage_forms:
|
|
- "Ambachten, handwerk en techniek"
|
|
- "Sociale praktijken"
|
|
address:
|
|
street: "De Hazelaar 41"
|
|
postal_code: "6903 BB"
|
|
city: "Zevenaar"
|
|
province: "Gelderland"
|
|
country: "NL"
|
|
registered_since: "2019-11"
|
|
enrichment_timestamp: "2025-01-08T00:00:00Z"
|
|
source: "https://www.immaterieelerfgoed.nl"
|
|
```
|
|
|
|
### Step 2: Validate Google Maps Match (If Any)
|
|
|
|
If Google Maps enrichment exists, compare against KIEN data:
|
|
|
|
```python
|
|
def validate_google_maps_match(kien_data, gmaps_data):
|
|
"""Check if Google Maps data matches KIEN authoritative source."""
|
|
|
|
# Check website domain match
|
|
kien_domain = extract_domain(kien_data.get('website'))
|
|
gmaps_domain = extract_domain(gmaps_data.get('website'))
|
|
|
|
if kien_domain and gmaps_domain and kien_domain != gmaps_domain:
|
|
return {
|
|
'status': 'FALSE_MATCH',
|
|
'reason': f'Website mismatch: KIEN={kien_domain}, GMaps={gmaps_domain}'
|
|
}
|
|
|
|
# Check name similarity
|
|
kien_name = kien_data.get('kien_name', '').lower()
|
|
gmaps_name = gmaps_data.get('name', '').lower()
|
|
|
|
if fuzz.ratio(kien_name, gmaps_name) < 70:
|
|
return {
|
|
'status': 'FALSE_MATCH',
|
|
'reason': f'Name mismatch: KIEN="{kien_name}", GMaps="{gmaps_name}"'
|
|
}
|
|
|
|
return {'status': 'VERIFIED'}
|
|
```
|
|
|
|
### Step 3: Mark False Matches
|
|
|
|
When Google Maps returns a different organization:
|
|
|
|
```yaml
|
|
google_maps_enrichment:
|
|
status: FALSE_MATCH
|
|
false_match_reason: >-
|
|
Google Maps returned "Platform 9 BV" (a health/coaching business at
|
|
Nieuwleusen) instead of "Platform Handschriftontwikkeling" (a virtual
|
|
handwriting development platform). These are completely different
|
|
organizations. KIEN registry is authoritative for this Type I custodian.
|
|
original_false_match:
|
|
place_id: ChIJNZ6o7H_fx0cR-TURAN3Bj54
|
|
name: Platform 9 BV
|
|
formatted_address: Burg, Burgemeester Backxlaan 321, 7711 AD Nieuwleusen
|
|
website: http://www.platform9.nl/
|
|
correction_timestamp: "2025-01-08T00:00:00Z"
|
|
correction_agent: opencode-claude-sonnet-4
|
|
```
|
|
|
|
## KIEN Contact Data Extraction
|
|
|
|
The KIEN heritage pages follow a consistent structure. Extract from the "Contact" section:
|
|
|
|
```
|
|
## Contact
|
|
[Organization Name](link-to-profile-page)
|
|
Street Address
|
|
Postal Code
|
|
City
|
|
Province
|
|
[Website](url)
|
|
Bijgeschreven in inventaris vanaf: [date]
|
|
```
|
|
|
|
### Example Extraction (from immaterieelerfgoed.nl/nl/handschrift):
|
|
|
|
```yaml
|
|
contact:
|
|
organization: "Platform Handschriftontwikkeling"
|
|
profile_url: "https://www.immaterieelerfgoed.nl/nl/page/2476/platform-handschriftontwikkeling"
|
|
address:
|
|
street: "De Hazelaar 41"
|
|
postal_code: "6903 BB"
|
|
city: "Zevenaar"
|
|
province: "Gelderland"
|
|
website: "http://www.handschriftontwikkeling.nl/"
|
|
registered_since: "november 2019"
|
|
```
|
|
|
|
## Location Resolution for Type I
|
|
|
|
When KIEN provides an address:
|
|
|
|
1. **Use KIEN address** for `location.formatted_address`
|
|
2. **Geocode KIEN address** to get coordinates (NOT Google Maps coordinates)
|
|
3. **Update location_resolution** with method `KIEN_ADDRESS_GEOCODE`
|
|
|
|
```yaml
|
|
location:
|
|
street_address: "De Hazelaar 41"
|
|
postal_code: "6903 BB"
|
|
city: Zevenaar
|
|
region_code: GE
|
|
country: NL
|
|
coordinate_provenance:
|
|
source_type: KIEN_ADDRESS_GEOCODE
|
|
source_url: "https://www.immaterieelerfgoed.nl/nl/handschrift"
|
|
geocoding_service: nominatim
|
|
geocoding_timestamp: "2025-01-08T00:00:00Z"
|
|
```
|
|
|
|
## Batch Re-Enrichment Script
|
|
|
|
To fix all Type I custodians with potentially incorrect Google Maps data:
|
|
|
|
```bash
|
|
# Find all Type I custodians
|
|
python scripts/rescrape_kien_contacts.py --type I --output data/custodian/
|
|
|
|
# This script should:
|
|
# 1. Read all NL-*-I-*.yaml files
|
|
# 2. Fetch KIEN page for each (from kien_enrichment.kien_url)
|
|
# 3. Extract contact/address from KIEN
|
|
# 4. Compare with google_maps_enrichment
|
|
# 5. Mark mismatches as FALSE_MATCH
|
|
# 6. Update location with KIEN address
|
|
```
|
|
|
|
## Anti-Patterns
|
|
|
|
### WRONG - Using Google Maps as primary source for Type I:
|
|
|
|
```yaml
|
|
# WRONG - Google Maps overriding KIEN data
|
|
location:
|
|
formatted_address: "Burg, Burgemeester Backxlaan 321, 7711 AD Nieuwleusen"
|
|
coordinate_provenance:
|
|
source_type: GOOGLE_MAPS # WRONG for Type I!
|
|
```
|
|
|
|
### CORRECT - KIEN as primary source:
|
|
|
|
```yaml
|
|
# CORRECT - KIEN is authoritative
|
|
location:
|
|
street_address: "De Hazelaar 41"
|
|
postal_code: "6903 BB"
|
|
city: Zevenaar
|
|
coordinate_provenance:
|
|
source_type: KIEN_ADDRESS_GEOCODE # Correct!
|
|
```
|
|
|
|
## Affected Files
|
|
|
|
This rule affects approximately 100+ Type I custodian files:
|
|
- `data/custodian/NL-*-I-*.yaml`
|
|
|
|
All should be reviewed to ensure:
|
|
1. `kien_enrichment` contains address from KIEN page
|
|
2. `google_maps_enrichment` is validated against KIEN
|
|
3. `location` uses KIEN address (not Google Maps)
|
|
4. False matches are properly documented
|
|
|
|
## Related Rules
|
|
|
|
- **Rule 5**: NEVER Delete Enriched Data - Keep false match data in `original_false_match`
|
|
- **Rule 6**: WebObservation Claims - KIEN data should have provenance
|
|
- **Rule 22**: Custodian YAML Files Are Single Source of Truth
|
|
- **Rule 35**: Provenance Timestamps - Include KIEN fetch timestamps
|
|
|
|
## See Also
|
|
|
|
- KIEN Registry: https://www.immaterieelerfgoed.nl/
|
|
- UNESCO Intangible Cultural Heritage: https://ich.unesco.org/
|
|
- Dutch Intangible Heritage Network documentation
|