glam/.opencode/rules/entity_resolution/kien-authoritative-source-rule.md
kempersc 554fe520ea Add comprehensive rules for LinkML schema management and ontology mapping
- Introduced Rule 42: No Ontology Prefixes in Slot Names to enforce clean naming conventions.
- Established Rule: No Rough Edits in Schema Files to ensure structural integrity during modifications.
- Implemented Rule: No Version Indicators in Names to maintain stable semantic naming.
- Created Rule: Ontology Detection vs Heuristics to emphasize the importance of verifying ontology definitions.
- Defined Rule 50: Ontology-to-LinkML Mapping Convention to standardize mapping practices.
- Added Rule: Polished Slot Storage Location to specify directory structure for polished slot files.
- Enforced Rule: Preserve Bespoke Slots Until Refactoring to prevent unintended migrations during slot updates.
- Instituted Rule 56: Semantic Consistency Over Simplicity to mandate execution of revisions in slot_fixes.yaml.
- Added new Genealogy Archives Registry Enrichment class with multilingual support and structured aliases.
2026-02-15 19:20:09 +01:00

8.7 KiB

Rule 40: KIEN Registry is Authoritative for Intangible Heritage Custodians

Summary

For Intangible Heritage Custodians (Type I), the KIEN registry at https://www.immaterieelerfgoed.nl/ is the TIER_1_AUTHORITATIVE source for contact data and addresses. Google Maps enrichment is TIER_3_CROWD_SOURCED and should NEVER override KIEN data.

Empirical Validation (January 2025)

A comprehensive audit of 188 Type I custodian files revealed:

Category Count Percentage
Google Maps matches OK 101 53.7%
🔧 FALSE_MATCH detected 62 33.0%
⚠️ No official website (valid) 20 10.6%
📭 No Google Maps data 5 2.7%

Key Finding: 33% of Google Maps enrichment data for Type I custodians was incorrect.

False Match Categories Identified

  1. Domain mismatches (39 files): Google Maps website ≠ KIEN official website
  2. Name mismatches (8 files): Completely different organizations (e.g., "Ria Bos" heritage practitioner → "Ria Money Transfer Agent")
  3. Wrong location (6 files): Same-ish name but different city (Amsterdam→Den Haag, Netherlands→Suriname!)
  4. Wrong organization type (5 files): Federation vs specific member, heritage org vs webshop
  5. Different entity type (3 files): Organization vs location/street name
  6. Different event (3 files): Horse racing vs festival, different village's event

Why Google Maps Fails for Type I

Google Maps is optimized for commercial businesses with physical storefronts. Type I intangible heritage custodians are fundamentally different:

  • Virtual organizations without commercial presence
  • Person-based heritage (individual practitioners preserving traditional crafts)
  • Volunteer networks meeting in private residences
  • Event-based organizations that exist only during festivals
  • Federations that coordinate member organizations without own premises

Rationale

Google Maps frequently returns false matches for intangible heritage organizations because:

  1. Virtual Organizations: Many intangible heritage custodians operate as networks/platforms without commercial storefronts
  2. Name Collisions: Common words in organization names (e.g., "Platform") match unrelated businesses
  3. No Physical Presence: Organizations focused on intangible heritage (handwriting, oral traditions, crafts) often have no Google Maps listing
  4. Volunteer-Run: Contact addresses are often private residences, not businesses

KIEN (Kenniscentrum Immaterieel Erfgoed Nederland) is the official Dutch registry for intangible cultural heritage and maintains verified contact information directly from the organizations.

Data Tier Hierarchy for Type I Custodians

Priority Source Data Tier Trust Level
1st KIEN Registry (immaterieelerfgoed.nl) TIER_1_AUTHORITATIVE Highest
2nd Organization's Official Website TIER_2_VERIFIED High
3rd Wikidata TIER_3_CROWD_SOURCED Medium
4th Google Maps TIER_3_CROWD_SOURCED Low (verify!)

Required Workflow for Type I Enrichment

Step 1: Scrape KIEN Page First

For every intangible heritage custodian, the KIEN profile page MUST be scraped to extract:

kien_enrichment:
  kien_name: "Platform Handschriftontwikkeling"
  kien_url: "https://www.immaterieelerfgoed.nl/nl/page/2476/platform-handschriftontwikkeling"
  heritage_page_url: "https://www.immaterieelerfgoed.nl/nl/handschrift"
  heritage_forms:
    - "Ambachten, handwerk en techniek"
    - "Sociale praktijken"
  address:
    street: "De Hazelaar 41"
    postal_code: "6903 BB"
    city: "Zevenaar"
    province: "Gelderland"
    country: "NL"
  registered_since: "2019-11"
  enrichment_timestamp: "2025-01-08T00:00:00Z"
  source: "https://www.immaterieelerfgoed.nl"

Step 2: Validate Google Maps Match (If Any)

If Google Maps enrichment exists, compare against KIEN data:

def validate_google_maps_match(kien_data, gmaps_data):
    """Check if Google Maps data matches KIEN authoritative source."""
    
    # Check website domain match
    kien_domain = extract_domain(kien_data.get('website'))
    gmaps_domain = extract_domain(gmaps_data.get('website'))
    
    if kien_domain and gmaps_domain and kien_domain != gmaps_domain:
        return {
            'status': 'FALSE_MATCH',
            'reason': f'Website mismatch: KIEN={kien_domain}, GMaps={gmaps_domain}'
        }
    
    # Check name similarity
    kien_name = kien_data.get('kien_name', '').lower()
    gmaps_name = gmaps_data.get('name', '').lower()
    
    if fuzz.ratio(kien_name, gmaps_name) < 70:
        return {
            'status': 'FALSE_MATCH', 
            'reason': f'Name mismatch: KIEN="{kien_name}", GMaps="{gmaps_name}"'
        }
    
    return {'status': 'VERIFIED'}

Step 3: Mark False Matches

When Google Maps returns a different organization:

google_maps_enrichment:
  status: FALSE_MATCH
  false_match_reason: >-
    Google Maps returned "Platform 9 BV" (a health/coaching business at 
    Nieuwleusen) instead of "Platform Handschriftontwikkeling" (a virtual 
    handwriting development platform). These are completely different 
    organizations. KIEN registry is authoritative for this Type I custodian.    
  original_false_match:
    place_id: ChIJNZ6o7H_fx0cR-TURAN3Bj54
    name: Platform 9 BV
    formatted_address: Burg, Burgemeester Backxlaan 321, 7711 AD Nieuwleusen
    website: http://www.platform9.nl/
  correction_timestamp: "2025-01-08T00:00:00Z"
  correction_agent: opencode-claude-sonnet-4

KIEN Contact Data Extraction

The KIEN heritage pages follow a consistent structure. Extract from the "Contact" section:

## Contact
[Organization Name](link-to-profile-page)
Street Address
Postal Code
City
Province
[Website](url)
Bijgeschreven in inventaris vanaf: [date]

Example Extraction (from immaterieelerfgoed.nl/nl/handschrift):

contact:
  organization: "Platform Handschriftontwikkeling"
  profile_url: "https://www.immaterieelerfgoed.nl/nl/page/2476/platform-handschriftontwikkeling"
  address:
    street: "De Hazelaar 41"
    postal_code: "6903 BB"
    city: "Zevenaar"
    province: "Gelderland"
  website: "http://www.handschriftontwikkeling.nl/"
  registered_since: "november 2019"

Location Resolution for Type I

When KIEN provides an address:

  1. Use KIEN address for location.formatted_address
  2. Geocode KIEN address to get coordinates (NOT Google Maps coordinates)
  3. Update location_resolution with method KIEN_ADDRESS_GEOCODE
location:
  street_address: "De Hazelaar 41"
  postal_code: "6903 BB"
  city: Zevenaar
  region_code: GE
  country: NL
  coordinate_provenance:
    source_type: KIEN_ADDRESS_GEOCODE
    source_url: "https://www.immaterieelerfgoed.nl/nl/handschrift"
    geocoding_service: nominatim
    geocoding_timestamp: "2025-01-08T00:00:00Z"

Batch Re-Enrichment Script

To fix all Type I custodians with potentially incorrect Google Maps data:

# Find all Type I custodians
python scripts/rescrape_kien_contacts.py --type I --output data/custodian/

# This script should:
# 1. Read all NL-*-I-*.yaml files
# 2. Fetch KIEN page for each (from kien_enrichment.kien_url)
# 3. Extract contact/address from KIEN
# 4. Compare with google_maps_enrichment
# 5. Mark mismatches as FALSE_MATCH
# 6. Update location with KIEN address

Anti-Patterns

WRONG - Using Google Maps as primary source for Type I:

# WRONG - Google Maps overriding KIEN data
location:
  formatted_address: "Burg, Burgemeester Backxlaan 321, 7711 AD Nieuwleusen"
  coordinate_provenance:
    source_type: GOOGLE_MAPS  # WRONG for Type I!

CORRECT - KIEN as primary source:

# CORRECT - KIEN is authoritative
location:
  street_address: "De Hazelaar 41"
  postal_code: "6903 BB"
  city: Zevenaar
  coordinate_provenance:
    source_type: KIEN_ADDRESS_GEOCODE  # Correct!

Affected Files

This rule affects approximately 100+ Type I custodian files:

  • data/custodian/NL-*-I-*.yaml

All should be reviewed to ensure:

  1. kien_enrichment contains address from KIEN page
  2. google_maps_enrichment is validated against KIEN
  3. location uses KIEN address (not Google Maps)
  4. False matches are properly documented
  • Rule 5: NEVER Delete Enriched Data - Keep false match data in original_false_match
  • Rule 6: WebObservation Claims - KIEN data should have provenance
  • Rule 22: Custodian YAML Files Are Single Source of Truth
  • Rule 35: Provenance Timestamps - Include KIEN fetch timestamps

See Also