GLAM heritage institution data extraction and management
Find a file
kempersc 5e9f54bd91 Deduplicate Brazilian institutions (212→121)
- Merged 91 duplicate Brazilian institution records
- Improved Wikidata coverage from 26.4% to 38.8% (+12.4pp)
- Created intelligent merge strategy:
  - Prefer records with higher confidence scores
  - Merge locations (prefer most complete)
  - Combine all unique identifiers
  - Combine all unique digital platforms
  - Combine all unique collections
- Add provenance notes documenting merges
- Create backup before deduplication
- Generate comprehensive deduplication report

Dataset changes:
- Total institutions: 13,502 → 13,411
- Brazilian institutions: 212 → 121
- Coverage: 47/121 institutions with Q-numbers (38.8%)
2025-11-11 22:08:34 +01:00
data/instances/brazil Deduplicate Brazilian institutions (212→121) 2025-11-11 22:08:34 +01:00
deduplicate_brazilian_institutions.py Deduplicate Brazilian institutions (212→121) 2025-11-11 22:08:34 +01:00