glam/data
kempersc 5eaab2bd30 data(person): enrich heritage professional profiles with web claims
Batch enrichment of 3,728 person profiles with additional data:
- Birth decade inference from education/career history
- Location resolution for inferred birth settlements
- Web claims with full provenance (source_url, retrieved_on)
- Organizational subdivision extraction
- Heritage relevance scoring

Also includes:
- 14 profile renames for PPID format corrections
- Updated _manifest.json with extraction statistics
- New _extraction_log.txt and _extraction_summary.json

Enrichment follows AGENTS.md rules:
- Rule 44: EDTF unknown date notation (XXXX, 196X, etc.)
- Rule 45: Inferred data with explicit provenance
- Rule 30: Confidence scoring (0.50-0.95)
- Rule 31: Organizational subdivision extraction

35,052 files changed, +4,507,411 insertions, -63,118 deletions
2026-01-10 10:35:20 +01:00
..
custodian data(custodian): remove 380 PENDING files after collision merge 2026-01-09 21:06:22 +01:00
custodian.backup.20251230
custodian_sample
entity_annotation
examples
extracted
google_maps_enrichment
instances
intangible_heritage
isil
json
jsonld
manual_enrichment
museum_register_nl
nde
ontology
person data(person): enrich heritage professional profiles with web claims 2026-01-10 10:35:20 +01:00
rag_eval chore: minor updates and evaluation results 2026-01-09 21:10:55 +01:00
raw
rdf
reference
reports
review
test
training
unified
validation
web/lap_gaza_report_2024
wikidata
wikpedia/Destruction_of_cultural_heritage_during_the_Israeli_invasion_of_the_Gaza_Strip
collision_edge_case_analysis.md
deduplication_improvement_summary.md
dutch_collision_report.txt
dutch_collision_stats.json
dutch_deduplication_report.txt
dutch_institutions_with_ghcids.yaml
extraction_checkpoint.json
failed_crawl_urls.txt
failed_crawl_urls_round1_backup.txt
failed_crawl_urls_round3_backup.txt
failed_crawl_urls_round4.txt
ISIL-codes_2025-08-01.csv
linkedin_locations.json enrich profiles 2026-01-09 20:35:19 +01:00
mexican_geography_analysis.yaml
missing_annotations_checkpoint.json
NDE-logo-RGB-basis-nl-blauw.png
reenrich_queue.json
sparql_templates.yaml feat(rag): add database routing to 8 more factual query templates 2026-01-09 12:33:41 +01:00
temp_conv1_artifact2.md
temp_conv2_artifact1.md
temp_mexican_conv1.json
temp_mexican_conv2.json
unenriched_urls_round2.txt
xxx_matches.json