- Updated `entity_review.py` to map email semantic fields from JSON.
- Expanded `email_semantics.py` with additional museum mappings.
- Introduced a new rule in `.opencode/rules/no-duplicate-ontology-mappings.md` to prevent duplicate ontology mappings.
- Added a backup JSON file for entity resolution candidates.
- Created `enrich_email_semantics.py` to enrich candidates with email semantic signals.
- Developed `merge_entity_reviews.py` to merge reviewed decisions from a backup into new candidates.
- Introduced SoundArchiveRecordSetType, SpecialCollectionRecordSetType, SpecializedArchiveRecordSetType, SpecializedArchivesCzechiaRecordSetType, StateArchivesRecordSetType, StateArchivesSectionRecordSetType, StateDistrictArchiveRecordSetType, StateRegionalArchiveCzechiaRecordSetType, TelevisionArchiveRecordSetType, TradeUnionArchiveRecordSetType, UniversityArchiveRecordSetType, VereinsarchivRecordSetType, VerlagsarchivRecordSetType, VerwaltungsarchivRecordSetType, WebArchiveRecordSetType, and WomensArchivesRecordSetType.
- Each new type includes appropriate metadata, slots, and relationships to existing classes.
- Implemented a script to detect and fix Type class violations in LinkML files.
- Set up GitHub integration to be disabled.
- Configure Git settings including path and autofetch options.
- Add Gitea instance URL and repository details.
- Enable YAML support for LinkML schemas with validation.
- Define file associations for YAML files.
- Recommend essential extensions for development and exclude unwanted ones.
Add companion_query support to fetch full entity records alongside
aggregate count queries. Enables displaying results on map/list when
asking 'how many museums in Amsterdam?'
Backend changes:
- Add companion_query, companion_query_region, companion_query_country
fields to TemplateDefinition and TemplateMatchResult
- Add render_template_string() for raw companion query rendering
Template changes:
- Add companion queries to count_institutions_by_type_and_location
for settlement, region, and country level queries
- Returns institution URI, name, coordinates, city for visualization
Update heritage professional profiles with:
- Separate role entries for different positions at same institution
- Employment date ranges (start_date, end_date)
- Updated observed_on timestamps
- Direct LinkedIn profile URLs as source
Profiles updated:
- Antoinet Nijssen (Noord-Hollands Archief)
- Anna Lakmaker
- Annelies Reus
- Marianne Hamersma
- Marcel Auwers
- Hans Felius
- Nico Vriend
- Add GHCID references to custodian affiliations
- Add start dates for employment periods
- Expand heritage type classifications (A→[A,F])
- Add detailed rationales based on career history
- Add full_initials from archival publications
Data Quality Corrections:
- TIRANA-ADISUNA: Fix erroneous death_year claim (was education end date 2016,
not death). Set is_living=true. Reassess heritage_relevance=false (tourism
ministry is not a GLAM institution)
- ALEX-ALSEMGEEST: Rename from NL-ZH-TH (The Hague) to NL-ZH-ROT (Rotterdam)
based on verified birth location. Update birth year to 1980
Profile Enrichments (5 profiles with XX-XX-XXX placeholders):
- Add web claims with proper provenance timestamps
- Add LinkedIn-verified education and position claims
- Document correction rationale in modification_reason
Heritage Relevance Reassessments:
- Government ministries (Tourism, etc.) marked as non-heritage
- Only GLAM institutions (Galleries, Libraries, Archives, Museums) qualify
- auth.setup.ts: require env vars for test credentials (no hardcoded defaults)
- manifest.json: update schema manifest
- full_evaluation_results.json: add RAG evaluation results
- petra-links.json: update birth date from web claim
- Rename 512 person files from XX-XX-XXX placeholders to proper GeoNames locations
- Update 2,463 profiles with enriched data
- Add 512 new person profiles (AU, international heritage professionals)
- PPID format: ID_{birth-loc}_{decade}_{work-loc}_{custodian}_{NAME}
- Add display_name and name_romanized fields to all 7948 person profiles
- Resolve UNKNOWN-UNKNOWN collision group (Hebrew/Arabic names now properly romanize)
- Hebrew names like אבישי דנינו now generate PPID AVISHI-DANINO instead of UNKNOWN-UNKNOWN
- Collision count reduced from 82 to 81 groups
Regenerated using generate_ppids.py with unidecode support (commit abe30cb)
Merge data from PENDING files (with XX-XXX placeholders) into their
corresponding enriched custodian records with proper GHCIDs.
Countries affected:
- DE: 4 institutions (Deutsche Stiftung, Jewish Museum Berlin, etc.)
- ES: 1 institution (Biblioteca Nacional de España)
- FR: 1 institution (NMO)
- ID: 18 Indonesian museums and archives
- NL: 111 Dutch institutions across all provinces
- US: 1 institution (ARCA)
The PENDING files are deleted after merge; originals archived in
data/custodian/archive/pending_merged_20250109/
Identified 125 institutions from LinkedIn staff extraction that are NOT Dutch:
- FR: 45 (French museums, archives, libraries)
- ID: 14 (Indonesian institutions)
- GB: 14 (British institutions)
- DE: 13 (German museums, foundations)
- BE: 11 (Belgian museums)
- IT: 6 (Italian institutions)
- AU: 6 (Australian archives, museums)
- Plus smaller counts from IN, US, ES, CH, DK, AT, SA, NO, IL
These files have staff data from LinkedIn company pages but need
GHCID resolution (currently XX-XXX placeholders for region/city).
Dutch PENDING files remain: 1,283
Merged LinkedIn-extracted staff sections from PENDING files into their
corresponding proper GHCID custodian files. This consolidates data from
two extraction sources:
- Existing enriched files: Google Maps, Museum Register, YouTube, etc.
- PENDING files: LinkedIn staff data extraction
Files modified:
- 28 custodian files enriched with staff data
- 35 PENDING files deleted (merged into proper locations)
- Originals archived to archive/pending_duplicates_20250109/
Key institutions enriched:
- Rijksmuseum (NL-NH-AMS-M-RM)
- Stedelijk Museum Amsterdam (NL-NH-AMS-M-SMA)
- Amsterdam Museum (NL-NH-AMS-M-AM)
- Regionaal Archief Alkmaar (NL-NH-ALK-A-RAA)
- Maritiem Museum Rotterdam (NL-ZH-ROT-M-MMR)
- And 23 more museums/archives across NL
New scripts:
- scripts/merge_staff_data.py: Automated staff data merger
- scripts/categorize_pending_files.py: PENDING file analysis utility
Add databases: ["oxigraph"] to 5 more templates that don't benefit from vector search:
- count_institutions_by_type_location
- compare_locations
- find_by_founding
- find_custodians_by_budget_threshold
- find_institutions_by_founding_date
Total templates with Oxigraph-only routing: 10
- Rename NL-FR-HOO-M-MT.yaml → NL-FR-TER-M-MT.yaml
- HOO (Hooghalen) → TER (Terschelling) - correct island location
- Institution is on Terschelling island, not in Drenthe