- Add digital platform discovery data with provenance
- Cleanup duplicate/incorrect custodian entries
- Add GHCID collision resolution suffixes where needed
- Update person entity profiles with career history
- Implemented a new script `test_pico_arabic_waqf.py` to test the GLM annotator's ability to extract person observations from Arabic historical documents.
- The script includes environment variable handling for API token, structured prompts for the GLM API, and validation of extraction results.
- Added comprehensive logging for API responses, extraction results, and validation errors.
- Included a sample Arabic waqf text for testing purposes, following the PiCo ontology pattern.
- Improved city name cleaning:
- Roman numeral district suffixes (Kolín V. -> Kolín)
- City + country suffixes (Genève 4 - Suisse -> Genève)
- Czech postal notation (p. Luka nad Jihlavou -> Luka nad Jihlavou)
- Historical city names (Gottwaldov -> Zlín, renamed 1990)
- Manual mappings for Swiss districts (Lugano Massagno -> Lugano)