- Introduced total expense, total frames analyzed, total investment, total liability, total net asset, and traditional product slots to enhance financial reporting capabilities.
- Added transition types detected, treatment description, type hypothesis, typical condition, typical HTTP methods, typical response formats, and typical scope slots for improved heritage documentation.
- Implemented user community, verified, web observation, WhatsApp business likelihood, wikidata equivalent, and wikidata mapping slots to enrich institutional data representation.
- Established has_or_had_asset, has_or_had_budget, has_or_had_expense, and is_or_was_threatened_by slots to capture asset, budget, expense relationships, and threats to heritage forms.
- Removed deprecated slots: storage_security_level, version_number, video_comment, visiting_hour, was_asserted_by, was_revision_of, writing_system.
- Archived corresponding YAML files for deprecated slots with detailed migration notes.
- Updated slot definitions for has_collection and encompassing_body to reflect new naming conventions and temporal patterns.
- Enhanced metadata extraction in index_persons_qdrant.py to include WCMS registration and data sources.
- Modified hybrid_retriever and multi_embedding_retriever to support filtering by WCMS registration status.
- Migrated `archived_at` to `is_or_was_archived_at` in AuxiliaryDigitalPlatform, WebObservation, and other relevant classes to better reflect historical archival status.
- Removed `bold_id` slot and replaced it with `has_or_had_identifier` linked to the new `BOLDIdentifier` class in BiologicalObject.
- Introduced `Bookplate` and `Approver` classes to enhance provenance tracking and ownership documentation.
- Updated `InformationCarrier` to replace `bookplate` with `includes_or_included` for better representation of ownership marks.
- Added new slots `is_or_was_approved_by` and `is_or_was_archived_at` to capture historical approval and archival locations.
- Archived old slot definitions for `archived_at` and `bold_id` to maintain schema integrity.
- Enhanced LinkedIn profile extraction functionality by integrating Linkup API alongside Exa API.
- Introduced `is_or_was_created_through` slot to indicate content creation methods, replacing previous boolean flags.
- Added `is_or_was_required` slot for generic temporal boolean requirements, aligning with Schema.org.
- Created `AutoGeneration` class to represent automatic content generation, capturing methods and provenance.
- Established `AvailabilityStatus` class to model resource availability with temporal validity.
- Developed `Documentation` class for structured documentation resources, replacing domain-specific slots.
- Implemented `Taxon` class for biological classification in natural history collections.
- Archived previous slots related to API availability and documentation, ensuring a clean schema.
- Enhanced existing slots with detailed descriptions and examples for clarity and usability.
- Removed deprecated slots: appraisal_notes, branch_id, is_or_was_real.
- Introduced new slots: has_or_had_notes, has_or_had_provenance.
- Created Notes class to encapsulate note-related metadata.
- Archived removed slots and classes in accordance with the new archive folder convention.
- Updated slot_fixes.yaml to reflect migration status and details.
- Enhanced documentation for new slots and classes, ensuring compliance with ontology alignment.
- Added new slots for note content, date, and type to support the Notes class.
- Skip YYYYMMDD and YYMMDD date patterns at end of email
- Skip digit sequences longer than 4 characters
- Require non-digit before 4-digit years at end
- Add knid.nl/kabelnoord.nl to consumer domains (Friesland ISP)
- Add 11 missing regional archive domains to HERITAGE_DOMAIN_MAP
- Update recalculation script to re-extract email semantics
Results:
- 3,151 false birth years removed
- 'Likely wrong person' reduced from 533 to 325 (-39%)
- 2,944 candidates' scores boosted
- Updated `entity_review.py` to map email semantic fields from JSON.
- Expanded `email_semantics.py` with additional museum mappings.
- Introduced a new rule in `.opencode/rules/no-duplicate-ontology-mappings.md` to prevent duplicate ontology mappings.
- Added a backup JSON file for entity resolution candidates.
- Created `enrich_email_semantics.py` to enrich candidates with email semantic signals.
- Developed `merge_entity_reviews.py` to merge reviewed decisions from a backup into new candidates.
- Use crm:E39_Actor instead of glam:HeritageCustodian
- Use hc:institutionType with single-letter codes (M, L, A, etc.)
- Use Wikidata URIs for countries (Q55=NL, Q31=BE, etc.)
- Use skos:prefLabel for institution names
- Update ONTOLOGY_CONTEXT with correct examples
- Fixed _vector_search() to check uses_named_vectors() before adding 'using' parameter
- Fixed _person_vector_search() to detect person collection vector size and use appropriate model
- Resolves 'Not existing vector name error: openai_1536' for single-vector collections
- Resolves embedding dimension mismatch between heritage_custodians (1536-dim) and heritage_persons (384-dim)
- Implemented a new script `test_pico_arabic_waqf.py` to test the GLM annotator's ability to extract person observations from Arabic historical documents.
- The script includes environment variable handling for API token, structured prompts for the GLM API, and validation of extraction results.
- Added comprehensive logging for API responses, extraction results, and validation errors.
- Included a sample Arabic waqf text for testing purposes, following the PiCo ontology pattern.