Commit graph

334 commits

Author SHA1 Message Date
kempersc
e64b5653bf chore: Update manifest.json timestamp and migrate beneficiary_group slot to has_or_had_beneficiary 2026-01-14 17:04:58 +01:00
kempersc
bdf3ceafb8 feat: Migrate and standardize measurement units; introduce Area and MeasureUnit classes 2026-01-14 17:04:33 +01:00
kempersc
913a1a41a7 chore: Update generated timestamp in manifest.json and add new slot revisions in slot_fixes.yaml 2026-01-14 16:59:47 +01:00
kempersc
ad281c507a chore: Remove arrangement_notes slot; migrated to has_arrangement_note 2026-01-14 16:58:50 +01:00
kempersc
8902f0e082 feat: Migrate and enhance currency and area slots; introduce Currency class 2026-01-14 16:58:25 +01:00
kempersc
6da794ee38 feat: Introduce new slots and classes for enhanced heritage data modeling
- Added `has_or_had_place_of_birth` slot to capture structured birth place information with historical context.
- Introduced `has_or_had_quantity` slot for capturing quantified values with units and provenance.
- Created `has_or_had_service_area` slot to define geographic service areas for heritage custodians.
- Implemented `is_or_was_approximate` slot to indicate uncertainty in values (dates, quantities).
- Added `is_or_was_asserted_by` slot to track the agent responsible for assertions.
- Introduced `Asserter` class to model agents making assertions, including types like human, automated, and AI.
- Created `Quantity` class to represent quantified values with optional units and types.
- Added enums for `AsserterTypeEnum` and `QuantityTypeEnum` to standardize types of asserters and quantities.
- Archived outdated slots and replaced them with new structured alternatives following RiC-O conventions.
2026-01-14 16:54:10 +01:00
kempersc
4338d0a081 feat: Add structured representation for BirthDate and BirthPlace classes
- Introduced BirthDate class with support for EDTF notation, provenance tracking, and confidence scoring.
- Added BirthPlace class to preserve historical names, link modern equivalents, and integrate geographic identifiers.
- Created Approximation Level slot to express uncertainty levels for various values.
- Migrated existing slots to structured classes for better data modeling, including has_or_had_date_of_birth and has_or_had_place_of_birth.
- Enhanced service area representation with has_or_had_service_area slot, linking to ServiceArea class.
- Updated is_or_was_approximate slot to model uncertainty levels using ApproximationStatus class.
- Archived previous versions of slots for historical reference.
2026-01-14 16:04:09 +01:00
kempersc
5ddb7e818a Refactor schema: Migrate slots to new patterns and create new classes
- Migrated `audio_event_segments` to `has_or_had_segment` with range `AudioEventSegment` in VideoAudioAnnotation.yaml.
- Removed deprecated slots: `approved_by`, `audio_event_segments`, `bay_number`, `box_number`, and `budget_status`.
- Created new classes: `AudioEventSegment`, `BayNumber`, `BoxNumber`, and `BudgetStatus` to encapsulate previously slot-based data.
- Introduced `has_or_had_auxiliary_entities` slot to replace `auxiliary_places` and `auxiliary_platforms`.
- Archived removed slots to maintain historical context.
- Updated LinkMLViewerPage to utilize new schema element popup for better navigation.
2026-01-14 15:20:53 +01:00
kempersc
7691a11e79 chore: Update generated timestamp in manifest.json and archive budget_status slot 2026-01-14 15:14:23 +01:00
kempersc
7c7d8c0270 feat: Add SchemaElementPopup component for displaying LinkML schema element previews
- Implemented a draggable, resizable, and minimizable popup component for displaying previews of LinkML schema elements (classes, slots, enums).
- Integrated loading states and error handling for fetching element information.
- Added navigation functionality to go to full element view.
- Enhanced user experience with type badges and detailed descriptions for each element type.

chore: Migrate AudioEventSegment, BayNumber, BoxNumber, and BudgetStatus classes to new YAML schema format

- Created new YAML definitions for AudioEventSegment, BayNumber, BoxNumber, and BudgetStatus classes with detailed descriptions and attributes.
- Migrated from deprecated slots to new class structures as part of Rule 53.
- Updated imports and prefixes for consistency across schemas.

chore: Archive deprecated slots for audio_event_segments, bay_number, and box_number

- Archived previous slot definitions for audio_event_segments, bay_number, and box_number to maintain historical records.
- Updated slot descriptions and ensured proper URI mappings for future reference.
2026-01-14 15:13:06 +01:00
kempersc
b927bc4b43 Update manifest.json and migrate approved_by slot to is_or_was_approved_by; add includes_or_included slot to InformationCarrier; remove bookplate slot and archive it 2026-01-14 15:05:37 +01:00
kempersc
21c207c9da Refactor schema slots and classes for improved clarity and structure
- Migrated `archived_at` to `is_or_was_archived_at` in AuxiliaryDigitalPlatform, WebObservation, and other relevant classes to better reflect historical archival status.
- Removed `bold_id` slot and replaced it with `has_or_had_identifier` linked to the new `BOLDIdentifier` class in BiologicalObject.
- Introduced `Bookplate` and `Approver` classes to enhance provenance tracking and ownership documentation.
- Updated `InformationCarrier` to replace `bookplate` with `includes_or_included` for better representation of ownership marks.
- Added new slots `is_or_was_approved_by` and `is_or_was_archived_at` to capture historical approval and archival locations.
- Archived old slot definitions for `archived_at` and `bold_id` to maintain schema integrity.
- Enhanced LinkedIn profile extraction functionality by integrating Linkup API alongside Exa API.
2026-01-14 13:28:33 +01:00
kempersc
60e66d60f9 Add new slots and classes for enhanced documentation and availability tracking
- Introduced `is_or_was_created_through` slot to indicate content creation methods, replacing previous boolean flags.
- Added `is_or_was_required` slot for generic temporal boolean requirements, aligning with Schema.org.
- Created `AutoGeneration` class to represent automatic content generation, capturing methods and provenance.
- Established `AvailabilityStatus` class to model resource availability with temporal validity.
- Developed `Documentation` class for structured documentation resources, replacing domain-specific slots.
- Implemented `Taxon` class for biological classification in natural history collections.
- Archived previous slots related to API availability and documentation, ensuring a clean schema.
- Enhanced existing slots with detailed descriptions and examples for clarity and usability.
2026-01-14 13:09:31 +01:00
kempersc
b13674400f Refactor schema slots and classes for improved organization and clarity
- Removed deprecated slots: appraisal_notes, branch_id, is_or_was_real.
- Introduced new slots: has_or_had_notes, has_or_had_provenance.
- Created Notes class to encapsulate note-related metadata.
- Archived removed slots and classes in accordance with the new archive folder convention.
- Updated slot_fixes.yaml to reflect migration status and details.
- Enhanced documentation for new slots and classes, ensuring compliance with ontology alignment.
- Added new slots for note content, date, and type to support the Notes class.
2026-01-14 12:14:07 +01:00
kempersc
b8914761b8 standardise slots 2026-01-14 09:51:14 +01:00
kempersc
e3adb4ed60 feat: Introduce Overview, RealnessStatus, and WebLink classes with comprehensive documentation and migration notes
- Added Overview class to represent structured collections of web links, including detailed descriptions, examples, and ontology alignments.
- Introduced RealnessStatus class to classify data as real or synthetic, with rich provenance and temporal semantics.
- Created WebLink class for representing hyperlinks with associated metadata, enhancing structured link representation.
- Established new slots: has_or_had_comprehensive_overview, is_or_was_real, and includes_or_included to support the new classes and improve data modeling.
- Migrated existing slots to new structures, ensuring compliance with RiC-O naming conventions and enhancing specificity.
- Updated annotations and examples across all new classes and slots for clarity and usability.
2026-01-14 09:32:14 +01:00
kempersc
c807487a51 Refactor expense slots: remove program_expense slot, migrate administrative_expenses, and archive related slots
- Deleted the program_expense slot from the schema.
- Updated slot_fixes.yaml to reflect the migration of administrative_expenses, marking it as fully migrated and archiving related bespoke slots.
- Created archived YAML files for administrative_expenses, fundraising_expense, has_or_had_administrative_expense, innovation_expense, and program_expense, documenting their structure and descriptions.
- All expense types now utilize the Expenses class with ExpenseTypeEnum classification for better organization and clarity.
2026-01-14 09:15:17 +01:00
kempersc
1133749de8 fix: update manifest.json timestamp and consolidate expense slots in FinancialStatement.yaml 2026-01-14 09:07:11 +01:00
kempersc
554c5721ea fix: update generated timestamp in manifest.json and add has_or_had_expenses slot definition 2026-01-14 09:06:36 +01:00
kempersc
b30711fcfb update slots 2026-01-14 09:05:54 +01:00
kempersc
74ca873585 fix: update generated timestamp in manifest.json
All checks were successful
Deploy Frontend / build-and-deploy (push) Successful in 3m58s
2026-01-13 21:54:50 +01:00
kempersc
6a3616beac feat(entity-resolution): expand Dutch heritage domain mappings
Some checks are pending
Deploy Frontend / build-and-deploy (push) Waiting to run
Add domain mappings for better email-based entity matching:
- Government: noord-holland.nl, amsterdam.nl, rotterdam.nl, denhaag.nl,
  hoorn.nl, hhnk.nl, rijksoverheid.nl, politie.nl, kadaster.nl, rvo.nl,
  rivm.nl, staatsbosbeheer.nl, vng.nl
- Museums: maritiemmuseum.nl, paleishetloo.nl, slotloevestein.nl
- Universities: student.vu.nl, cdh.leidenuniv.nl, jur.ru.nl, student.ru.nl,
  student.tudelft.nl, eshcc.eur.nl, wur.nl, ou.nl
- Hogescholen: hva.nl, student.hu.nl, student.fontys.nl

Also remove deprecated activity_id.yaml slot file
2026-01-13 20:53:49 +01:00
kempersc
408813280a refactor: simplify slot descriptions to be more concise
Some checks are pending
Deploy Frontend / build-and-deploy (push) Waiting to run
- accepts_or_accepted_external_work: Remove verbose examples list
- accepts_or_accepted_payment_method: Condense to single sentence
- accepts_or_accepted_visiting_scholar: Minor rewording for consistency
2026-01-13 20:52:05 +01:00
kempersc
92b490d690 edit slots 2026-01-13 20:35:11 +01:00
kempersc
21ed120ac2 fix: correct hallucinated RiC-O terms and add locn ontology
RiC-O hallucinated terms fixed:
- FindingAidType.yaml: rico:FindingAidType → rico:DocumentaryFormType
- has_acquisition_method.yaml: rico:hasOrHadActivityType → prov:wasGeneratedBy
- has_activity_type.yaml: rico:hasOrHadActivityType → dcterms:type
- has_arrangement.yaml: rico:hasOrHadArrangement → dcterms:description
- has_or_had_finding_aid.yaml: rico:isDescribedBy → rico:isOrWasDescribedBy

The following terms do NOT exist in RiC-O 1.1:
- rico:FindingAidType (use rico:DocumentaryFormType)
- rico:hasOrHadActivityType (no equivalent)
- rico:hasOrHadArrangement (no equivalent)
- rico:isDescribedBy (correct form: rico:isOrWasDescribedBy)

Added LOCN ontology support:
- Copied locn.ttl to frontend/public/ontology/
- Added LOCN to ONTOLOGY_FILES in ontology-loader.ts
- Added locn prefix to OntologyTermPopup.tsx
- LOCN (http://www.w3.org/ns/locn#) is W3C Location Core Vocabulary
  for addresses and geometry (used by locn:Address)
2026-01-13 16:42:32 +01:00
kempersc
73b3b21017 docs: add Rule 52 prohibiting duplicate ontology mappings
All checks were successful
Deploy Frontend / build-and-deploy (push) Successful in 3m53s
- Create .opencode/rules/no-duplicate-ontology-mappings.md with detection script
- Add Rule 52 to AGENTS.md (after Rule 51)
- Fix 29 duplicate mappings: same URI in multiple mapping categories
  - 26 slot files: remove duplicates keeping most precise mapping
  - 3 class files: ExhibitionSpace, Custodian, DigitalPlatform
- Mapping precedence: exact > close > narrow/broad > related

Each ontology URI must appear in only ONE mapping category per schema
element, following SKOS semantics where mapping properties are mutually
exclusive.
2026-01-13 15:57:26 +01:00
kempersc
1fb924c412 feat: add ontology mappings to LinkML schema and enhance entity resolution
Schema enhancements (443 files):
- Add class_uri with proper ontology references (schema:, prov:, skos:, rico:)
- Add close_mappings, related_mappings per Rule 50 convention
- Replace stub hc: slot_uri with standard predicates (dcterms:identifier, skos:prefLabel)
- Improve descriptions with ontology mapping rationale
- Add prefixes blocks to all schema modules

Entity Resolution improvements:
- Add entity_resolution module with email semantics parsing
- Enhance build_entity_resolution.py with email-based matching signals
- Extend Entity Review API with filtering by signal types and count
- Add candidates caching and indexing for performance
- Add ReviewLoginPage component

New rules and documentation:
- Add Rule 51: No Hallucinated Ontology References
- Add .opencode/rules/no-hallucinated-ontology-references.md
- Add .opencode/rules/slot-ontology-mapping-reference.md
- Add adms.ttl and dqv.ttl ontology files

Frontend ontology support:
- Add RiC-O_1-1.rdf and schemaorg.owl to public/ontology
2026-01-13 13:51:02 +01:00
kempersc
c5fb9ec88e feat: add route for Entity Review page with lazy loading 2026-01-13 01:49:43 +01:00
kempersc
3b35f4aea5 Refactor code structure for improved readability and maintainability 2026-01-12 18:31:31 +01:00
kempersc
846a6cdcec Add new Record Set Types for various archival collections
- Introduced SoundArchiveRecordSetType, SpecialCollectionRecordSetType, SpecializedArchiveRecordSetType, SpecializedArchivesCzechiaRecordSetType, StateArchivesRecordSetType, StateArchivesSectionRecordSetType, StateDistrictArchiveRecordSetType, StateRegionalArchiveCzechiaRecordSetType, TelevisionArchiveRecordSetType, TradeUnionArchiveRecordSetType, UniversityArchiveRecordSetType, VereinsarchivRecordSetType, VerlagsarchivRecordSetType, VerwaltungsarchivRecordSetType, WebArchiveRecordSetType, and WomensArchivesRecordSetType.
- Each new type includes appropriate metadata, slots, and relationships to existing classes.
- Implemented a script to detect and fix Type class violations in LinkML files.
2026-01-12 15:20:29 +01:00
kempersc
355d8be51d centralise slots 2026-01-12 14:33:56 +01:00
kempersc
0d5d48568d refactor(schema): centralize slot definitions per Rule 38
- Remove slot_uri, description, mappings from slot_usage sections
- Move these properties to centralized slot files in modules/slots/
- Keep only class-specific overrides in slot_usage (required, inlined, examples)
- Update 1,499 centralized slot files with enriched definitions
- Clean 188 class files

Violations fixed:
- slot_uri in slot_usage: 1,676 → 0
- description in slot_usage: 2,287 → 0 (moved to centralized)

Schema still validates: 816 classes, 2028 slots, 127 enums
2026-01-11 23:27:17 +01:00
kempersc
5d3d8530b0 chore: trigger DSPy eval workflow
All checks were successful
Deploy Frontend / build-and-deploy (push) Successful in 4m13s
2026-01-11 22:40:23 +01:00
kempersc
56c373bba8 Implement fast WCMS migration script with state file checkpointing and batch processing 2026-01-11 22:26:37 +01:00
kempersc
174a420c08 refactor(schema): centralize 1515 inline slot definitions per Rule 48
All checks were successful
Deploy Frontend / build-and-deploy (push) Successful in 3m57s
- Remove inline slot definitions from 144 class files
- Create 7 new centralized slot files in modules/slots/:
  - custodian_type_broader.yaml
  - custodian_type_narrower.yaml
  - custodian_type_related.yaml
  - definition.yaml
  - finding_aid_access_restriction.yaml
  - finding_aid_description.yaml
  - finding_aid_temporal_coverage.yaml
- Add centralize_inline_slots.py automation script
- Update manifest with new timestamp

Rule 48: Class files must NOT define inline slots - all slots
must be imported from modules/slots/ directory.

Note: Pre-existing IdentifierFormat duplicate class definition
(in Standard.yaml and IdentifierFormat.yaml) not addressed in
this commit - requires separate schema refactor.
2026-01-11 22:02:14 +01:00
kempersc
10bb5b69c5 Add Environmental Zone Type Enumeration and related slots
- Introduced EnvironmentalZoneTypeEnum.yaml to classify climate-controlled storage zones with detailed descriptions and recommended conditions for various materials.
- Created slots for environmental zone type code, description, ID, label, and HC preset URI to facilitate structured data representation.
- Implemented boolean slots for specific environmental requirements including dark storage, dust-free environment, ESD protection, and UV filtering, referencing relevant ISO standards.
- Enhanced documentation for each slot to clarify usage and preservation context.
2026-01-11 21:14:59 +01:00
kempersc
66ab2908d0 fix: remove deprecated AnnotationMotivationEnum, add European surname data
Some checks failed
Deploy Frontend / build-and-deploy (push) Failing after 3m21s
- Move deprecated AnnotationMotivationEnum to archive-deprecated/ (outside served paths)
- Add French, Italian, Polish, Spanish surname datasets for entity resolution
- Update name_commonality.py with expanded European surname detection
- Triggers GitOps workflow to test Forgejo Actions runner
2026-01-11 16:03:18 +01:00
kempersc
055fd890ff test: verify pre-commit hook regenerates manifest
Some checks are pending
Deploy Frontend / build-and-deploy (push) Waiting to run
2026-01-11 15:21:58 +01:00
kempersc
3a661d6013 fix(schema): regenerate manifest to remove stale AnnotationMotivationEnum reference
Some checks are pending
Deploy Frontend / build-and-deploy (push) Waiting to run
The old enum was properly archived to modules/enums/archive/ with .deprecated
suffix per Rule 9, but the manifest wasn't regenerated. Now correctly shows
only AnnotationMotivationType.yaml and AnnotationMotivationTypes.yaml.
2026-01-11 15:16:50 +01:00
kempersc
b6e069a1d5 chore(schema): bump version to 0.9.12 - test webhook deployment
Some checks are pending
Deploy Frontend / build-and-deploy (push) Waiting to run
2026-01-11 15:08:35 +01:00
kempersc
be8b14f6ac refactor: Convert AnnotationMotivationEnum to Type/Types class hierarchy
- Create AnnotationMotivationType abstract base class (oa:Motivation)
- Create 10 concrete motivation subclasses in AnnotationMotivationTypes.yaml:
  - 6 W3C Web Annotation standard: classifying, describing, identifying,
    tagging, linking, commenting
  - 4 heritage-specific: accessibility, discovery, preservation, research
- Update has_annotation_motivation slot to use AnnotationMotivationType range
- Update VideoAnnotation.yaml imports and remove inline enum
- Archive deprecated AnnotationMotivationEnum.yaml
- Add motivation_type_id, motivation_type_name, motivation_type_description slots

Follows Rule 0b (Type/Types naming convention) and Rule 9 (enum-to-class promotion)
2026-01-11 13:48:28 +01:00
kempersc
3c3be47e32 feat(infra): add fast push-based schema sync to production
- Replace slow Forgejo→Server git pull with direct local rsync
- Add git-push-schemas.sh wrapper script for manual pushes
- Add post-commit hook for automatic schema sync
- Fix YAML syntax errors in slot comment blocks
- Update deploy-webhook.py to use master branch
2026-01-11 01:22:47 +01:00
kempersc
f02cffe1e8 refactor(schema): migrate 5 deprecated slots to temporal naming convention
Migrate slots to follow RiC-O-style temporal naming (Rule 39):
- accepts_external_work → accepts_or_accepted_external_work
- accepts_visiting_scholars → accepts_or_accepted_visiting_scholar
- accepts_payment_methods → accepts_or_accepted_payment_method
- access → has_or_had_access_condition
- access_policy_ref → has_or_had_access_policy_reference

Updated classes to use new slot names:
- ConservationLab.yaml
- ResearchCenter.yaml
- GiftShop.yaml
- ArchiveReference.yaml
- FindingAid.yaml
- Collection.yaml

Archived deprecated slots to schemas/20251121/linkml/archive/slots/
with _archived_20260110 suffix per Rule 9 (enum-to-class principle).
2026-01-10 21:09:29 +01:00
kempersc
6c19ef8661 feat(rag): add Rule 46 epistemic provenance tracking
Track full lineage of RAG responses: WHERE data comes from, WHEN it was
retrieved, HOW it was processed (SPARQL/vector/LLM).

Backend changes:
- Add provenance.py with EpistemicProvenance, DataTier, SourceAttribution
- Integrate provenance into MultiSourceRetriever.merge_results()
- Return epistemic_provenance in DSPyQueryResponse

Frontend changes:
- Pass EpistemicProvenance through useMultiDatabaseRAG hook
- Display provenance in ConversationPage (for cache transparency)

Schema fixes:
- Fix truncated example in has_observation.yaml slot definition

References:
- Pavlyshyn's Context Graphs and Data Traces paper
- LinkML ProvenanceBlock schema pattern
2026-01-10 18:42:43 +01:00
kempersc
28c3aaf33f enrich profiles 2026-01-10 17:31:02 +01:00
kempersc
626bd3a095 refactor(schemas): apply naming conventions to 261 class files
- Apply Rule 39: RiC-O style hasOrHad*/isOrWas* for temporal slots
- Apply Rule 43: Singular noun convention (keywords → keyword)
- Update slot references to match renamed slot files
- Maintain schema integrity across all class definitions
2026-01-10 15:36:33 +01:00
kempersc
94bfc9061e refactor(schemas): consolidate slot definitions and remove 305 redundant files
- Apply Rule 39: RiC-O style temporal naming (hasOrHad*, isOrWas*)
- Apply Rule 43: Singular noun convention for slot names
- Remove duplicate slot definitions consolidated into centralized files
- Net reduction: 6,162 lines across 305 deleted files
2026-01-10 15:36:13 +01:00
kempersc
e5a08a353d enrich person profiles 2026-01-10 14:14:04 +01:00
kempersc
2808dad6cd fix(linkml): correct invalid ontology property references in slot definitions
- confidence_score: prov:confidence doesn't exist → hc:confidenceScore
- deliverables: schema:result doesn't exist → hc:deliverables
- circumstances_of_death: wikidata:P1196 is identifier, not predicate → hc:circumstancesOfDeath
- deceased: schema:deathDate wrong semantics for boolean → hc:deceased
- death_place: fix sdo prefix to schema, remove wd:P20 as exact mapping
- date_of_death: wikidata:P570 is identifier, not predicate
- martyred: correct prefix inconsistencies
- given_name/literal_name: fix sdo→schema prefix
- occupation/religion/status: standardize prefix declarations

Add comments documenting why Wikidata properties (P-numbers) cannot be
used as slot_uri (they are entity identifiers, not RDF predicates).
2026-01-10 13:29:55 +01:00
kempersc
095a3f949c refactor(linkml): apply RiC-O slot naming conventions to /schemas/ (Rule 39)
Apply same RiC-O-style slot naming refactor to /schemas/20251121/linkml/
that was previously applied to frontend/public/schemas/:

- Add 'has_' prefix for possession predicates
- Add 'is_or_was_' prefix for temporal inverse relationships
- Add 'has_or_had_' for bidirectional temporal relations
- Add new slots: is_or_was_aggregated_by, is_or_was_allocated_by, etc.
- Update count slots with proper descriptions

This ensures consistency between the source schema directory and the
frontend-served schemas.

514 files changed, +6,325 insertions, -4,255 deletions
2026-01-10 12:55:45 +01:00
kempersc
519b0b47a8 Add Playwright test results JSON file with initial test suite and failure details 2026-01-09 21:33:31 +01:00
kempersc
0393b321c9 refactor(schema): unify custodian_type slots into has_or_had_custodian_type (Rule 39, 43)
- Migrate 236+ class files from custodian_types to has_or_had_custodian_type
- Archive deprecated slots: custodian_type, custodian_types, custodian_type_broader/narrower/related
- Update main schema and manifest imports
- Fix Custodian.yaml class to use new slot
- Fix annotation format (list→scalar) in has_or_had_custodian_type.yaml

Rules applied:
- Rule 39: RiC-O naming convention (hasOrHad pattern)
- Rule 43: Slot nouns must be singular (multivalued:true for cardinality)
- Rule 38: Slot centralization with semantic URI
2026-01-09 10:55:21 +01:00
kempersc
6608a207d4 update frontend 2026-01-08 15:56:28 +01:00
kempersc
53ffed3531 Add TemplateSpecificityScores import to SpecificityAnnotation 2026-01-07 22:05:42 +01:00
kempersc
20374b9032 Archive monolithic class_metadata_slots before modular refactoring 2026-01-07 22:05:21 +01:00
kempersc
9b769f1ca2 Update manifest timestamp and minor class fixes 2026-01-07 22:04:29 +01:00
kempersc
181991940f Add new LinkML schema modules for specificity and Wikidata alignment
New classes:
- SpecificityAnnotation: Track class relevance scores per template
- TemplateSpecificityScores: 10 conversation template scores
- WikidataAlignment: Link classes to Wikidata entities
- DualClassLink: Model dual-aspect class relationships

New enums:
- WikidataMappingTypeEnum: exact/close/narrow/broad/related mappings
- DualClassPatternEnum: place-custodian, collection-custodian patterns

New slots (44 files):
- Specificity slots: score, rationale, agent, timestamp
- Template scores: archive_search, museum_search, library_search, etc.
- Wikidata slots: entity_id, label, mapping_type, rationale
- Multilingual labels: label_nl, label_de, label_fr, label_es, etc.
- Custodian type annotations: custodian_types, rationale, primary
- SKOS hierarchy: skos_broader, skos_narrower, skos_related
2026-01-07 22:03:58 +01:00
kempersc
7f792d0250 Remove outdated RDF schema files
Clean up old generated RDF/OWL files that have been superseded by
the clean schema deployed to Oxigraph (commit f8421a2903).
Only the current deployed schema should be regenerated as needed.
2026-01-07 22:03:26 +01:00
kempersc
f8421a2903 Deploy clean RDF schema to Oxigraph - 288,857 triples with 0 malformed URIs
- Archive 48 old timestamped RDF files
- Fix relative IRIs by adding hc: namespace prefix
- Fix file path references in seeAlso predicates
- Deployed to sparql.glam-ontology.org triplestore
- 11,250 distinct OWL classes
- Schema includes all base ontologies (CIDOC-CRM, RiC-O, CPOV, etc.)
2026-01-07 16:53:01 +01:00
kempersc
dfa667c90f Fix LinkML schema for valid RDF generation with proper slot_uri
Summary:
- Create 46 missing slot definition files with proper slot_uri values
- Add slot imports to main schema (01_custodian_name_modular.yaml)
- Fix YAML examples sections in 116+ class and slot files
- Fix PersonObservation.yaml examples section (nested objects → string literals)

Technical changes:
- All slots now have explicit slot_uri mapping to base ontologies (RiC-O, Schema.org, SKOS)
- Eliminates malformed URIs like 'custodian/:slot_name' in generated RDF
- gen-owl now produces valid Turtle with 153,166 triples

New slot files (46):
- RiC-O slots: rico_note, rico_organizational_principle, rico_has_or_had_holder, etc.
- Scope slots: scope_includes, scope_excludes, archive_scope
- Organization slots: organization_type, governance_authority, area_served
- Platform slots: platform_type_category, portal_type_category
- Social media slots: social_media_platform_category, post_type_*
- Type hierarchy slots: broader_type, narrower_types, custodian_type_broader
- Wikidata slots: wikidata_equivalent, wikidata_mapping

Generated output:
- schemas/20251121/rdf/01_custodian_name_modular_20260107_134534_clean.owl.ttl (6.9MB)
- Validated with rdflib: 153,166 triples, no malformed URIs
2026-01-07 13:48:03 +01:00
kempersc
98c42bf272 Fix LinkML URI conflicts and generate RDF outputs
- Fix scope_note → finding_aid_scope_note in FindingAid.yaml
- Remove duplicate wikidata_entity slot from CustodianType.yaml (import instead)
- Remove duplicate rico_record_set_type from class_metadata_slots.yaml
- Fix range types for equals_string compatibility (uriorcurie → string)
- Move class names from close_mappings to see_also in 10 RecordSetTypes files
- Generate all RDF formats: OWL, N-Triples, RDF/XML, N3, JSON-LD context
- Sync schemas to frontend/public/schemas/

Files: 1,151 changed (includes prior CustodianType migration)
2026-01-07 12:32:59 +01:00
kempersc
6c6810fa43 Replace CustodianTypeCodeEnum with CustodianType class references
- Remove deprecated CustodianTypeCodeEnum from class_metadata_slots.yaml
- Update custodian_types slot to use uriorcurie range (references CustodianType subclasses)
- Update custodian_types_primary slot similarly
- Add migration note for legacy string format ['A'] vs new URI format

Per Rule 9: Enum-to-Class Promotion - Single Source of Truth
2026-01-06 12:37:40 +01:00
kempersc
b34992b1d3 Migrate all 293 class files to ontology-aligned slots
Extends migration to all class types (museums, libraries, galleries, etc.)

New slots added to class_metadata_slots.yaml:
- RiC-O: rico_record_set_type, rico_organizational_principle,
  rico_has_or_had_holder, rico_note
- Multilingual: label_de, label_es, label_fr, label_nl, label_it, label_pt
- Scope: scope_includes, scope_excludes, custodian_only,
  organizational_level, geographic_restriction
- Notes: privacy_note, preservation_note, legal_note

Migration script now handles 30+ annotation types.
All migrated schemas pass linkml-validate.

Total: 387 class files now use proper slots instead of annotations.
2026-01-06 12:24:54 +01:00
kempersc
aa763dab25 Migrate 94 archive class annotations to ontology-aligned slots
- Add migration script: scripts/migrate_annotations_to_slots.py
- Convert custodian_types, wikidata, skos_broader, specificity_* annotations
- Replace with proper slots mapped to SKOS, PROV-O, RiC-O predicates
- Add ../slots/class_metadata_slots import to all migrated files
- Remove AcademicArchive_refactored.yaml (main file now migrated)
- Sync changes to frontend/public/schemas/

Migration converts:
  - custodian_types → hc:custodianTypes slot
  - wikidata/wikidata_label → wikidata_alignment structured slot
  - skos_broader → skos:broader slot
  - specificity_* → specificity_annotation structured slot
  - dual_class_pattern → dual_class_link structured slot
  - template_specificity → template_specificity slot

All 94 migrated schemas pass linkml-validate.
2026-01-06 11:25:37 +01:00
kempersc
bc562bd68d Add class metadata slots to replace annotations with ontology-aligned predicates
- Add class_metadata_slots.yaml with slots for:
  - GLAMORCUBESFIXPHDNT custodian type classification (hc:custodianTypes)
  - Wikidata alignment (wdt:P31, skos:mappingRelation)
  - SKOS hierarchical relationships (skos:broader, skos:narrower)
  - Dual-class pattern linking (rdfs:seeAlso)
  - Specificity scoring for RAG (prov:generatedAtTime, prov:wasAttributedTo)
  - Collection holdings (rico:isOrWasHolderOf)

- Add AcademicArchive_refactored.yaml demonstrating slot-based approach
- Add migration guide documenting annotation-to-slot mappings

Ontology sources: SKOS, PROV-O, Dublin Core, RiC-O, Wikidata
2026-01-06 11:16:49 +01:00
kempersc
11983014bb Enhance specificity scoring system integration with existing infrastructure
- Updated documentation to clarify integration points with existing components in the RAG pipeline and DSPy framework.
- Added detailed mapping of SPARQL templates to context templates for improved specificity filtering.
- Implemented wrapper patterns around existing classifiers to extend functionality without duplication.
- Introduced new tests for the SpecificityAwareClassifier and SPARQLToContextMapper to ensure proper integration and functionality.
- Enhanced the CustodianRDFConverter to include ISO country and subregion codes from GHCID for better geospatial data handling.
2026-01-05 17:37:49 +01:00
kempersc
242bc8bb35 Add new slots for heritage custodian entities
- Created deliverables_slot for expected or achieved deliverable outputs.
- Introduced event_id_slot for persistent unique event identifiers.
- Added follow_up_date_slot for scheduled follow-up action dates.
- Implemented object_ref_slot for references to heritage objects.
- Established price_slot for price information across entities.
- Added price_currency_slot for currency codes in price information.
- Created protocol_slot for API protocol specifications.
- Introduced provenance_text_slot for full provenance entry text.
- Added record_type_slot for classification of record types.
- Implemented response_formats_slot for supported API response formats.
- Established status_slot for current status of entities or activities.
- Added FactualCountDisplay component for displaying count query results.
- Introduced ReplyTypeIndicator component for visualizing reply types.
- Created approval_date_slot for formal approval dates.
- Added authentication_required_slot for API authentication status.
- Implemented capacity_items_slot for maximum storage capacity.
- Established conservation_lab_slot for conservation laboratory information.
- Added cost_usd_slot for API operation costs in USD.
2026-01-05 00:49:05 +01:00
kempersc
2dca28d8c1 enrich CH entries with mission statements 2026-01-04 13:12:32 +01:00
kempersc
4f0cafe98a enrich HC profiles 2026-01-02 02:11:04 +01:00
kempersc
349f31ae6f enrich custodian profiles 2026-01-02 02:10:18 +01:00
kempersc
45e873ec0a enrich JP BE AR profiles 2025-12-30 23:07:03 +01:00
kempersc
d64f857aa9 add sparql validator and RAG injector 2025-12-30 03:43:31 +01:00
kempersc
a1fb6344e7 enriching custodian data 2025-12-23 17:26:29 +01:00
kempersc
0c1d19e98b enrich entries 2025-12-23 13:27:35 +01:00
kempersc
7a056fa746 enrich entries 2025-12-21 22:12:34 +01:00
kempersc
aca68ea47f remove a,bihguous web-claims 2025-12-21 00:01:54 +01:00
kempersc
99430c2a70 add new entries and semantic routing 2025-12-17 10:11:56 +01:00
kempersc
e0dd847491 extend ontology 2025-12-16 20:27:39 +01:00
kempersc
0cf93587fb fix(schema): Normalize custodian_types annotation YAML quoting
YAML arrays in LinkML annotations must be quoted strings to ensure
proper parsing. This change quotes all custodian_types annotations
from the raw array format to quoted string format.

Before: custodian_types: ["A", "G"]
After:  custodian_types: '["A", "G"]'

Affected: 50+ class files in modules/classes/
Also updates: manifest.json, 01_custodian_name_modular.yaml
2025-12-16 20:19:45 +01:00
kempersc
8a4727eb34 feat(schema): Add social media post and content modeling schema
Social Media Post Classes:
- SocialMediaPost: Base class for platform-agnostic post modeling
- SocialMediaPostType: Abstract base for post type taxonomy
- SocialMediaPostTypes: Concrete post types (TextPost, ImagePost,
  CarouselPost, StoryPost, ReelPost, ArticlePost, PollPost, EventPost)

Content Classes:
- SocialMediaContent: Rich content modeling with media attachments,
  hashtags, mentions, links, and engagement metrics

Features:
- Platform-specific post type mappings (Instagram, LinkedIn, Twitter, etc.)
- Engagement analytics (likes, comments, shares, saves)
- Heritage institution content categorization
- Media attachment handling (images, videos, documents)
- Hashtag and mention extraction for heritage topic tracking
2025-12-16 20:06:08 +01:00
kempersc
767fb8ca80 feat(schema): Add LinkedIn profile and person modeling schema
Person Identity Classes:
- PersonName: Full name modeling with components (given_name, surname_prefix,
  base_surname, patronym, initials) following Dutch naming conventions
- PersonConnection: Professional network connections with heritage relevance scoring
- ConnectionNetwork: Network-level analysis and statistics

LinkedIn Profile Schema:
- LinkedInProfile: Complete professional profile structure
- WorkExperience: Employment history with heritage institution detection
- EducationCredential: Academic background and qualifications
- LanguageProficiency: Language skills with ISO 639-1 codes

Supporting Classes:
- ExtractionMetadata: Provenance tracking for extracted profile data
- HeritageRelevance: GLAMORCUBESFIXPHDNT type scoring and classification

Slots (17 person-related slots):
- Name components: given_name, base_surname, surname_prefix, patronym, initials
- Identity: age, birth_date, birth_place, death_place, gender_identity, pronouns
- Professional: occupation, religion
- References: literal_name, name_specification, has_person_name, extraction_metadata

Enums:
- HeritageTypeEnum: GLAMORCUBESFIXPHDNT type codes for heritage relevance
2025-12-16 20:04:59 +01:00
kempersc
51554947a0 feat(schema): Add video content schema with comprehensive examples
Video Schema Classes (9 files):
- VideoPost, VideoComment: Social media video modeling
- VideoTextContent: Base class for text content extraction
- VideoTranscript, VideoSubtitle: Text with timing and formatting
- VideoTimeSegment: Time code handling with ISO 8601 duration
- VideoAnnotation: Base annotation with W3C Web Annotation alignment
- VideoAnnotationTypes: Scene, Object, OCR detection annotations
- VideoChapter, VideoChapterList: Navigation and chapter structure
- VideoAudioAnnotation: Speaker diarization, music, sound events

Enumerations (12 enums):
- VideoDefinitionEnum, LiveBroadcastStatusEnum
- TranscriptFormatEnum, SubtitleFormatEnum, SubtitlePositionEnum
- AnnotationTypeEnum, AnnotationMotivationEnum
- DetectionLevelEnum, SceneTypeEnum, TransitionTypeEnum, TextTypeEnum
- ChapterSourceEnum, AudioEventTypeEnum, SoundEventTypeEnum, MusicTypeEnum

Examples (904 lines, 10 comprehensive heritage-themed examples):
- Rijksmuseum virtual tour chapters (5 chapters with heritage entity refs)
- Operation Night Watch documentary chapters (5 chapters)
- VideoAudioAnnotation: curator interview, exhibition promo, museum lecture

All examples reference real heritage entities with Wikidata IDs:
Q5598 (Rembrandt), Q41264 (Vermeer), Q219831 (The Night Watch)
2025-12-16 20:03:17 +01:00
kempersc
cb56aa7e40 enrich all custodian timespan 2025-12-15 22:31:41 +01:00
kempersc
c50c35fd3a enrich person custodian 2025-12-14 17:09:55 +01:00
kempersc
505c12601a Add test script for PiCo extraction from Arabic waqf documents
- Implemented a new script `test_pico_arabic_waqf.py` to test the GLM annotator's ability to extract person observations from Arabic historical documents.
- The script includes environment variable handling for API token, structured prompts for the GLM API, and validation of extraction results.
- Added comprehensive logging for API responses, extraction results, and validation errors.
- Included a sample Arabic waqf text for testing purposes, following the PiCo ontology pattern.
2025-12-12 17:50:17 +01:00
kempersc
b1f93b6f22 enrich person profiles 2025-12-12 12:51:10 +01:00
kempersc
03263f67d6 moved web archives 2025-12-12 00:40:26 +01:00
kempersc
1b1cfbfca0 enrich custodians 2025-12-11 22:32:09 +01:00
kempersc
41959f0766 correct HCID! 2025-12-10 13:01:13 +01:00
kempersc
3a6ead8fde feat: Add legal form filtering rule for CustodianName
- Introduced LEGAL-FORM-FILTER rule to standardize CustodianName by removing legal form designations.
- Documented rationale, examples, and implementation guidelines for the filtering process.

docs: Create README for value standardization rules

- Established a comprehensive README outlining various value standardization rules applicable to Heritage Custodian classes.
- Categorized rules into Name Standardization, Geographic Standardization, Web Observation, and Schema Evolution.

feat: Implement transliteration standards for non-Latin scripts

- Added TRANSLIT-ISO rule to ensure GHCID abbreviations are generated from emic names using ISO standards for transliteration.
- Included detailed guidelines for various scripts and languages, along with implementation examples.

feat: Define XPath provenance rules for web observations

- Created XPATH-PROVENANCE rule mandating XPath pointers for claims extracted from web sources.
- Established a workflow for archiving websites and verifying claims against archived HTML.

chore: Update records lifecycle diagram

- Generated a new Mermaid diagram illustrating the records lifecycle for heritage custodians.
- Included phases for active records, inactive archives, and processed heritage collections with key relationships and classifications.
2025-12-09 16:58:41 +01:00
kempersc
a7321b1bb9 reconstruct location blocks 2025-12-09 12:25:16 +01:00
kempersc
cab712659d recover location blocks 2025-12-09 11:34:56 +01:00
kempersc
62fdd35321 Refactor code structure for improved readability and maintainability 2025-12-09 11:15:51 +01:00
kempersc
b61271220b enrich entries 2025-12-09 10:46:43 +01:00
kempersc
131e3ca259 normalise custodian entries 2025-12-09 07:56:35 +01:00
kempersc
6a6557bbe8 feat(enrichment): add emic name enrichment and update CustodianName schema
- Add emic_name, name_language, standardized_name to CustodianName
- Add scripts for enriching custodian emic names from Wikidata
- Add YouTube and Google Maps enrichment scripts
- Update DuckLake loader for new schema fields
2025-12-08 14:58:50 +01:00
kempersc
7e3559f7e5 add new entries 2025-12-07 23:08:02 +01:00
kempersc
ee4e57bc75 add new entries 2025-12-07 00:26:01 +01:00
kempersc
1635625032 added web annotations 2025-12-06 19:50:04 +01:00
kempersc
3a242370fc annotation standards added 2025-12-05 15:30:23 +01:00
kempersc
d661947830 update enriched entries 2025-12-03 17:38:46 +01:00
kempersc
ef89b1213a validate enrichments 2025-12-02 14:36:01 +01:00
kempersc
8ebca2f845 add pid 2025-12-02 00:00:45 +01:00
kempersc
4b833d20b2 add pids 2025-12-01 23:55:55 +01:00
kempersc
7dce283c17 Add new enums for PersonalCollectionType, ResearchCenterType, and TasteScentHeritage classifications; implement validation script for custodian names against authoritative sources 2025-12-01 18:39:22 +01:00
kempersc
48a2b26f59 feat: Add script to generate Mermaid ER diagrams with instance data from LinkML schemas
- Implemented `generate_mermaid_with_instances.py` to create ER diagrams that include all classes, relationships, enum values, and instance data.
- Loaded instance data from YAML files and enriched enum definitions with meaningful annotations.
- Configured output paths for generated diagrams in both frontend and schema directories.
- Added support for excluding technical classes and limiting the number of displayed enum and instance values for readability.
2025-12-01 16:58:03 +01:00
kempersc
097d116b72 enrich entries 2025-12-01 16:06:34 +01:00
kempersc
2497e5913f enrich entries 2025-12-01 00:37:24 +01:00
kempersc
f3c149b1bb update entries 2025-11-30 23:30:29 +01:00
kempersc
d623f0af4a store archived websites 2025-11-29 20:40:46 +01:00
kempersc
572ccd5daf archive websites 2025-11-29 18:18:04 +01:00
kempersc
0ab8f24a6b archive websites 2025-11-29 18:05:16 +01:00
kempersc
da1eae6597 Refactor code structure for improved readability and maintainability 2025-11-29 12:27:39 +01:00
kempersc
30162e6526 Add script to validate KB library entries and generate enrichment report
- Implemented a Python script to validate KB library YAML files for required fields and data quality.
- Analyzed enrichment coverage from Wikidata and Google Maps, generating statistics.
- Created a comprehensive markdown report summarizing validation results and enrichment quality.
- Included error handling for file loading and validation processes.
- Generated JSON statistics for further analysis.
2025-11-28 14:48:33 +01:00
kempersc
5cdce584b2 Add complete schema for heritage custodian observation reconstruction
- Introduced a comprehensive class diagram for the heritage custodian observation reconstruction schema.
- Defined multiple classes including AllocationAgency, ArchiveOrganizationType, AuxiliaryDigitalPlatform, and others, with relevant attributes and relationships.
- Established inheritance and associations among classes to represent complex relationships within the schema.
- Generated on 2025-11-28, version 0.9.0, excluding the Container class.
2025-11-28 13:13:23 +01:00
kempersc
0d1741c55e Refactor code structure for improved readability and maintainability 2025-11-28 11:44:21 +01:00
kempersc
37886f0433 Refactor code structure for improved readability and maintainability 2025-11-27 17:43:14 +01:00
kempersc
5ef8ccac51 Add script to enrich NDE Register NL entries with Wikidata data
- Implemented a Python script that fetches and enriches entries from the NDE Register using data from Wikidata.
- Utilized the Wikibase REST API and SPARQL endpoints for data retrieval.
- Added logging for tracking progress and errors during the enrichment process.
- Configured rate limiting based on authentication status for API requests.
- Created a structured output in YAML format, including detailed enrichment data.
- Generated a log file summarizing the enrichment process and results.
2025-11-27 13:30:00 +01:00
kempersc
cd0ff5b9c7 wrap up voorbeeld lijst 2025-11-27 10:58:53 +01:00
kempersc
e99b1e644e feat: Add platform_description slot for detailed auxiliary platform information 2025-11-26 10:18:16 +01:00
kempersc
e2eb7aa5cf feat: Add auxiliary slots and enums for places and digital platforms 2025-11-26 10:09:06 +01:00
kempersc
eff2f47f6f Add auxiliary enums and slots for digital platforms and physical locations
- Created AuxiliaryDigitalPlatformTypeEnum.yaml to classify types of secondary digital platforms.
- Created AuxiliaryPlaceTypeEnum.yaml to classify types of secondary physical locations.
- Added OrganizationBranchTypeEnum.yaml for formal organizational branches at auxiliary locations.
- Introduced auxiliary_places.yaml slot to link CustodianPlace to subordinate physical locations.
- Introduced auxiliary_platforms.yaml slot to link DigitalPlatform to subordinate digital properties.
- Added located_at.yaml slot to connect OrganizationalStructure to physical locations.
2025-11-25 15:06:43 +01:00
kempersc
a5a66eb547 add classes 2025-11-25 12:48:07 +01:00
kempersc
3ff0e33bf9 Add UML diagrams and scripts for custodian schema
- Created PlantUML diagrams for custodian types, full schema, legal status, and organizational structure.
- Implemented a script to generate GraphViz DOT diagrams from OWL/RDF ontology files.
- Developed a script to generate UML diagrams from modular LinkML schema, supporting both Mermaid and PlantUML formats.
- Enhanced class definitions and relationships in UML diagrams to reflect the latest schema updates.
2025-11-23 23:05:33 +01:00
kempersc
67657c39b6 feat: Complete Country Class Implementation and Hypernyms Removal
- Created the Country class with ISO 3166-1 alpha-2 and alpha-3 codes, ensuring minimal design without additional metadata.
- Integrated the Country class into CustodianPlace and LegalForm schemas to support country-specific feature types and legal forms.
- Removed duplicate keys in FeatureTypeEnum.yaml, resulting in 294 unique feature types.
- Eliminated "Hypernyms:" text from FeatureTypeEnum descriptions, verifying that semantic relationships are now conveyed through ontology mappings.
- Created example instance file demonstrating integration of Country with CustodianPlace and LegalForm.
- Updated documentation to reflect the completion of the Country class implementation and hypernyms removal.
2025-11-23 13:09:38 +01:00
kempersc
6eb18700f0 Add SHACL validation shapes and validation script for Heritage Custodian Ontology
- Created SHACL shapes for validating temporal consistency and bidirectional relationships in custodial collections and staff observations.
- Implemented a Python script to validate RDF data against the defined SHACL shapes using the pyshacl library.
- Added command-line interface for validation with options for specifying data formats and output reports.
- Included detailed error handling and reporting for validation results.
2025-11-22 23:22:10 +01:00
kempersc
2761857b0d Add scripts for converting OWL/Turtle ontology to Mermaid and PlantUML diagrams
- Implemented `owl_to_mermaid.py` to convert OWL/Turtle files into Mermaid class diagrams.
- Implemented `owl_to_plantuml.py` to convert OWL/Turtle files into PlantUML class diagrams.
- Added two new PlantUML files for custodian multi-aspect diagrams.
2025-11-22 23:01:13 +01:00
kempersc
8907aa6213 feat: Refactor Heritage Custodian Ontology to Multi-Aspect Model
- Implemented three independent aspects for custodians: CustodianLegalStatus, CustodianName, and CustodianPlace.
- Renamed CustodianReconstruction to CustodianLegalStatus and updated all references.
- Created new components for CustodianPlace and PlaceSpecificityEnum.
- Removed direct links from CustodianObservation to Custodian, aligning with PROV-O standards.
- Generated comprehensive example instance demonstrating the new architecture.
- Updated documentation to reflect changes and provide guidance on multi-aspect modeling.
- Added React hook for managing IndexedDB operations, including storing and loading transformation results.
- Created complete YAML example for Rijksmuseum, illustrating the integration of all three aspects.
2025-11-22 15:40:17 +01:00
kempersc
94d1054f4a Refactor code structure for improved readability and maintainability; removed redundant code blocks and optimized function calls. 2025-11-22 15:35:35 +01:00
kempersc
fa5680f0dd Add initial versions of custodian hub UML diagrams in Mermaid and PlantUML formats
- Introduced custodian_hub_v3.mmd, custodian_hub_v4_final.mmd, and custodian_hub_v5_FINAL.mmd for Mermaid representation.
- Created custodian_hub_FINAL.puml and custodian_hub_v3.puml for PlantUML representation.
- Defined entities such as CustodianReconstruction, Identifier, TimeSpan, Agent, CustodianName, CustodianObservation, ReconstructionActivity, Appellation, ConfidenceMeasure, Custodian, LanguageCode, and SourceDocument.
- Established relationships and associations between entities, including temporal extents, observations, and reconstruction activities.
- Incorporated enumerations for various types, statuses, and classifications relevant to custodians and their activities.
2025-11-22 14:33:51 +01:00
kempersc
284b575e88 Add UML diagrams for Custodian Hub v2 in Mermaid and PlantUML formats
- Introduced a new Mermaid diagram for Custodian Hub v2, detailing entities such as CustodianReconstruction, Identifier, TimeSpan, Agent, CustodianName, CustodianObservation, ReconstructionActivity, Appellation, ConfidenceMeasure, Custodian, LanguageCode, and SourceDocument.
- Established relationships between entities, including temporal extents, derivations, and revisions.
- Added a comprehensive PlantUML diagram reflecting the same structure and relationships, including enumerations for various types and statuses relevant to custodians and observations.
- Enhanced documentation to clarify the hub architecture pattern and its implications for data integrity and source authority.
2025-11-21 22:30:07 +01:00
kempersc
edb1e07941 updated schemata 2025-11-21 22:12:33 +01:00
kempersc
3c80de87e0 add isil entries 2025-11-19 23:25:22 +01:00
kempersc
e5a532a8bc Add comprehensive tests for NLP institution extraction and RDF partnership integration
- Introduced `test_nlp_extractor.py` with unit tests for the InstitutionExtractor, covering various extraction patterns (ISIL, Wikidata, VIAF, city names) and ensuring proper classification of institutions (museum, library, archive).
- Added tests for extracted entities and result handling to validate the extraction process.
- Created `test_partnership_rdf_integration.py` to validate the end-to-end process of extracting partnerships from a conversation and exporting them to RDF format.
- Implemented tests for temporal properties in partnerships and ensured compliance with W3C Organization Ontology patterns.
- Verified that extracted partnerships are correctly linked with PROV-O provenance metadata.
2025-11-19 23:20:47 +01:00