Commit graph

59 commits

Author SHA1 Message Date
kempersc
aca68ea47f remove a,bihguous web-claims 2025-12-21 00:01:54 +01:00
kempersc
99430c2a70 add new entries and semantic routing 2025-12-17 10:11:56 +01:00
kempersc
e0dd847491 extend ontology 2025-12-16 20:27:39 +01:00
kempersc
0cf93587fb fix(schema): Normalize custodian_types annotation YAML quoting
YAML arrays in LinkML annotations must be quoted strings to ensure
proper parsing. This change quotes all custodian_types annotations
from the raw array format to quoted string format.

Before: custodian_types: ["A", "G"]
After:  custodian_types: '["A", "G"]'

Affected: 50+ class files in modules/classes/
Also updates: manifest.json, 01_custodian_name_modular.yaml
2025-12-16 20:19:45 +01:00
kempersc
8a4727eb34 feat(schema): Add social media post and content modeling schema
Social Media Post Classes:
- SocialMediaPost: Base class for platform-agnostic post modeling
- SocialMediaPostType: Abstract base for post type taxonomy
- SocialMediaPostTypes: Concrete post types (TextPost, ImagePost,
  CarouselPost, StoryPost, ReelPost, ArticlePost, PollPost, EventPost)

Content Classes:
- SocialMediaContent: Rich content modeling with media attachments,
  hashtags, mentions, links, and engagement metrics

Features:
- Platform-specific post type mappings (Instagram, LinkedIn, Twitter, etc.)
- Engagement analytics (likes, comments, shares, saves)
- Heritage institution content categorization
- Media attachment handling (images, videos, documents)
- Hashtag and mention extraction for heritage topic tracking
2025-12-16 20:06:08 +01:00
kempersc
767fb8ca80 feat(schema): Add LinkedIn profile and person modeling schema
Person Identity Classes:
- PersonName: Full name modeling with components (given_name, surname_prefix,
  base_surname, patronym, initials) following Dutch naming conventions
- PersonConnection: Professional network connections with heritage relevance scoring
- ConnectionNetwork: Network-level analysis and statistics

LinkedIn Profile Schema:
- LinkedInProfile: Complete professional profile structure
- WorkExperience: Employment history with heritage institution detection
- EducationCredential: Academic background and qualifications
- LanguageProficiency: Language skills with ISO 639-1 codes

Supporting Classes:
- ExtractionMetadata: Provenance tracking for extracted profile data
- HeritageRelevance: GLAMORCUBESFIXPHDNT type scoring and classification

Slots (17 person-related slots):
- Name components: given_name, base_surname, surname_prefix, patronym, initials
- Identity: age, birth_date, birth_place, death_place, gender_identity, pronouns
- Professional: occupation, religion
- References: literal_name, name_specification, has_person_name, extraction_metadata

Enums:
- HeritageTypeEnum: GLAMORCUBESFIXPHDNT type codes for heritage relevance
2025-12-16 20:04:59 +01:00
kempersc
51554947a0 feat(schema): Add video content schema with comprehensive examples
Video Schema Classes (9 files):
- VideoPost, VideoComment: Social media video modeling
- VideoTextContent: Base class for text content extraction
- VideoTranscript, VideoSubtitle: Text with timing and formatting
- VideoTimeSegment: Time code handling with ISO 8601 duration
- VideoAnnotation: Base annotation with W3C Web Annotation alignment
- VideoAnnotationTypes: Scene, Object, OCR detection annotations
- VideoChapter, VideoChapterList: Navigation and chapter structure
- VideoAudioAnnotation: Speaker diarization, music, sound events

Enumerations (12 enums):
- VideoDefinitionEnum, LiveBroadcastStatusEnum
- TranscriptFormatEnum, SubtitleFormatEnum, SubtitlePositionEnum
- AnnotationTypeEnum, AnnotationMotivationEnum
- DetectionLevelEnum, SceneTypeEnum, TransitionTypeEnum, TextTypeEnum
- ChapterSourceEnum, AudioEventTypeEnum, SoundEventTypeEnum, MusicTypeEnum

Examples (904 lines, 10 comprehensive heritage-themed examples):
- Rijksmuseum virtual tour chapters (5 chapters with heritage entity refs)
- Operation Night Watch documentary chapters (5 chapters)
- VideoAudioAnnotation: curator interview, exhibition promo, museum lecture

All examples reference real heritage entities with Wikidata IDs:
Q5598 (Rembrandt), Q41264 (Vermeer), Q219831 (The Night Watch)
2025-12-16 20:03:17 +01:00
kempersc
cb56aa7e40 enrich all custodian timespan 2025-12-15 22:31:41 +01:00
kempersc
c50c35fd3a enrich person custodian 2025-12-14 17:09:55 +01:00
kempersc
505c12601a Add test script for PiCo extraction from Arabic waqf documents
- Implemented a new script `test_pico_arabic_waqf.py` to test the GLM annotator's ability to extract person observations from Arabic historical documents.
- The script includes environment variable handling for API token, structured prompts for the GLM API, and validation of extraction results.
- Added comprehensive logging for API responses, extraction results, and validation errors.
- Included a sample Arabic waqf text for testing purposes, following the PiCo ontology pattern.
2025-12-12 17:50:17 +01:00
kempersc
b1f93b6f22 enrich person profiles 2025-12-12 12:51:10 +01:00
kempersc
03263f67d6 moved web archives 2025-12-12 00:40:26 +01:00
kempersc
1b1cfbfca0 enrich custodians 2025-12-11 22:32:09 +01:00
kempersc
41959f0766 correct HCID! 2025-12-10 13:01:13 +01:00
kempersc
3a6ead8fde feat: Add legal form filtering rule for CustodianName
- Introduced LEGAL-FORM-FILTER rule to standardize CustodianName by removing legal form designations.
- Documented rationale, examples, and implementation guidelines for the filtering process.

docs: Create README for value standardization rules

- Established a comprehensive README outlining various value standardization rules applicable to Heritage Custodian classes.
- Categorized rules into Name Standardization, Geographic Standardization, Web Observation, and Schema Evolution.

feat: Implement transliteration standards for non-Latin scripts

- Added TRANSLIT-ISO rule to ensure GHCID abbreviations are generated from emic names using ISO standards for transliteration.
- Included detailed guidelines for various scripts and languages, along with implementation examples.

feat: Define XPath provenance rules for web observations

- Created XPATH-PROVENANCE rule mandating XPath pointers for claims extracted from web sources.
- Established a workflow for archiving websites and verifying claims against archived HTML.

chore: Update records lifecycle diagram

- Generated a new Mermaid diagram illustrating the records lifecycle for heritage custodians.
- Included phases for active records, inactive archives, and processed heritage collections with key relationships and classifications.
2025-12-09 16:58:41 +01:00
kempersc
a7321b1bb9 reconstruct location blocks 2025-12-09 12:25:16 +01:00
kempersc
cab712659d recover location blocks 2025-12-09 11:34:56 +01:00
kempersc
62fdd35321 Refactor code structure for improved readability and maintainability 2025-12-09 11:15:51 +01:00
kempersc
b61271220b enrich entries 2025-12-09 10:46:43 +01:00
kempersc
131e3ca259 normalise custodian entries 2025-12-09 07:56:35 +01:00
kempersc
6a6557bbe8 feat(enrichment): add emic name enrichment and update CustodianName schema
- Add emic_name, name_language, standardized_name to CustodianName
- Add scripts for enriching custodian emic names from Wikidata
- Add YouTube and Google Maps enrichment scripts
- Update DuckLake loader for new schema fields
2025-12-08 14:58:50 +01:00
kempersc
7e3559f7e5 add new entries 2025-12-07 23:08:02 +01:00
kempersc
ee4e57bc75 add new entries 2025-12-07 00:26:01 +01:00
kempersc
1635625032 added web annotations 2025-12-06 19:50:04 +01:00
kempersc
3a242370fc annotation standards added 2025-12-05 15:30:23 +01:00
kempersc
d661947830 update enriched entries 2025-12-03 17:38:46 +01:00
kempersc
ef89b1213a validate enrichments 2025-12-02 14:36:01 +01:00
kempersc
8ebca2f845 add pid 2025-12-02 00:00:45 +01:00
kempersc
4b833d20b2 add pids 2025-12-01 23:55:55 +01:00
kempersc
7dce283c17 Add new enums for PersonalCollectionType, ResearchCenterType, and TasteScentHeritage classifications; implement validation script for custodian names against authoritative sources 2025-12-01 18:39:22 +01:00
kempersc
48a2b26f59 feat: Add script to generate Mermaid ER diagrams with instance data from LinkML schemas
- Implemented `generate_mermaid_with_instances.py` to create ER diagrams that include all classes, relationships, enum values, and instance data.
- Loaded instance data from YAML files and enriched enum definitions with meaningful annotations.
- Configured output paths for generated diagrams in both frontend and schema directories.
- Added support for excluding technical classes and limiting the number of displayed enum and instance values for readability.
2025-12-01 16:58:03 +01:00
kempersc
097d116b72 enrich entries 2025-12-01 16:06:34 +01:00
kempersc
2497e5913f enrich entries 2025-12-01 00:37:24 +01:00
kempersc
f3c149b1bb update entries 2025-11-30 23:30:29 +01:00
kempersc
d623f0af4a store archived websites 2025-11-29 20:40:46 +01:00
kempersc
572ccd5daf archive websites 2025-11-29 18:18:04 +01:00
kempersc
0ab8f24a6b archive websites 2025-11-29 18:05:16 +01:00
kempersc
da1eae6597 Refactor code structure for improved readability and maintainability 2025-11-29 12:27:39 +01:00
kempersc
30162e6526 Add script to validate KB library entries and generate enrichment report
- Implemented a Python script to validate KB library YAML files for required fields and data quality.
- Analyzed enrichment coverage from Wikidata and Google Maps, generating statistics.
- Created a comprehensive markdown report summarizing validation results and enrichment quality.
- Included error handling for file loading and validation processes.
- Generated JSON statistics for further analysis.
2025-11-28 14:48:33 +01:00
kempersc
5cdce584b2 Add complete schema for heritage custodian observation reconstruction
- Introduced a comprehensive class diagram for the heritage custodian observation reconstruction schema.
- Defined multiple classes including AllocationAgency, ArchiveOrganizationType, AuxiliaryDigitalPlatform, and others, with relevant attributes and relationships.
- Established inheritance and associations among classes to represent complex relationships within the schema.
- Generated on 2025-11-28, version 0.9.0, excluding the Container class.
2025-11-28 13:13:23 +01:00
kempersc
0d1741c55e Refactor code structure for improved readability and maintainability 2025-11-28 11:44:21 +01:00
kempersc
37886f0433 Refactor code structure for improved readability and maintainability 2025-11-27 17:43:14 +01:00
kempersc
5ef8ccac51 Add script to enrich NDE Register NL entries with Wikidata data
- Implemented a Python script that fetches and enriches entries from the NDE Register using data from Wikidata.
- Utilized the Wikibase REST API and SPARQL endpoints for data retrieval.
- Added logging for tracking progress and errors during the enrichment process.
- Configured rate limiting based on authentication status for API requests.
- Created a structured output in YAML format, including detailed enrichment data.
- Generated a log file summarizing the enrichment process and results.
2025-11-27 13:30:00 +01:00
kempersc
cd0ff5b9c7 wrap up voorbeeld lijst 2025-11-27 10:58:53 +01:00
kempersc
e99b1e644e feat: Add platform_description slot for detailed auxiliary platform information 2025-11-26 10:18:16 +01:00
kempersc
e2eb7aa5cf feat: Add auxiliary slots and enums for places and digital platforms 2025-11-26 10:09:06 +01:00
kempersc
eff2f47f6f Add auxiliary enums and slots for digital platforms and physical locations
- Created AuxiliaryDigitalPlatformTypeEnum.yaml to classify types of secondary digital platforms.
- Created AuxiliaryPlaceTypeEnum.yaml to classify types of secondary physical locations.
- Added OrganizationBranchTypeEnum.yaml for formal organizational branches at auxiliary locations.
- Introduced auxiliary_places.yaml slot to link CustodianPlace to subordinate physical locations.
- Introduced auxiliary_platforms.yaml slot to link DigitalPlatform to subordinate digital properties.
- Added located_at.yaml slot to connect OrganizationalStructure to physical locations.
2025-11-25 15:06:43 +01:00
kempersc
a5a66eb547 add classes 2025-11-25 12:48:07 +01:00
kempersc
3ff0e33bf9 Add UML diagrams and scripts for custodian schema
- Created PlantUML diagrams for custodian types, full schema, legal status, and organizational structure.
- Implemented a script to generate GraphViz DOT diagrams from OWL/RDF ontology files.
- Developed a script to generate UML diagrams from modular LinkML schema, supporting both Mermaid and PlantUML formats.
- Enhanced class definitions and relationships in UML diagrams to reflect the latest schema updates.
2025-11-23 23:05:33 +01:00
kempersc
67657c39b6 feat: Complete Country Class Implementation and Hypernyms Removal
- Created the Country class with ISO 3166-1 alpha-2 and alpha-3 codes, ensuring minimal design without additional metadata.
- Integrated the Country class into CustodianPlace and LegalForm schemas to support country-specific feature types and legal forms.
- Removed duplicate keys in FeatureTypeEnum.yaml, resulting in 294 unique feature types.
- Eliminated "Hypernyms:" text from FeatureTypeEnum descriptions, verifying that semantic relationships are now conveyed through ontology mappings.
- Created example instance file demonstrating integration of Country with CustodianPlace and LegalForm.
- Updated documentation to reflect the completion of the Country class implementation and hypernyms removal.
2025-11-23 13:09:38 +01:00