kempersc
7f792d0250
Remove outdated RDF schema files
...
Clean up old generated RDF/OWL files that have been superseded by
the clean schema deployed to Oxigraph (commit f8421a2903 ).
Only the current deployed schema should be regenerated as needed.
2026-01-07 22:03:26 +01:00
kempersc
81da4ede50
Add comprehensive slot visualization to LinkML viewer
...
- Add standalone Slots section in visual view alongside Classes and Enums
- Display slot_uri, range, identifier badge, description, pattern
- Show examples with value/description pairs
- Color-coded SKOS mapping tags (exact/close/narrow/broad/related)
- Yellow highlighted comments section
- Custodian type filtering works with slots
- Shared renderSlotDetails() function for consistency
2026-01-07 22:03:08 +01:00
kempersc
f8421a2903
Deploy clean RDF schema to Oxigraph - 288,857 triples with 0 malformed URIs
...
- Archive 48 old timestamped RDF files
- Fix relative IRIs by adding hc: namespace prefix
- Fix file path references in seeAlso predicates
- Deployed to sparql.glam-ontology.org triplestore
- 11,250 distinct OWL classes
- Schema includes all base ontologies (CIDOC-CRM, RiC-O, CPOV, etc.)
2026-01-07 16:53:01 +01:00
kempersc
d19822f958
Remove redundant sections from class descriptions
...
- Created cleanup_class_descriptions_v2.py script using text-based regex
- Removed 134 class files' redundant sections:
- dual_class_pattern: 80 occurrences
- ontological_alignment: 35 occurrences
- ontology_alignment_upper: 33 occurrences
- multilingual_labels: 26 occurrences
- glamorcubes_category: 6 occurrences
- example_structure: 6 occurrences
- Fixed ArchiveOrganizationType.yaml parse error after cleanup
- Added 49 new slot definition files
- All 395 class files validate as correct YAML
- Deployed to bronhouder.nl/linkml
2026-01-07 13:50:14 +01:00
kempersc
dfa667c90f
Fix LinkML schema for valid RDF generation with proper slot_uri
...
Summary:
- Create 46 missing slot definition files with proper slot_uri values
- Add slot imports to main schema (01_custodian_name_modular.yaml)
- Fix YAML examples sections in 116+ class and slot files
- Fix PersonObservation.yaml examples section (nested objects → string literals)
Technical changes:
- All slots now have explicit slot_uri mapping to base ontologies (RiC-O, Schema.org, SKOS)
- Eliminates malformed URIs like 'custodian/:slot_name' in generated RDF
- gen-owl now produces valid Turtle with 153,166 triples
New slot files (46):
- RiC-O slots: rico_note, rico_organizational_principle, rico_has_or_had_holder, etc.
- Scope slots: scope_includes, scope_excludes, archive_scope
- Organization slots: organization_type, governance_authority, area_served
- Platform slots: platform_type_category, portal_type_category
- Social media slots: social_media_platform_category, post_type_*
- Type hierarchy slots: broader_type, narrower_types, custodian_type_broader
- Wikidata slots: wikidata_equivalent, wikidata_mapping
Generated output:
- schemas/20251121/rdf/01_custodian_name_modular_20260107_134534_clean.owl.ttl (6.9MB)
- Validated with rdflib: 153,166 triples, no malformed URIs
2026-01-07 13:48:03 +01:00
kempersc
98c42bf272
Fix LinkML URI conflicts and generate RDF outputs
...
- Fix scope_note → finding_aid_scope_note in FindingAid.yaml
- Remove duplicate wikidata_entity slot from CustodianType.yaml (import instead)
- Remove duplicate rico_record_set_type from class_metadata_slots.yaml
- Fix range types for equals_string compatibility (uriorcurie → string)
- Move class names from close_mappings to see_also in 10 RecordSetTypes files
- Generate all RDF formats: OWL, N-Triples, RDF/XML, N3, JSON-LD context
- Sync schemas to frontend/public/schemas/
Files: 1,151 changed (includes prior CustodianType migration)
2026-01-07 12:32:59 +01:00
kempersc
6c6810fa43
Replace CustodianTypeCodeEnum with CustodianType class references
...
- Remove deprecated CustodianTypeCodeEnum from class_metadata_slots.yaml
- Update custodian_types slot to use uriorcurie range (references CustodianType subclasses)
- Update custodian_types_primary slot similarly
- Add migration note for legacy string format ['A'] vs new URI format
Per Rule 9: Enum-to-Class Promotion - Single Source of Truth
2026-01-06 12:37:40 +01:00
kempersc
b34992b1d3
Migrate all 293 class files to ontology-aligned slots
...
Extends migration to all class types (museums, libraries, galleries, etc.)
New slots added to class_metadata_slots.yaml:
- RiC-O: rico_record_set_type, rico_organizational_principle,
rico_has_or_had_holder, rico_note
- Multilingual: label_de, label_es, label_fr, label_nl, label_it, label_pt
- Scope: scope_includes, scope_excludes, custodian_only,
organizational_level, geographic_restriction
- Notes: privacy_note, preservation_note, legal_note
Migration script now handles 30+ annotation types.
All migrated schemas pass linkml-validate.
Total: 387 class files now use proper slots instead of annotations.
2026-01-06 12:24:54 +01:00
kempersc
aa763dab25
Migrate 94 archive class annotations to ontology-aligned slots
...
- Add migration script: scripts/migrate_annotations_to_slots.py
- Convert custodian_types, wikidata, skos_broader, specificity_* annotations
- Replace with proper slots mapped to SKOS, PROV-O, RiC-O predicates
- Add ../slots/class_metadata_slots import to all migrated files
- Remove AcademicArchive_refactored.yaml (main file now migrated)
- Sync changes to frontend/public/schemas/
Migration converts:
- custodian_types → hc:custodianTypes slot
- wikidata/wikidata_label → wikidata_alignment structured slot
- skos_broader → skos:broader slot
- specificity_* → specificity_annotation structured slot
- dual_class_pattern → dual_class_link structured slot
- template_specificity → template_specificity slot
All 94 migrated schemas pass linkml-validate.
2026-01-06 11:25:37 +01:00
kempersc
f37f5208ca
Copy class metadata slots to frontend public folder for deployment
2026-01-06 11:17:12 +01:00
kempersc
bc562bd68d
Add class metadata slots to replace annotations with ontology-aligned predicates
...
- Add class_metadata_slots.yaml with slots for:
- GLAMORCUBESFIXPHDNT custodian type classification (hc:custodianTypes)
- Wikidata alignment (wdt:P31, skos:mappingRelation)
- SKOS hierarchical relationships (skos:broader, skos:narrower)
- Dual-class pattern linking (rdfs:seeAlso)
- Specificity scoring for RAG (prov:generatedAtTime, prov:wasAttributedTo)
- Collection holdings (rico:isOrWasHolderOf)
- Add AcademicArchive_refactored.yaml demonstrating slot-based approach
- Add migration guide documenting annotation-to-slot mappings
Ontology sources: SKOS, PROV-O, Dublin Core, RiC-O, Wikidata
2026-01-06 11:16:49 +01:00
kempersc
11983014bb
Enhance specificity scoring system integration with existing infrastructure
...
- Updated documentation to clarify integration points with existing components in the RAG pipeline and DSPy framework.
- Added detailed mapping of SPARQL templates to context templates for improved specificity filtering.
- Implemented wrapper patterns around existing classifiers to extend functionality without duplication.
- Introduced new tests for the SpecificityAwareClassifier and SPARQLToContextMapper to ensure proper integration and functionality.
- Enhanced the CustodianRDFConverter to include ISO country and subregion codes from GHCID for better geospatial data handling.
2026-01-05 17:37:49 +01:00
kempersc
41d8905661
Fix Turtle parser multi-line string handling for PiCo ontology
...
- Fixed bug where closing triple-quotes (""") would incorrectly re-trigger
multi-line string detection, causing subsequent class definitions to be skipped
- Added lineToProcess variable to track which portion of line to process after
closing a multi-line string, preventing re-detection of opening quotes
- Moved UML large diagram confirmation logic from OntologyViewerPage to
UMLVisualization component for better encapsulation
- PiCo ontology now correctly shows all 8 classes instead of 2
Deployed and verified on https://bronhouder.nl/ontology?ontology=PiCo
2026-01-05 11:25:43 +01:00
kempersc
242bc8bb35
Add new slots for heritage custodian entities
...
- Created deliverables_slot for expected or achieved deliverable outputs.
- Introduced event_id_slot for persistent unique event identifiers.
- Added follow_up_date_slot for scheduled follow-up action dates.
- Implemented object_ref_slot for references to heritage objects.
- Established price_slot for price information across entities.
- Added price_currency_slot for currency codes in price information.
- Created protocol_slot for API protocol specifications.
- Introduced provenance_text_slot for full provenance entry text.
- Added record_type_slot for classification of record types.
- Implemented response_formats_slot for supported API response formats.
- Established status_slot for current status of entities or activities.
- Added FactualCountDisplay component for displaying count query results.
- Introduced ReplyTypeIndicator component for visualizing reply types.
- Created approval_date_slot for formal approval dates.
- Added authentication_required_slot for API authentication status.
- Implemented capacity_items_slot for maximum storage capacity.
- Established conservation_lab_slot for conservation laboratory information.
- Added cost_usd_slot for API operation costs in USD.
2026-01-05 00:49:05 +01:00
kempersc
89001fbc53
compact header controls on OntologyViewer and QueryBuilder pages
2026-01-04 17:29:34 +01:00
kempersc
eb61f45de2
compact UML controls toolbar to fit single line when sidebar collapsed
2026-01-04 17:21:53 +01:00
kempersc
2dca28d8c1
enrich CH entries with mission statements
2026-01-04 13:12:32 +01:00
kempersc
4f0cafe98a
enrich HC profiles
2026-01-02 02:11:04 +01:00
kempersc
349f31ae6f
enrich custodian profiles
2026-01-02 02:10:18 +01:00
kempersc
aee76fcc7f
backup html content
2025-12-31 02:36:38 +01:00
kempersc
b7701c8a8e
backup person profiles
2025-12-31 00:04:09 +01:00
kempersc
7108cb1483
backup person profiles
2025-12-31 00:00:25 +01:00
kempersc
38dcd2ce9c
Restore YAML files for Museum Dokkum and Gemeente Smallingerland with enriched data and provenance tracking
2025-12-30 23:58:21 +01:00
kempersc
1d8fd68e3a
backup custodian web profiles
2025-12-30 23:53:16 +01:00
kempersc
f6a5962c3b
backup person profiles
2025-12-30 23:48:50 +01:00
kempersc
cbf88d2a6d
backup person profiles
2025-12-30 23:44:57 +01:00
kempersc
30b701a5ec
backup HC data
2025-12-30 23:41:15 +01:00
kempersc
c417d0c758
Refactor code structure for improved readability and maintainability
2025-12-30 23:38:18 +01:00
kempersc
fb0daab718
backup JP profiles
2025-12-30 23:24:30 +01:00
kempersc
b42d6bf5d2
backup CZ and JP
2025-12-30 23:19:38 +01:00
kempersc
45e873ec0a
enrich JP BE AR profiles
2025-12-30 23:07:03 +01:00
kempersc
bc6ad46bfa
enrich CZ and JP profiles
2025-12-30 23:03:03 +01:00
kempersc
90b402dba6
enrich AR en Czech files
2025-12-30 23:01:01 +01:00
kempersc
f753d7277f
Add country code extraction for location validation in Google Places API
2025-12-30 03:45:29 +01:00
kempersc
cefc847056
Remove custodian entry for Leica AG from YAML file
2025-12-30 03:44:25 +01:00
kempersc
9159ff35db
Add custodian entry for Leica AG with data contamination fixes and location corrections
2025-12-30 03:43:47 +01:00
kempersc
d64f857aa9
add sparql validator and RAG injector
2025-12-30 03:43:31 +01:00
kempersc
84904e344b
Make AGENTS more succint by referring to opencode rules & enrich custodians
2025-12-28 14:56:35 +01:00
kempersc
4cf3fe8a07
Logo enrichment batch: JP+170 (5,166/12,096 = 42.7%) - 14,503 total (45.6%)
2025-12-27 13:17:40 +01:00
kempersc
3447a9cc6c
Logo enrichment batch: JP+440 (4,996/12,096 = 41.3%) - 14,333 total (45.1%)
2025-12-27 12:20:53 +01:00
kempersc
cdb633b0c9
enrich custodian entries with logo
2025-12-27 02:15:17 +01:00
kempersc
fd91fec63f
Logo enrichment batch: JP+320, 13,603 total (42.8%)
...
- JP: 4,516/12,096 (37.4%) ✅ NEW COMMIT
- CZ: 3,820/8,432 (45.3%) - batches 7-16 running
- CH, NL, BE, AT, BR: 100% complete
- Total: 13,603/31,772 (42.8%)
- Using crawl4ai favicon extraction
2025-12-26 23:25:40 +01:00
kempersc
2104a90f22
Logo enrichment COMPLETE: CZ 3,820 (45.3%)
...
- CZ: 3,820/8,432 files processed (45.3%)
- 9 parallel batches completed (500 files each)
- NL person entities added (4 staff profiles)
- scripts/discover_websites_crawl4ai.py modified
- Using crawl4ai favicon extraction
2025-12-26 21:45:14 +01:00
kempersc
6af5009444
enrich entries
2025-12-26 21:41:18 +01:00
kempersc
ca219340f2
enrich entries
2025-12-26 14:30:31 +01:00
kempersc
59963c8d3f
Logo enrichment batch: JP+300, CZ-0 - 12,833 files (40.4%)
...
- JP: 4,496 processed (37.2% of 12,096) ✅ COMPLETE
- CZ: 2,820 processed (33.4% of 8,432) - batch completed, slight decrease
- CH, NL, BE, AT, BR: 100% complete
- Total: 12,833 of 31,772 files (40.4%)
- Using crawl4ai favicon extraction
2025-12-26 13:42:21 +01:00
kempersc
fb7993e3af
fix: filter DSPy field markers from streaming output
...
Implements a state machine to filter streaming tokens:
- Only stream tokens from the 'answer' field to the frontend
- Skip tokens from 'reasoning', 'citations', 'confidence', 'follow_up' fields
- Remove DSPy field markers like '[[ ## answer ## ]]' from streamed content
This fixes the issue where raw DSPy signature field markers were being
displayed in the chat interface instead of clean answer text.
2025-12-26 03:11:44 +01:00
kempersc
6b9fa33767
Logo enrichment batch: CZ+500, JP+170 - 12,513 files (39.4%)
...
- CZ: 2,820 processed (33.4% of 8,432)
- JP: 4,176 processed (34.5% of 12,096)
- Total: 12,513 of 31,772 (39.4%)
- CZ batch completed: 500 files, 52 logos found
- JP batch crashed during run (4,176 files before crash)
- Using crawl4ai favicon extraction
2025-12-26 02:03:48 +01:00
kempersc
63400392ff
Fix CZ-52-PAB-L-IPVVZOVI logo: use primary_logo.png instead of favicon.ico
...
- Primary logo (logo.png) identified via crawl4ai direct scraping
- Favicon (favicon.ico) retained as secondary asset
- Updated claims: primary_logo_url + favicon_url
- Summary shows: has_primary_logo: true, total_claims: 2
2025-12-25 21:01:05 +01:00
kempersc
6ab0b19ae2
Logo enrichment batch: CZ+260, JP+260 - 11,663 files (36.7%)
...
- CZ: 2,810 processed (33.3% of 8,432)
- JP: 3,336 processed (27.6% of 12,096)
- Total: 11,663 of 31,772 (36.7%)
- Using crawl4ai favicon extraction
2025-12-25 19:23:41 +01:00