Commit graph

225 commits

Author SHA1 Message Date
kempersc
0393b321c9 refactor(schema): unify custodian_type slots into has_or_had_custodian_type (Rule 39, 43)
- Migrate 236+ class files from custodian_types to has_or_had_custodian_type
- Archive deprecated slots: custodian_type, custodian_types, custodian_type_broader/narrower/related
- Update main schema and manifest imports
- Fix Custodian.yaml class to use new slot
- Fix annotation format (list→scalar) in has_or_had_custodian_type.yaml

Rules applied:
- Rule 39: RiC-O naming convention (hasOrHad pattern)
- Rule 43: Slot nouns must be singular (multivalued:true for cardinality)
- Rule 38: Slot centralization with semantic URI
2026-01-09 10:55:21 +01:00
kempersc
508b858e16 docs(Rule 40): Add empirical validation showing 33% Google Maps error rate for Type I
Audit of 188 Type I custodian files revealed:
- 62 false matches (33%) detected and corrected
- Categories: domain mismatch (39), name mismatch (8), wrong location (6),
  wrong org type (5), different entity (3), different event (3)
- Documents why Google Maps fails for intangible heritage:
  virtual orgs, person-based heritage, volunteer networks, event-based orgs

This validates KIEN as TIER_1_AUTHORITATIVE for Type I custodians.
2026-01-08 16:47:17 +01:00
kempersc
6608a207d4 update frontend 2026-01-08 15:56:28 +01:00
kempersc
9d68ed8c2e fix: mark 15 more Google Maps false matches via comprehensive review
Manual review of remaining Type I custodian files without official websites
identified additional false matches in these categories:

Wrong organization type:
- Bird catchers vs bird watchers association
- Heritage org vs webshop
- Regional org vs specific local entity
- Federation vs single member association
- Bell ringers org vs church building

Wrong location:
- Amsterdam org matched to Den Haag
- Haarlem org matched to Apeldoorn
- Rotterdam org matched to Amstelveen
- Dutch org matched to Suriname (!)
- Giethoorn event matched to Belt-Schutsloot
- Duindorp bonfire matched to Scheveningen

Different event/entity:
- Horse racing org vs summer festival
- Street name vs organization
- Heritage foundation vs specific local fair

Total Type I false matches fixed: 62 of 188 files (33%)
2026-01-08 15:21:31 +01:00
kempersc
0b0ea75070 feat(rag): add factual query fast path - skip LLM for count/list queries
- Add ontology cache warming at startup in lifespan() function
- Add is_factual_query() detection in template_sparql.py (12 templates)
- Add factual_result and sparql_query fields to DSPyQueryResponse
- Skip LLM generation for factual templates (count, list, compare)
- Execute SPARQL directly and return results as table (~15s → ~2s latency)
- Update ConversationPanel.tsx to render factual results table
- Add CSS styling for factual results with green theme

For queries like 'hoeveel archieven zijn er in Den Haag', the SPARQL
results ARE the answer - no need for expensive LLM prose generation.
2026-01-08 13:34:23 +01:00
kempersc
85d9cee82f fix: mark 8 more Google Maps false matches detected via name mismatch
Additional Type I custodian files with obvious name mismatches between
KIEN registry entries and Google Maps results. These couldn't be
auto-detected via domain mismatch because they lack official websites.

Fixes:
- Dick Timmerman (person) → carpentry business
- Ria Bos (cigar maker) → money transfer agent
- Stichting Kracom (Krampuslauf) → Happy Caps retail
- Fed. Nederlandse Vertelorganisaties → NET Foundation
- Stichting dodenherdenking Alphen → wrong memorial
- Sao Joao Rotterdam → Heemraadsplein (location not org)
- sport en spel (heritage) → equipment rental
- Eiertikken Ommen → restaurant

Also adds detection and fix scripts for Google Maps false matches.
2026-01-08 13:26:53 +01:00
kempersc
b2b21abe2b fix: mark 39 Google Maps false matches for Type I intangible heritage custodians
Per Rule 40 (KIEN authoritative source), Google Maps frequently returns
false matches for intangible heritage organizations. These are virtual
networks without commercial storefronts.

Changes:
- Mark google_maps_enrichment.status as FALSE_MATCH
- Preserve original data in original_false_match for audit trail
- Add correction_timestamp and correction_agent provenance
- Special handling for NL-GE-TIE-I-M (Stichting MOZA): also fixed
  YouTube false match (Mozart channel) and removed ~1750 lines of
  irrelevant video data

Detection method: Domain mismatch between Google Maps website field
and official KIEN registry website.
2026-01-08 12:16:39 +01:00
kempersc
53ffed3531 Add TemplateSpecificityScores import to SpecificityAnnotation 2026-01-07 22:05:42 +01:00
kempersc
20374b9032 Archive monolithic class_metadata_slots before modular refactoring 2026-01-07 22:05:21 +01:00
kempersc
30b9cb9d14 Add SOTA analysis and update design pattern documentation
- Add prompt-query_template_mapping/SOTA_analysis.md with Formica et al. research
- Update GraphRAG design patterns documentation
- Update temporal semantic hypergraph documentation
2026-01-07 22:05:01 +01:00
kempersc
99dc608826 Refactor RAG to template-based SPARQL generation
Major architectural changes based on Formica et al. (2023) research:
- Add TemplateClassifier for deterministic SPARQL template matching
- Add SlotExtractor with synonym resolution for slot values
- Add TemplateInstantiator using Jinja2 for query rendering
- Refactor dspy_heritage_rag.py to use template system
- Update main.py with streamlined pipeline
- Fix semantic_router.py ordering issues
- Add comprehensive metrics tracking

Template-based approach achieves 65% precision vs 10% LLM-only
per Formica et al. research on SPARQL generation.
2026-01-07 22:04:43 +01:00
kempersc
9b769f1ca2 Update manifest timestamp and minor class fixes 2026-01-07 22:04:29 +01:00
kempersc
181991940f Add new LinkML schema modules for specificity and Wikidata alignment
New classes:
- SpecificityAnnotation: Track class relevance scores per template
- TemplateSpecificityScores: 10 conversation template scores
- WikidataAlignment: Link classes to Wikidata entities
- DualClassLink: Model dual-aspect class relationships

New enums:
- WikidataMappingTypeEnum: exact/close/narrow/broad/related mappings
- DualClassPatternEnum: place-custodian, collection-custodian patterns

New slots (44 files):
- Specificity slots: score, rationale, agent, timestamp
- Template scores: archive_search, museum_search, library_search, etc.
- Wikidata slots: entity_id, label, mapping_type, rationale
- Multilingual labels: label_nl, label_de, label_fr, label_es, etc.
- Custodian type annotations: custodian_types, rationale, primary
- SKOS hierarchy: skos_broader, skos_narrower, skos_related
2026-01-07 22:03:58 +01:00
kempersc
7f792d0250 Remove outdated RDF schema files
Clean up old generated RDF/OWL files that have been superseded by
the clean schema deployed to Oxigraph (commit f8421a2903).
Only the current deployed schema should be regenerated as needed.
2026-01-07 22:03:26 +01:00
kempersc
81da4ede50 Add comprehensive slot visualization to LinkML viewer
- Add standalone Slots section in visual view alongside Classes and Enums
- Display slot_uri, range, identifier badge, description, pattern
- Show examples with value/description pairs
- Color-coded SKOS mapping tags (exact/close/narrow/broad/related)
- Yellow highlighted comments section
- Custodian type filtering works with slots
- Shared renderSlotDetails() function for consistency
2026-01-07 22:03:08 +01:00
kempersc
f8421a2903 Deploy clean RDF schema to Oxigraph - 288,857 triples with 0 malformed URIs
- Archive 48 old timestamped RDF files
- Fix relative IRIs by adding hc: namespace prefix
- Fix file path references in seeAlso predicates
- Deployed to sparql.glam-ontology.org triplestore
- 11,250 distinct OWL classes
- Schema includes all base ontologies (CIDOC-CRM, RiC-O, CPOV, etc.)
2026-01-07 16:53:01 +01:00
kempersc
d19822f958 Remove redundant sections from class descriptions
- Created cleanup_class_descriptions_v2.py script using text-based regex
- Removed 134 class files' redundant sections:
  - dual_class_pattern: 80 occurrences
  - ontological_alignment: 35 occurrences
  - ontology_alignment_upper: 33 occurrences
  - multilingual_labels: 26 occurrences
  - glamorcubes_category: 6 occurrences
  - example_structure: 6 occurrences
- Fixed ArchiveOrganizationType.yaml parse error after cleanup
- Added 49 new slot definition files
- All 395 class files validate as correct YAML
- Deployed to bronhouder.nl/linkml
2026-01-07 13:50:14 +01:00
kempersc
dfa667c90f Fix LinkML schema for valid RDF generation with proper slot_uri
Summary:
- Create 46 missing slot definition files with proper slot_uri values
- Add slot imports to main schema (01_custodian_name_modular.yaml)
- Fix YAML examples sections in 116+ class and slot files
- Fix PersonObservation.yaml examples section (nested objects → string literals)

Technical changes:
- All slots now have explicit slot_uri mapping to base ontologies (RiC-O, Schema.org, SKOS)
- Eliminates malformed URIs like 'custodian/:slot_name' in generated RDF
- gen-owl now produces valid Turtle with 153,166 triples

New slot files (46):
- RiC-O slots: rico_note, rico_organizational_principle, rico_has_or_had_holder, etc.
- Scope slots: scope_includes, scope_excludes, archive_scope
- Organization slots: organization_type, governance_authority, area_served
- Platform slots: platform_type_category, portal_type_category
- Social media slots: social_media_platform_category, post_type_*
- Type hierarchy slots: broader_type, narrower_types, custodian_type_broader
- Wikidata slots: wikidata_equivalent, wikidata_mapping

Generated output:
- schemas/20251121/rdf/01_custodian_name_modular_20260107_134534_clean.owl.ttl (6.9MB)
- Validated with rdflib: 153,166 triples, no malformed URIs
2026-01-07 13:48:03 +01:00
kempersc
98c42bf272 Fix LinkML URI conflicts and generate RDF outputs
- Fix scope_note → finding_aid_scope_note in FindingAid.yaml
- Remove duplicate wikidata_entity slot from CustodianType.yaml (import instead)
- Remove duplicate rico_record_set_type from class_metadata_slots.yaml
- Fix range types for equals_string compatibility (uriorcurie → string)
- Move class names from close_mappings to see_also in 10 RecordSetTypes files
- Generate all RDF formats: OWL, N-Triples, RDF/XML, N3, JSON-LD context
- Sync schemas to frontend/public/schemas/

Files: 1,151 changed (includes prior CustodianType migration)
2026-01-07 12:32:59 +01:00
kempersc
6c6810fa43 Replace CustodianTypeCodeEnum with CustodianType class references
- Remove deprecated CustodianTypeCodeEnum from class_metadata_slots.yaml
- Update custodian_types slot to use uriorcurie range (references CustodianType subclasses)
- Update custodian_types_primary slot similarly
- Add migration note for legacy string format ['A'] vs new URI format

Per Rule 9: Enum-to-Class Promotion - Single Source of Truth
2026-01-06 12:37:40 +01:00
kempersc
b34992b1d3 Migrate all 293 class files to ontology-aligned slots
Extends migration to all class types (museums, libraries, galleries, etc.)

New slots added to class_metadata_slots.yaml:
- RiC-O: rico_record_set_type, rico_organizational_principle,
  rico_has_or_had_holder, rico_note
- Multilingual: label_de, label_es, label_fr, label_nl, label_it, label_pt
- Scope: scope_includes, scope_excludes, custodian_only,
  organizational_level, geographic_restriction
- Notes: privacy_note, preservation_note, legal_note

Migration script now handles 30+ annotation types.
All migrated schemas pass linkml-validate.

Total: 387 class files now use proper slots instead of annotations.
2026-01-06 12:24:54 +01:00
kempersc
aa763dab25 Migrate 94 archive class annotations to ontology-aligned slots
- Add migration script: scripts/migrate_annotations_to_slots.py
- Convert custodian_types, wikidata, skos_broader, specificity_* annotations
- Replace with proper slots mapped to SKOS, PROV-O, RiC-O predicates
- Add ../slots/class_metadata_slots import to all migrated files
- Remove AcademicArchive_refactored.yaml (main file now migrated)
- Sync changes to frontend/public/schemas/

Migration converts:
  - custodian_types → hc:custodianTypes slot
  - wikidata/wikidata_label → wikidata_alignment structured slot
  - skos_broader → skos:broader slot
  - specificity_* → specificity_annotation structured slot
  - dual_class_pattern → dual_class_link structured slot
  - template_specificity → template_specificity slot

All 94 migrated schemas pass linkml-validate.
2026-01-06 11:25:37 +01:00
kempersc
f37f5208ca Copy class metadata slots to frontend public folder for deployment 2026-01-06 11:17:12 +01:00
kempersc
bc562bd68d Add class metadata slots to replace annotations with ontology-aligned predicates
- Add class_metadata_slots.yaml with slots for:
  - GLAMORCUBESFIXPHDNT custodian type classification (hc:custodianTypes)
  - Wikidata alignment (wdt:P31, skos:mappingRelation)
  - SKOS hierarchical relationships (skos:broader, skos:narrower)
  - Dual-class pattern linking (rdfs:seeAlso)
  - Specificity scoring for RAG (prov:generatedAtTime, prov:wasAttributedTo)
  - Collection holdings (rico:isOrWasHolderOf)

- Add AcademicArchive_refactored.yaml demonstrating slot-based approach
- Add migration guide documenting annotation-to-slot mappings

Ontology sources: SKOS, PROV-O, Dublin Core, RiC-O, Wikidata
2026-01-06 11:16:49 +01:00
kempersc
11983014bb Enhance specificity scoring system integration with existing infrastructure
- Updated documentation to clarify integration points with existing components in the RAG pipeline and DSPy framework.
- Added detailed mapping of SPARQL templates to context templates for improved specificity filtering.
- Implemented wrapper patterns around existing classifiers to extend functionality without duplication.
- Introduced new tests for the SpecificityAwareClassifier and SPARQLToContextMapper to ensure proper integration and functionality.
- Enhanced the CustodianRDFConverter to include ISO country and subregion codes from GHCID for better geospatial data handling.
2026-01-05 17:37:49 +01:00
kempersc
41d8905661 Fix Turtle parser multi-line string handling for PiCo ontology
- Fixed bug where closing triple-quotes (""") would incorrectly re-trigger
  multi-line string detection, causing subsequent class definitions to be skipped
- Added lineToProcess variable to track which portion of line to process after
  closing a multi-line string, preventing re-detection of opening quotes
- Moved UML large diagram confirmation logic from OntologyViewerPage to
  UMLVisualization component for better encapsulation
- PiCo ontology now correctly shows all 8 classes instead of 2

Deployed and verified on https://bronhouder.nl/ontology?ontology=PiCo
2026-01-05 11:25:43 +01:00
kempersc
242bc8bb35 Add new slots for heritage custodian entities
- Created deliverables_slot for expected or achieved deliverable outputs.
- Introduced event_id_slot for persistent unique event identifiers.
- Added follow_up_date_slot for scheduled follow-up action dates.
- Implemented object_ref_slot for references to heritage objects.
- Established price_slot for price information across entities.
- Added price_currency_slot for currency codes in price information.
- Created protocol_slot for API protocol specifications.
- Introduced provenance_text_slot for full provenance entry text.
- Added record_type_slot for classification of record types.
- Implemented response_formats_slot for supported API response formats.
- Established status_slot for current status of entities or activities.
- Added FactualCountDisplay component for displaying count query results.
- Introduced ReplyTypeIndicator component for visualizing reply types.
- Created approval_date_slot for formal approval dates.
- Added authentication_required_slot for API authentication status.
- Implemented capacity_items_slot for maximum storage capacity.
- Established conservation_lab_slot for conservation laboratory information.
- Added cost_usd_slot for API operation costs in USD.
2026-01-05 00:49:05 +01:00
kempersc
89001fbc53 compact header controls on OntologyViewer and QueryBuilder pages 2026-01-04 17:29:34 +01:00
kempersc
eb61f45de2 compact UML controls toolbar to fit single line when sidebar collapsed 2026-01-04 17:21:53 +01:00
kempersc
2dca28d8c1 enrich CH entries with mission statements 2026-01-04 13:12:32 +01:00
kempersc
4f0cafe98a enrich HC profiles 2026-01-02 02:11:04 +01:00
kempersc
349f31ae6f enrich custodian profiles 2026-01-02 02:10:18 +01:00
kempersc
aee76fcc7f backup html content 2025-12-31 02:36:38 +01:00
kempersc
b7701c8a8e backup person profiles 2025-12-31 00:04:09 +01:00
kempersc
7108cb1483 backup person profiles 2025-12-31 00:00:25 +01:00
kempersc
38dcd2ce9c Restore YAML files for Museum Dokkum and Gemeente Smallingerland with enriched data and provenance tracking 2025-12-30 23:58:21 +01:00
kempersc
1d8fd68e3a backup custodian web profiles 2025-12-30 23:53:16 +01:00
kempersc
f6a5962c3b backup person profiles 2025-12-30 23:48:50 +01:00
kempersc
cbf88d2a6d backup person profiles 2025-12-30 23:44:57 +01:00
kempersc
30b701a5ec backup HC data 2025-12-30 23:41:15 +01:00
kempersc
c417d0c758 Refactor code structure for improved readability and maintainability 2025-12-30 23:38:18 +01:00
kempersc
fb0daab718 backup JP profiles 2025-12-30 23:24:30 +01:00
kempersc
b42d6bf5d2 backup CZ and JP 2025-12-30 23:19:38 +01:00
kempersc
45e873ec0a enrich JP BE AR profiles 2025-12-30 23:07:03 +01:00
kempersc
bc6ad46bfa enrich CZ and JP profiles 2025-12-30 23:03:03 +01:00
kempersc
90b402dba6 enrich AR en Czech files 2025-12-30 23:01:01 +01:00
kempersc
f753d7277f Add country code extraction for location validation in Google Places API 2025-12-30 03:45:29 +01:00
kempersc
cefc847056 Remove custodian entry for Leica AG from YAML file 2025-12-30 03:44:25 +01:00
kempersc
9159ff35db Add custodian entry for Leica AG with data contamination fixes and location corrections 2025-12-30 03:43:47 +01:00
kempersc
d64f857aa9 add sparql validator and RAG injector 2025-12-30 03:43:31 +01:00