# Session Summary: Strategic Pivot to Top-Down Ontology Design **Date**: 2025-11-21 **Session Focus**: Name Entity as Central Hub - Foundation Complete **Status**: ✅ COMPLETE --- ## 🎯 Major Strategic Pivot ### From: Bottom-Up Entity Enrichment (0.20% complete) **Old Approach**: - Enrich 2,453 Wikidata entities one-by-one - Progress: 5/2,453 entries (0.20%) - Estimated time: 2,400+ sessions at current pace ### To: Top-Down Ontology Design **New Approach**: 1. Define abstract patterns ONCE (Name, Place, Organization, Collection) 2. Extract unique hypernyms from `hyponyms_curated.yaml` (~20 top-level categories) 3. Map hypernyms to ontology classes 4. Batch convert all 2,453 entities using patterns **Result**: ~100x efficiency gain --- ## 🏗️ Core Design: Name as Central Hub ### The Insight **Question**: Is "Mansion House" a place name or an organization name? **Answer**: **BOTH** - it's a single **nominal reference** that refers to multiple aspects. ### The Solution **Single Name Entity** with **multi-aspect references**: ``` Name (nominal reference) ├─ refers_to_place → Place (spatial aspect) ├─ refers_to_organization → Organization (custodian aspect) └─ refers_to_collection → Collection (heritage materials aspect) ``` **Each aspect has independent temporal lifecycle**: - Place: Construction (1753) → Present (271 years) - Organization: Founding (1753) → Present (271 years) - Name: "Mansion House" (1753) → Present (same name for 271 years) - Alternative scenario: Name changes 5 times while Place/Organization persist ### Ontological Justification 1. **Wikidata Q82799**: "name" = nominal reference (linguistic identifier), NOT the entity itself 2. **SKOS**: Names are `skos:Concept` with hierarchical structure 3. **CIDOC-CRM E41**: Appellations are distinct from entities they identify 4. **Temporal Flexibility**: Name changes don't require entity recreation 5. **Multi-Aspect**: Single name can reference multiple aspects simultaneously --- ## 📁 Deliverables - 4 Schema Formats ### 1. LinkML Schema (`01_name_entity.yaml`) **Purpose**: Machine-readable foundation **Content**: - Class: `Name` (1 entity) - Slots: 24 properties - Enums: 1 (NameTypeEnum) - **SKOS Alignment**: `skos:Concept`, `skos:prefLabel`, `skos:broader` - **Multi-Aspect**: `refers_to_place`, `refers_to_organization`, `refers_to_collection` - **Temporal**: `valid_from`, `valid_to`, `replaces`, `replaced_by` **Validation**: ✅ PASSED (YAML syntax valid) **Usage**: ```bash # Generate JSON Schema linkml-convert -s 01_name_entity.yaml -t json-schema # Generate Python dataclasses linkml-convert -s 01_name_entity.yaml -t python # Generate SHACL shapes linkml-convert -s 01_name_entity.yaml -t shacl ``` --- ### 2. Mermaid Diagram (`01_name_entity_hub.mmd`) **Purpose**: GitHub-friendly visual documentation **Content**: - Class diagram with relationships - Forward references (Place, Organization, Collection) - SKOS hierarchical relationships (broader/narrower) - Temporal name chains (replaces/replaced_by) **Features**: - Auto-renders in GitHub - Embeddable in Markdown docs - Simple syntax for quick updates **Rendering**: ```markdown ![Name Entity Hub](uml/mermaid/01_name_entity_hub.mmd) ``` --- ### 3. PlantUML Diagram (`01_name_entity_hub.puml`) **Purpose**: Comprehensive UML modeling **Content**: - Full UML 2.5 class diagram - Color-coded by ontology: - SKOS (#E1F5FE - light blue) - CIDOC-CRM (#FFF3E0 - light orange) - CPOV (#F3E5F5 - light purple) - Schema.org (#E8F5E9 - light green) - Extensive notes (500+ words of rationale) - Method signatures - Cardinality constraints **Rendering**: ```bash # Local PlantUML CLI plantuml 01_name_entity_hub.puml # PlantUML server curl -X POST --data-binary @01_name_entity_hub.puml https://www.plantuml.com/plantuml/png ``` --- ### 4. TypeQL Schema (`01_name_entity_hub.tql`) **Purpose**: TypeDB knowledge graph database **Content**: - Entity: `name` (PERA model) - Relations: 5 types - `broader-narrower` (SKOS hierarchy) - `name-reference` (multi-aspect connections) - `name-succession` (temporal chains) - `name-change-event` (provenance) - `hypernym-relationship` (taxonomy) - Attributes: 20+ properties - **Reasoning Rules**: 3 inference rules - Transitive broader/narrower - Current name detection - Organization inference from place **Loading**: ```bash typedb console --script 01_name_entity_hub.tql ``` --- ### 5. RDF/OWL Ontology (`01_name_entity_hub.ttl`) **Purpose**: Semantic Web / Linked Open Data **Content**: - OWL Class: `heritage:Name` - OWL Properties: 5 multi-aspect properties - **SKOS Integration**: Reuses SKOS vocabulary - **SHACL Constraints**: Cardinality, datatypes, patterns - **PROV-O**: `heritage:NameChange` activity - **Forward References**: Place, Organization, Collection (minimally defined) **Usage**: ```bash # Load into GraphDB curl -X POST -H "Content-Type: text/turtle" --data-binary @01_name_entity_hub.ttl http://localhost:7200/repositories/heritage/statements # Validate with RDFLib python -c "from rdflib import Graph; g = Graph(); g.parse('01_name_entity_hub.ttl'); print(len(g))" ``` --- ## 🔍 Key Features ### Multi-Aspect Pattern **Example: Mansion House (Q1786933)** ```yaml # LinkML Instance - id: https://w3id.org/heritage/name/Q1786933 prefLabel: Mansion House wikidata_id: Q1786933 refers_to_place: - https://w3id.org/heritage/place/mansion-house-london refers_to_organization: - https://w3id.org/heritage/org/lord-mayor-residence refers_to_collection: - https://w3id.org/heritage/collection/mansion-house-art broader: - https://w3id.org/heritage/name/Q1802963 # mansion concept ``` ```turtle # RDF/Turtle a heritage:Name , skos:Concept ; heritage:wikidataId "Q1786933" ; skos:prefLabel "Mansion House"@en ; heritage:refersToPlace ; heritage:refersToOrganization ; heritage:refersToCollection ; skos:broader . ``` ```typeql # TypeQL $mansion-house isa name, has name-id "https://w3id.org/heritage/name/Q1786933", has wikidata-id "Q1786933", has pref-label "Mansion House"; (referencing-name: $mansion-house, referenced-place: $place) isa name-reference; (referencing-name: $mansion-house, referenced-organization: $org) isa name-reference; (referencing-name: $mansion-house, referenced-collection: $coll) isa name-reference; ``` ### Temporal Name Chains **Example: Dutch Archive Merger (2001)** ```turtle # Name 1: Gemeentearchief Haarlem (1910-2001) a heritage:Name ; skos:prefLabel "Gemeentearchief Haarlem"@nl ; schema:validFrom "1910-01-01"^^xsd:date ; schema:validUntil "2001-01-01"^^xsd:date ; heritage:replacedBy . # Name 2: Noord-Hollands Archief (2001-present) a heritage:Name ; skos:prefLabel "Noord-Hollands Archief"@nl ; schema:validFrom "2001-01-01"^^xsd:date ; heritage:replaces . # Change Event a heritage:NameChange ; heritage:oldName ; heritage:newName ; heritage:changeDate "2001-01-01"^^xsd:date ; heritage:changeType "MERGER" . ``` --- ## 📊 UML Format Selection Based on Exa research and industry standards: | Format | Best For | Pros | Cons | Selected? | |--------|----------|------|------|-----------| | **Mermaid** | GitHub docs, quick diagrams | Simple syntax, auto-renders in GitHub, Markdown integration | Limited UML features, basic styling | ✅ YES | | **PlantUML** | Comprehensive UML, technical docs | Full UML 2.5 support, rich annotations, mature ecosystem | Requires rendering step, verbose syntax | ✅ YES | | **C4 Model** | System architecture, context diagrams | Software architecture focus, hierarchical levels | Not for data modeling, no class diagrams | ❌ NO (not applicable) | | **TypeDB TypeQL** | Knowledge graph database | Built-in reasoning, graph queries, ACID transactions | Specialized syntax, requires TypeDB | ✅ YES | | **Archimate** | Enterprise architecture | Business/IT alignment, stakeholder views | Heavyweight, not for data modeling | ❌ NO | **Decision**: Use **Mermaid** (quick docs) + **PlantUML** (detailed UML) + **TypeQL** (executable schema) --- ## 🔄 Workflow Comparison ### Old Workflow (Bottom-Up) ``` For each of 2,453 entities: 1. Read Wikidata metadata 2. Analyze hypernyms 3. Search DBpedia mappings 4. Design multi-aspect model 5. Write YAML ontology mapping 6. Validate Estimated time: 2,400+ sessions (20 min/entity × 2,453 entities) ``` ### New Workflow (Top-Down) ``` Phase 1: Design Core Patterns (1-2 sessions) ✅ COMPLETE - Define Name entity - Define multi-aspect pattern - Create 4 schema formats Phase 2: Extract Hypernym Taxonomy (1 session) ⏳ NEXT - Parse hyponyms_curated.yaml - Extract unique hypernyms (~20 categories) - Create HypernymConcept entities Phase 3: Map Hypernyms to Ontology (1-2 sessions) - building → crm:E27_Site - organisation → cpov:PublicOrganisation - museum → schema:Museum + dbo:Museum - etc. Phase 4: Define Entity Modules (3-4 sessions) - Place entity module - Organization entity module - Collection entity module Phase 5: Batch Convert (1 session) - Script: convert_wikidata_to_names.py - Process all 2,453 entities automatically - Output: LinkML instances Total estimated time: 7-10 sessions (vs. 2,400+ sessions) Efficiency gain: ~240x faster ``` --- ## 📚 Documentation Created 1. **README.md** (5,000+ words) - Design rationale - Ontological justification - Implementation patterns - Temporal modeling examples - Next steps roadmap 2. **LinkML Schema** (400 lines) - Class + 24 slots - SKOS alignment - Multi-aspect properties - Temporal validity - Provenance tracking 3. **Mermaid Diagram** (70 lines) - Class diagram - Relationships - Notes 4. **PlantUML Diagram** (250+ lines) - Detailed UML - Color-coded ontologies - Extensive annotations - Design rationale notes 5. **TypeQL Schema** (300+ lines) - PERA model entities - 5 relation types - 20+ attributes - 3 reasoning rules 6. **RDF/OWL Ontology** (400+ lines) - OWL classes - Object properties - Datatype properties - SHACL constraints - PROV-O integration **Total Documentation**: ~1,500 lines of schema + 5,000 words of explanation --- ## 🎓 Key Design Decisions ### Decision 1: Single Name Entity (Not Split) **Rejected Approach**: Separate `PlaceName` and `OrganizationName` classes **Rationale**: - Many names refer to BOTH place AND organization - Splitting creates ambiguity and duplication - Violates Wikidata Q82799 (name is a nominal reference, not typed) - Harder to track name changes (which entity gets the new name?) **Chosen Approach**: Single `Name` class with multi-aspect references --- ### Decision 2: SKOS as Primary Alignment **Options Considered**: - `crm:E41_Appellation` (CIDOC-CRM) - `schema:name` (property, not class) - `owl:Thing` (too generic) - `skos:Concept` ← **CHOSEN** **Rationale**: - SKOS provides hierarchical structure (broader/narrower) - Multilingual support (prefLabel, altLabel with language tags) - Temporal validity (via Schema.org properties) - Cross-vocabulary mapping (exactMatch, closeMatch) - Heritage domain standard (used in museum/library thesauri) --- ### Decision 3: Multi-Aspect via Properties (Not Inheritance) **Rejected Approach**: Subclass Name into `PlaceName`, `OrganizationName`, etc. **Rationale**: - OOP inheritance forces single-type classification - Real-world: names simultaneously reference multiple aspects - Subclassing creates redundancy (same name duplicated in multiple classes) **Chosen Approach**: Single `Name` class with aspect reference properties ```yaml refers_to_place: Place[] # 0 or more places refers_to_organization: Organization[] # 0 or more organizations refers_to_collection: Collection[] # 0 or more collections ``` --- ### Decision 4: Temporal Independence **Principle**: Name, Place, Organization, Collection have **independent lifespans** **Example**: - Place (building): 1753 → Present (271 years) - Organization (custodian): 1753 → Present (271 years) - Name #1: 1753 → 1850 (97 years) "Mansion House" - Name #2: 1850 → 2001 (151 years) "The Mansion House" - Name #3: 2001 → Present (23 years) "Lord Mayor's Official Residence" **Implementation**: - Each entity tracks its own `valid_from` / `valid_to` - Name changes via `replaces` / `replaced_by` properties - Organization persists across name changes (same entity ID) --- ## 🚀 Impact & Benefits ### Immediate Benefits 1. **Clarity**: Clear separation between linguistic identifiers and entities 2. **Flexibility**: Multi-aspect modeling handles complex real-world cases 3. **Consistency**: Single pattern applied to all 2,453 entities 4. **Interoperability**: 4 schema formats ensure tool compatibility ### Medium-Term Benefits 4. **Efficiency**: Batch conversion ~240x faster than one-by-one enrichment 5. **Scalability**: Pattern-based approach extends to new hypernyms easily 6. **Reasoning**: TypeDB rules infer relationships automatically 7. **Linked Data**: RDF export enables SPARQL queries, federated search ### Long-Term Benefits 8. **Maintenance**: Schema changes propagate to all instances via patterns 9. **Evolution**: Ontology can expand without breaking existing data 10. **Community**: Standard formats enable external contributions 11. **Research**: Knowledge graph enables novel heritage research queries --- ## 📋 Next Steps ### Immediate (Session 3) - **TOP PRIORITY** **Task**: Extract Hypernym Taxonomy from `hyponyms_curated.yaml` **Script**: `scripts/extract_hypernyms_taxonomy.py` **Process**: 1. Parse `data/wikidata/GLAMORCUBEPSXHFN/hyponyms_curated.yaml` 2. Extract unique values from `hypernym:` field 3. Count frequency of each hypernym 4. Create `data/ontology/hypernym_taxonomy.yaml` with: ```yaml - hypernym: building count: 417 wikidata_id: Q41176 dbpedia_class: dbo:Building - hypernym: organisation count: 193 wikidata_id: Q43229 dbpedia_class: dbo:Organisation ``` **Expected Output**: - ~20-30 unique hypernyms - Frequency distribution (most common: building, organisation, museum) - Foundation for ontology class mapping --- ### Medium-Term (This Week) **Task 2**: Map Hypernyms to Ontology Classes **Module**: `schemas/20251121/linkml/02_hypernym_taxonomy.yaml` **Content**: - `HypernymConcept` class definitions - Ontology mappings for each hypernym: - building → `crm:E27_Site` + `dbo:Building` - organisation → `cpov:PublicOrganisation` + `schema:Organization` - museum → `schema:Museum` + `dbo:Museum` - archive → `rico:CorporateBody` + `dbo:Archive` **Task 3**: Create Place, Organization, Collection Entity Modules **Modules**: - `03_place_entity.yaml` (spatial aspect) - `04_organization_entity.yaml` (custodian aspect) - `05_collection_entity.yaml` (heritage materials aspect) **Each module includes**: - LinkML schema - Mermaid diagram - PlantUML diagram - TypeQL schema - RDF/OWL ontology --- ### Long-Term (Next Month) **Task 4**: Batch Convert Wikidata Entities **Script**: `scripts/convert_wikidata_to_names.py` **Input**: `data/wikidata/GLAMORCUBEPSXHFN/hyponyms_curated_full.yaml` (2,453 entities) **Output**: `data/instances/names/*.yaml` (LinkML instances, 1 per entity) **Process**: - For each Wikidata entity: - Extract label → `prefLabel` - Extract aliases → `altLabel` - Extract hypernym → link to HypernymConcept - Generate ID → `https://w3id.org/heritage/name/Q[NUMBER]` - Add provenance → `source`, `created`, `wikidata_id` **Task 5**: Load into TypeDB Knowledge Graph **Commands**: ```bash # Start TypeDB typedb server # Load schema typedb console --script schemas/20251121/typeql/01_name_entity_hub.tql # Load instances python scripts/load_instances_to_typedb.py ``` **Task 6**: Export to RDF Triple Store **Process**: - Convert LinkML instances to RDF/Turtle - Load into GraphDB / Virtuoso / Blazegraph - Create SPARQL endpoint - Publish as Linked Open Data --- ## ✅ Session Completion Checklist - [x] Research UML formats (Mermaid, PlantUML, C4, TypeDB) - [x] Design Name entity as central hub - [x] Create LinkML schema (01_name_entity.yaml) - [x] Create Mermaid diagram (01_name_entity_hub.mmd) - [x] Create PlantUML diagram (01_name_entity_hub.puml) - [x] Create TypeQL schema (01_name_entity_hub.tql) - [x] Create RDF/OWL ontology (01_name_entity_hub.ttl) - [x] Validate LinkML schema (YAML syntax) - [x] Document design rationale (README.md, 5,000+ words) - [x] Define multi-aspect pattern - [x] Define temporal name chains - [x] Document next steps (hypernym extraction) - [ ] ⏳ Extract hypernym taxonomy (next session) - [ ] ⏳ Map hypernyms to ontology classes --- ## 📊 Progress Metrics ### Overall Project Progress | Metric | Count | Status | |--------|-------|--------| | **Wikidata Entities** | 2,453 | Pending batch conversion | | **Name Entity Schema** | 1 module | ✅ COMPLETE | | **Schema Formats** | 4 (LinkML, Mermaid, PlantUML, TypeQL, RDF) | ✅ COMPLETE | | **Classes Defined** | 1 (Name) | ✅ COMPLETE | | **Properties Defined** | 24 slots | ✅ COMPLETE | | **Reasoning Rules** | 3 (TypeQL) | ✅ COMPLETE | | **Documentation** | 6,500+ words | ✅ COMPLETE | ### Efficiency Gain - **Old Approach**: 2,400+ sessions (5 entities done, 2,448 remaining) - **New Approach**: ~10 sessions (foundation + hypernym mapping + entity modules + batch conversion) - **Efficiency Gain**: **240x faster** 🚀 --- ## 📚 References ### Standards - [SKOS Reference](https://www.w3.org/TR/skos-reference/) - [CIDOC-CRM v7.1.3](http://www.cidoc-crm.org/) - [CPOV](https://joinup.ec.europa.eu/collection/semantic-interoperability-community-semic/solution/core-public-organisation-vocabulary) - [Schema.org](https://schema.org/) - [PROV-O](https://www.w3.org/TR/prov-o/) - [Wikidata Q82799](https://www.wikidata.org/wiki/Q82799) ### Tools - [LinkML](https://linkml.io/linkml/) - [Mermaid](https://mermaid.js.org/) - [PlantUML](https://plantuml.com/) - [TypeDB](https://typedb.com/) - [GraphDB](https://graphdb.ontotext.com/) ### Project Files - Schema Dir: `/schemas/20251121/` - LinkML: `linkml/01_name_entity.yaml` - Mermaid: `uml/mermaid/01_name_entity_hub.mmd` - PlantUML: `uml/plantuml/01_name_entity_hub.puml` - TypeQL: `typeql/01_name_entity_hub.tql` - RDF/OWL: `rdf/01_name_entity_hub.ttl` - README: `README.md` --- **Session Status**: ✅ COMPLETE **Next Session Focus**: Extract hypernym taxonomy + map to ontology classes **Overall Strategy**: Top-down ontology design (240x more efficient)