19 KiB
Session Summary: Strategic Pivot to Top-Down Ontology Design
Date: 2025-11-21
Session Focus: Name Entity as Central Hub - Foundation Complete
Status: ✅ COMPLETE
🎯 Major Strategic Pivot
From: Bottom-Up Entity Enrichment (0.20% complete)
Old Approach:
- Enrich 2,453 Wikidata entities one-by-one
- Progress: 5/2,453 entries (0.20%)
- Estimated time: 2,400+ sessions at current pace
To: Top-Down Ontology Design
New Approach:
- Define abstract patterns ONCE (Name, Place, Organization, Collection)
- Extract unique hypernyms from
hyponyms_curated.yaml(~20 top-level categories) - Map hypernyms to ontology classes
- Batch convert all 2,453 entities using patterns
Result: ~100x efficiency gain
🏗️ Core Design: Name as Central Hub
The Insight
Question: Is "Mansion House" a place name or an organization name?
Answer: BOTH - it's a single nominal reference that refers to multiple aspects.
The Solution
Single Name Entity with multi-aspect references:
Name (nominal reference)
├─ refers_to_place → Place (spatial aspect)
├─ refers_to_organization → Organization (custodian aspect)
└─ refers_to_collection → Collection (heritage materials aspect)
Each aspect has independent temporal lifecycle:
- Place: Construction (1753) → Present (271 years)
- Organization: Founding (1753) → Present (271 years)
- Name: "Mansion House" (1753) → Present (same name for 271 years)
- Alternative scenario: Name changes 5 times while Place/Organization persist
Ontological Justification
- Wikidata Q82799: "name" = nominal reference (linguistic identifier), NOT the entity itself
- SKOS: Names are
skos:Conceptwith hierarchical structure - CIDOC-CRM E41: Appellations are distinct from entities they identify
- Temporal Flexibility: Name changes don't require entity recreation
- Multi-Aspect: Single name can reference multiple aspects simultaneously
📁 Deliverables - 4 Schema Formats
1. LinkML Schema (01_name_entity.yaml)
Purpose: Machine-readable foundation
Content:
- Class:
Name(1 entity) - Slots: 24 properties
- Enums: 1 (NameTypeEnum)
- SKOS Alignment:
skos:Concept,skos:prefLabel,skos:broader - Multi-Aspect:
refers_to_place,refers_to_organization,refers_to_collection - Temporal:
valid_from,valid_to,replaces,replaced_by
Validation: ✅ PASSED (YAML syntax valid)
Usage:
# Generate JSON Schema
linkml-convert -s 01_name_entity.yaml -t json-schema
# Generate Python dataclasses
linkml-convert -s 01_name_entity.yaml -t python
# Generate SHACL shapes
linkml-convert -s 01_name_entity.yaml -t shacl
2. Mermaid Diagram (01_name_entity_hub.mmd)
Purpose: GitHub-friendly visual documentation
Content:
- Class diagram with relationships
- Forward references (Place, Organization, Collection)
- SKOS hierarchical relationships (broader/narrower)
- Temporal name chains (replaces/replaced_by)
Features:
- Auto-renders in GitHub
- Embeddable in Markdown docs
- Simple syntax for quick updates
Rendering:

3. PlantUML Diagram (01_name_entity_hub.puml)
Purpose: Comprehensive UML modeling
Content:
- Full UML 2.5 class diagram
- Color-coded by ontology:
- SKOS (#E1F5FE - light blue)
- CIDOC-CRM (#FFF3E0 - light orange)
- CPOV (#F3E5F5 - light purple)
- Schema.org (#E8F5E9 - light green)
- Extensive notes (500+ words of rationale)
- Method signatures
- Cardinality constraints
Rendering:
# Local PlantUML CLI
plantuml 01_name_entity_hub.puml
# PlantUML server
curl -X POST --data-binary @01_name_entity_hub.puml https://www.plantuml.com/plantuml/png
4. TypeQL Schema (01_name_entity_hub.tql)
Purpose: TypeDB knowledge graph database
Content:
- Entity:
name(PERA model) - Relations: 5 types
broader-narrower(SKOS hierarchy)name-reference(multi-aspect connections)name-succession(temporal chains)name-change-event(provenance)hypernym-relationship(taxonomy)
- Attributes: 20+ properties
- Reasoning Rules: 3 inference rules
- Transitive broader/narrower
- Current name detection
- Organization inference from place
Loading:
typedb console --script 01_name_entity_hub.tql
5. RDF/OWL Ontology (01_name_entity_hub.ttl)
Purpose: Semantic Web / Linked Open Data
Content:
- OWL Class:
heritage:Name - OWL Properties: 5 multi-aspect properties
- SKOS Integration: Reuses SKOS vocabulary
- SHACL Constraints: Cardinality, datatypes, patterns
- PROV-O:
heritage:NameChangeactivity - Forward References: Place, Organization, Collection (minimally defined)
Usage:
# Load into GraphDB
curl -X POST -H "Content-Type: text/turtle" --data-binary @01_name_entity_hub.ttl http://localhost:7200/repositories/heritage/statements
# Validate with RDFLib
python -c "from rdflib import Graph; g = Graph(); g.parse('01_name_entity_hub.ttl'); print(len(g))"
🔍 Key Features
Multi-Aspect Pattern
Example: Mansion House (Q1786933)
# LinkML Instance
- id: https://w3id.org/heritage/name/Q1786933
prefLabel: Mansion House
wikidata_id: Q1786933
refers_to_place:
- https://w3id.org/heritage/place/mansion-house-london
refers_to_organization:
- https://w3id.org/heritage/org/lord-mayor-residence
refers_to_collection:
- https://w3id.org/heritage/collection/mansion-house-art
broader:
- https://w3id.org/heritage/name/Q1802963 # mansion concept
# RDF/Turtle
<https://w3id.org/heritage/name/Q1786933> a heritage:Name , skos:Concept ;
heritage:wikidataId "Q1786933" ;
skos:prefLabel "Mansion House"@en ;
heritage:refersToPlace <https://w3id.org/heritage/place/mansion-house-london> ;
heritage:refersToOrganization <https://w3id.org/heritage/org/lord-mayor-residence> ;
heritage:refersToCollection <https://w3id.org/heritage/collection/mansion-house-art> ;
skos:broader <https://w3id.org/heritage/name/Q1802963> .
# TypeQL
$mansion-house isa name,
has name-id "https://w3id.org/heritage/name/Q1786933",
has wikidata-id "Q1786933",
has pref-label "Mansion House";
(referencing-name: $mansion-house, referenced-place: $place) isa name-reference;
(referencing-name: $mansion-house, referenced-organization: $org) isa name-reference;
(referencing-name: $mansion-house, referenced-collection: $coll) isa name-reference;
Temporal Name Chains
Example: Dutch Archive Merger (2001)
# Name 1: Gemeentearchief Haarlem (1910-2001)
<https://w3id.org/heritage/name/gemeentearchief-haarlem> a heritage:Name ;
skos:prefLabel "Gemeentearchief Haarlem"@nl ;
schema:validFrom "1910-01-01"^^xsd:date ;
schema:validUntil "2001-01-01"^^xsd:date ;
heritage:replacedBy <https://w3id.org/heritage/name/noord-hollands-archief> .
# Name 2: Noord-Hollands Archief (2001-present)
<https://w3id.org/heritage/name/noord-hollands-archief> a heritage:Name ;
skos:prefLabel "Noord-Hollands Archief"@nl ;
schema:validFrom "2001-01-01"^^xsd:date ;
heritage:replaces <https://w3id.org/heritage/name/gemeentearchief-haarlem> .
# Change Event
<https://w3id.org/heritage/event/nha-merger-2001> a heritage:NameChange ;
heritage:oldName <https://w3id.org/heritage/name/gemeentearchief-haarlem> ;
heritage:newName <https://w3id.org/heritage/name/noord-hollands-archief> ;
heritage:changeDate "2001-01-01"^^xsd:date ;
heritage:changeType "MERGER" .
📊 UML Format Selection
Based on Exa research and industry standards:
| Format | Best For | Pros | Cons | Selected? |
|---|---|---|---|---|
| Mermaid | GitHub docs, quick diagrams | Simple syntax, auto-renders in GitHub, Markdown integration | Limited UML features, basic styling | ✅ YES |
| PlantUML | Comprehensive UML, technical docs | Full UML 2.5 support, rich annotations, mature ecosystem | Requires rendering step, verbose syntax | ✅ YES |
| C4 Model | System architecture, context diagrams | Software architecture focus, hierarchical levels | Not for data modeling, no class diagrams | ❌ NO (not applicable) |
| TypeDB TypeQL | Knowledge graph database | Built-in reasoning, graph queries, ACID transactions | Specialized syntax, requires TypeDB | ✅ YES |
| Archimate | Enterprise architecture | Business/IT alignment, stakeholder views | Heavyweight, not for data modeling | ❌ NO |
Decision: Use Mermaid (quick docs) + PlantUML (detailed UML) + TypeQL (executable schema)
🔄 Workflow Comparison
Old Workflow (Bottom-Up)
For each of 2,453 entities:
1. Read Wikidata metadata
2. Analyze hypernyms
3. Search DBpedia mappings
4. Design multi-aspect model
5. Write YAML ontology mapping
6. Validate
Estimated time: 2,400+ sessions (20 min/entity × 2,453 entities)
New Workflow (Top-Down)
Phase 1: Design Core Patterns (1-2 sessions) ✅ COMPLETE
- Define Name entity
- Define multi-aspect pattern
- Create 4 schema formats
Phase 2: Extract Hypernym Taxonomy (1 session) ⏳ NEXT
- Parse hyponyms_curated.yaml
- Extract unique hypernyms (~20 categories)
- Create HypernymConcept entities
Phase 3: Map Hypernyms to Ontology (1-2 sessions)
- building → crm:E27_Site
- organisation → cpov:PublicOrganisation
- museum → schema:Museum + dbo:Museum
- etc.
Phase 4: Define Entity Modules (3-4 sessions)
- Place entity module
- Organization entity module
- Collection entity module
Phase 5: Batch Convert (1 session)
- Script: convert_wikidata_to_names.py
- Process all 2,453 entities automatically
- Output: LinkML instances
Total estimated time: 7-10 sessions (vs. 2,400+ sessions)
Efficiency gain: ~240x faster
📚 Documentation Created
-
README.md (5,000+ words)
- Design rationale
- Ontological justification
- Implementation patterns
- Temporal modeling examples
- Next steps roadmap
-
LinkML Schema (400 lines)
- Class + 24 slots
- SKOS alignment
- Multi-aspect properties
- Temporal validity
- Provenance tracking
-
Mermaid Diagram (70 lines)
- Class diagram
- Relationships
- Notes
-
PlantUML Diagram (250+ lines)
- Detailed UML
- Color-coded ontologies
- Extensive annotations
- Design rationale notes
-
TypeQL Schema (300+ lines)
- PERA model entities
- 5 relation types
- 20+ attributes
- 3 reasoning rules
-
RDF/OWL Ontology (400+ lines)
- OWL classes
- Object properties
- Datatype properties
- SHACL constraints
- PROV-O integration
Total Documentation: ~1,500 lines of schema + 5,000 words of explanation
🎓 Key Design Decisions
Decision 1: Single Name Entity (Not Split)
Rejected Approach: Separate PlaceName and OrganizationName classes
Rationale:
- Many names refer to BOTH place AND organization
- Splitting creates ambiguity and duplication
- Violates Wikidata Q82799 (name is a nominal reference, not typed)
- Harder to track name changes (which entity gets the new name?)
Chosen Approach: Single Name class with multi-aspect references
Decision 2: SKOS as Primary Alignment
Options Considered:
crm:E41_Appellation(CIDOC-CRM)schema:name(property, not class)owl:Thing(too generic)skos:Concept← CHOSEN
Rationale:
- SKOS provides hierarchical structure (broader/narrower)
- Multilingual support (prefLabel, altLabel with language tags)
- Temporal validity (via Schema.org properties)
- Cross-vocabulary mapping (exactMatch, closeMatch)
- Heritage domain standard (used in museum/library thesauri)
Decision 3: Multi-Aspect via Properties (Not Inheritance)
Rejected Approach: Subclass Name into PlaceName, OrganizationName, etc.
Rationale:
- OOP inheritance forces single-type classification
- Real-world: names simultaneously reference multiple aspects
- Subclassing creates redundancy (same name duplicated in multiple classes)
Chosen Approach: Single Name class with aspect reference properties
refers_to_place: Place[] # 0 or more places
refers_to_organization: Organization[] # 0 or more organizations
refers_to_collection: Collection[] # 0 or more collections
Decision 4: Temporal Independence
Principle: Name, Place, Organization, Collection have independent lifespans
Example:
- Place (building): 1753 → Present (271 years)
- Organization (custodian): 1753 → Present (271 years)
- Name #1: 1753 → 1850 (97 years) "Mansion House"
- Name #2: 1850 → 2001 (151 years) "The Mansion House"
- Name #3: 2001 → Present (23 years) "Lord Mayor's Official Residence"
Implementation:
- Each entity tracks its own
valid_from/valid_to - Name changes via
replaces/replaced_byproperties - Organization persists across name changes (same entity ID)
🚀 Impact & Benefits
Immediate Benefits
- Clarity: Clear separation between linguistic identifiers and entities
- Flexibility: Multi-aspect modeling handles complex real-world cases
- Consistency: Single pattern applied to all 2,453 entities
- Interoperability: 4 schema formats ensure tool compatibility
Medium-Term Benefits
- Efficiency: Batch conversion ~240x faster than one-by-one enrichment
- Scalability: Pattern-based approach extends to new hypernyms easily
- Reasoning: TypeDB rules infer relationships automatically
- Linked Data: RDF export enables SPARQL queries, federated search
Long-Term Benefits
- Maintenance: Schema changes propagate to all instances via patterns
- Evolution: Ontology can expand without breaking existing data
- Community: Standard formats enable external contributions
- Research: Knowledge graph enables novel heritage research queries
📋 Next Steps
Immediate (Session 3) - TOP PRIORITY
Task: Extract Hypernym Taxonomy from hyponyms_curated.yaml
Script: scripts/extract_hypernyms_taxonomy.py
Process:
- Parse
data/wikidata/GLAMORCUBEPSXHFN/hyponyms_curated.yaml - Extract unique values from
hypernym:field - Count frequency of each hypernym
- Create
data/ontology/hypernym_taxonomy.yamlwith:- hypernym: building count: 417 wikidata_id: Q41176 dbpedia_class: dbo:Building - hypernym: organisation count: 193 wikidata_id: Q43229 dbpedia_class: dbo:Organisation
Expected Output:
- ~20-30 unique hypernyms
- Frequency distribution (most common: building, organisation, museum)
- Foundation for ontology class mapping
Medium-Term (This Week)
Task 2: Map Hypernyms to Ontology Classes
Module: schemas/20251121/linkml/02_hypernym_taxonomy.yaml
Content:
HypernymConceptclass definitions- Ontology mappings for each hypernym:
- building →
crm:E27_Site+dbo:Building - organisation →
cpov:PublicOrganisation+schema:Organization - museum →
schema:Museum+dbo:Museum - archive →
rico:CorporateBody+dbo:Archive
- building →
Task 3: Create Place, Organization, Collection Entity Modules
Modules:
03_place_entity.yaml(spatial aspect)04_organization_entity.yaml(custodian aspect)05_collection_entity.yaml(heritage materials aspect)
Each module includes:
- LinkML schema
- Mermaid diagram
- PlantUML diagram
- TypeQL schema
- RDF/OWL ontology
Long-Term (Next Month)
Task 4: Batch Convert Wikidata Entities
Script: scripts/convert_wikidata_to_names.py
Input: data/wikidata/GLAMORCUBEPSXHFN/hyponyms_curated_full.yaml (2,453 entities)
Output: data/instances/names/*.yaml (LinkML instances, 1 per entity)
Process:
- For each Wikidata entity:
- Extract label →
prefLabel - Extract aliases →
altLabel - Extract hypernym → link to HypernymConcept
- Generate ID →
https://w3id.org/heritage/name/Q[NUMBER] - Add provenance →
source,created,wikidata_id
- Extract label →
Task 5: Load into TypeDB Knowledge Graph
Commands:
# Start TypeDB
typedb server
# Load schema
typedb console --script schemas/20251121/typeql/01_name_entity_hub.tql
# Load instances
python scripts/load_instances_to_typedb.py
Task 6: Export to RDF Triple Store
Process:
- Convert LinkML instances to RDF/Turtle
- Load into GraphDB / Virtuoso / Blazegraph
- Create SPARQL endpoint
- Publish as Linked Open Data
✅ Session Completion Checklist
- Research UML formats (Mermaid, PlantUML, C4, TypeDB)
- Design Name entity as central hub
- Create LinkML schema (01_name_entity.yaml)
- Create Mermaid diagram (01_name_entity_hub.mmd)
- Create PlantUML diagram (01_name_entity_hub.puml)
- Create TypeQL schema (01_name_entity_hub.tql)
- Create RDF/OWL ontology (01_name_entity_hub.ttl)
- Validate LinkML schema (YAML syntax)
- Document design rationale (README.md, 5,000+ words)
- Define multi-aspect pattern
- Define temporal name chains
- Document next steps (hypernym extraction)
- ⏳ Extract hypernym taxonomy (next session)
- ⏳ Map hypernyms to ontology classes
📊 Progress Metrics
Overall Project Progress
| Metric | Count | Status |
|---|---|---|
| Wikidata Entities | 2,453 | Pending batch conversion |
| Name Entity Schema | 1 module | ✅ COMPLETE |
| Schema Formats | 4 (LinkML, Mermaid, PlantUML, TypeQL, RDF) | ✅ COMPLETE |
| Classes Defined | 1 (Name) | ✅ COMPLETE |
| Properties Defined | 24 slots | ✅ COMPLETE |
| Reasoning Rules | 3 (TypeQL) | ✅ COMPLETE |
| Documentation | 6,500+ words | ✅ COMPLETE |
Efficiency Gain
- Old Approach: 2,400+ sessions (5 entities done, 2,448 remaining)
- New Approach: ~10 sessions (foundation + hypernym mapping + entity modules + batch conversion)
- Efficiency Gain: 240x faster 🚀
📚 References
Standards
Tools
Project Files
- Schema Dir:
/schemas/20251121/ - LinkML:
linkml/01_name_entity.yaml - Mermaid:
uml/mermaid/01_name_entity_hub.mmd - PlantUML:
uml/plantuml/01_name_entity_hub.puml - TypeQL:
typeql/01_name_entity_hub.tql - RDF/OWL:
rdf/01_name_entity_hub.ttl - README:
README.md
Session Status: ✅ COMPLETE
Next Session Focus: Extract hypernym taxonomy + map to ontology classes
Overall Strategy: Top-down ontology design (240x more efficient)