glam/schemas/20251121/typedb
2025-12-15 22:31:41 +01:00
..
01_custodian_minimal_v3.tql enrich all custodian timespan 2025-12-15 22:31:41 +01:00
01_custodian_name.tql updated schemata 2025-11-21 22:12:33 +01:00
01_custodian_name_v3.tql enrich all custodian timespan 2025-12-15 22:31:41 +01:00
README.md updated schemata 2025-11-21 22:12:33 +01:00
TRANSLATION_SUMMARY.md updated schemata 2025-11-21 22:12:33 +01:00
TYPEDB_3X_MIGRATION.md updated schemata 2025-11-21 22:12:33 +01:00

TypeDB Schema Translation

Status: COMPLETED - MANUAL TRANSLATION
Last Updated: 2025-11-21
Current LinkML Schema: ../linkml/01_custodian_name_modular.yaml
TypeDB Version: 3.7.x
TypeDB Schema: 01_custodian_name_v3.tql (TypeDB 3.x) | 01_custodian_name.tql (TypeDB 2.x - legacy)


Overview

TypeDB schemas (.tql files) cannot be automatically generated from LinkML. They require manual translation by a TypeDB expert.

This schema has been manually translated and is ready for testing.


Why TypeDB Can't Be Auto-Generated

1. Different Type Systems

LinkML uses classes, slots, and enums:

classes:
  CustodianObservation:
    slots:
      - observed_name
      - source

TypeDB uses entities, attributes, and relations:

define
  custodian-observation sub entity,
    owns observed-name,
    plays observation-source:observation;

Mapping is non-trivial and requires design decisions.

2. TypeDB Has Unique Features

TypeDB supports features LinkML doesn't model:

  • Rules for inference: when { ... } then { ... }
  • Relation types: Many-to-many with roles
  • Role playing: Entities play roles in relations
  • Polymorphic queries: Type hierarchy inference

3. Semantic Choices Required

Translating LinkML → TypeDB requires human decisions:

  • Which slots become relations vs attributes?
  • How to model inheritance hierarchies?
  • What inference rules to define?
  • How to optimize for query patterns?

Manual Translation Process

Step 1: Map LinkML Classes to TypeDB Entities

# LinkML (source)
CustodianObservation:
  description: Source-based reference to heritage custodian
  slots:
    - observed_name
    - source
# TypeDB (target - requires manual design)
define

  # Entity type
  custodian-observation sub entity,
    abstract,
    owns observed-name,
    plays source-citation:observation;
  
  # Relation type (design decision: source is a relation, not attribute)
  source-citation sub relation,
    relates observation,
    relates document;

Step 2: Map LinkML Slots to TypeDB Attributes/Relations

LinkML:

slots:
  observed_name:
    range: string
    required: true
  
  source:
    range: SourceDocument
    multivalued: true

TypeDB (design choice: attribute vs relation):

# observed_name → attribute (simple type)
observed-name sub attribute, value string;

# source → relation (complex type)
source-citation sub relation,
  relates observation,
  relates document;

Step 3: Define Inference Rules (TypeDB-Specific)

# Example: Infer reconstruction from observations
define

rule observation-implies-reconstruction:
  when {
    $obs isa custodian-observation;
    $obs has observed-name $name;
    $activity (input: $obs, output: $recon) isa reconstruction-activity;
  } then {
    $recon has standardized-name $name;
  };

Archived Files

Previous TypeDB translations (from obsolete schemas) archived to:

../archive/typedb_obsolete/
├── 01_name_entity_hub.tql                         (Nov 21, 09:33)
└── 02_organization_observation_reconstruction.tql (Nov 21, 13:11)

⚠️ These are from old schema versions and should not be used.


Creating a TypeDB Schema for Current Schema

Prerequisites

  • TypeDB 2.x installed
  • TypeDB Studio (for testing)
  • Understanding of current LinkML schema (01_custodian_name_modular.yaml)
  1. Read LinkML Schema:

    cat ../linkml/01_custodian_name_modular.yaml
    cat ../linkml/modules/classes/*.yaml
    
  2. Design TypeDB Entity Model:

    • Map 12 classes to TypeDB entities
    • Decide which slots are attributes vs relations
    • Design relation types for complex relationships
  3. Write TypeDB Schema (01_custodian_name.tql):

    define
    
    # Core entities
    custodian-observation sub entity, ...
    custodian-name sub entity, ...
    custodian-reconstruction sub entity, ...
    
    # Relations
    observation-to-reconstruction sub relation, ...
    
    # Attributes
    observed-name sub attribute, value string;
    
    # Rules
    rule ...: when { ... } then { ... };
    
  4. Test in TypeDB:

    typedb console --script 01_custodian_name.tql
    
  5. Iterate and Refine


References


Why TypeDB?

TypeDB offers unique advantages for heritage custodian data:

  1. Polymorphic Queries: Query across type hierarchies automatically
  2. Rule-Based Inference: Derive reconstructions from observations
  3. Relation Types: Model complex relationships (observation → activity → reconstruction)
  4. Distributed Graph DB: Scale to millions of institutions

Trade-off: Requires manual schema design (can't auto-generate from LinkML)



Schema Files

  • 01_custodian_name_v3.tql - TypeDB 3.x schema with functions (490 lines)
    • Compatible with TypeDB 3.7.x
    • 12 entity types (custodian-observation, custodian-name, custodian-reconstruction, etc.)
    • 30+ attributes (observed-name, legal-name, confidence-value, etc.)
    • 10 relation types (derivation, generation, source-citation, etc.)
    • 7 functions for computed queries (TypeDB 3.x feature)
    • Uses @abstract annotation (TypeDB 3.x syntax)

TypeDB 2.x (Legacy - For Reference Only)

  • 01_custodian_name.tql - TypeDB 2.x schema with inference rules (492 lines)
    • ⚠️ NOT compatible with TypeDB 3.x
    • Uses abstract keyword (TypeDB 2.x syntax)
    • Uses inference rules (deprecated in TypeDB 3.x)
    • Kept for reference and migration comparison

Testing the Schema

Prerequisites

  1. Install TypeDB 3.7.x: https://github.com/typedb/typedb/releases
  2. Install TypeDB Studio (GUI): https://typedb.com/docs/home/install/studio

Load Schema into TypeDB

Option 1: Using TypeDB Console

# Start TypeDB 3.7 server
typedb server

# In another terminal, create database and load schema
typedb console
> database create heritage_custodian
> transaction heritage_custodian schema write
heritage_custodian> source /Users/kempersc/apps/glam/schemas/20251121/typedb/01_custodian_name_v3.tql
heritage_custodian> commit

Option 2: Using TypeDB Studio

  1. Launch TypeDB Studio
  2. Connect to TypeDB 3.7 server (localhost:1729)
  3. Create database "heritage_custodian"
  4. Open 01_custodian_name_v3.tql
  5. Click "Run" to load schema

Validate Schema

# Check schema loaded correctly
typedb console
> transaction heritage_custodian schema read
heritage_custodian> match $x sub entity; get;
heritage_custodian> match $x sub relation; get;
heritage_custodian> match $x sub attribute; get;

Expected output:

  • 12 entity types
  • 10 relation types
  • 30+ attribute types

Example Queries

Once you've loaded the schema, test with sample data:

Q1: Find all observations of "Rijksmuseum"

match
  $obs isa custodian-observation, has observed-name contains "Rijksmuseum";
get;

Q2: Find reconstruction derived from observation

match
  $obs isa custodian-observation, has observed-name "Rijksmuseum Amsterdam";
  (derived-entity: $recon, source-entity: $obs) isa derivation;
get $recon;

Q3: Find all names used by an entity over time

match
  $recon isa custodian-reconstruction, has legal-name $legal;
  (derived-entity: $recon, source-entity: $obs) isa derivation;
  $obs has observed-name $observed;
get $legal, $observed;

Q4: Trace organizational hierarchy (uses transitive inference)

match
  $parent isa custodian-reconstruction, has legal-name "Ministry of Culture";
  (parent: $parent, child: $child) isa organizational-hierarchy;
  $child has legal-name $child-name;
get $child-name;

Q5: Trace name succession over time

match
  $n1 isa custodian-name, has standardized-name "Historical Society";
  (predecessor: $n1, successor: $n2) isa name-succession;
  $n2 has standardized-name $new-name;
get $new-name;

Design Decisions

1. Entities vs Relations

  • Observations and Reconstructions → Entities (they have independent existence)
  • Source Documents → Entities (information objects)
  • Derivation and Generation → Relations (PROV-O provenance links)
  • Appellations and Identifiers → Entities (complex structured objects)

2. Attributes vs Entities

  • Simple strings (observed-name, legal-name) → Attributes (query efficiency)
  • Complex objects (Appellation, Identifier) → Entities (rich metadata)

3. Functions (TypeDB 3.x)

TypeDB 3.x replaces inference rules with FUNCTIONS - reusable computed queries.

The schema includes 7 TypeDB functions:

  1. get-reconstructions-by-observation-name($name) - Find reconstructions by observed name
  2. get-high-confidence-observations() - Return observations with multiple sources
  3. get-entity-names($recon) - Get all historical names for an entity
  4. get-all-descendants($parent) - Recursive organizational hierarchy traversal
  5. get-name-successors($name) - Trace name succession chains over time
  6. get-endorsed-names() - Return only custodian-endorsed (emic) names
  7. get-coreferent-observations($name) - Find observations referring to same entity

Key Difference from TypeDB 2.x:

  • TypeDB 2.x: Rules automatically infer new facts (backward chaining)
  • TypeDB 3.x: Functions are explicitly called in queries (no automatic inference)

4. Ontology Mappings

TypeDB schema maps to:

  • PROV-O: Provenance tracking (wasDerivedFrom, wasGeneratedBy)
  • CIDOC-CRM: E39_Actor, E41_Appellation, E73_Information_Object
  • W3C Org: Organizational hierarchy (subOrganizationOf)
  • Schema.org: Organization, Person
  • PiCo: Person observation/reconstruction pattern

Next Steps

  1. Load schema into TypeDB (see "Testing the Schema" above)
  2. Create sample data - Extract heritage institutions from conversations
  3. Test inference rules - Verify rules derive correct reconstructions
  4. Performance tuning - Optimize queries for large datasets
  5. Integration - Connect TypeDB to LinkML data pipeline

Status: COMPLETED
Priority: Medium (TypeDB is optional - RDF is primary output)
Completed: 2025-11-21 by OpenCode AI agent
Lines of Code: 492 lines
Testing Status: Schema validated, awaiting sample data