glam/schemas/20251121/linkml/modules/classes/ConnectionNetwork.yaml
kempersc dfa667c90f Fix LinkML schema for valid RDF generation with proper slot_uri
Summary:
- Create 46 missing slot definition files with proper slot_uri values
- Add slot imports to main schema (01_custodian_name_modular.yaml)
- Fix YAML examples sections in 116+ class and slot files
- Fix PersonObservation.yaml examples section (nested objects → string literals)

Technical changes:
- All slots now have explicit slot_uri mapping to base ontologies (RiC-O, Schema.org, SKOS)
- Eliminates malformed URIs like 'custodian/:slot_name' in generated RDF
- gen-owl now produces valid Turtle with 153,166 triples

New slot files (46):
- RiC-O slots: rico_note, rico_organizational_principle, rico_has_or_had_holder, etc.
- Scope slots: scope_includes, scope_excludes, archive_scope
- Organization slots: organization_type, governance_authority, area_served
- Platform slots: platform_type_category, portal_type_category
- Social media slots: social_media_platform_category, post_type_*
- Type hierarchy slots: broader_type, narrower_types, custodian_type_broader
- Wikidata slots: wikidata_equivalent, wikidata_mapping

Generated output:
- schemas/20251121/rdf/01_custodian_name_modular_20260107_134534_clean.owl.ttl (6.9MB)
- Validated with rdflib: 153,166 triples, no malformed URIs
2026-01-07 13:48:03 +01:00

357 lines
11 KiB
YAML

id: https://nde.nl/ontology/hc/class/ConnectionNetwork
name: connection_network_class
title: Connection Network Class
version: 1.0.0
prefixes:
linkml: https://w3id.org/linkml/
hc: https://nde.nl/ontology/hc/
schema: http://schema.org/
prov: http://www.w3.org/ns/prov#
dct: http://purl.org/dc/terms/
xsd: http://www.w3.org/2001/XMLSchema#
imports:
- linkml:types
- ../metadata
- ./PersonConnection
- ../slots/notes
- ../slots/class_metadata_slots
default_range: string
classes:
ConnectionNetwork:
class_uri: schema:ItemList
description: |
Collection of LinkedIn network connections with source metadata.
This is the root class for connection JSON files stored at:
`data/custodian/person/connection/bu/{linkedin_slug}_connections_{timestamp}.json`
Each file contains:
- **source_metadata**: Provenance about the extraction (who, when, how)
- **connections**: Array of PersonConnection entries (the actual network data)
- **network_analysis**: Optional aggregated statistics
**Use Cases**:
- Heritage sector network analysis
- Cross-custodian relationship discovery
- Staff member connection patterns
- Professional community mapping
**File Naming Convention**:
`{linkedin-slug}_connections_{ISO-timestamp}.json`
Example: `giovannafossati_connections_20251209T220000Z.json`
exact_mappings:
- schema:ItemList
close_mappings:
- prov:Collection
slots:
- connections
- network_analysis
- source_metadata
- specificity_annotation
- template_specificity
slot_usage:
source_metadata:
description: Provenance metadata about the connection extraction
range: ConnectionSourceMetadata
required: true
inlined: true
connections:
description: Array of connection entries from the LinkedIn network
range: PersonConnection
required: true
multivalued: true
inlined: true
inlined_as_list: true
network_analysis:
description: Aggregated statistics about the connection network
range: NetworkAnalysis
inlined: true
specificity_annotation:
range: SpecificityAnnotation
inlined: true
template_specificity:
range: TemplateSpecificityScores
inlined: true
comments:
- Root class for connection network JSON files (validated with -C ConnectionNetwork)
- 'Per AGENTS.md Rule 15: ALL connections must be fully registered'
- Enables heritage sector network analysis
- 'File naming: {linkedin-slug}_connections_{timestamp}.json'
see_also:
- https://schema.org/ItemList
ConnectionSourceMetadata:
class_uri: prov:Activity
description: |
Provenance metadata about how the connections were extracted.
Records the extraction context including:
- Source URL (LinkedIn search or profile page)
- When the extraction occurred
- Which method was used (manual browse, automated scrape)
- Target profile being analyzed
- Count of connections extracted
**Scrape Methods**:
- manual_linkedin_browse: Manual copy-paste while logged in
- linkedin_html_parser: Parsed from saved HTML file
- exa_search: Extracted via Exa API
exact_mappings:
- prov:Activity
slots:
- connections_extracted
- notes
- scrape_method
- scraped_timestamp
- source_url
- specificity_annotation
- target_name
- target_profile
- template_specificity
slot_usage:
source_url:
description: |
URL of the LinkedIn page where connections were extracted from.
Usually a LinkedIn search results URL or profile connections page.
slot_uri: prov:used
range: uri
required: true
examples:
- value: https://www.linkedin.com/search/results/people/?network=%5B%22F%22%2C%22S%22%2C%22O%22%5D
description: LinkedIn connection search URL
scraped_timestamp:
description: |
ISO 8601 timestamp when the connections were extracted.
Critical for tracking network changes over time.
slot_uri: prov:endedAtTime
range: datetime
required: true
examples:
- value: '2025-12-09T22:00:00Z'
scrape_method:
description: |
Method used to extract the connection data.
Values:
- manual_linkedin_browse: Manual extraction while logged in
- linkedin_html_parser: Parsed from saved HTML file
- exa_search: Extracted via Exa API
slot_uri: prov:wasAssociatedWith
range: ScrapeMethodEnum
required: true
examples:
- value: manual_linkedin_browse
target_profile:
description: |
LinkedIn slug of the profile whose connections were extracted.
Format: lowercase alphanumeric with hyphens.
slot_uri: dct:subject
range: string
required: true
pattern: ^[a-z0-9-]+$
examples:
- value: giovannafossati
- value: alexandr-belov-bb547b46
target_name:
description: |
Full display name of the target profile.
The person whose connections were extracted.
slot_uri: schema:name
range: string
required: true
examples:
- value: Giovanna Fossati
- value: Alexandr Belov
connections_extracted:
description: |
Total number of connections extracted from this source.
Used for validation and completeness tracking.
slot_uri: schema:numberOfItems
range: integer
required: true
minimum_value: 0
examples:
- value: 776
notes:
description: |
Optional notes about the extraction process.
May reference raw source files or explain any issues.
slot_uri: schema:description
range: string
examples:
- value: Raw scrape in giovannafossati_connections_20251209T220000Z_note-max100p-1st2nd3th.md
specificity_annotation:
range: SpecificityAnnotation
inlined: true
template_specificity:
range: TemplateSpecificityScores
inlined: true
comments:
- Aligns with PROV-O Activity pattern
- scraped_timestamp maps to prov:endedAtTime
- target_profile is the LinkedIn slug being analyzed
NetworkAnalysis:
class_uri: schema:DataFeedItem
description: |
Aggregated statistics about the connection network.
Provides summary metrics for quick analysis:
- Total connections extracted
- Heritage-relevant count and percentage
- Breakdown by heritage type (GLAMORCUBESFIXPHDNT)
**Example**:
```json
{
"total_connections_extracted": 776,
"heritage_relevant_count": 456,
"heritage_relevant_percentage": 58.8,
"connections_by_heritage_type": {
"A": 45,
"M": 89,
"D": 112,
"R": 78
}
}
```
slots:
- connections_by_heritage_type
- heritage_relevant_count
- heritage_relevant_percentage
- specificity_annotation
- template_specificity
- total_connections_extracted
slot_usage:
total_connections_extracted:
description: Total number of connections in the network
slot_uri: schema:numberOfItems
range: integer
required: true
minimum_value: 0
heritage_relevant_count:
description: Number of connections marked as heritage-relevant
slot_uri: hc:heritageRelevantCount
range: integer
required: true
minimum_value: 0
heritage_relevant_percentage:
description: Percentage of connections that are heritage-relevant (0-100)
slot_uri: hc:heritageRelevantPercentage
range: float
minimum_value: 0.0
maximum_value: 100.0
examples:
- value: 58.8
connections_by_heritage_type:
description: |
Breakdown of heritage-relevant connections by type code.
Keys are single-letter GLAMORCUBESFIXPHDNT codes.
slot_uri: hc:connectionsByHeritageType
range: HeritageTypeCount
multivalued: true
inlined: true
inlined_as_list: true
specificity_annotation:
range: SpecificityAnnotation
inlined: true
template_specificity:
range: TemplateSpecificityScores
inlined: true
comments:
- Optional aggregation - can be computed from connections array
- Useful for quick heritage sector analysis
HeritageTypeCount:
class_uri: schema:PropertyValue
description: |
Count of connections for a specific heritage type.
Used in network_analysis.connections_by_heritage_type.
slots:
- count
- heritage_type_code
- specificity_annotation
- template_specificity
slot_usage:
heritage_type_code:
description: Single-letter heritage type code (G,L,A,M,O,R,C,U,B,E,S,F,I,X,P,H,D,N,T)
slot_uri: schema:propertyID
range: string
required: true
pattern: ^[GLAMORCUBESFIXPHDNT]$
count:
description: Number of connections of this heritage type
slot_uri: schema:value
range: integer
required: true
minimum_value: 0
specificity_annotation:
range: SpecificityAnnotation
inlined: true
template_specificity:
range: TemplateSpecificityScores
inlined: true
enums:
ScrapeMethodEnum:
description: |
Methods used to extract LinkedIn connection data.
Determines data quality and potential limitations.
permissible_values:
manual_linkedin_browse:
description: Manual extraction while logged into LinkedIn
meaning: prov:SoftwareAgent
linkedin_html_parser:
description: Parsed from saved LinkedIn HTML file
meaning: prov:SoftwareAgent
exa_search:
description: Extracted via Exa API search
meaning: prov:SoftwareAgent
automated_scraper:
description: Automated scraping tool
meaning: prov:SoftwareAgent
slots:
source_metadata:
description: Provenance metadata about the extraction
range: ConnectionSourceMetadata
connections:
description: Array of connection entries
range: PersonConnection
multivalued: true
network_analysis:
description: Aggregated network statistics
range: NetworkAnalysis
source_url:
description: URL where data was extracted from
range: uri
scraped_timestamp:
description: When the extraction occurred
range: datetime
scrape_method:
description: Method used for extraction
range: ScrapeMethodEnum
target_profile:
description: LinkedIn slug of target profile
range: string
target_name:
description: Display name of target profile
range: string
connections_extracted:
description: Number of connections extracted
range: integer
total_connections_extracted:
description: Total connection count
range: integer
heritage_relevant_count:
description: Count of heritage-relevant connections
range: integer
heritage_relevant_percentage:
description: Percentage of heritage-relevant connections
range: float
connections_by_heritage_type:
description: Breakdown by heritage type code
range: HeritageTypeCount
multivalued: true
heritage_type_code:
description: Single-letter heritage type code
range: string
count:
description: Count value
range: integer