Person Identity Classes: - PersonName: Full name modeling with components (given_name, surname_prefix, base_surname, patronym, initials) following Dutch naming conventions - PersonConnection: Professional network connections with heritage relevance scoring - ConnectionNetwork: Network-level analysis and statistics LinkedIn Profile Schema: - LinkedInProfile: Complete professional profile structure - WorkExperience: Employment history with heritage institution detection - EducationCredential: Academic background and qualifications - LanguageProficiency: Language skills with ISO 639-1 codes Supporting Classes: - ExtractionMetadata: Provenance tracking for extracted profile data - HeritageRelevance: GLAMORCUBESFIXPHDNT type scoring and classification Slots (17 person-related slots): - Name components: given_name, base_surname, surname_prefix, patronym, initials - Identity: age, birth_date, birth_place, death_place, gender_identity, pronouns - Professional: occupation, religion - References: literal_name, name_specification, has_person_name, extraction_metadata Enums: - HeritageTypeEnum: GLAMORCUBESFIXPHDNT type codes for heritage relevance
403 lines
12 KiB
YAML
403 lines
12 KiB
YAML
# Connection Network Class
|
|
# Collection of LinkedIn connections with source metadata
|
|
|
|
id: https://nde.nl/ontology/hc/class/ConnectionNetwork
|
|
name: connection_network_class
|
|
title: Connection Network Class
|
|
version: 1.0.0
|
|
|
|
prefixes:
|
|
linkml: https://w3id.org/linkml/
|
|
hc: https://nde.nl/ontology/hc/
|
|
schema: http://schema.org/
|
|
prov: http://www.w3.org/ns/prov#
|
|
dct: http://purl.org/dc/terms/
|
|
xsd: http://www.w3.org/2001/XMLSchema#
|
|
|
|
imports:
|
|
- linkml:types
|
|
- ../metadata
|
|
- ./PersonConnection
|
|
|
|
default_range: string
|
|
|
|
classes:
|
|
|
|
ConnectionNetwork:
|
|
class_uri: schema:ItemList
|
|
tree_root: true
|
|
description: |
|
|
Collection of LinkedIn network connections with source metadata.
|
|
|
|
This is the root class for connection JSON files stored at:
|
|
`data/custodian/person/connection/bu/{linkedin_slug}_connections_{timestamp}.json`
|
|
|
|
Each file contains:
|
|
- **source_metadata**: Provenance about the extraction (who, when, how)
|
|
- **connections**: Array of PersonConnection entries (the actual network data)
|
|
- **network_analysis**: Optional aggregated statistics
|
|
|
|
**Use Cases**:
|
|
- Heritage sector network analysis
|
|
- Cross-custodian relationship discovery
|
|
- Staff member connection patterns
|
|
- Professional community mapping
|
|
|
|
**File Naming Convention**:
|
|
`{linkedin-slug}_connections_{ISO-timestamp}.json`
|
|
|
|
Example: `giovannafossati_connections_20251209T220000Z.json`
|
|
|
|
**Example JSON Structure**:
|
|
```json
|
|
{
|
|
"source_metadata": {
|
|
"source_url": "https://www.linkedin.com/search/results/people/...",
|
|
"scraped_timestamp": "2025-12-09T22:00:00Z",
|
|
"scrape_method": "manual_linkedin_browse",
|
|
"target_profile": "giovannafossati",
|
|
"target_name": "Giovanna Fossati",
|
|
"connections_extracted": 776
|
|
},
|
|
"connections": [
|
|
{ "connection_id": "...", "name": "...", ... }
|
|
],
|
|
"network_analysis": {
|
|
"total_connections_extracted": 776,
|
|
"heritage_relevant_count": 456,
|
|
"heritage_relevant_percentage": 58.8
|
|
}
|
|
}
|
|
```
|
|
|
|
exact_mappings:
|
|
- schema:ItemList
|
|
close_mappings:
|
|
- prov:Collection
|
|
|
|
slots:
|
|
- source_metadata
|
|
- connections
|
|
- network_analysis
|
|
|
|
slot_usage:
|
|
source_metadata:
|
|
description: "Provenance metadata about the connection extraction"
|
|
range: ConnectionSourceMetadata
|
|
required: true
|
|
inlined: true
|
|
|
|
connections:
|
|
description: "Array of connection entries from the LinkedIn network"
|
|
range: PersonConnection
|
|
required: true
|
|
multivalued: true
|
|
inlined: true
|
|
inlined_as_list: true
|
|
|
|
network_analysis:
|
|
description: "Aggregated statistics about the connection network"
|
|
range: NetworkAnalysis
|
|
inlined: true
|
|
|
|
comments:
|
|
- "Root class for connection network JSON files (tree_root: true)"
|
|
- "Per AGENTS.md Rule 15: ALL connections must be fully registered"
|
|
- "Enables heritage sector network analysis"
|
|
- "File naming: {linkedin-slug}_connections_{timestamp}.json"
|
|
|
|
see_also:
|
|
- "https://schema.org/ItemList"
|
|
|
|
ConnectionSourceMetadata:
|
|
class_uri: prov:Activity
|
|
description: |
|
|
Provenance metadata about how the connections were extracted.
|
|
|
|
Records the extraction context including:
|
|
- Source URL (LinkedIn search or profile page)
|
|
- When the extraction occurred
|
|
- Which method was used (manual browse, automated scrape)
|
|
- Target profile being analyzed
|
|
- Count of connections extracted
|
|
|
|
**Scrape Methods**:
|
|
- manual_linkedin_browse: Manual copy-paste while logged in
|
|
- linkedin_html_parser: Parsed from saved HTML file
|
|
- exa_search: Extracted via Exa API
|
|
|
|
exact_mappings:
|
|
- prov:Activity
|
|
|
|
slots:
|
|
- source_url
|
|
- scraped_timestamp
|
|
- scrape_method
|
|
- target_profile
|
|
- target_name
|
|
- connections_extracted
|
|
- notes
|
|
|
|
slot_usage:
|
|
source_url:
|
|
description: |
|
|
URL of the LinkedIn page where connections were extracted from.
|
|
Usually a LinkedIn search results URL or profile connections page.
|
|
slot_uri: prov:used
|
|
range: uri
|
|
required: true
|
|
examples:
|
|
- value: "https://www.linkedin.com/search/results/people/?network=%5B%22F%22%2C%22S%22%2C%22O%22%5D"
|
|
description: "LinkedIn connection search URL"
|
|
|
|
scraped_timestamp:
|
|
description: |
|
|
ISO 8601 timestamp when the connections were extracted.
|
|
Critical for tracking network changes over time.
|
|
slot_uri: prov:endedAtTime
|
|
range: datetime
|
|
required: true
|
|
examples:
|
|
- value: "2025-12-09T22:00:00Z"
|
|
|
|
scrape_method:
|
|
description: |
|
|
Method used to extract the connection data.
|
|
|
|
Values:
|
|
- manual_linkedin_browse: Manual extraction while logged in
|
|
- linkedin_html_parser: Parsed from saved HTML file
|
|
- exa_search: Extracted via Exa API
|
|
slot_uri: prov:wasAssociatedWith
|
|
range: ScrapeMethodEnum
|
|
required: true
|
|
examples:
|
|
- value: "manual_linkedin_browse"
|
|
|
|
target_profile:
|
|
description: |
|
|
LinkedIn slug of the profile whose connections were extracted.
|
|
Format: lowercase alphanumeric with hyphens.
|
|
slot_uri: dct:subject
|
|
range: string
|
|
required: true
|
|
pattern: "^[a-z0-9-]+$"
|
|
examples:
|
|
- value: "giovannafossati"
|
|
- value: "alexandr-belov-bb547b46"
|
|
|
|
target_name:
|
|
description: |
|
|
Full display name of the target profile.
|
|
The person whose connections were extracted.
|
|
slot_uri: schema:name
|
|
range: string
|
|
required: true
|
|
examples:
|
|
- value: "Giovanna Fossati"
|
|
- value: "Alexandr Belov"
|
|
|
|
connections_extracted:
|
|
description: |
|
|
Total number of connections extracted from this source.
|
|
Used for validation and completeness tracking.
|
|
slot_uri: schema:numberOfItems
|
|
range: integer
|
|
required: true
|
|
minimum_value: 0
|
|
examples:
|
|
- value: 776
|
|
|
|
notes:
|
|
description: |
|
|
Optional notes about the extraction process.
|
|
May reference raw source files or explain any issues.
|
|
slot_uri: schema:description
|
|
range: string
|
|
examples:
|
|
- value: "Raw scrape in giovannafossati_connections_20251209T220000Z_note-max100p-1st2nd3th.md"
|
|
|
|
comments:
|
|
- "Aligns with PROV-O Activity pattern"
|
|
- "scraped_timestamp maps to prov:endedAtTime"
|
|
- "target_profile is the LinkedIn slug being analyzed"
|
|
|
|
NetworkAnalysis:
|
|
class_uri: schema:DataFeedItem
|
|
description: |
|
|
Aggregated statistics about the connection network.
|
|
|
|
Provides summary metrics for quick analysis:
|
|
- Total connections extracted
|
|
- Heritage-relevant count and percentage
|
|
- Breakdown by heritage type (GLAMORCUBESFIXPHDNT)
|
|
|
|
**Example**:
|
|
```json
|
|
{
|
|
"total_connections_extracted": 776,
|
|
"heritage_relevant_count": 456,
|
|
"heritage_relevant_percentage": 58.8,
|
|
"connections_by_heritage_type": {
|
|
"A": 45,
|
|
"M": 89,
|
|
"D": 112,
|
|
"R": 78
|
|
}
|
|
}
|
|
```
|
|
|
|
slots:
|
|
- total_connections_extracted
|
|
- heritage_relevant_count
|
|
- heritage_relevant_percentage
|
|
- connections_by_heritage_type
|
|
|
|
slot_usage:
|
|
total_connections_extracted:
|
|
description: "Total number of connections in the network"
|
|
slot_uri: schema:numberOfItems
|
|
range: integer
|
|
required: true
|
|
minimum_value: 0
|
|
|
|
heritage_relevant_count:
|
|
description: "Number of connections marked as heritage-relevant"
|
|
slot_uri: hc:heritageRelevantCount
|
|
range: integer
|
|
required: true
|
|
minimum_value: 0
|
|
|
|
heritage_relevant_percentage:
|
|
description: "Percentage of connections that are heritage-relevant (0-100)"
|
|
slot_uri: hc:heritageRelevantPercentage
|
|
range: float
|
|
minimum_value: 0.0
|
|
maximum_value: 100.0
|
|
examples:
|
|
- value: 58.8
|
|
|
|
connections_by_heritage_type:
|
|
description: |
|
|
Breakdown of heritage-relevant connections by type code.
|
|
Keys are single-letter GLAMORCUBESFIXPHDNT codes.
|
|
slot_uri: hc:connectionsByHeritageType
|
|
range: HeritageTypeCount
|
|
multivalued: true
|
|
inlined: true
|
|
inlined_as_list: true
|
|
|
|
comments:
|
|
- "Optional aggregation - can be computed from connections array"
|
|
- "Useful for quick heritage sector analysis"
|
|
|
|
HeritageTypeCount:
|
|
class_uri: schema:PropertyValue
|
|
description: |
|
|
Count of connections for a specific heritage type.
|
|
Used in network_analysis.connections_by_heritage_type.
|
|
|
|
slots:
|
|
- heritage_type_code
|
|
- count
|
|
|
|
slot_usage:
|
|
heritage_type_code:
|
|
description: "Single-letter heritage type code (G,L,A,M,O,R,C,U,B,E,S,F,I,X,P,H,D,N,T)"
|
|
slot_uri: schema:propertyID
|
|
range: string
|
|
required: true
|
|
pattern: "^[GLAMORCUBESFIXPHDNT]$"
|
|
|
|
count:
|
|
description: "Number of connections of this heritage type"
|
|
slot_uri: schema:value
|
|
range: integer
|
|
required: true
|
|
minimum_value: 0
|
|
|
|
enums:
|
|
ScrapeMethodEnum:
|
|
description: |
|
|
Methods used to extract LinkedIn connection data.
|
|
Determines data quality and potential limitations.
|
|
permissible_values:
|
|
manual_linkedin_browse:
|
|
description: "Manual extraction while logged into LinkedIn"
|
|
meaning: prov:SoftwareAgent
|
|
linkedin_html_parser:
|
|
description: "Parsed from saved LinkedIn HTML file"
|
|
meaning: prov:SoftwareAgent
|
|
exa_search:
|
|
description: "Extracted via Exa API search"
|
|
meaning: prov:SoftwareAgent
|
|
automated_scraper:
|
|
description: "Automated scraping tool"
|
|
meaning: prov:SoftwareAgent
|
|
|
|
slots:
|
|
source_metadata:
|
|
description: "Provenance metadata about the extraction"
|
|
range: ConnectionSourceMetadata
|
|
|
|
connections:
|
|
description: "Array of connection entries"
|
|
range: PersonConnection
|
|
multivalued: true
|
|
|
|
network_analysis:
|
|
description: "Aggregated network statistics"
|
|
range: NetworkAnalysis
|
|
|
|
source_url:
|
|
description: "URL where data was extracted from"
|
|
range: uri
|
|
|
|
scraped_timestamp:
|
|
description: "When the extraction occurred"
|
|
range: datetime
|
|
|
|
scrape_method:
|
|
description: "Method used for extraction"
|
|
range: ScrapeMethodEnum
|
|
|
|
target_profile:
|
|
description: "LinkedIn slug of target profile"
|
|
range: string
|
|
|
|
target_name:
|
|
description: "Display name of target profile"
|
|
range: string
|
|
|
|
connections_extracted:
|
|
description: "Number of connections extracted"
|
|
range: integer
|
|
|
|
notes:
|
|
description: "Optional notes about extraction"
|
|
range: string
|
|
|
|
total_connections_extracted:
|
|
description: "Total connection count"
|
|
range: integer
|
|
|
|
heritage_relevant_count:
|
|
description: "Count of heritage-relevant connections"
|
|
range: integer
|
|
|
|
heritage_relevant_percentage:
|
|
description: "Percentage of heritage-relevant connections"
|
|
range: float
|
|
|
|
connections_by_heritage_type:
|
|
description: "Breakdown by heritage type code"
|
|
range: HeritageTypeCount
|
|
multivalued: true
|
|
|
|
heritage_type_code:
|
|
description: "Single-letter heritage type code"
|
|
range: string
|
|
|
|
count:
|
|
description: "Count value"
|
|
range: integer
|