# Connection Network Class # Collection of LinkedIn connections with source metadata id: https://nde.nl/ontology/hc/class/ConnectionNetwork name: connection_network_class title: Connection Network Class version: 1.0.0 prefixes: linkml: https://w3id.org/linkml/ hc: https://nde.nl/ontology/hc/ schema: http://schema.org/ prov: http://www.w3.org/ns/prov# dct: http://purl.org/dc/terms/ xsd: http://www.w3.org/2001/XMLSchema# imports: - linkml:types - ../metadata - ./PersonConnection default_range: string classes: ConnectionNetwork: class_uri: schema:ItemList tree_root: true description: | Collection of LinkedIn network connections with source metadata. This is the root class for connection JSON files stored at: `data/custodian/person/connection/bu/{linkedin_slug}_connections_{timestamp}.json` Each file contains: - **source_metadata**: Provenance about the extraction (who, when, how) - **connections**: Array of PersonConnection entries (the actual network data) - **network_analysis**: Optional aggregated statistics **Use Cases**: - Heritage sector network analysis - Cross-custodian relationship discovery - Staff member connection patterns - Professional community mapping **File Naming Convention**: `{linkedin-slug}_connections_{ISO-timestamp}.json` Example: `giovannafossati_connections_20251209T220000Z.json` **Example JSON Structure**: ```json { "source_metadata": { "source_url": "https://www.linkedin.com/search/results/people/...", "scraped_timestamp": "2025-12-09T22:00:00Z", "scrape_method": "manual_linkedin_browse", "target_profile": "giovannafossati", "target_name": "Giovanna Fossati", "connections_extracted": 776 }, "connections": [ { "connection_id": "...", "name": "...", ... } ], "network_analysis": { "total_connections_extracted": 776, "heritage_relevant_count": 456, "heritage_relevant_percentage": 58.8 } } ``` exact_mappings: - schema:ItemList close_mappings: - prov:Collection slots: - source_metadata - connections - network_analysis slot_usage: source_metadata: description: "Provenance metadata about the connection extraction" range: ConnectionSourceMetadata required: true inlined: true connections: description: "Array of connection entries from the LinkedIn network" range: PersonConnection required: true multivalued: true inlined: true inlined_as_list: true network_analysis: description: "Aggregated statistics about the connection network" range: NetworkAnalysis inlined: true comments: - "Root class for connection network JSON files (tree_root: true)" - "Per AGENTS.md Rule 15: ALL connections must be fully registered" - "Enables heritage sector network analysis" - "File naming: {linkedin-slug}_connections_{timestamp}.json" see_also: - "https://schema.org/ItemList" ConnectionSourceMetadata: class_uri: prov:Activity description: | Provenance metadata about how the connections were extracted. Records the extraction context including: - Source URL (LinkedIn search or profile page) - When the extraction occurred - Which method was used (manual browse, automated scrape) - Target profile being analyzed - Count of connections extracted **Scrape Methods**: - manual_linkedin_browse: Manual copy-paste while logged in - linkedin_html_parser: Parsed from saved HTML file - exa_search: Extracted via Exa API exact_mappings: - prov:Activity slots: - source_url - scraped_timestamp - scrape_method - target_profile - target_name - connections_extracted - notes slot_usage: source_url: description: | URL of the LinkedIn page where connections were extracted from. Usually a LinkedIn search results URL or profile connections page. slot_uri: prov:used range: uri required: true examples: - value: "https://www.linkedin.com/search/results/people/?network=%5B%22F%22%2C%22S%22%2C%22O%22%5D" description: "LinkedIn connection search URL" scraped_timestamp: description: | ISO 8601 timestamp when the connections were extracted. Critical for tracking network changes over time. slot_uri: prov:endedAtTime range: datetime required: true examples: - value: "2025-12-09T22:00:00Z" scrape_method: description: | Method used to extract the connection data. Values: - manual_linkedin_browse: Manual extraction while logged in - linkedin_html_parser: Parsed from saved HTML file - exa_search: Extracted via Exa API slot_uri: prov:wasAssociatedWith range: ScrapeMethodEnum required: true examples: - value: "manual_linkedin_browse" target_profile: description: | LinkedIn slug of the profile whose connections were extracted. Format: lowercase alphanumeric with hyphens. slot_uri: dct:subject range: string required: true pattern: "^[a-z0-9-]+$" examples: - value: "giovannafossati" - value: "alexandr-belov-bb547b46" target_name: description: | Full display name of the target profile. The person whose connections were extracted. slot_uri: schema:name range: string required: true examples: - value: "Giovanna Fossati" - value: "Alexandr Belov" connections_extracted: description: | Total number of connections extracted from this source. Used for validation and completeness tracking. slot_uri: schema:numberOfItems range: integer required: true minimum_value: 0 examples: - value: 776 notes: description: | Optional notes about the extraction process. May reference raw source files or explain any issues. slot_uri: schema:description range: string examples: - value: "Raw scrape in giovannafossati_connections_20251209T220000Z_note-max100p-1st2nd3th.md" comments: - "Aligns with PROV-O Activity pattern" - "scraped_timestamp maps to prov:endedAtTime" - "target_profile is the LinkedIn slug being analyzed" NetworkAnalysis: class_uri: schema:DataFeedItem description: | Aggregated statistics about the connection network. Provides summary metrics for quick analysis: - Total connections extracted - Heritage-relevant count and percentage - Breakdown by heritage type (GLAMORCUBESFIXPHDNT) **Example**: ```json { "total_connections_extracted": 776, "heritage_relevant_count": 456, "heritage_relevant_percentage": 58.8, "connections_by_heritage_type": { "A": 45, "M": 89, "D": 112, "R": 78 } } ``` slots: - total_connections_extracted - heritage_relevant_count - heritage_relevant_percentage - connections_by_heritage_type slot_usage: total_connections_extracted: description: "Total number of connections in the network" slot_uri: schema:numberOfItems range: integer required: true minimum_value: 0 heritage_relevant_count: description: "Number of connections marked as heritage-relevant" slot_uri: hc:heritageRelevantCount range: integer required: true minimum_value: 0 heritage_relevant_percentage: description: "Percentage of connections that are heritage-relevant (0-100)" slot_uri: hc:heritageRelevantPercentage range: float minimum_value: 0.0 maximum_value: 100.0 examples: - value: 58.8 connections_by_heritage_type: description: | Breakdown of heritage-relevant connections by type code. Keys are single-letter GLAMORCUBESFIXPHDNT codes. slot_uri: hc:connectionsByHeritageType range: HeritageTypeCount multivalued: true inlined: true inlined_as_list: true comments: - "Optional aggregation - can be computed from connections array" - "Useful for quick heritage sector analysis" HeritageTypeCount: class_uri: schema:PropertyValue description: | Count of connections for a specific heritage type. Used in network_analysis.connections_by_heritage_type. slots: - heritage_type_code - count slot_usage: heritage_type_code: description: "Single-letter heritage type code (G,L,A,M,O,R,C,U,B,E,S,F,I,X,P,H,D,N,T)" slot_uri: schema:propertyID range: string required: true pattern: "^[GLAMORCUBESFIXPHDNT]$" count: description: "Number of connections of this heritage type" slot_uri: schema:value range: integer required: true minimum_value: 0 enums: ScrapeMethodEnum: description: | Methods used to extract LinkedIn connection data. Determines data quality and potential limitations. permissible_values: manual_linkedin_browse: description: "Manual extraction while logged into LinkedIn" meaning: prov:SoftwareAgent linkedin_html_parser: description: "Parsed from saved LinkedIn HTML file" meaning: prov:SoftwareAgent exa_search: description: "Extracted via Exa API search" meaning: prov:SoftwareAgent automated_scraper: description: "Automated scraping tool" meaning: prov:SoftwareAgent slots: source_metadata: description: "Provenance metadata about the extraction" range: ConnectionSourceMetadata connections: description: "Array of connection entries" range: PersonConnection multivalued: true network_analysis: description: "Aggregated network statistics" range: NetworkAnalysis source_url: description: "URL where data was extracted from" range: uri scraped_timestamp: description: "When the extraction occurred" range: datetime scrape_method: description: "Method used for extraction" range: ScrapeMethodEnum target_profile: description: "LinkedIn slug of target profile" range: string target_name: description: "Display name of target profile" range: string connections_extracted: description: "Number of connections extracted" range: integer notes: description: "Optional notes about extraction" range: string total_connections_extracted: description: "Total connection count" range: integer heritage_relevant_count: description: "Count of heritage-relevant connections" range: integer heritage_relevant_percentage: description: "Percentage of heritage-relevant connections" range: float connections_by_heritage_type: description: "Breakdown by heritage type code" range: HeritageTypeCount multivalued: true heritage_type_code: description: "Single-letter heritage type code" range: string count: description: "Count value" range: integer