# LinkedIn Profile Privacy Handling Rule ## 🚨 CRITICAL: Store Basic Data for Inaccessible Profiles **When LinkedIn profiles return 403 errors due to privacy settings, store the available basic data with metadata explaining limited enrichment rather than skipping the profile entirely.** ### What Constitutes a 403 Error - HTTP status code 403 from LinkedIn profile URLs - "SOURCE_NOT_AVAILABLE" error tag from EXA API - Profile accessible only to logged-in LinkedIn users - Privacy-protected profiles ### Required Data Structure for 403 Profiles When a profile is inaccessible, create a JSON file with: ```json { "extraction_metadata": { "source_file": "path/to/staff_list", "staff_id": "unique_identifier", "extraction_date": "ISO_timestamp", "extraction_method": "exa_crawling_exa", "extraction_agent": "claude-opus-4.5", "linkedin_url": "full_profile_url", "cost_usd": 0, "request_id": "exa_request_id", "extraction_error": { "error_type": "HTTP_403_PRIVATE_PROFILE", "error_message": "LinkedIn profile not accessible due to privacy settings", "http_status": 403, "occurred_on": "2025-12-13T16:00:00Z" } }, "profile_data": { "name": "Full Name from staff list", "linkedin_url": "profile_url_from_staff_list", "headline": "Headline from staff list", "location": "Location from staff list (if available)", "heritage_relevant": true/false, "heritage_type": "A/L/M/E/D/G/O/R/C/U/B/E/S/F/I/X/P/H/D/N/T", "connections": "Connection count from staff list (if available)", "mutual_connections": "Mutual connections from staff list (if available)", "about": null, "experience": [], "education": [], "skills": [], "languages": [], "heritage_relevant_experience": [], "profile_image_url": null, "photo_urls": null } } ``` ### Field Mappings from Staff List When profile is inaccessible (403 error), use these mappings: | Staff List Field | Profile Data Field | Notes | |----------------|-------------------|-------| | `name` | `profile_data.name` | Full name from staff list | | `headline` | `profile_data.headline` | Professional headline | | `degree` | NOT stored | Connection degree, not profile attribute | | `mutual_connections` | `profile_data.mutual_connections` | If available | | `heritage_relevant` | `profile_data.heritage_relevant` | Heritage relevance flag | | `heritage_type` | `profile_data.heritage_type` | Heritage institution type | | `linkedin_profile_url` | `profile_data.linkedin_url` | Profile URL | | `linkedin_slug` | NOT stored | Used only for filename generation | ### Null/Empty Values for Inaccessible Profiles Set these fields to `null` or empty arrays when profile is inaccessible: - `about` - No profile summary available - `experience` - `[]` - Cannot extract work history - `education` - `[]` - Cannot extract education history - `skills` - `[]` - Cannot extract skills - `languages` - `[]` - Cannot extract languages - `heritage_relevant_experience` - `[]` - Cannot tag specific roles - `profile_image_url` - `null` - Cannot access profile photos - `photo_urls` - `null` - Cannot access profile photos ### Extraction Error Metadata Always include detailed error metadata: ```json "extraction_error": { "error_type": "HTTP_403_PRIVATE_PROFILE", "error_message": "LinkedIn profile not accessible due to privacy settings", "http_status": 403, "occurred_on": "2025-12-13T16:00:00Z", "retry_possible": false, "data_source": "staff_list_only" } ``` ### File Naming Convention Use the same naming convention as accessible profiles: ``` {linkedin-slug}_{ISO-timestamp}.json ``` Example: `anne-kool_20251213T160000Z.json` ### Rationale 1. **Data Preservation**: Even basic data (name, role, heritage relevance) is valuable for network analysis 2. **Transparency**: Clear documentation of why full enrichment wasn't possible 3. **Consistency**: Same file structure as accessible profiles with null values for missing data 4. **Future Re-attempt**: Metadata indicates if retry might be possible (generally not for 403 errors) 5. **Network Analysis**: Basic connection data enables heritage sector relationship mapping ### Implementation When encountering a 403 error: 1. Create JSON file with structure above 2. Use staff list data for available fields 3. Set all extracted fields to `null`/empty where appropriate 4. Include comprehensive error metadata 5. Continue with next profile ### Example Output ```json { "extraction_metadata": { "source_file": "data/custodian/person/affiliated/parsed/the-dutch-inspectorate-of-education_staff_20251210T155416Z.json", "staff_id": "the-dutch-inspectorate-of-education_staff_0098_anne_kool", "extraction_date": "2025-12-13T16:00:00Z", "extraction_method": "exa_crawling_exa", "extraction_agent": "claude-opus-4.5", "linkedin_url": "https://www.linkedin.com/in/anne-kool", "cost_usd": 0, "request_id": "1887bedfed30b7ab01175de94996b54b", "extraction_error": { "error_type": "HTTP_403_PRIVATE_PROFILE", "error_message": "LinkedIn profile not accessible due to privacy settings", "http_status": 403, "occurred_on": "2025-12-13T16:00:00Z", "retry_possible": false, "data_source": "staff_list_only" } }, "profile_data": { "name": "Anne Kool", "linkedin_url": "https://www.linkedin.com/in/anne-kool", "headline": "Student aan Tilburg University", "location": null, "heritage_relevant": true, "heritage_type": "E", "connections": null, "mutual_connections": "", "about": null, "experience": [], "education": [], "skills": [], "languages": [], "heritage_relevant_experience": [], "profile_image_url": null, "photo_urls": null } } ``` This rule ensures that even privacy-protected profiles contribute to the heritage sector dataset while maintaining transparency about data limitations.