5.8 KiB
5.8 KiB
LinkedIn Profile Privacy Handling Rule
🚨 CRITICAL: Store Basic Data for Inaccessible Profiles
When LinkedIn profiles return 403 errors due to privacy settings, store the available basic data with metadata explaining limited enrichment rather than skipping the profile entirely.
What Constitutes a 403 Error
- HTTP status code 403 from LinkedIn profile URLs
- "SOURCE_NOT_AVAILABLE" error tag from EXA API
- Profile accessible only to logged-in LinkedIn users
- Privacy-protected profiles
Required Data Structure for 403 Profiles
When a profile is inaccessible, create a JSON file with:
{
"extraction_metadata": {
"source_file": "path/to/staff_list",
"staff_id": "unique_identifier",
"extraction_date": "ISO_timestamp",
"extraction_method": "exa_crawling_exa",
"extraction_agent": "claude-opus-4.5",
"linkedin_url": "full_profile_url",
"cost_usd": 0,
"request_id": "exa_request_id",
"extraction_error": {
"error_type": "HTTP_403_PRIVATE_PROFILE",
"error_message": "LinkedIn profile not accessible due to privacy settings",
"http_status": 403,
"occurred_on": "2025-12-13T16:00:00Z"
}
},
"profile_data": {
"name": "Full Name from staff list",
"linkedin_url": "profile_url_from_staff_list",
"headline": "Headline from staff list",
"location": "Location from staff list (if available)",
"heritage_relevant": true/false,
"heritage_type": "A/L/M/E/D/G/O/R/C/U/B/E/S/F/I/X/P/H/D/N/T",
"connections": "Connection count from staff list (if available)",
"mutual_connections": "Mutual connections from staff list (if available)",
"about": null,
"experience": [],
"education": [],
"skills": [],
"languages": [],
"heritage_relevant_experience": [],
"profile_image_url": null,
"photo_urls": null
}
}
Field Mappings from Staff List
When profile is inaccessible (403 error), use these mappings:
| Staff List Field | Profile Data Field | Notes |
|---|---|---|
name |
profile_data.name |
Full name from staff list |
headline |
profile_data.headline |
Professional headline |
degree |
NOT stored | Connection degree, not profile attribute |
mutual_connections |
profile_data.mutual_connections |
If available |
heritage_relevant |
profile_data.heritage_relevant |
Heritage relevance flag |
heritage_type |
profile_data.heritage_type |
Heritage institution type |
linkedin_profile_url |
profile_data.linkedin_url |
Profile URL |
linkedin_slug |
NOT stored | Used only for filename generation |
Null/Empty Values for Inaccessible Profiles
Set these fields to null or empty arrays when profile is inaccessible:
about- No profile summary availableexperience-[]- Cannot extract work historyeducation-[]- Cannot extract education historyskills-[]- Cannot extract skillslanguages-[]- Cannot extract languagesheritage_relevant_experience-[]- Cannot tag specific rolesprofile_image_url-null- Cannot access profile photosphoto_urls-null- Cannot access profile photos
Extraction Error Metadata
Always include detailed error metadata:
"extraction_error": {
"error_type": "HTTP_403_PRIVATE_PROFILE",
"error_message": "LinkedIn profile not accessible due to privacy settings",
"http_status": 403,
"occurred_on": "2025-12-13T16:00:00Z",
"retry_possible": false,
"data_source": "staff_list_only"
}
File Naming Convention
Use the same naming convention as accessible profiles:
{linkedin-slug}_{ISO-timestamp}.json
Example: anne-kool_20251213T160000Z.json
Rationale
- Data Preservation: Even basic data (name, role, heritage relevance) is valuable for network analysis
- Transparency: Clear documentation of why full enrichment wasn't possible
- Consistency: Same file structure as accessible profiles with null values for missing data
- Future Re-attempt: Metadata indicates if retry might be possible (generally not for 403 errors)
- Network Analysis: Basic connection data enables heritage sector relationship mapping
Implementation
When encountering a 403 error:
- Create JSON file with structure above
- Use staff list data for available fields
- Set all extracted fields to
null/empty where appropriate - Include comprehensive error metadata
- Continue with next profile
Example Output
{
"extraction_metadata": {
"source_file": "data/custodian/person/affiliated/parsed/the-dutch-inspectorate-of-education_staff_20251210T155416Z.json",
"staff_id": "the-dutch-inspectorate-of-education_staff_0098_anne_kool",
"extraction_date": "2025-12-13T16:00:00Z",
"extraction_method": "exa_crawling_exa",
"extraction_agent": "claude-opus-4.5",
"linkedin_url": "https://www.linkedin.com/in/anne-kool",
"cost_usd": 0,
"request_id": "1887bedfed30b7ab01175de94996b54b",
"extraction_error": {
"error_type": "HTTP_403_PRIVATE_PROFILE",
"error_message": "LinkedIn profile not accessible due to privacy settings",
"http_status": 403,
"occurred_on": "2025-12-13T16:00:00Z",
"retry_possible": false,
"data_source": "staff_list_only"
}
},
"profile_data": {
"name": "Anne Kool",
"linkedin_url": "https://www.linkedin.com/in/anne-kool",
"headline": "Student aan Tilburg University",
"location": null,
"heritage_relevant": true,
"heritage_type": "E",
"connections": null,
"mutual_connections": "",
"about": null,
"experience": [],
"education": [],
"skills": [],
"languages": [],
"heritage_relevant_experience": [],
"profile_image_url": null,
"photo_urls": null
}
}
This rule ensures that even privacy-protected profiles contribute to the heritage sector dataset while maintaining transparency about data limitations.