170 lines
No EOL
5.8 KiB
Markdown
170 lines
No EOL
5.8 KiB
Markdown
# LinkedIn Profile Privacy Handling Rule
|
|
|
|
## 🚨 CRITICAL: Store Basic Data for Inaccessible Profiles
|
|
|
|
**When LinkedIn profiles return 403 errors due to privacy settings, store the available basic data with metadata explaining limited enrichment rather than skipping the profile entirely.**
|
|
|
|
### What Constitutes a 403 Error
|
|
|
|
- HTTP status code 403 from LinkedIn profile URLs
|
|
- "SOURCE_NOT_AVAILABLE" error tag from EXA API
|
|
- Profile accessible only to logged-in LinkedIn users
|
|
- Privacy-protected profiles
|
|
|
|
### Required Data Structure for 403 Profiles
|
|
|
|
When a profile is inaccessible, create a JSON file with:
|
|
|
|
```json
|
|
{
|
|
"extraction_metadata": {
|
|
"source_file": "path/to/staff_list",
|
|
"staff_id": "unique_identifier",
|
|
"extraction_date": "ISO_timestamp",
|
|
"extraction_method": "exa_crawling_exa",
|
|
"extraction_agent": "claude-opus-4.5",
|
|
"linkedin_url": "full_profile_url",
|
|
"cost_usd": 0,
|
|
"request_id": "exa_request_id",
|
|
"extraction_error": {
|
|
"error_type": "HTTP_403_PRIVATE_PROFILE",
|
|
"error_message": "LinkedIn profile not accessible due to privacy settings",
|
|
"http_status": 403,
|
|
"occurred_on": "2025-12-13T16:00:00Z"
|
|
}
|
|
},
|
|
"profile_data": {
|
|
"name": "Full Name from staff list",
|
|
"linkedin_url": "profile_url_from_staff_list",
|
|
"headline": "Headline from staff list",
|
|
"location": "Location from staff list (if available)",
|
|
"heritage_relevant": true/false,
|
|
"heritage_type": "A/L/M/E/D/G/O/R/C/U/B/E/S/F/I/X/P/H/D/N/T",
|
|
"connections": "Connection count from staff list (if available)",
|
|
"mutual_connections": "Mutual connections from staff list (if available)",
|
|
"about": null,
|
|
"experience": [],
|
|
"education": [],
|
|
"skills": [],
|
|
"languages": [],
|
|
"heritage_relevant_experience": [],
|
|
"profile_image_url": null,
|
|
"photo_urls": null
|
|
}
|
|
}
|
|
```
|
|
|
|
### Field Mappings from Staff List
|
|
|
|
When profile is inaccessible (403 error), use these mappings:
|
|
|
|
| Staff List Field | Profile Data Field | Notes |
|
|
|----------------|-------------------|-------|
|
|
| `name` | `profile_data.name` | Full name from staff list |
|
|
| `headline` | `profile_data.headline` | Professional headline |
|
|
| `degree` | NOT stored | Connection degree, not profile attribute |
|
|
| `mutual_connections` | `profile_data.mutual_connections` | If available |
|
|
| `heritage_relevant` | `profile_data.heritage_relevant` | Heritage relevance flag |
|
|
| `heritage_type` | `profile_data.heritage_type` | Heritage institution type |
|
|
| `linkedin_profile_url` | `profile_data.linkedin_url` | Profile URL |
|
|
| `linkedin_slug` | NOT stored | Used only for filename generation |
|
|
|
|
### Null/Empty Values for Inaccessible Profiles
|
|
|
|
Set these fields to `null` or empty arrays when profile is inaccessible:
|
|
|
|
- `about` - No profile summary available
|
|
- `experience` - `[]` - Cannot extract work history
|
|
- `education` - `[]` - Cannot extract education history
|
|
- `skills` - `[]` - Cannot extract skills
|
|
- `languages` - `[]` - Cannot extract languages
|
|
- `heritage_relevant_experience` - `[]` - Cannot tag specific roles
|
|
- `profile_image_url` - `null` - Cannot access profile photos
|
|
- `photo_urls` - `null` - Cannot access profile photos
|
|
|
|
### Extraction Error Metadata
|
|
|
|
Always include detailed error metadata:
|
|
|
|
```json
|
|
"extraction_error": {
|
|
"error_type": "HTTP_403_PRIVATE_PROFILE",
|
|
"error_message": "LinkedIn profile not accessible due to privacy settings",
|
|
"http_status": 403,
|
|
"occurred_on": "2025-12-13T16:00:00Z",
|
|
"retry_possible": false,
|
|
"data_source": "staff_list_only"
|
|
}
|
|
```
|
|
|
|
### File Naming Convention
|
|
|
|
Use the same naming convention as accessible profiles:
|
|
```
|
|
{linkedin-slug}_{ISO-timestamp}.json
|
|
```
|
|
|
|
Example: `anne-kool_20251213T160000Z.json`
|
|
|
|
### Rationale
|
|
|
|
1. **Data Preservation**: Even basic data (name, role, heritage relevance) is valuable for network analysis
|
|
2. **Transparency**: Clear documentation of why full enrichment wasn't possible
|
|
3. **Consistency**: Same file structure as accessible profiles with null values for missing data
|
|
4. **Future Re-attempt**: Metadata indicates if retry might be possible (generally not for 403 errors)
|
|
5. **Network Analysis**: Basic connection data enables heritage sector relationship mapping
|
|
|
|
### Implementation
|
|
|
|
When encountering a 403 error:
|
|
|
|
1. Create JSON file with structure above
|
|
2. Use staff list data for available fields
|
|
3. Set all extracted fields to `null`/empty where appropriate
|
|
4. Include comprehensive error metadata
|
|
5. Continue with next profile
|
|
|
|
### Example Output
|
|
|
|
```json
|
|
{
|
|
"extraction_metadata": {
|
|
"source_file": "data/custodian/person/affiliated/parsed/the-dutch-inspectorate-of-education_staff_20251210T155416Z.json",
|
|
"staff_id": "the-dutch-inspectorate-of-education_staff_0098_anne_kool",
|
|
"extraction_date": "2025-12-13T16:00:00Z",
|
|
"extraction_method": "exa_crawling_exa",
|
|
"extraction_agent": "claude-opus-4.5",
|
|
"linkedin_url": "https://www.linkedin.com/in/anne-kool",
|
|
"cost_usd": 0,
|
|
"request_id": "1887bedfed30b7ab01175de94996b54b",
|
|
"extraction_error": {
|
|
"error_type": "HTTP_403_PRIVATE_PROFILE",
|
|
"error_message": "LinkedIn profile not accessible due to privacy settings",
|
|
"http_status": 403,
|
|
"occurred_on": "2025-12-13T16:00:00Z",
|
|
"retry_possible": false,
|
|
"data_source": "staff_list_only"
|
|
}
|
|
},
|
|
"profile_data": {
|
|
"name": "Anne Kool",
|
|
"linkedin_url": "https://www.linkedin.com/in/anne-kool",
|
|
"headline": "Student aan Tilburg University",
|
|
"location": null,
|
|
"heritage_relevant": true,
|
|
"heritage_type": "E",
|
|
"connections": null,
|
|
"mutual_connections": "",
|
|
"about": null,
|
|
"experience": [],
|
|
"education": [],
|
|
"skills": [],
|
|
"languages": [],
|
|
"heritage_relevant_experience": [],
|
|
"profile_image_url": null,
|
|
"photo_urls": null
|
|
}
|
|
}
|
|
```
|
|
|
|
This rule ensures that even privacy-protected profiles contribute to the heritage sector dataset while maintaining transparency about data limitations. |