glam/.opencode/LINKEDIN_PRIVACY_403_RULE.md
2025-12-14 17:09:55 +01:00

170 lines
No EOL
5.8 KiB
Markdown

# LinkedIn Profile Privacy Handling Rule
## 🚨 CRITICAL: Store Basic Data for Inaccessible Profiles
**When LinkedIn profiles return 403 errors due to privacy settings, store the available basic data with metadata explaining limited enrichment rather than skipping the profile entirely.**
### What Constitutes a 403 Error
- HTTP status code 403 from LinkedIn profile URLs
- "SOURCE_NOT_AVAILABLE" error tag from EXA API
- Profile accessible only to logged-in LinkedIn users
- Privacy-protected profiles
### Required Data Structure for 403 Profiles
When a profile is inaccessible, create a JSON file with:
```json
{
"extraction_metadata": {
"source_file": "path/to/staff_list",
"staff_id": "unique_identifier",
"extraction_date": "ISO_timestamp",
"extraction_method": "exa_crawling_exa",
"extraction_agent": "claude-opus-4.5",
"linkedin_url": "full_profile_url",
"cost_usd": 0,
"request_id": "exa_request_id",
"extraction_error": {
"error_type": "HTTP_403_PRIVATE_PROFILE",
"error_message": "LinkedIn profile not accessible due to privacy settings",
"http_status": 403,
"occurred_on": "2025-12-13T16:00:00Z"
}
},
"profile_data": {
"name": "Full Name from staff list",
"linkedin_url": "profile_url_from_staff_list",
"headline": "Headline from staff list",
"location": "Location from staff list (if available)",
"heritage_relevant": true/false,
"heritage_type": "A/L/M/E/D/G/O/R/C/U/B/E/S/F/I/X/P/H/D/N/T",
"connections": "Connection count from staff list (if available)",
"mutual_connections": "Mutual connections from staff list (if available)",
"about": null,
"experience": [],
"education": [],
"skills": [],
"languages": [],
"heritage_relevant_experience": [],
"profile_image_url": null,
"photo_urls": null
}
}
```
### Field Mappings from Staff List
When profile is inaccessible (403 error), use these mappings:
| Staff List Field | Profile Data Field | Notes |
|----------------|-------------------|-------|
| `name` | `profile_data.name` | Full name from staff list |
| `headline` | `profile_data.headline` | Professional headline |
| `degree` | NOT stored | Connection degree, not profile attribute |
| `mutual_connections` | `profile_data.mutual_connections` | If available |
| `heritage_relevant` | `profile_data.heritage_relevant` | Heritage relevance flag |
| `heritage_type` | `profile_data.heritage_type` | Heritage institution type |
| `linkedin_profile_url` | `profile_data.linkedin_url` | Profile URL |
| `linkedin_slug` | NOT stored | Used only for filename generation |
### Null/Empty Values for Inaccessible Profiles
Set these fields to `null` or empty arrays when profile is inaccessible:
- `about` - No profile summary available
- `experience` - `[]` - Cannot extract work history
- `education` - `[]` - Cannot extract education history
- `skills` - `[]` - Cannot extract skills
- `languages` - `[]` - Cannot extract languages
- `heritage_relevant_experience` - `[]` - Cannot tag specific roles
- `profile_image_url` - `null` - Cannot access profile photos
- `photo_urls` - `null` - Cannot access profile photos
### Extraction Error Metadata
Always include detailed error metadata:
```json
"extraction_error": {
"error_type": "HTTP_403_PRIVATE_PROFILE",
"error_message": "LinkedIn profile not accessible due to privacy settings",
"http_status": 403,
"occurred_on": "2025-12-13T16:00:00Z",
"retry_possible": false,
"data_source": "staff_list_only"
}
```
### File Naming Convention
Use the same naming convention as accessible profiles:
```
{linkedin-slug}_{ISO-timestamp}.json
```
Example: `anne-kool_20251213T160000Z.json`
### Rationale
1. **Data Preservation**: Even basic data (name, role, heritage relevance) is valuable for network analysis
2. **Transparency**: Clear documentation of why full enrichment wasn't possible
3. **Consistency**: Same file structure as accessible profiles with null values for missing data
4. **Future Re-attempt**: Metadata indicates if retry might be possible (generally not for 403 errors)
5. **Network Analysis**: Basic connection data enables heritage sector relationship mapping
### Implementation
When encountering a 403 error:
1. Create JSON file with structure above
2. Use staff list data for available fields
3. Set all extracted fields to `null`/empty where appropriate
4. Include comprehensive error metadata
5. Continue with next profile
### Example Output
```json
{
"extraction_metadata": {
"source_file": "data/custodian/person/affiliated/parsed/the-dutch-inspectorate-of-education_staff_20251210T155416Z.json",
"staff_id": "the-dutch-inspectorate-of-education_staff_0098_anne_kool",
"extraction_date": "2025-12-13T16:00:00Z",
"extraction_method": "exa_crawling_exa",
"extraction_agent": "claude-opus-4.5",
"linkedin_url": "https://www.linkedin.com/in/anne-kool",
"cost_usd": 0,
"request_id": "1887bedfed30b7ab01175de94996b54b",
"extraction_error": {
"error_type": "HTTP_403_PRIVATE_PROFILE",
"error_message": "LinkedIn profile not accessible due to privacy settings",
"http_status": 403,
"occurred_on": "2025-12-13T16:00:00Z",
"retry_possible": false,
"data_source": "staff_list_only"
}
},
"profile_data": {
"name": "Anne Kool",
"linkedin_url": "https://www.linkedin.com/in/anne-kool",
"headline": "Student aan Tilburg University",
"location": null,
"heritage_relevant": true,
"heritage_type": "E",
"connections": null,
"mutual_connections": "",
"about": null,
"experience": [],
"education": [],
"skills": [],
"languages": [],
"heritage_relevant_experience": [],
"profile_image_url": null,
"photo_urls": null
}
}
```
This rule ensures that even privacy-protected profiles contribute to the heritage sector dataset while maintaining transparency about data limitations.