glam/.opencode/LINKEDIN_PRIVACY_403_RULE.md
2025-12-14 17:09:55 +01:00

5.8 KiB

LinkedIn Profile Privacy Handling Rule

🚨 CRITICAL: Store Basic Data for Inaccessible Profiles

When LinkedIn profiles return 403 errors due to privacy settings, store the available basic data with metadata explaining limited enrichment rather than skipping the profile entirely.

What Constitutes a 403 Error

  • HTTP status code 403 from LinkedIn profile URLs
  • "SOURCE_NOT_AVAILABLE" error tag from EXA API
  • Profile accessible only to logged-in LinkedIn users
  • Privacy-protected profiles

Required Data Structure for 403 Profiles

When a profile is inaccessible, create a JSON file with:

{
  "extraction_metadata": {
    "source_file": "path/to/staff_list",
    "staff_id": "unique_identifier",
    "extraction_date": "ISO_timestamp",
    "extraction_method": "exa_crawling_exa",
    "extraction_agent": "claude-opus-4.5",
    "linkedin_url": "full_profile_url",
    "cost_usd": 0,
    "request_id": "exa_request_id",
    "extraction_error": {
      "error_type": "HTTP_403_PRIVATE_PROFILE",
      "error_message": "LinkedIn profile not accessible due to privacy settings",
      "http_status": 403,
      "occurred_on": "2025-12-13T16:00:00Z"
    }
  },
  "profile_data": {
    "name": "Full Name from staff list",
    "linkedin_url": "profile_url_from_staff_list",
    "headline": "Headline from staff list",
    "location": "Location from staff list (if available)",
    "heritage_relevant": true/false,
    "heritage_type": "A/L/M/E/D/G/O/R/C/U/B/E/S/F/I/X/P/H/D/N/T",
    "connections": "Connection count from staff list (if available)",
    "mutual_connections": "Mutual connections from staff list (if available)",
    "about": null,
    "experience": [],
    "education": [],
    "skills": [],
    "languages": [],
    "heritage_relevant_experience": [],
    "profile_image_url": null,
    "photo_urls": null
  }
}

Field Mappings from Staff List

When profile is inaccessible (403 error), use these mappings:

Staff List Field Profile Data Field Notes
name profile_data.name Full name from staff list
headline profile_data.headline Professional headline
degree NOT stored Connection degree, not profile attribute
mutual_connections profile_data.mutual_connections If available
heritage_relevant profile_data.heritage_relevant Heritage relevance flag
heritage_type profile_data.heritage_type Heritage institution type
linkedin_profile_url profile_data.linkedin_url Profile URL
linkedin_slug NOT stored Used only for filename generation

Null/Empty Values for Inaccessible Profiles

Set these fields to null or empty arrays when profile is inaccessible:

  • about - No profile summary available
  • experience - [] - Cannot extract work history
  • education - [] - Cannot extract education history
  • skills - [] - Cannot extract skills
  • languages - [] - Cannot extract languages
  • heritage_relevant_experience - [] - Cannot tag specific roles
  • profile_image_url - null - Cannot access profile photos
  • photo_urls - null - Cannot access profile photos

Extraction Error Metadata

Always include detailed error metadata:

"extraction_error": {
  "error_type": "HTTP_403_PRIVATE_PROFILE",
  "error_message": "LinkedIn profile not accessible due to privacy settings",
  "http_status": 403,
  "occurred_on": "2025-12-13T16:00:00Z",
  "retry_possible": false,
  "data_source": "staff_list_only"
}

File Naming Convention

Use the same naming convention as accessible profiles:

{linkedin-slug}_{ISO-timestamp}.json

Example: anne-kool_20251213T160000Z.json

Rationale

  1. Data Preservation: Even basic data (name, role, heritage relevance) is valuable for network analysis
  2. Transparency: Clear documentation of why full enrichment wasn't possible
  3. Consistency: Same file structure as accessible profiles with null values for missing data
  4. Future Re-attempt: Metadata indicates if retry might be possible (generally not for 403 errors)
  5. Network Analysis: Basic connection data enables heritage sector relationship mapping

Implementation

When encountering a 403 error:

  1. Create JSON file with structure above
  2. Use staff list data for available fields
  3. Set all extracted fields to null/empty where appropriate
  4. Include comprehensive error metadata
  5. Continue with next profile

Example Output

{
  "extraction_metadata": {
    "source_file": "data/custodian/person/affiliated/parsed/the-dutch-inspectorate-of-education_staff_20251210T155416Z.json",
    "staff_id": "the-dutch-inspectorate-of-education_staff_0098_anne_kool",
    "extraction_date": "2025-12-13T16:00:00Z",
    "extraction_method": "exa_crawling_exa",
    "extraction_agent": "claude-opus-4.5",
    "linkedin_url": "https://www.linkedin.com/in/anne-kool",
    "cost_usd": 0,
    "request_id": "1887bedfed30b7ab01175de94996b54b",
    "extraction_error": {
      "error_type": "HTTP_403_PRIVATE_PROFILE",
      "error_message": "LinkedIn profile not accessible due to privacy settings",
      "http_status": 403,
      "occurred_on": "2025-12-13T16:00:00Z",
      "retry_possible": false,
      "data_source": "staff_list_only"
    }
  },
  "profile_data": {
    "name": "Anne Kool",
    "linkedin_url": "https://www.linkedin.com/in/anne-kool",
    "headline": "Student aan Tilburg University",
    "location": null,
    "heritage_relevant": true,
    "heritage_type": "E",
    "connections": null,
    "mutual_connections": "",
    "about": null,
    "experience": [],
    "education": [],
    "skills": [],
    "languages": [],
    "heritage_relevant_experience": [],
    "profile_image_url": null,
    "photo_urls": null
  }
}

This rule ensures that even privacy-protected profiles contribute to the heritage sector dataset while maintaining transparency about data limitations.