glam/.opencode/LINKEDIN_PHOTO_CDN_RULE.md
2025-12-10 13:01:13 +01:00

3.8 KiB

Rule 16: LinkedIn Photo URLs Must Be CDN URLs, Not Overlay Pages

Core Rule

🚨 CRITICAL: When storing LinkedIn profile photos, store the ACTUAL CDN image URL from media.licdn.com, NOT the overlay page URL.

The LinkedIn photo overlay page (/overlay/photo/) is trivially derivable from any profile URL and provides no value. The actual image file URL from LinkedIn's CDN is what must be extracted and stored.

URL Transformation

URL Type Example Store?
Profile URL https://www.linkedin.com/in/giovannafossati/ Store in linkedin_profile_url
Overlay Page URL https://www.linkedin.com/in/giovannafossati/overlay/photo/ NEVER STORE (derivable)
CDN Image URL https://media.licdn.com/dms/image/v2/C4D03AQ.../profile-displayphoto-shrink_800_800/... Store in linkedin_photo_url

Why This Matters

  1. Overlay URLs are derivable: {profile_url}overlay/photo/ - no information value
  2. Overlay URLs require JavaScript: Cannot be directly embedded or rendered
  3. CDN URLs are direct links: Can be embedded, downloaded, verified
  4. CDN URLs prove extraction effort: Demonstrate actual profile access

Derivability Rule

If a URL can be trivially derived from another stored URL, DO NOT store it separately.

linkedin_profile_url → overlay/photo/  ← DERIVABLE, don't store
linkedin_profile_url → media.licdn.com CDN URL  ← NOT DERIVABLE, must store

Implementation

CORRECT Storage Pattern

{
  "linkedin_profile_url": "https://www.linkedin.com/in/giovannafossati",
  "linkedin_photo_url": "https://media.licdn.com/dms/image/v2/C4D03AQHQCBcoih82SQ/profile-displayphoto-shrink_800_800/profile-displayphoto-shrink_800_800/0/1517545267594?e=1766620800&v=beta&t=R1_3Tm1cgNanjfgJZkXHBUiQcQik7_QSdt94d87I52M"
}

WRONG Storage Pattern (NEVER DO THIS)

{
  "linkedin_profile_url": "https://www.linkedin.com/in/giovannafossati",
  "linkedin_photo_url": "https://www.linkedin.com/in/giovannafossati/overlay/photo/"
}

How to Extract CDN URLs

Method 1: Browser (Manual)

  1. Go to profile → Click photo → Right-click → "Copy Image Address"
  2. URL should start with https://media.licdn.com/dms/image/

Method 2: Playwright Automation

# Navigate to overlay page, extract img[src*="media.licdn.com"]

Method 3: Exa MCP Tools

Use exa_crawling_exa with the profile URL and look for CDN URLs in the response.

Fallback: Alternative Photo Sources

When LinkedIn CDN URL cannot be extracted, use photo_urls object:

{
  "linkedin_photo_url": null,
  "photo_urls": {
    "indiana_university_blog": "https://blogs.libraries.indiana.edu/.../headshot.jpeg",
    "screen_daily": "https://d1nslcd7m2225b.cloudfront.net/.../photo.jpeg",
    "primary": "https://blogs.libraries.indiana.edu/.../headshot.jpeg",
    "photo_credit": "Indiana University"
  }
}

Validation

When reviewing person profile JSON files:

  1. linkedin_photo_url is null OR starts with https://media.licdn.com/
  2. linkedin_photo_url contains /overlay/photo/ - FIX IMMEDIATELY
  3. linkedin_photo_url equals linkedin_profile_url - FIX IMMEDIATELY

CDN URL Structure Reference

https://media.licdn.com/dms/image/v2/{IMAGE_ID}/profile-displayphoto-shrink_{SIZE}_{SIZE}/profile-displayphoto-shrink_{SIZE}_{SIZE}/0/{TIMESTAMP}?e={EXPIRY}&v=beta&t={TOKEN}

Sizes: 100_100, 200_200, 400_400, 800_800 (prefer 800_800)

See Also

  • docs/LINKEDIN_PHOTO_URL_EXTRACTION.md - Complete extraction documentation
  • .opencode/EXA_LINKEDIN_EXTRACTION_RULES.md - Exa MCP extraction rules
  • .opencode/PERSON_DATA_REFERENCE_PATTERN.md - Person profile structure

Rule Number: 16
Created: 2025-12-09
Status: PRODUCTION