3.8 KiB
3.8 KiB
Rule 16: LinkedIn Photo URLs Must Be CDN URLs, Not Overlay Pages
Core Rule
🚨 CRITICAL: When storing LinkedIn profile photos, store the ACTUAL CDN image URL from media.licdn.com, NOT the overlay page URL.
The LinkedIn photo overlay page (/overlay/photo/) is trivially derivable from any profile URL and provides no value. The actual image file URL from LinkedIn's CDN is what must be extracted and stored.
URL Transformation
| URL Type | Example | Store? |
|---|---|---|
| Profile URL | https://www.linkedin.com/in/giovannafossati/ |
Store in linkedin_profile_url |
| Overlay Page URL | https://www.linkedin.com/in/giovannafossati/overlay/photo/ |
❌ NEVER STORE (derivable) |
| CDN Image URL | https://media.licdn.com/dms/image/v2/C4D03AQ.../profile-displayphoto-shrink_800_800/... |
✅ Store in linkedin_photo_url |
Why This Matters
- Overlay URLs are derivable:
{profile_url}overlay/photo/- no information value - Overlay URLs require JavaScript: Cannot be directly embedded or rendered
- CDN URLs are direct links: Can be embedded, downloaded, verified
- CDN URLs prove extraction effort: Demonstrate actual profile access
Derivability Rule
If a URL can be trivially derived from another stored URL, DO NOT store it separately.
linkedin_profile_url → overlay/photo/ ← DERIVABLE, don't store
linkedin_profile_url → media.licdn.com CDN URL ← NOT DERIVABLE, must store
Implementation
CORRECT Storage Pattern
{
"linkedin_profile_url": "https://www.linkedin.com/in/giovannafossati",
"linkedin_photo_url": "https://media.licdn.com/dms/image/v2/C4D03AQHQCBcoih82SQ/profile-displayphoto-shrink_800_800/profile-displayphoto-shrink_800_800/0/1517545267594?e=1766620800&v=beta&t=R1_3Tm1cgNanjfgJZkXHBUiQcQik7_QSdt94d87I52M"
}
WRONG Storage Pattern (NEVER DO THIS)
{
"linkedin_profile_url": "https://www.linkedin.com/in/giovannafossati",
"linkedin_photo_url": "https://www.linkedin.com/in/giovannafossati/overlay/photo/"
}
How to Extract CDN URLs
Method 1: Browser (Manual)
- Go to profile → Click photo → Right-click → "Copy Image Address"
- URL should start with
https://media.licdn.com/dms/image/
Method 2: Playwright Automation
# Navigate to overlay page, extract img[src*="media.licdn.com"]
Method 3: Exa MCP Tools
Use exa_crawling_exa with the profile URL and look for CDN URLs in the response.
Fallback: Alternative Photo Sources
When LinkedIn CDN URL cannot be extracted, use photo_urls object:
{
"linkedin_photo_url": null,
"photo_urls": {
"indiana_university_blog": "https://blogs.libraries.indiana.edu/.../headshot.jpeg",
"screen_daily": "https://d1nslcd7m2225b.cloudfront.net/.../photo.jpeg",
"primary": "https://blogs.libraries.indiana.edu/.../headshot.jpeg",
"photo_credit": "Indiana University"
}
}
Validation
When reviewing person profile JSON files:
- ✅
linkedin_photo_urlisnullOR starts withhttps://media.licdn.com/ - ❌
linkedin_photo_urlcontains/overlay/photo/- FIX IMMEDIATELY - ❌
linkedin_photo_urlequalslinkedin_profile_url- FIX IMMEDIATELY
CDN URL Structure Reference
https://media.licdn.com/dms/image/v2/{IMAGE_ID}/profile-displayphoto-shrink_{SIZE}_{SIZE}/profile-displayphoto-shrink_{SIZE}_{SIZE}/0/{TIMESTAMP}?e={EXPIRY}&v=beta&t={TOKEN}
Sizes: 100_100, 200_200, 400_400, 800_800 (prefer 800_800)
See Also
docs/LINKEDIN_PHOTO_URL_EXTRACTION.md- Complete extraction documentation.opencode/EXA_LINKEDIN_EXTRACTION_RULES.md- Exa MCP extraction rules.opencode/PERSON_DATA_REFERENCE_PATTERN.md- Person profile structure
Rule Number: 16
Created: 2025-12-09
Status: PRODUCTION