5.3 KiB
LinkedIn Photo URL Extraction Rule
Problem
LinkedIn profile URLs like https://www.linkedin.com/in/giovannafossati/ have a trivially derivable photo overlay page:
- Profile URL:
https://www.linkedin.com/in/giovannafossati/ - Overlay URL:
https://www.linkedin.com/in/giovannafossati/overlay/photo/
The overlay URL is useless for data storage because:
- It requires JavaScript rendering to display the actual image
- It cannot be directly embedded in applications
- It provides no direct access to the image file
Solution: Extract the Actual CDN Photo URL
When visiting the LinkedIn photo overlay page, the actual image URL is hosted on LinkedIn's CDN at media.licdn.com.
Example
Profile: https://www.linkedin.com/in/giovannafossati/
WRONG (overlay page - derivable, useless):
https://www.linkedin.com/in/giovannafossati/overlay/photo/
CORRECT (actual CDN image - must be extracted and stored):
https://media.licdn.com/dms/image/v2/C4D03AQHQCBcoih82SQ/profile-displayphoto-shrink_800_800/profile-displayphoto-shrink_800_800/0/1517545267594?e=1766620800&v=beta&t=R1_3Tm1cgNanjfgJZkXHBUiQcQik7_QSdt94d87I52M
CDN URL Structure
LinkedIn photo CDN URLs follow this pattern:
https://media.licdn.com/dms/image/v2/{IMAGE_ID}/profile-displayphoto-shrink_{SIZE}_{SIZE}/profile-displayphoto-shrink_{SIZE}_{SIZE}/0/{TIMESTAMP}?e={EXPIRY}&v=beta&t={TOKEN}
Components:
- Host:
media.licdn.com - Path:
/dms/image/v2/{IMAGE_ID}/profile-displayphoto-shrink_{SIZE}_{SIZE}/... - Size Options:
100_100,200_200,400_400,800_800 - Expiry (
e=): Unix timestamp when URL expires - Token (
t=): Authentication/integrity token
Size Preference
Always prefer the largest available size for archival purposes:
800_800(preferred)400_400200_200100_100
Extraction Workflow
Method 1: Browser Inspection (Manual)
- Navigate to profile:
https://www.linkedin.com/in/{slug}/ - Click on profile photo to open overlay
- Right-click on the photo → "Copy Image Address"
- The URL should start with
https://media.licdn.com/dms/image/
Method 2: Playwright Automation
from playwright.sync_api import sync_playwright
def extract_linkedin_photo_url(profile_url: str) -> str:
"""Extract actual CDN photo URL from LinkedIn profile."""
with sync_playwright() as p:
browser = p.chromium.launch(headless=True)
page = browser.new_page()
# Navigate to profile
page.goto(profile_url)
page.wait_for_load_state('networkidle')
# Click on profile photo to open overlay
page.click('button[aria-label*="photo"]')
page.wait_for_selector('img[src*="media.licdn.com"]')
# Extract the CDN URL
img = page.query_selector('img[src*="media.licdn.com"]')
photo_url = img.get_attribute('src')
browser.close()
return photo_url
Method 3: Exa MCP Tool
When using exa_crawling_exa or exa_linkedin_search_exa, look for URLs matching:
https://media\.licdn\.com/dms/image/v2/[^/]+/profile-displayphoto-shrink_\d+_\d+/[^?]+\?[^\s"']+
JSON Storage Format
In person profile JSON files (data/custodian/person/*.json):
{
"linkedin_profile_url": "https://www.linkedin.com/in/giovannafossati",
"linkedin_photo_url": "https://media.licdn.com/dms/image/v2/C4D03AQHQCBcoih82SQ/profile-displayphoto-shrink_800_800/profile-displayphoto-shrink_800_800/0/1517545267594?e=1766620800&v=beta&t=R1_3Tm1cgNanjfgJZkXHBUiQcQik7_QSdt94d87I52M",
"profile_data": {...}
}
When CDN URL Not Available
If the actual CDN URL cannot be extracted, set linkedin_photo_url to null and use a photo_urls object with alternative sources:
{
"linkedin_profile_url": "https://www.linkedin.com/in/anne-gant-59908a18",
"linkedin_photo_url": null,
"photo_urls": {
"source_name": "https://example.com/photo.jpg",
"primary": "https://example.com/photo.jpg",
"photo_notes": "LinkedIn CDN URL not available. Using alternative source."
}
}
URL Expiration
IMPORTANT: LinkedIn CDN URLs have an expiration timestamp (e= parameter).
- URLs typically expire in 1-2 years
- The token (
t=) becomes invalid after expiry - For long-term archival, consider:
- Downloading and storing the image locally
- Recording the extraction timestamp
- Planning for periodic re-extraction
Anti-Patterns
❌ WRONG: Store overlay page URL
"linkedin_photo_url": "https://www.linkedin.com/in/giovannafossati/overlay/photo/"
This is derivable from the profile URL and requires JavaScript to render.
❌ WRONG: Store profile URL in photo field
"linkedin_photo_url": "https://www.linkedin.com/in/giovannafossati/"
This is not a photo URL at all.
✅ CORRECT: Store actual CDN URL
"linkedin_photo_url": "https://media.licdn.com/dms/image/v2/C4D03AQHQCBcoih82SQ/profile-displayphoto-shrink_800_800/..."
Related Documentation
.opencode/EXA_LINKEDIN_EXTRACTION_RULES.md- LinkedIn profile extraction with Exa MCP.opencode/PERSON_DATA_REFERENCE_PATTERN.md- Person profile file structureAGENTS.mdRule 14 - Exa MCP LinkedIn Profile Extraction
Created: 2025-12-09
Version: 1.0
Status: PRODUCTION