109 lines
3.8 KiB
Markdown
109 lines
3.8 KiB
Markdown
# Rule 16: LinkedIn Photo URLs Must Be CDN URLs, Not Overlay Pages
|
|
|
|
## Core Rule
|
|
|
|
**🚨 CRITICAL: When storing LinkedIn profile photos, store the ACTUAL CDN image URL from `media.licdn.com`, NOT the overlay page URL.**
|
|
|
|
The LinkedIn photo overlay page (`/overlay/photo/`) is **trivially derivable** from any profile URL and provides no value. The actual image file URL from LinkedIn's CDN is what must be extracted and stored.
|
|
|
|
## URL Transformation
|
|
|
|
| URL Type | Example | Store? |
|
|
|----------|---------|--------|
|
|
| Profile URL | `https://www.linkedin.com/in/giovannafossati/` | Store in `linkedin_profile_url` |
|
|
| Overlay Page URL | `https://www.linkedin.com/in/giovannafossati/overlay/photo/` | ❌ NEVER STORE (derivable) |
|
|
| CDN Image URL | `https://media.licdn.com/dms/image/v2/C4D03AQ.../profile-displayphoto-shrink_800_800/...` | ✅ Store in `linkedin_photo_url` |
|
|
|
|
## Why This Matters
|
|
|
|
1. **Overlay URLs are derivable**: `{profile_url}overlay/photo/` - no information value
|
|
2. **Overlay URLs require JavaScript**: Cannot be directly embedded or rendered
|
|
3. **CDN URLs are direct links**: Can be embedded, downloaded, verified
|
|
4. **CDN URLs prove extraction effort**: Demonstrate actual profile access
|
|
|
|
## Derivability Rule
|
|
|
|
**If a URL can be trivially derived from another stored URL, DO NOT store it separately.**
|
|
|
|
```
|
|
linkedin_profile_url → overlay/photo/ ← DERIVABLE, don't store
|
|
linkedin_profile_url → media.licdn.com CDN URL ← NOT DERIVABLE, must store
|
|
```
|
|
|
|
## Implementation
|
|
|
|
### CORRECT Storage Pattern
|
|
|
|
```json
|
|
{
|
|
"linkedin_profile_url": "https://www.linkedin.com/in/giovannafossati",
|
|
"linkedin_photo_url": "https://media.licdn.com/dms/image/v2/C4D03AQHQCBcoih82SQ/profile-displayphoto-shrink_800_800/profile-displayphoto-shrink_800_800/0/1517545267594?e=1766620800&v=beta&t=R1_3Tm1cgNanjfgJZkXHBUiQcQik7_QSdt94d87I52M"
|
|
}
|
|
```
|
|
|
|
### WRONG Storage Pattern (NEVER DO THIS)
|
|
|
|
```json
|
|
{
|
|
"linkedin_profile_url": "https://www.linkedin.com/in/giovannafossati",
|
|
"linkedin_photo_url": "https://www.linkedin.com/in/giovannafossati/overlay/photo/"
|
|
}
|
|
```
|
|
|
|
## How to Extract CDN URLs
|
|
|
|
### Method 1: Browser (Manual)
|
|
1. Go to profile → Click photo → Right-click → "Copy Image Address"
|
|
2. URL should start with `https://media.licdn.com/dms/image/`
|
|
|
|
### Method 2: Playwright Automation
|
|
```python
|
|
# Navigate to overlay page, extract img[src*="media.licdn.com"]
|
|
```
|
|
|
|
### Method 3: Exa MCP Tools
|
|
Use `exa_crawling_exa` with the profile URL and look for CDN URLs in the response.
|
|
|
|
## Fallback: Alternative Photo Sources
|
|
|
|
When LinkedIn CDN URL cannot be extracted, use `photo_urls` object:
|
|
|
|
```json
|
|
{
|
|
"linkedin_photo_url": null,
|
|
"photo_urls": {
|
|
"indiana_university_blog": "https://blogs.libraries.indiana.edu/.../headshot.jpeg",
|
|
"screen_daily": "https://d1nslcd7m2225b.cloudfront.net/.../photo.jpeg",
|
|
"primary": "https://blogs.libraries.indiana.edu/.../headshot.jpeg",
|
|
"photo_credit": "Indiana University"
|
|
}
|
|
}
|
|
```
|
|
|
|
## Validation
|
|
|
|
When reviewing person profile JSON files:
|
|
|
|
1. ✅ `linkedin_photo_url` is `null` OR starts with `https://media.licdn.com/`
|
|
2. ❌ `linkedin_photo_url` contains `/overlay/photo/` - **FIX IMMEDIATELY**
|
|
3. ❌ `linkedin_photo_url` equals `linkedin_profile_url` - **FIX IMMEDIATELY**
|
|
|
|
## CDN URL Structure Reference
|
|
|
|
```
|
|
https://media.licdn.com/dms/image/v2/{IMAGE_ID}/profile-displayphoto-shrink_{SIZE}_{SIZE}/profile-displayphoto-shrink_{SIZE}_{SIZE}/0/{TIMESTAMP}?e={EXPIRY}&v=beta&t={TOKEN}
|
|
```
|
|
|
|
Sizes: `100_100`, `200_200`, `400_400`, `800_800` (prefer `800_800`)
|
|
|
|
## See Also
|
|
|
|
- `docs/LINKEDIN_PHOTO_URL_EXTRACTION.md` - Complete extraction documentation
|
|
- `.opencode/EXA_LINKEDIN_EXTRACTION_RULES.md` - Exa MCP extraction rules
|
|
- `.opencode/PERSON_DATA_REFERENCE_PATTERN.md` - Person profile structure
|
|
|
|
---
|
|
|
|
**Rule Number**: 16
|
|
**Created**: 2025-12-09
|
|
**Status**: PRODUCTION
|