244 lines
6.8 KiB
Markdown
244 lines
6.8 KiB
Markdown
# Person Data Reference Pattern
|
|
|
|
## Rule: Reference Person Files Instead of Inline Duplication
|
|
|
|
**🚨 CRITICAL: When person profile data is already stored in `data/custodian/person/`, custodian files MUST reference the file path instead of duplicating the full profile inline.**
|
|
|
|
This pattern reduces data duplication, ensures single-source-of-truth for person data, and makes updates easier to manage.
|
|
|
|
---
|
|
|
|
## Directory Structure
|
|
|
|
```
|
|
data/custodian/
|
|
├── person/ # Canonical person profile storage
|
|
│ ├── alexandr-belov-bb547b46_20251210T120000Z.json
|
|
│ ├── giovanna-fossati_20251209T170000Z.json
|
|
│ └── ...
|
|
├── NL-NH-AMS-U-EFM-eye_filmmuseum.yaml # Custodian file references person/
|
|
└── ...
|
|
```
|
|
|
|
---
|
|
|
|
## Pattern: Reference vs. Inline
|
|
|
|
### ❌ WRONG - Full Inline Duplication
|
|
|
|
```yaml
|
|
collection_management_specialist:
|
|
- name: Alexandr Belov
|
|
role: Collection/Information Specialist for film-related materials
|
|
department: Collection and Research Center
|
|
linkedin_url: https://www.linkedin.com/in/alexandr-belov-bb547b46
|
|
location: Amsterdam, North Holland, Netherlands
|
|
current: true
|
|
source: linkedin_exa
|
|
linkedin_connections: 94
|
|
linkedin_followers: 94
|
|
about: >-
|
|
International university librarian with information organization skills...
|
|
[60+ lines of profile data]
|
|
total_experience_years: 15.5
|
|
skills:
|
|
- MARC 21 cataloging
|
|
- RDA cataloging
|
|
# ... 10 more skills
|
|
languages:
|
|
- language: English
|
|
proficiency: fluent
|
|
# ... 7 more languages
|
|
education:
|
|
- degree: Bachelor
|
|
field: Linguistics
|
|
career_history:
|
|
# ... extensive career data
|
|
provenance:
|
|
source_urls:
|
|
- https://www.linkedin.com/in/alexandr-belov-bb547b46
|
|
extraction_tool: exa_web_search_exa
|
|
extraction_timestamp: '2025-12-10T12:00:00Z'
|
|
```
|
|
|
|
### ✅ CORRECT - File Path Reference
|
|
|
|
```yaml
|
|
collection_management_specialist:
|
|
- name: Alexandr Belov
|
|
role: Collection/Information Specialist for film-related materials
|
|
department: Collection and Research Center
|
|
linkedin_url: https://www.linkedin.com/in/alexandr-belov-bb547b46
|
|
current: true
|
|
person_profile_path: data/custodian/person/alexandr-belov-bb547b46_20251210T120000Z.json
|
|
```
|
|
|
|
---
|
|
|
|
## When to Use Each Pattern
|
|
|
|
### Use File Path Reference When:
|
|
- ✅ Full profile data has already been extracted and saved
|
|
- ✅ Person has extensive career history, skills, education
|
|
- ✅ Profile data exceeds ~10 lines
|
|
- ✅ Same person might be referenced by multiple custodians
|
|
|
|
### Use Inline Data When:
|
|
- ✅ Only basic info available (name, role, LinkedIn URL)
|
|
- ✅ Person has minimal profile data
|
|
- ✅ Quick enrichment without full profile extraction
|
|
- ✅ Temporary/placeholder entry before full extraction
|
|
|
|
---
|
|
|
|
## File Naming Convention for Person Profiles
|
|
|
|
**Format**: `{linkedin-slug}_{ISO-timestamp}.json`
|
|
|
|
**Examples**:
|
|
```
|
|
alexandr-belov-bb547b46_20251210T120000Z.json
|
|
giovanna-fossati_20251209T170000Z.json
|
|
sandra-den-hamer-66024510_20251209T190000Z.json
|
|
```
|
|
|
|
**Components**:
|
|
- `linkedin-slug`: The unique part of LinkedIn URL (e.g., `alexandr-belov-bb547b46`)
|
|
- `ISO-timestamp`: Full timestamp with timezone (YYYYMMDDTHHMMSSZ)
|
|
|
|
---
|
|
|
|
## Person Profile JSON Structure
|
|
|
|
```json
|
|
{
|
|
"exa_search_metadata": {
|
|
"query": "site:linkedin.com/in/{slug}",
|
|
"search_timestamp": "2025-12-10T12:00:00Z",
|
|
"extraction_tool": "exa_web_search_exa",
|
|
"extraction_agent": "claude-sonnet-4-20250514"
|
|
},
|
|
"linkedin_profile_url": "https://www.linkedin.com/in/{slug}",
|
|
"profile_data": {
|
|
"name": "Full Name",
|
|
"headline": "Current Role",
|
|
"location": "City, Region, Country",
|
|
"current_company": "Organization Name",
|
|
"department": "Department Name",
|
|
"connections": 94,
|
|
"followers": 94,
|
|
"about": "Professional summary...",
|
|
"total_experience_years": 15.5,
|
|
"skills": ["Skill 1", "Skill 2", ...],
|
|
"languages": [
|
|
{"language": "English", "proficiency": "fluent"},
|
|
...
|
|
],
|
|
"education": [...],
|
|
"career_history": [...],
|
|
"international_experience": [...],
|
|
"project_experience": [...]
|
|
},
|
|
"raw_exa_response_summary": {
|
|
"source_url": "https://www.linkedin.com/in/{slug}",
|
|
"search_type": "site-specific LinkedIn search",
|
|
"data_tier": "TIER_3_CROWD_SOURCED"
|
|
}
|
|
}
|
|
```
|
|
|
|
---
|
|
|
|
## Minimal Reference in Custodian File
|
|
|
|
When referencing a person file, the custodian YAML needs only:
|
|
|
|
```yaml
|
|
- name: Alexandr Belov # Display name (required)
|
|
role: Collection/Information Specialist # Current role at this institution
|
|
department: Collection and Research Center # Department (if known)
|
|
linkedin_url: https://www.linkedin.com/in/alexandr-belov-bb547b46 # For linking
|
|
current: true # Still employed at institution?
|
|
person_profile_path: data/custodian/person/alexandr-belov-bb547b46_20251210T120000Z.json
|
|
```
|
|
|
|
**Optional additional fields** (if not in person file or institution-specific):
|
|
- `start_date`: When they started at THIS institution
|
|
- `note`: Institution-specific notes
|
|
- `source`: Where this association was discovered
|
|
|
|
---
|
|
|
|
## Migration: Converting Inline to Reference
|
|
|
|
When you find inline person data that should be a file reference:
|
|
|
|
1. **Create person file**:
|
|
```bash
|
|
# Save to data/custodian/person/{slug}_{timestamp}.json
|
|
```
|
|
|
|
2. **Update custodian YAML**:
|
|
```yaml
|
|
# Replace 50+ lines of inline data with:
|
|
- name: Person Name
|
|
role: Their Role
|
|
linkedin_url: https://linkedin.com/in/{slug}
|
|
current: true
|
|
person_profile_path: data/custodian/person/{slug}_{timestamp}.json
|
|
```
|
|
|
|
3. **Verify file exists** before removing inline data
|
|
|
|
---
|
|
|
|
## Cross-Custodian References
|
|
|
|
When the same person works at multiple institutions:
|
|
|
|
```yaml
|
|
# In NL-NH-AMS-U-EFM-eye_filmmuseum.yaml
|
|
former_directors:
|
|
- name: Sandra den Hamer
|
|
role: Director
|
|
tenure_start: '2010-01'
|
|
tenure_end: '2023-02'
|
|
person_profile_path: data/custodian/person/sandra-den-hamer-66024510_20251209T190000Z.json
|
|
|
|
# In NL-ZH-DHA-O-NFF-netherlands_filmfonds.yaml
|
|
management:
|
|
- name: Sandra den Hamer
|
|
role: Interim CEO
|
|
start_date: '2023-05'
|
|
current: true
|
|
person_profile_path: data/custodian/person/sandra-den-hamer-66024510_20251209T190000Z.json
|
|
```
|
|
|
|
Both institutions reference the SAME person file - single source of truth.
|
|
|
|
---
|
|
|
|
## Provenance Tracking
|
|
|
|
The person file contains full provenance:
|
|
|
|
```json
|
|
{
|
|
"exa_search_metadata": {
|
|
"query": "site:linkedin.com/in/alexandr-belov-bb547b46",
|
|
"search_timestamp": "2025-12-10T12:00:00Z",
|
|
"extraction_tool": "exa_web_search_exa",
|
|
"extraction_agent": "claude-sonnet-4-20250514"
|
|
}
|
|
}
|
|
```
|
|
|
|
The custodian file does NOT need to duplicate this - it inherits provenance from the referenced file.
|
|
|
|
---
|
|
|
|
## See Also
|
|
|
|
- `AGENTS.md` - Rule 5: NEVER Delete Enriched Data
|
|
- `.opencode/DATA_PRESERVATION_RULES.md` - Data preservation guidelines
|
|
- `schemas/20251121/linkml/modules/classes/PersonObservation.yaml` - PiCo-based person modeling
|