6.8 KiB
Person Data Reference Pattern
Rule: Reference Person Files Instead of Inline Duplication
🚨 CRITICAL: When person profile data is already stored in data/custodian/person/, custodian files MUST reference the file path instead of duplicating the full profile inline.
This pattern reduces data duplication, ensures single-source-of-truth for person data, and makes updates easier to manage.
Directory Structure
data/custodian/
├── person/ # Canonical person profile storage
│ ├── alexandr-belov-bb547b46_20251210T120000Z.json
│ ├── giovanna-fossati_20251209T170000Z.json
│ └── ...
├── NL-NH-AMS-U-EFM-eye_filmmuseum.yaml # Custodian file references person/
└── ...
Pattern: Reference vs. Inline
❌ WRONG - Full Inline Duplication
collection_management_specialist:
- name: Alexandr Belov
role: Collection/Information Specialist for film-related materials
department: Collection and Research Center
linkedin_url: https://www.linkedin.com/in/alexandr-belov-bb547b46
location: Amsterdam, North Holland, Netherlands
current: true
source: linkedin_exa
linkedin_connections: 94
linkedin_followers: 94
about: >-
International university librarian with information organization skills...
[60+ lines of profile data]
total_experience_years: 15.5
skills:
- MARC 21 cataloging
- RDA cataloging
# ... 10 more skills
languages:
- language: English
proficiency: fluent
# ... 7 more languages
education:
- degree: Bachelor
field: Linguistics
career_history:
# ... extensive career data
provenance:
source_urls:
- https://www.linkedin.com/in/alexandr-belov-bb547b46
extraction_tool: exa_web_search_exa
extraction_timestamp: '2025-12-10T12:00:00Z'
✅ CORRECT - File Path Reference
collection_management_specialist:
- name: Alexandr Belov
role: Collection/Information Specialist for film-related materials
department: Collection and Research Center
linkedin_url: https://www.linkedin.com/in/alexandr-belov-bb547b46
current: true
person_profile_path: data/custodian/person/alexandr-belov-bb547b46_20251210T120000Z.json
When to Use Each Pattern
Use File Path Reference When:
- ✅ Full profile data has already been extracted and saved
- ✅ Person has extensive career history, skills, education
- ✅ Profile data exceeds ~10 lines
- ✅ Same person might be referenced by multiple custodians
Use Inline Data When:
- ✅ Only basic info available (name, role, LinkedIn URL)
- ✅ Person has minimal profile data
- ✅ Quick enrichment without full profile extraction
- ✅ Temporary/placeholder entry before full extraction
File Naming Convention for Person Profiles
Format: {linkedin-slug}_{ISO-timestamp}.json
Examples:
alexandr-belov-bb547b46_20251210T120000Z.json
giovanna-fossati_20251209T170000Z.json
sandra-den-hamer-66024510_20251209T190000Z.json
Components:
linkedin-slug: The unique part of LinkedIn URL (e.g.,alexandr-belov-bb547b46)ISO-timestamp: Full timestamp with timezone (YYYYMMDDTHHMMSSZ)
Person Profile JSON Structure
{
"exa_search_metadata": {
"query": "site:linkedin.com/in/{slug}",
"search_timestamp": "2025-12-10T12:00:00Z",
"extraction_tool": "exa_web_search_exa",
"extraction_agent": "claude-sonnet-4-20250514"
},
"linkedin_profile_url": "https://www.linkedin.com/in/{slug}",
"profile_data": {
"name": "Full Name",
"headline": "Current Role",
"location": "City, Region, Country",
"current_company": "Organization Name",
"department": "Department Name",
"connections": 94,
"followers": 94,
"about": "Professional summary...",
"total_experience_years": 15.5,
"skills": ["Skill 1", "Skill 2", ...],
"languages": [
{"language": "English", "proficiency": "fluent"},
...
],
"education": [...],
"career_history": [...],
"international_experience": [...],
"project_experience": [...]
},
"raw_exa_response_summary": {
"source_url": "https://www.linkedin.com/in/{slug}",
"search_type": "site-specific LinkedIn search",
"data_tier": "TIER_3_CROWD_SOURCED"
}
}
Minimal Reference in Custodian File
When referencing a person file, the custodian YAML needs only:
- name: Alexandr Belov # Display name (required)
role: Collection/Information Specialist # Current role at this institution
department: Collection and Research Center # Department (if known)
linkedin_url: https://www.linkedin.com/in/alexandr-belov-bb547b46 # For linking
current: true # Still employed at institution?
person_profile_path: data/custodian/person/alexandr-belov-bb547b46_20251210T120000Z.json
Optional additional fields (if not in person file or institution-specific):
start_date: When they started at THIS institutionnote: Institution-specific notessource: Where this association was discovered
Migration: Converting Inline to Reference
When you find inline person data that should be a file reference:
-
Create person file:
# Save to data/custodian/person/{slug}_{timestamp}.json -
Update custodian YAML:
# Replace 50+ lines of inline data with: - name: Person Name role: Their Role linkedin_url: https://linkedin.com/in/{slug} current: true person_profile_path: data/custodian/person/{slug}_{timestamp}.json -
Verify file exists before removing inline data
Cross-Custodian References
When the same person works at multiple institutions:
# In NL-NH-AMS-U-EFM-eye_filmmuseum.yaml
former_directors:
- name: Sandra den Hamer
role: Director
tenure_start: '2010-01'
tenure_end: '2023-02'
person_profile_path: data/custodian/person/sandra-den-hamer-66024510_20251209T190000Z.json
# In NL-ZH-DHA-O-NFF-netherlands_filmfonds.yaml
management:
- name: Sandra den Hamer
role: Interim CEO
start_date: '2023-05'
current: true
person_profile_path: data/custodian/person/sandra-den-hamer-66024510_20251209T190000Z.json
Both institutions reference the SAME person file - single source of truth.
Provenance Tracking
The person file contains full provenance:
{
"exa_search_metadata": {
"query": "site:linkedin.com/in/alexandr-belov-bb547b46",
"search_timestamp": "2025-12-10T12:00:00Z",
"extraction_tool": "exa_web_search_exa",
"extraction_agent": "claude-sonnet-4-20250514"
}
}
The custodian file does NOT need to duplicate this - it inherits provenance from the referenced file.
See Also
AGENTS.md- Rule 5: NEVER Delete Enriched Data.opencode/DATA_PRESERVATION_RULES.md- Data preservation guidelinesschemas/20251121/linkml/modules/classes/PersonObservation.yaml- PiCo-based person modeling