# Person Data Reference Pattern ## Rule: Reference Person Files Instead of Inline Duplication **🚨 CRITICAL: When person profile data is already stored in `data/custodian/person/`, custodian files MUST reference the file path instead of duplicating the full profile inline.** This pattern reduces data duplication, ensures single-source-of-truth for person data, and makes updates easier to manage. --- ## Directory Structure ``` data/custodian/ ├── person/ # Canonical person profile storage │ ├── alexandr-belov-bb547b46_20251210T120000Z.json │ ├── giovanna-fossati_20251209T170000Z.json │ └── ... ├── NL-NH-AMS-U-EFM-eye_filmmuseum.yaml # Custodian file references person/ └── ... ``` --- ## Pattern: Reference vs. Inline ### ❌ WRONG - Full Inline Duplication ```yaml collection_management_specialist: - name: Alexandr Belov role: Collection/Information Specialist for film-related materials department: Collection and Research Center linkedin_url: https://www.linkedin.com/in/alexandr-belov-bb547b46 location: Amsterdam, North Holland, Netherlands current: true source: linkedin_exa linkedin_connections: 94 linkedin_followers: 94 about: >- International university librarian with information organization skills... [60+ lines of profile data] total_experience_years: 15.5 skills: - MARC 21 cataloging - RDA cataloging # ... 10 more skills languages: - language: English proficiency: fluent # ... 7 more languages education: - degree: Bachelor field: Linguistics career_history: # ... extensive career data provenance: source_urls: - https://www.linkedin.com/in/alexandr-belov-bb547b46 extraction_tool: exa_web_search_exa extraction_timestamp: '2025-12-10T12:00:00Z' ``` ### ✅ CORRECT - File Path Reference ```yaml collection_management_specialist: - name: Alexandr Belov role: Collection/Information Specialist for film-related materials department: Collection and Research Center linkedin_url: https://www.linkedin.com/in/alexandr-belov-bb547b46 current: true person_profile_path: data/custodian/person/alexandr-belov-bb547b46_20251210T120000Z.json ``` --- ## When to Use Each Pattern ### Use File Path Reference When: - ✅ Full profile data has already been extracted and saved - ✅ Person has extensive career history, skills, education - ✅ Profile data exceeds ~10 lines - ✅ Same person might be referenced by multiple custodians ### Use Inline Data When: - ✅ Only basic info available (name, role, LinkedIn URL) - ✅ Person has minimal profile data - ✅ Quick enrichment without full profile extraction - ✅ Temporary/placeholder entry before full extraction --- ## File Naming Convention for Person Profiles **Format**: `{linkedin-slug}_{ISO-timestamp}.json` **Examples**: ``` alexandr-belov-bb547b46_20251210T120000Z.json giovanna-fossati_20251209T170000Z.json sandra-den-hamer-66024510_20251209T190000Z.json ``` **Components**: - `linkedin-slug`: The unique part of LinkedIn URL (e.g., `alexandr-belov-bb547b46`) - `ISO-timestamp`: Full timestamp with timezone (YYYYMMDDTHHMMSSZ) --- ## Person Profile JSON Structure ```json { "exa_search_metadata": { "query": "site:linkedin.com/in/{slug}", "search_timestamp": "2025-12-10T12:00:00Z", "extraction_tool": "exa_web_search_exa", "extraction_agent": "claude-sonnet-4-20250514" }, "linkedin_profile_url": "https://www.linkedin.com/in/{slug}", "profile_data": { "name": "Full Name", "headline": "Current Role", "location": "City, Region, Country", "current_company": "Organization Name", "department": "Department Name", "connections": 94, "followers": 94, "about": "Professional summary...", "total_experience_years": 15.5, "skills": ["Skill 1", "Skill 2", ...], "languages": [ {"language": "English", "proficiency": "fluent"}, ... ], "education": [...], "career_history": [...], "international_experience": [...], "project_experience": [...] }, "raw_exa_response_summary": { "source_url": "https://www.linkedin.com/in/{slug}", "search_type": "site-specific LinkedIn search", "data_tier": "TIER_3_CROWD_SOURCED" } } ``` --- ## Minimal Reference in Custodian File When referencing a person file, the custodian YAML needs only: ```yaml - name: Alexandr Belov # Display name (required) role: Collection/Information Specialist # Current role at this institution department: Collection and Research Center # Department (if known) linkedin_url: https://www.linkedin.com/in/alexandr-belov-bb547b46 # For linking current: true # Still employed at institution? person_profile_path: data/custodian/person/alexandr-belov-bb547b46_20251210T120000Z.json ``` **Optional additional fields** (if not in person file or institution-specific): - `start_date`: When they started at THIS institution - `note`: Institution-specific notes - `source`: Where this association was discovered --- ## Migration: Converting Inline to Reference When you find inline person data that should be a file reference: 1. **Create person file**: ```bash # Save to data/custodian/person/{slug}_{timestamp}.json ``` 2. **Update custodian YAML**: ```yaml # Replace 50+ lines of inline data with: - name: Person Name role: Their Role linkedin_url: https://linkedin.com/in/{slug} current: true person_profile_path: data/custodian/person/{slug}_{timestamp}.json ``` 3. **Verify file exists** before removing inline data --- ## Cross-Custodian References When the same person works at multiple institutions: ```yaml # In NL-NH-AMS-U-EFM-eye_filmmuseum.yaml former_directors: - name: Sandra den Hamer role: Director tenure_start: '2010-01' tenure_end: '2023-02' person_profile_path: data/custodian/person/sandra-den-hamer-66024510_20251209T190000Z.json # In NL-ZH-DHA-O-NFF-netherlands_filmfonds.yaml management: - name: Sandra den Hamer role: Interim CEO start_date: '2023-05' current: true person_profile_path: data/custodian/person/sandra-den-hamer-66024510_20251209T190000Z.json ``` Both institutions reference the SAME person file - single source of truth. --- ## Provenance Tracking The person file contains full provenance: ```json { "exa_search_metadata": { "query": "site:linkedin.com/in/alexandr-belov-bb547b46", "search_timestamp": "2025-12-10T12:00:00Z", "extraction_tool": "exa_web_search_exa", "extraction_agent": "claude-sonnet-4-20250514" } } ``` The custodian file does NOT need to duplicate this - it inherits provenance from the referenced file. --- ## See Also - `AGENTS.md` - Rule 5: NEVER Delete Enriched Data - `.opencode/DATA_PRESERVATION_RULES.md` - Data preservation guidelines - `schemas/20251121/linkml/modules/classes/PersonObservation.yaml` - PiCo-based person modeling