glam/.opencode/PERSON_DATA_REFERENCE_PATTERN.md
2025-12-10 13:01:13 +01:00

6.8 KiB

Person Data Reference Pattern

Rule: Reference Person Files Instead of Inline Duplication

🚨 CRITICAL: When person profile data is already stored in data/custodian/person/, custodian files MUST reference the file path instead of duplicating the full profile inline.

This pattern reduces data duplication, ensures single-source-of-truth for person data, and makes updates easier to manage.


Directory Structure

data/custodian/
├── person/                              # Canonical person profile storage
│   ├── alexandr-belov-bb547b46_20251210T120000Z.json
│   ├── giovanna-fossati_20251209T170000Z.json
│   └── ...
├── NL-NH-AMS-U-EFM-eye_filmmuseum.yaml  # Custodian file references person/
└── ...

Pattern: Reference vs. Inline

WRONG - Full Inline Duplication

collection_management_specialist:
- name: Alexandr Belov
  role: Collection/Information Specialist for film-related materials
  department: Collection and Research Center
  linkedin_url: https://www.linkedin.com/in/alexandr-belov-bb547b46
  location: Amsterdam, North Holland, Netherlands
  current: true
  source: linkedin_exa
  linkedin_connections: 94
  linkedin_followers: 94
  about: >-
    International university librarian with information organization skills...
    [60+ lines of profile data]    
  total_experience_years: 15.5
  skills:
  - MARC 21 cataloging
  - RDA cataloging
  # ... 10 more skills
  languages:
  - language: English
    proficiency: fluent
  # ... 7 more languages
  education:
  - degree: Bachelor
    field: Linguistics
  career_history:
  # ... extensive career data
  provenance:
    source_urls:
    - https://www.linkedin.com/in/alexandr-belov-bb547b46
    extraction_tool: exa_web_search_exa
    extraction_timestamp: '2025-12-10T12:00:00Z'

CORRECT - File Path Reference

collection_management_specialist:
- name: Alexandr Belov
  role: Collection/Information Specialist for film-related materials
  department: Collection and Research Center
  linkedin_url: https://www.linkedin.com/in/alexandr-belov-bb547b46
  current: true
  person_profile_path: data/custodian/person/alexandr-belov-bb547b46_20251210T120000Z.json

When to Use Each Pattern

Use File Path Reference When:

  • Full profile data has already been extracted and saved
  • Person has extensive career history, skills, education
  • Profile data exceeds ~10 lines
  • Same person might be referenced by multiple custodians

Use Inline Data When:

  • Only basic info available (name, role, LinkedIn URL)
  • Person has minimal profile data
  • Quick enrichment without full profile extraction
  • Temporary/placeholder entry before full extraction

File Naming Convention for Person Profiles

Format: {linkedin-slug}_{ISO-timestamp}.json

Examples:

alexandr-belov-bb547b46_20251210T120000Z.json
giovanna-fossati_20251209T170000Z.json
sandra-den-hamer-66024510_20251209T190000Z.json

Components:

  • linkedin-slug: The unique part of LinkedIn URL (e.g., alexandr-belov-bb547b46)
  • ISO-timestamp: Full timestamp with timezone (YYYYMMDDTHHMMSSZ)

Person Profile JSON Structure

{
  "exa_search_metadata": {
    "query": "site:linkedin.com/in/{slug}",
    "search_timestamp": "2025-12-10T12:00:00Z",
    "extraction_tool": "exa_web_search_exa",
    "extraction_agent": "claude-sonnet-4-20250514"
  },
  "linkedin_profile_url": "https://www.linkedin.com/in/{slug}",
  "profile_data": {
    "name": "Full Name",
    "headline": "Current Role",
    "location": "City, Region, Country",
    "current_company": "Organization Name",
    "department": "Department Name",
    "connections": 94,
    "followers": 94,
    "about": "Professional summary...",
    "total_experience_years": 15.5,
    "skills": ["Skill 1", "Skill 2", ...],
    "languages": [
      {"language": "English", "proficiency": "fluent"},
      ...
    ],
    "education": [...],
    "career_history": [...],
    "international_experience": [...],
    "project_experience": [...]
  },
  "raw_exa_response_summary": {
    "source_url": "https://www.linkedin.com/in/{slug}",
    "search_type": "site-specific LinkedIn search",
    "data_tier": "TIER_3_CROWD_SOURCED"
  }
}

Minimal Reference in Custodian File

When referencing a person file, the custodian YAML needs only:

- name: Alexandr Belov                    # Display name (required)
  role: Collection/Information Specialist # Current role at this institution
  department: Collection and Research Center  # Department (if known)
  linkedin_url: https://www.linkedin.com/in/alexandr-belov-bb547b46  # For linking
  current: true                           # Still employed at institution?
  person_profile_path: data/custodian/person/alexandr-belov-bb547b46_20251210T120000Z.json

Optional additional fields (if not in person file or institution-specific):

  • start_date: When they started at THIS institution
  • note: Institution-specific notes
  • source: Where this association was discovered

Migration: Converting Inline to Reference

When you find inline person data that should be a file reference:

  1. Create person file:

    # Save to data/custodian/person/{slug}_{timestamp}.json
    
  2. Update custodian YAML:

    # Replace 50+ lines of inline data with:
    - name: Person Name
      role: Their Role
      linkedin_url: https://linkedin.com/in/{slug}
      current: true
      person_profile_path: data/custodian/person/{slug}_{timestamp}.json
    
  3. Verify file exists before removing inline data


Cross-Custodian References

When the same person works at multiple institutions:

# In NL-NH-AMS-U-EFM-eye_filmmuseum.yaml
former_directors:
- name: Sandra den Hamer
  role: Director
  tenure_start: '2010-01'
  tenure_end: '2023-02'
  person_profile_path: data/custodian/person/sandra-den-hamer-66024510_20251209T190000Z.json

# In NL-ZH-DHA-O-NFF-netherlands_filmfonds.yaml
management:
- name: Sandra den Hamer
  role: Interim CEO
  start_date: '2023-05'
  current: true
  person_profile_path: data/custodian/person/sandra-den-hamer-66024510_20251209T190000Z.json

Both institutions reference the SAME person file - single source of truth.


Provenance Tracking

The person file contains full provenance:

{
  "exa_search_metadata": {
    "query": "site:linkedin.com/in/alexandr-belov-bb547b46",
    "search_timestamp": "2025-12-10T12:00:00Z",
    "extraction_tool": "exa_web_search_exa",
    "extraction_agent": "claude-sonnet-4-20250514"
  }
}

The custodian file does NOT need to duplicate this - it inherits provenance from the referenced file.


See Also

  • AGENTS.md - Rule 5: NEVER Delete Enriched Data
  • .opencode/DATA_PRESERVATION_RULES.md - Data preservation guidelines
  • schemas/20251121/linkml/modules/classes/PersonObservation.yaml - PiCo-based person modeling