629 lines
22 KiB
Markdown
629 lines
22 KiB
Markdown
# Person-Custodian Data Architecture
|
|
|
|
## Overview
|
|
|
|
This document describes the data architecture for managing person/staff information in the GLAM Heritage Custodian project. The architecture follows a **Single Source of Truth** pattern where person entity files contain all person-specific data, while custodian files contain only references and affiliation provenance.
|
|
|
|
## Table of Contents
|
|
|
|
1. [Architecture Principles](#architecture-principles)
|
|
2. [Directory Structure](#directory-structure)
|
|
3. [Data Model](#data-model)
|
|
4. [Person Entity Files](#person-entity-files)
|
|
5. [Custodian YAML Files](#custodian-yaml-files)
|
|
6. [Data Flow](#data-flow)
|
|
7. [Scripts and Tools](#scripts-and-tools)
|
|
8. [Examples](#examples)
|
|
9. [Migration Guide](#migration-guide)
|
|
10. [FAQ](#faq)
|
|
|
|
---
|
|
|
|
## Architecture Principles
|
|
|
|
### 1. Single Source of Truth
|
|
|
|
**Person entity files are the authoritative source for all person data.**
|
|
|
|
- Profile information (name, headline, about, experience, education, skills)
|
|
- Web claims (provenance for extracted data)
|
|
- Affiliations (all custodians this person is associated with)
|
|
|
|
### 2. Separation of Concerns
|
|
|
|
**Different data types live in different locations:**
|
|
|
|
| Concern | Location | Rationale |
|
|
|---------|----------|-----------|
|
|
| Who is this person? | Entity file | Reusable across custodians |
|
|
| What is their background? | Entity file | Belongs to the person, not the custodian |
|
|
| Where did we get this data? | Entity file (web_claims) | Provenance is per-claim |
|
|
| How are they affiliated? | Custodian file | Relationship-specific data |
|
|
| When did we observe this? | Both | Entity has claim timestamps; Custodian has affiliation timestamp |
|
|
|
|
### 3. No Data Duplication
|
|
|
|
**Same person appearing at multiple institutions → ONE entity file**
|
|
|
|
```
|
|
Person: Sandra den Hamer
|
|
├── Entity: data/custodian/person/entity/sandra-den-hamer-66024510_20251209T190000Z.json
|
|
│ └── affiliations: [EYE Filmmuseum, Netherlands Film Fund]
|
|
│
|
|
├── Reference: data/custodian/NL-NH-AMS-U-EFM.yaml
|
|
│ └── linkedin_profile_path: → entity file
|
|
│
|
|
└── Reference: data/custodian/NL-ZH-DHA-O-NFF.yaml
|
|
└── linkedin_profile_path: → entity file (SAME file!)
|
|
```
|
|
|
|
### 4. Cross-Custodian Career Tracking
|
|
|
|
Entity files track all affiliations, enabling queries like:
|
|
- "Who has worked at multiple archives?"
|
|
- "Show career paths in the heritage sector"
|
|
- "Find people who moved from museums to archives"
|
|
|
|
---
|
|
|
|
## Directory Structure
|
|
|
|
```
|
|
data/custodian/
|
|
├── person/
|
|
│ │
|
|
│ ├── entity/ # SINGLE SOURCE OF TRUTH
|
|
│ │ ├── bibianvanreeken_20251211T000000Z.json
|
|
│ │ ├── giovanna-fossati_20251209T170000Z.json
|
|
│ │ ├── sandra-den-hamer-66024510_20251209T190000Z.json
|
|
│ │ └── ...
|
|
│ │
|
|
│ ├── affiliated/ # Staff lists by custodian
|
|
│ │ ├── manual/ # Raw HTML/MD input files
|
|
│ │ │ └── nationaal-archief_staff_20251214.html
|
|
│ │ └── parsed/ # Parsed JSON staff lists
|
|
│ │ ├── nationaal-archief_staff_20251214T112147Z.json
|
|
│ │ ├── noord-hollands-archief_staff_20251214T143055Z.json
|
|
│ │ └── ...
|
|
│ │
|
|
│ └── connection/ # Professional network data
|
|
│ ├── manual/ # Raw connection lists
|
|
│ │ └── giovanna-fossati_connections_20251211.md
|
|
│ └── parsed/ # Parsed connection JSON
|
|
│ └── giovanna-fossati_connections_20251211T140000Z.json
|
|
│
|
|
├── NL-ZH-DHA-A-NA.yaml # Custodian files reference entity/
|
|
├── NL-NH-HAA-A-NHA.yaml
|
|
├── NL-GE-ARN-A-GA.yaml
|
|
├── NL-UT-UTR-A-UA.yaml
|
|
└── ...
|
|
```
|
|
|
|
### File Naming Conventions
|
|
|
|
| File Type | Pattern | Example |
|
|
|-----------|---------|---------|
|
|
| Person entity | `{linkedin_slug}_{ISO_timestamp}.json` | `bibianvanreeken_20251211T000000Z.json` |
|
|
| Staff list (parsed) | `{custodian_slug}_staff_{ISO_timestamp}.json` | `nationaal-archief_staff_20251214T112147Z.json` |
|
|
| Connections | `{linkedin_slug}_connections_{ISO_timestamp}.json` | `giovanna-fossati_connections_20251211T140000Z.json` |
|
|
|
|
---
|
|
|
|
## Data Model
|
|
|
|
### Conceptual Model
|
|
|
|
```
|
|
┌──────────────────┐ ┌──────────────────┐
|
|
│ Person Entity │ │ Custodian │
|
|
│ │ N:M │ │
|
|
│ - profile_data │◄───────►│ - name │
|
|
│ - web_claims │ │ - ghcid │
|
|
│ - affiliations │ │ - staff[] │
|
|
│ │ │ │
|
|
└──────────────────┘ └──────────────────┘
|
|
│ │
|
|
│ 1:N │ 1:N
|
|
▼ ▼
|
|
┌──────────────────┐ ┌──────────────────┐
|
|
│ Web Claim │ │ Staff Entry │
|
|
│ │ │ │
|
|
│ - claim_type │ │ - person_id │
|
|
│ - claim_value │ │ - person_name │
|
|
│ - source_url │ │ - role_title │
|
|
│ - retrieved_on │ │ - affiliation_ │
|
|
│ - retrieval_ │ │ provenance │
|
|
│ agent │ │ - linkedin_ │
|
|
│ │ │ profile_path │
|
|
└──────────────────┘ └──────────────────┘
|
|
```
|
|
|
|
### Key Relationships
|
|
|
|
| Relationship | Cardinality | Description |
|
|
|--------------|-------------|-------------|
|
|
| Person ↔ Custodian | N:M | Person can work at multiple custodians; Custodian has multiple staff |
|
|
| Person → WebClaim | 1:N | One person has many provenance claims |
|
|
| Person → Affiliation | 1:N | One person has many affiliations (tracked in entity file) |
|
|
| Custodian → StaffEntry | 1:N | One custodian has many staff entries |
|
|
|
|
---
|
|
|
|
## Person Entity Files
|
|
|
|
### Location
|
|
|
|
`data/custodian/person/entity/{linkedin_slug}_{timestamp}.json`
|
|
|
|
### Complete Schema
|
|
|
|
```json
|
|
{
|
|
"extraction_metadata": {
|
|
"source_file": "string", // Path to source staff list
|
|
"staff_id": "string", // Unique identifier
|
|
"extraction_date": "ISO8601", // When profile was extracted
|
|
"extraction_method": "string", // exa_contents, exa_crawling_exa, manual
|
|
"extraction_agent": "string", // claude-opus-4.5 for manual, empty for automated
|
|
"linkedin_url": "string", // Full LinkedIn profile URL
|
|
"cost_usd": 0, // API cost (0 for Exa contents)
|
|
"request_id": "string" // Optional: Exa request ID
|
|
},
|
|
|
|
"linkedin_profile_url": "string", // Canonical LinkedIn URL
|
|
|
|
"profile_data": {
|
|
"name": "string", // Full name
|
|
"headline": "string", // Current role/headline
|
|
"location": "string", // City, Region, Country
|
|
"connections": "string", // "500 connections • 2,135 followers"
|
|
"about": "string", // Professional summary
|
|
"experience": [ // Work history
|
|
{
|
|
"title": "string",
|
|
"company": "string",
|
|
"duration": "string",
|
|
"location": "string",
|
|
"description": "string"
|
|
}
|
|
],
|
|
"education": [ // Education history
|
|
{
|
|
"school": "string",
|
|
"degree": "string",
|
|
"field": "string",
|
|
"years": "string"
|
|
}
|
|
],
|
|
"skills": ["string"], // Skills list
|
|
"languages": [ // Languages
|
|
{
|
|
"language": "string",
|
|
"proficiency": "string"
|
|
}
|
|
],
|
|
"profile_image_url": "string" // CDN URL for profile photo
|
|
},
|
|
|
|
"web_claims": [ // Provenance for extracted data
|
|
{
|
|
"claim_type": "string", // full_name, role_title, location, etc.
|
|
"claim_value": "string", // The extracted value
|
|
"source_url": "string", // Where it was found
|
|
"retrieved_on": "ISO8601", // When it was retrieved
|
|
"retrieval_agent": "string" // linkedin_html_parser, exa_crawling_exa, etc.
|
|
}
|
|
],
|
|
|
|
"affiliations": [ // All known custodian associations
|
|
{
|
|
"custodian_name": "string", // Full custodian name
|
|
"custodian_slug": "string", // Normalized slug
|
|
"role_title": "string", // Role at this custodian
|
|
"heritage_relevant": true, // Is this a heritage role?
|
|
"heritage_type": "A", // GLAMORCUBESFIXPHDNT type code
|
|
"current": true, // Currently employed?
|
|
"observed_on": "ISO8601", // When this affiliation was observed
|
|
"source_url": "string" // Where this was observed
|
|
}
|
|
]
|
|
}
|
|
```
|
|
|
|
### Required Fields
|
|
|
|
| Field | Required | Notes |
|
|
|-------|----------|-------|
|
|
| `extraction_metadata.extraction_date` | YES | ISO 8601 timestamp |
|
|
| `extraction_metadata.linkedin_url` | YES | Full LinkedIn profile URL |
|
|
| `linkedin_profile_url` | YES | Canonical URL (may duplicate above) |
|
|
| `profile_data.name` | YES | Full name |
|
|
| `web_claims` | YES | At least one claim (usually full_name) |
|
|
| `affiliations` | NO | May be empty if no custodian association known |
|
|
|
|
---
|
|
|
|
## Custodian YAML Files
|
|
|
|
### Location
|
|
|
|
`data/custodian/{GHCID}.yaml`
|
|
|
|
### Staff Entry Schema
|
|
|
|
```yaml
|
|
person_observations:
|
|
staff:
|
|
- person_id: string # Unique identifier (custodian_staff_NNNN_name_slug)
|
|
person_name: string # Full name (for display/search)
|
|
role_title: string # Current role at this custodian
|
|
heritage_relevant: boolean # Is this a heritage-relevant role?
|
|
heritage_type: string # GLAMORCUBESFIXPHDNT type code
|
|
current: boolean # Currently employed?
|
|
|
|
# AFFILIATION PROVENANCE - when/how was this association observed?
|
|
affiliation_provenance:
|
|
source_url: string # Where this association was found
|
|
retrieved_on: string # ISO 8601 timestamp
|
|
retrieval_agent: string # Tool used (linkedin_html_parser, etc.)
|
|
|
|
# REFERENCES to person entity file
|
|
linkedin_profile_url: string # For quick access/linking
|
|
linkedin_profile_path: string # Path to entity JSON file
|
|
```
|
|
|
|
### What NOT to Include
|
|
|
|
**Never put these in custodian YAML:**
|
|
|
|
- `web_claims` - Belongs in entity file
|
|
- `profile_data` - Belongs in entity file
|
|
- `experience` - Belongs in entity file
|
|
- `education` - Belongs in entity file
|
|
- `skills` - Belongs in entity file
|
|
- `about` - Belongs in entity file
|
|
- Full profile content of any kind
|
|
|
|
---
|
|
|
|
## Data Flow
|
|
|
|
### Complete Pipeline
|
|
|
|
```
|
|
┌─────────────────────────────────────────────────────────────────────────────┐
|
|
│ DATA FLOW PIPELINE │
|
|
└─────────────────────────────────────────────────────────────────────────────┘
|
|
|
|
PHASE 1: DATA COLLECTION
|
|
─────────────────────────
|
|
LinkedIn Company Page
|
|
│
|
|
▼ (Save HTML)
|
|
data/custodian/person/affiliated/manual/{slug}_staff_{date}.html
|
|
|
|
|
|
PHASE 2: PARSING
|
|
─────────────────
|
|
Manual HTML file
|
|
│
|
|
▼ (parse_linkedin_html.py)
|
|
data/custodian/person/affiliated/parsed/{slug}_staff_{timestamp}.json
|
|
│
|
|
│ Contains: List of {name, headline, linkedin_url, heritage_relevant}
|
|
│
|
|
|
|
|
|
PHASE 3: PROFILE EXTRACTION
|
|
───────────────────────────
|
|
Parsed staff list
|
|
│
|
|
▼ (Exa crawling OR manual extraction)
|
|
data/custodian/person/entity/{person_slug}_{timestamp}.json
|
|
│
|
|
│ Contains: Full profile_data, web_claims, affiliations
|
|
│
|
|
|
|
|
|
PHASE 4: LINKING
|
|
────────────────
|
|
Entity files + Custodian YAML
|
|
│
|
|
▼ (link_person_observations.py)
|
|
│
|
|
├──► Custodian YAML updated with:
|
|
│ - person_observations.staff[] entries
|
|
│ - affiliation_provenance
|
|
│ - linkedin_profile_path references
|
|
│
|
|
└──► Entity files updated with:
|
|
- web_claims (if not present)
|
|
- affiliations array (new custodian added)
|
|
```
|
|
|
|
### Script Responsibilities
|
|
|
|
| Script | Input | Output | Purpose |
|
|
|--------|-------|--------|---------|
|
|
| `parse_linkedin_html.py` | Raw HTML | `affiliated/parsed/*.json` | Extract staff list |
|
|
| `fetch_linkedin_profiles_exa.py` | Staff list | `entity/*.json` | Extract full profiles |
|
|
| `link_person_observations.py` | Entity files + Staff list | Updated YAML + Entity | Create references |
|
|
|
|
---
|
|
|
|
## Scripts and Tools
|
|
|
|
### parse_linkedin_html.py
|
|
|
|
**Purpose**: Parse LinkedIn company "People" pages to extract staff lists.
|
|
|
|
**Usage**:
|
|
```bash
|
|
python scripts/parse_linkedin_html.py \
|
|
"data/custodian/person/affiliated/manual/Nationaal Archief_ People _ LinkedIn.html" \
|
|
--custodian-name "Nationaal Archief" \
|
|
--custodian-slug "nationaal-archief" \
|
|
--output data/custodian/person/affiliated/parsed/nationaal-archief_staff_20251214T112147Z.json
|
|
```
|
|
|
|
**Output**: JSON file with staff entries containing:
|
|
- `name`, `headline`, `linkedin_url`
|
|
- `heritage_relevant`, `heritage_type`
|
|
- `degree` (LinkedIn connection degree)
|
|
|
|
### link_person_observations.py
|
|
|
|
**Purpose**: Link person entity files to custodian YAML files.
|
|
|
|
**Usage**:
|
|
```bash
|
|
python scripts/link_person_observations.py \
|
|
--custodian-file data/custodian/NL-ZH-DHA-A-NA.yaml \
|
|
--staff-file data/custodian/person/affiliated/parsed/nationaal-archief_staff_20251214T112147Z.json \
|
|
--entity-dir data/custodian/person/entity
|
|
```
|
|
|
|
**Actions**:
|
|
1. Reads staff list to get person identifiers
|
|
2. Finds matching entity files in `entity/`
|
|
3. Updates custodian YAML with `person_observations.staff[]`
|
|
4. Adds `affiliation_provenance` and `linkedin_profile_path`
|
|
5. Updates entity files with new affiliations and web_claims
|
|
|
|
### fetch_linkedin_profiles_exa.py
|
|
|
|
**Purpose**: Extract full LinkedIn profiles using Exa API.
|
|
|
|
**Usage**:
|
|
```bash
|
|
python scripts/fetch_linkedin_profiles_exa.py \
|
|
--staff-file data/custodian/person/affiliated/parsed/nationaal-archief_staff_20251214T112147Z.json \
|
|
--output-dir data/custodian/person/entity \
|
|
--limit 50
|
|
```
|
|
|
|
---
|
|
|
|
## Examples
|
|
|
|
### Example 1: Complete Person Entity File
|
|
|
|
```json
|
|
{
|
|
"extraction_metadata": {
|
|
"source_file": "data/custodian/person/affiliated/parsed/nationaal-archief_staff_20251214T112147Z.json",
|
|
"staff_id": "nationaal-archief_staff_0001_bibian_van_reeken",
|
|
"extraction_date": "2025-12-14T11:21:47Z",
|
|
"extraction_method": "exa_contents",
|
|
"extraction_agent": "claude-opus-4.5",
|
|
"linkedin_url": "https://www.linkedin.com/in/bibianvanreeken",
|
|
"cost_usd": 0
|
|
},
|
|
"linkedin_profile_url": "https://www.linkedin.com/in/bibianvanreeken",
|
|
"profile_data": {
|
|
"name": "Bibian van Reeken",
|
|
"headline": "Projectmanager Digitalisering bij het Nationaal Archief",
|
|
"location": "The Hague, South Holland, Netherlands",
|
|
"connections": "500+ connections",
|
|
"about": "Experienced project manager specializing in digitization...",
|
|
"experience": [
|
|
{
|
|
"title": "Projectmanager Digitalisering",
|
|
"company": "Nationaal Archief",
|
|
"duration": "3 years",
|
|
"location": "The Hague, Netherlands"
|
|
}
|
|
],
|
|
"education": [
|
|
{
|
|
"school": "Leiden University",
|
|
"degree": "Master",
|
|
"field": "History"
|
|
}
|
|
],
|
|
"skills": ["Project Management", "Digitization", "Archives"]
|
|
},
|
|
"web_claims": [
|
|
{
|
|
"claim_type": "full_name",
|
|
"claim_value": "Bibian van Reeken",
|
|
"source_url": "https://www.linkedin.com/in/bibianvanreeken",
|
|
"retrieved_on": "2025-12-14T11:21:47Z",
|
|
"retrieval_agent": "linkedin_html_parser"
|
|
},
|
|
{
|
|
"claim_type": "role_title",
|
|
"claim_value": "Projectmanager Digitalisering bij het Nationaal Archief",
|
|
"source_url": "https://www.linkedin.com/in/bibianvanreeken",
|
|
"retrieved_on": "2025-12-14T11:21:47Z",
|
|
"retrieval_agent": "linkedin_html_parser"
|
|
}
|
|
],
|
|
"affiliations": [
|
|
{
|
|
"custodian_name": "Nationaal Archief",
|
|
"custodian_slug": "nationaal-archief",
|
|
"role_title": "Projectmanager Digitalisering bij het Nationaal Archief",
|
|
"heritage_relevant": true,
|
|
"heritage_type": "A",
|
|
"current": true,
|
|
"observed_on": "2025-12-14T11:21:47Z",
|
|
"source_url": "https://www.linkedin.com/company/nationaal-archief/people/"
|
|
}
|
|
]
|
|
}
|
|
```
|
|
|
|
### Example 2: Custodian YAML Staff Section
|
|
|
|
```yaml
|
|
person_observations:
|
|
staff:
|
|
- person_id: nationaal-archief_staff_0001_bibian_van_reeken
|
|
person_name: Bibian van Reeken
|
|
role_title: Projectmanager Digitalisering bij het Nationaal Archief
|
|
heritage_relevant: true
|
|
heritage_type: A
|
|
current: true
|
|
affiliation_provenance:
|
|
source_url: https://www.linkedin.com/company/nationaal-archief/people/
|
|
retrieved_on: '2025-12-14T11:21:47Z'
|
|
retrieval_agent: linkedin_html_parser
|
|
linkedin_profile_url: https://www.linkedin.com/in/bibianvanreeken
|
|
linkedin_profile_path: data/custodian/person/entity/bibianvanreeken_20251211T000000Z.json
|
|
|
|
- person_id: nationaal-archief_staff_0002_jan_de_vries
|
|
person_name: Jan de Vries
|
|
role_title: Senior Archivist
|
|
heritage_relevant: true
|
|
heritage_type: A
|
|
current: true
|
|
affiliation_provenance:
|
|
source_url: https://www.linkedin.com/company/nationaal-archief/people/
|
|
retrieved_on: '2025-12-14T11:21:47Z'
|
|
retrieval_agent: linkedin_html_parser
|
|
linkedin_profile_url: https://www.linkedin.com/in/jandevries12345
|
|
linkedin_profile_path: data/custodian/person/entity/jandevries12345_20251214T150000Z.json
|
|
```
|
|
|
|
### Example 3: Cross-Custodian Reference
|
|
|
|
Person works at two custodians:
|
|
|
|
**Entity file** (`sandra-den-hamer-66024510_20251209T190000Z.json`):
|
|
```json
|
|
{
|
|
"affiliations": [
|
|
{
|
|
"custodian_name": "EYE Filmmuseum",
|
|
"custodian_slug": "eye-filmmuseum",
|
|
"role_title": "Director",
|
|
"current": false,
|
|
"observed_on": "2025-12-09T19:00:00Z"
|
|
},
|
|
{
|
|
"custodian_name": "Netherlands Film Fund",
|
|
"custodian_slug": "netherlands-filmfonds",
|
|
"role_title": "Interim CEO",
|
|
"current": true,
|
|
"observed_on": "2025-12-14T10:00:00Z"
|
|
}
|
|
]
|
|
}
|
|
```
|
|
|
|
**Custodian 1** (`NL-NH-AMS-U-EFM.yaml`):
|
|
```yaml
|
|
person_observations:
|
|
staff:
|
|
- person_id: eye-filmmuseum_staff_0001_sandra_den_hamer
|
|
person_name: Sandra den Hamer
|
|
role_title: Director
|
|
current: false
|
|
linkedin_profile_path: data/custodian/person/entity/sandra-den-hamer-66024510_20251209T190000Z.json
|
|
```
|
|
|
|
**Custodian 2** (`NL-ZH-DHA-O-NFF.yaml`):
|
|
```yaml
|
|
person_observations:
|
|
staff:
|
|
- person_id: netherlands-filmfonds_staff_0001_sandra_den_hamer
|
|
person_name: Sandra den Hamer
|
|
role_title: Interim CEO
|
|
current: true
|
|
linkedin_profile_path: data/custodian/person/entity/sandra-den-hamer-66024510_20251209T190000Z.json
|
|
```
|
|
|
|
**Note**: Both custodians reference the SAME entity file!
|
|
|
|
---
|
|
|
|
## Migration Guide
|
|
|
|
### Migrating from Inline Web Claims
|
|
|
|
If you have custodian files with inline `web_claims`, migrate them:
|
|
|
|
**Before** (incorrect):
|
|
```yaml
|
|
person_observations:
|
|
staff:
|
|
- person_id: example_staff_0001_john_doe
|
|
person_name: John Doe
|
|
web_claims: # WRONG - should not be here
|
|
- claim_type: full_name
|
|
claim_value: John Doe
|
|
```
|
|
|
|
**After** (correct):
|
|
```yaml
|
|
person_observations:
|
|
staff:
|
|
- person_id: example_staff_0001_john_doe
|
|
person_name: John Doe
|
|
affiliation_provenance:
|
|
source_url: https://www.linkedin.com/company/example/people/
|
|
retrieved_on: '2025-12-14T11:21:47Z'
|
|
retrieval_agent: linkedin_html_parser
|
|
linkedin_profile_path: data/custodian/person/entity/johndoe_20251214T000000Z.json
|
|
```
|
|
|
|
**Migration steps**:
|
|
1. Create entity file with profile data + web claims
|
|
2. Remove `web_claims` from custodian YAML
|
|
3. Add `affiliation_provenance` block
|
|
4. Add `linkedin_profile_path` reference
|
|
|
|
---
|
|
|
|
## FAQ
|
|
|
|
### Q: Why separate entity files from custodian files?
|
|
|
|
**A**: To avoid data duplication. A person working at 3 custodians would otherwise have their profile data copied 3 times. With this architecture, there's ONE entity file referenced 3 times.
|
|
|
|
### Q: Where do web claims go?
|
|
|
|
**A**: Always in the person entity file, never in custodian YAML. Web claims are about the person, not about their affiliation.
|
|
|
|
### Q: What if I don't have a LinkedIn URL?
|
|
|
|
**A**: You can still create an entity file using other sources (institutional website, manual research). Use a different slug pattern based on the available identifier.
|
|
|
|
### Q: Can a person have multiple entity files?
|
|
|
|
**A**: Ideally no - one person = one entity file. However, if you create duplicates by accident, they can be merged later. The `person_id` is the key identifier.
|
|
|
|
### Q: What timestamp format should I use?
|
|
|
|
**A**: ISO 8601 without separators: `YYYYMMDDTHHMMSSZ` (e.g., `20251214T112147Z`).
|
|
|
|
---
|
|
|
|
## Related Documentation
|
|
|
|
- **Agent Rules**: See `AGENTS.md` Rule 27
|
|
- **Agent Rule File**: `.opencode/PERSON_CUSTODIAN_DATA_ARCHITECTURE.md`
|
|
- **Person Reference Pattern**: `.opencode/PERSON_DATA_REFERENCE_PATTERN.md`
|
|
- **LinkedIn Extraction**: `.opencode/EXA_LINKEDIN_EXTRACTION_RULES.md`
|
|
- **Data Fabrication**: `.opencode/DATA_FABRICATION_PROHIBITION.md` (Rule 21)
|