glam/.opencode/CUSTODIAN_STAFF_PARSING_RULE.md
2025-12-10 13:01:13 +01:00

5.2 KiB

Rule 18: Custodian Staff Parsing from LinkedIn Company Pages

When manually registering heritage custodian staff from LinkedIn company "People" pages, use the parse_custodian_staff.py script to convert raw text files into structured JSON.

Overview

LinkedIn company pages have a "People" section that lists all staff members with their names, job titles, connection degree, and mutual connections. This data is valuable for understanding the heritage sector workforce and building network analysis.

File Locations

Type Location
Raw input files data/custodian/person/manual_hc/{slug}-{timestamp}.md
Parsed output files data/custodian/person/{slug}_staff_{timestamp}.json
Parser script scripts/parse_custodian_staff.py

Input File Format

Raw files are created by copy-pasting from LinkedIn company "People" pages:

Collectie Overijssel logo
Collectie Overijssel

Museums, Historical Sites, and Zoos
Zwolle, Overijssel
2K followers
51-200 employees

58 associated members

    Annelien Vos-Keen
    Annelien Vos-Keen
    2nd degree connection · 2nd
    Data Analist / KPI- en procesexpert
    Thomas van Maaren, Bob Coret, and 4 other mutual connections

Martine de Boer
Martine de Boer
2nd degree connection · 2nd
Collectiespecialist bij Collectie Overijssel
...

Usage

python scripts/parse_custodian_staff.py <input_file> <output_file> \
    --custodian-name "Custodian Name" \
    --custodian-slug "custodian-slug"

Example:

python scripts/parse_custodian_staff.py \
    data/custodian/person/manual_hc/collectie_overijssel-20251210T0055.md \
    data/custodian/person/collectie_overijssel_staff_20251210T0055.json \
    --custodian-name "Collectie Overijssel" \
    --custodian-slug "collectie-overijssel"

Dry-run mode (parse without writing):

python scripts/parse_custodian_staff.py input.md output.json \
    --custodian-name "Name" --custodian-slug "slug" --dry-run

Output Structure

{
  "custodian_metadata": {
    "custodian_name": "Collectie Overijssel",
    "custodian_slug": "collectie-overijssel",
    "name": "Collectie Overijssel",
    "industry": "Museums, Historical Sites, and Zoos",
    "location": { "city": "Zwolle", "region": "Overijssel" },
    "follower_count": "2K",
    "employee_count": "51-200",
    "associated_members": 58
  },
  "source_metadata": {
    "source_type": "linkedin_company_people_page",
    "registered_timestamp": "2025-12-10T00:55:00Z",
    "registration_method": "manual_linkedin_browse",
    "staff_extracted": 39
  },
  "staff": [
    {
      "staff_id": "collectie-overijssel_staff_0000_annelien_vos_keen",
      "name": "Annelien Vos-Keen",
      "name_type": "full",
      "degree": "2nd",
      "headline": "Data Analist / KPI- en procesexpert",
      "mutual_connections": "Thomas van Maaren, Bob Coret, and 4 other mutual connections",
      "heritage_relevant": true,
      "heritage_type": "D"
    }
  ],
  "staff_analysis": {
    "total_staff_extracted": 39,
    "heritage_relevant_count": 25,
    "heritage_relevant_percentage": 64.1,
    "staff_by_heritage_type": { "A": 4, "D": 1, "E": 1, "M": 18, "S": 1 },
    "staff_by_degree": { "1st": 2, "2nd": 37 },
    "staff_by_name_type": { "abbreviated": 1, "full": 38 },
    "common_roles": { "Medewerker": 7, "Coördinator": 5, "Beheerder": 4 }
  },
  "provenance": {
    "data_source": "LINKEDIN_MANUAL_REGISTER",
    "data_tier": "TIER_3_CROWD_SOURCED"
  }
}

Staff ID Format

{custodian_slug}_staff_{index:04d}_{name_slug}

Examples:

  • collectie-overijssel_staff_0000_annelien_vos_keen
  • nationaal-archief_staff_0042_afelonne_doek

Heritage Type Detection

Staff headlines are analyzed for heritage relevance using GLAMORCUBESFIXPHDNT keywords:

Code Type Keywords
A Archive archief, archivist, archivaris, nationaal archief
M Museum museum, curator, conservator, collectie
L Library library, bibliotheek, librarian
D Digital digital, data, developer, digitalisering
E Education university, professor, docent, educatie
R Research research, onderzoek, historicus
... ... ...

Name Types

Type Description Example
full Complete first + last name "Vincent Robijn"
abbreviated Contains single-letter initial "Fairoesh N.", "A.J. Gevers"
anonymous Privacy-hidden profile "LinkedIn Member"

Existing Files

Custodian Staff Count Output File
Collectie Overijssel 39 collectie_overijssel_staff_20251210T0055.json
Nationaal Archief 373 nationaal_archief_staff_20251209T2354.json

Integration with Other Scripts

This script complements parse_linkedin_connections.py (Rule 15):

Script Purpose Input
parse_linkedin_connections.py Parse PERSON's connections Individual profile connections
parse_custodian_staff.py Parse ORGANIZATION's staff Company "People" page

See Also

  • Rule 15: Connection Data Registration
  • Rule 14: Exa MCP LinkedIn Profile Extraction
  • AGENTS.md - Complete agent instructions