# Rule 18: Custodian Staff Parsing from LinkedIn Company Pages **When manually registering heritage custodian staff from LinkedIn company "People" pages, use the `parse_custodian_staff.py` script to convert raw text files into structured JSON.** ## Overview LinkedIn company pages have a "People" section that lists all staff members with their names, job titles, connection degree, and mutual connections. This data is valuable for understanding the heritage sector workforce and building network analysis. ## File Locations | Type | Location | |------|----------| | **Raw input files** | `data/custodian/person/manual_hc/{slug}-{timestamp}.md` | | **Parsed output files** | `data/custodian/person/{slug}_staff_{timestamp}.json` | | **Parser script** | `scripts/parse_custodian_staff.py` | ## Input File Format Raw files are created by copy-pasting from LinkedIn company "People" pages: ``` Collectie Overijssel logo Collectie Overijssel Museums, Historical Sites, and Zoos Zwolle, Overijssel 2K followers 51-200 employees 58 associated members Annelien Vos-Keen Annelien Vos-Keen 2nd degree connection · 2nd Data Analist / KPI- en procesexpert Thomas van Maaren, Bob Coret, and 4 other mutual connections Martine de Boer Martine de Boer 2nd degree connection · 2nd Collectiespecialist bij Collectie Overijssel ... ``` ## Usage ```bash python scripts/parse_custodian_staff.py \ --custodian-name "Custodian Name" \ --custodian-slug "custodian-slug" ``` **Example**: ```bash python scripts/parse_custodian_staff.py \ data/custodian/person/manual_hc/collectie_overijssel-20251210T0055.md \ data/custodian/person/collectie_overijssel_staff_20251210T0055.json \ --custodian-name "Collectie Overijssel" \ --custodian-slug "collectie-overijssel" ``` **Dry-run mode** (parse without writing): ```bash python scripts/parse_custodian_staff.py input.md output.json \ --custodian-name "Name" --custodian-slug "slug" --dry-run ``` ## Output Structure ```json { "custodian_metadata": { "custodian_name": "Collectie Overijssel", "custodian_slug": "collectie-overijssel", "name": "Collectie Overijssel", "industry": "Museums, Historical Sites, and Zoos", "location": { "city": "Zwolle", "region": "Overijssel" }, "follower_count": "2K", "employee_count": "51-200", "associated_members": 58 }, "source_metadata": { "source_type": "linkedin_company_people_page", "registered_timestamp": "2025-12-10T00:55:00Z", "registration_method": "manual_linkedin_browse", "staff_extracted": 39 }, "staff": [ { "staff_id": "collectie-overijssel_staff_0000_annelien_vos_keen", "name": "Annelien Vos-Keen", "name_type": "full", "degree": "2nd", "headline": "Data Analist / KPI- en procesexpert", "mutual_connections": "Thomas van Maaren, Bob Coret, and 4 other mutual connections", "heritage_relevant": true, "heritage_type": "D" } ], "staff_analysis": { "total_staff_extracted": 39, "heritage_relevant_count": 25, "heritage_relevant_percentage": 64.1, "staff_by_heritage_type": { "A": 4, "D": 1, "E": 1, "M": 18, "S": 1 }, "staff_by_degree": { "1st": 2, "2nd": 37 }, "staff_by_name_type": { "abbreviated": 1, "full": 38 }, "common_roles": { "Medewerker": 7, "Coördinator": 5, "Beheerder": 4 } }, "provenance": { "data_source": "LINKEDIN_MANUAL_REGISTER", "data_tier": "TIER_3_CROWD_SOURCED" } } ``` ## Staff ID Format ``` {custodian_slug}_staff_{index:04d}_{name_slug} ``` **Examples**: - `collectie-overijssel_staff_0000_annelien_vos_keen` - `nationaal-archief_staff_0042_afelonne_doek` ## Heritage Type Detection Staff headlines are analyzed for heritage relevance using GLAMORCUBESFIXPHDNT keywords: | Code | Type | Keywords | |------|------|----------| | A | Archive | archief, archivist, archivaris, nationaal archief | | M | Museum | museum, curator, conservator, collectie | | L | Library | library, bibliotheek, librarian | | D | Digital | digital, data, developer, digitalisering | | E | Education | university, professor, docent, educatie | | R | Research | research, onderzoek, historicus | | ... | ... | ... | ## Name Types | Type | Description | Example | |------|-------------|---------| | `full` | Complete first + last name | "Vincent Robijn" | | `abbreviated` | Contains single-letter initial | "Fairoesh N.", "A.J. Gevers" | | `anonymous` | Privacy-hidden profile | "LinkedIn Member" | ## Existing Files | Custodian | Staff Count | Output File | |-----------|-------------|-------------| | Collectie Overijssel | 39 | `collectie_overijssel_staff_20251210T0055.json` | | Nationaal Archief | 373 | `nationaal_archief_staff_20251209T2354.json` | ## Integration with Other Scripts This script complements `parse_linkedin_connections.py` (Rule 15): | Script | Purpose | Input | |--------|---------|-------| | `parse_linkedin_connections.py` | Parse PERSON's connections | Individual profile connections | | `parse_custodian_staff.py` | Parse ORGANIZATION's staff | Company "People" page | ## See Also - Rule 15: Connection Data Registration - Rule 14: Exa MCP LinkedIn Profile Extraction - `AGENTS.md` - Complete agent instructions