5.2 KiB
5.2 KiB
Rule 18: Custodian Staff Parsing from LinkedIn Company Pages
When manually registering heritage custodian staff from LinkedIn company "People" pages, use the parse_custodian_staff.py script to convert raw text files into structured JSON.
Overview
LinkedIn company pages have a "People" section that lists all staff members with their names, job titles, connection degree, and mutual connections. This data is valuable for understanding the heritage sector workforce and building network analysis.
File Locations
| Type | Location |
|---|---|
| Raw input files | data/custodian/person/manual_hc/{slug}-{timestamp}.md |
| Parsed output files | data/custodian/person/{slug}_staff_{timestamp}.json |
| Parser script | scripts/parse_custodian_staff.py |
Input File Format
Raw files are created by copy-pasting from LinkedIn company "People" pages:
Collectie Overijssel logo
Collectie Overijssel
Museums, Historical Sites, and Zoos
Zwolle, Overijssel
2K followers
51-200 employees
58 associated members
Annelien Vos-Keen
Annelien Vos-Keen
2nd degree connection · 2nd
Data Analist / KPI- en procesexpert
Thomas van Maaren, Bob Coret, and 4 other mutual connections
Martine de Boer
Martine de Boer
2nd degree connection · 2nd
Collectiespecialist bij Collectie Overijssel
...
Usage
python scripts/parse_custodian_staff.py <input_file> <output_file> \
--custodian-name "Custodian Name" \
--custodian-slug "custodian-slug"
Example:
python scripts/parse_custodian_staff.py \
data/custodian/person/manual_hc/collectie_overijssel-20251210T0055.md \
data/custodian/person/collectie_overijssel_staff_20251210T0055.json \
--custodian-name "Collectie Overijssel" \
--custodian-slug "collectie-overijssel"
Dry-run mode (parse without writing):
python scripts/parse_custodian_staff.py input.md output.json \
--custodian-name "Name" --custodian-slug "slug" --dry-run
Output Structure
{
"custodian_metadata": {
"custodian_name": "Collectie Overijssel",
"custodian_slug": "collectie-overijssel",
"name": "Collectie Overijssel",
"industry": "Museums, Historical Sites, and Zoos",
"location": { "city": "Zwolle", "region": "Overijssel" },
"follower_count": "2K",
"employee_count": "51-200",
"associated_members": 58
},
"source_metadata": {
"source_type": "linkedin_company_people_page",
"registered_timestamp": "2025-12-10T00:55:00Z",
"registration_method": "manual_linkedin_browse",
"staff_extracted": 39
},
"staff": [
{
"staff_id": "collectie-overijssel_staff_0000_annelien_vos_keen",
"name": "Annelien Vos-Keen",
"name_type": "full",
"degree": "2nd",
"headline": "Data Analist / KPI- en procesexpert",
"mutual_connections": "Thomas van Maaren, Bob Coret, and 4 other mutual connections",
"heritage_relevant": true,
"heritage_type": "D"
}
],
"staff_analysis": {
"total_staff_extracted": 39,
"heritage_relevant_count": 25,
"heritage_relevant_percentage": 64.1,
"staff_by_heritage_type": { "A": 4, "D": 1, "E": 1, "M": 18, "S": 1 },
"staff_by_degree": { "1st": 2, "2nd": 37 },
"staff_by_name_type": { "abbreviated": 1, "full": 38 },
"common_roles": { "Medewerker": 7, "Coördinator": 5, "Beheerder": 4 }
},
"provenance": {
"data_source": "LINKEDIN_MANUAL_REGISTER",
"data_tier": "TIER_3_CROWD_SOURCED"
}
}
Staff ID Format
{custodian_slug}_staff_{index:04d}_{name_slug}
Examples:
collectie-overijssel_staff_0000_annelien_vos_keennationaal-archief_staff_0042_afelonne_doek
Heritage Type Detection
Staff headlines are analyzed for heritage relevance using GLAMORCUBESFIXPHDNT keywords:
| Code | Type | Keywords |
|---|---|---|
| A | Archive | archief, archivist, archivaris, nationaal archief |
| M | Museum | museum, curator, conservator, collectie |
| L | Library | library, bibliotheek, librarian |
| D | Digital | digital, data, developer, digitalisering |
| E | Education | university, professor, docent, educatie |
| R | Research | research, onderzoek, historicus |
| ... | ... | ... |
Name Types
| Type | Description | Example |
|---|---|---|
full |
Complete first + last name | "Vincent Robijn" |
abbreviated |
Contains single-letter initial | "Fairoesh N.", "A.J. Gevers" |
anonymous |
Privacy-hidden profile | "LinkedIn Member" |
Existing Files
| Custodian | Staff Count | Output File |
|---|---|---|
| Collectie Overijssel | 39 | collectie_overijssel_staff_20251210T0055.json |
| Nationaal Archief | 373 | nationaal_archief_staff_20251209T2354.json |
Integration with Other Scripts
This script complements parse_linkedin_connections.py (Rule 15):
| Script | Purpose | Input |
|---|---|---|
parse_linkedin_connections.py |
Parse PERSON's connections | Individual profile connections |
parse_custodian_staff.py |
Parse ORGANIZATION's staff | Company "People" page |
See Also
- Rule 15: Connection Data Registration
- Rule 14: Exa MCP LinkedIn Profile Extraction
AGENTS.md- Complete agent instructions