data(person/entity): add 83,845 LinkedIn profile extractions from company pages
Bulk extraction of heritage professional profiles from LinkedIn company pages
using extract_persons_with_provenance.py script.
Key characteristics:
- Source: LinkedIn company 'People' pages for heritage institutions
- File format: {linkedin-slug}_{timestamp}.json
- Total size: ~3.6GB
- Includes: profile_data, heritage_relevance, affiliations, web_claims
- Provenance: Full XPath + archived HTML references (Rule 6 compliant)
- Dual timestamps: statement_created_at + source_archived_at (Rule 35)
Extraction metadata includes:
- extraction_agent: extract_persons_with_provenance.py
- source_file: Original archived HTML filename
- source_archived_at: When LinkedIn page was captured
- schema_version: 1.0.0
Note: URL-encoded filenames preserve international characters (Arabic,
Hebrew, Chinese, Turkish, accented Latin, etc.)