218 lines
5.7 KiB
Markdown
218 lines
5.7 KiB
Markdown
# Rule 31: Organizational Subdivision Extraction
|
|
|
|
**🚨 CRITICAL: When extracting person/staff affiliations, ALWAYS capture organizational subdivisions (departments, teams, units, divisions, sections) as structured data. This information is highly valuable for understanding institutional structure.**
|
|
|
|
## Why This Matters
|
|
|
|
Organizational subdivisions reveal:
|
|
1. **Institutional Structure**: How heritage custodians organize their work
|
|
2. **Expertise Clustering**: Where specific skills/knowledge are concentrated
|
|
3. **Contact Routing**: Who to contact for specific inquiries
|
|
4. **Network Analysis**: How teams connect across institutions
|
|
5. **Career Tracking**: Movement between teams/departments over time
|
|
|
|
## What to Extract
|
|
|
|
| Subdivision Type | Examples | Field Name |
|
|
|------------------|----------|------------|
|
|
| **Department** | Collections Department, Conservation Department | `department` |
|
|
| **Team** | Data Science Team, Digitization Team | `team` |
|
|
| **Unit** | Research Unit, Acquisitions Unit | `unit` |
|
|
| **Division** | Public Services Division, Technical Services | `division` |
|
|
| **Section** | Photographs Section, Maps Section | `section` |
|
|
| **Lab/Center** | Conservation Lab, Research Center | `lab_or_center` |
|
|
| **Office** | Director's Office, Communications Office | `office` |
|
|
|
|
## Detection Patterns
|
|
|
|
### LinkedIn Headlines
|
|
Parse subdivision indicators from headlines:
|
|
|
|
```
|
|
"Kadaster Data Science Team | BSc Artificial Intelligence UU"
|
|
↓
|
|
organization: Kadaster
|
|
team: Data Science Team
|
|
|
|
"Senior Curator, Asian Art Department | Rijksmuseum"
|
|
↓
|
|
organization: Rijksmuseum
|
|
department: Asian Art Department
|
|
|
|
"Head of Conservation Lab | British Museum"
|
|
↓
|
|
organization: British Museum
|
|
lab_or_center: Conservation Lab
|
|
```
|
|
|
|
### Keywords to Detect
|
|
|
|
| Language | Keywords |
|
|
|----------|----------|
|
|
| **English** | team, department, dept., division, unit, section, lab, laboratory, center, centre, office, group, branch |
|
|
| **Dutch** | team, afdeling, afd., divisie, eenheid, sectie, laboratorium, centrum, kantoor, groep |
|
|
| **German** | Team, Abteilung, Abt., Division, Einheit, Sektion, Labor, Zentrum, Büro, Gruppe |
|
|
| **French** | équipe, département, dép., division, unité, section, laboratoire, centre, bureau, groupe |
|
|
|
|
## Data Structure
|
|
|
|
### In Person Entity Files
|
|
|
|
```json
|
|
{
|
|
"profile_data": {
|
|
"name": "Aron Noordhoek",
|
|
"headline": "Kadaster Data Science Team | BSc Artificial Intelligence UU"
|
|
},
|
|
"affiliations": [
|
|
{
|
|
"custodian_name": "Kadaster",
|
|
"custodian_slug": "kadaster",
|
|
"role_title": "Data Science Team Member",
|
|
"subdivision": {
|
|
"type": "team",
|
|
"name": "Data Science Team",
|
|
"parent_subdivision": null,
|
|
"extraction_source": "linkedin_headline"
|
|
},
|
|
"heritage_relevant": true,
|
|
"heritage_type": "D",
|
|
"current": true
|
|
}
|
|
]
|
|
}
|
|
```
|
|
|
|
### Nested Subdivisions
|
|
|
|
Some organizations have hierarchical subdivisions:
|
|
|
|
```json
|
|
{
|
|
"subdivision": {
|
|
"type": "section",
|
|
"name": "Photographs Section",
|
|
"parent_subdivision": {
|
|
"type": "department",
|
|
"name": "Collections Department"
|
|
},
|
|
"extraction_source": "institutional_website"
|
|
}
|
|
}
|
|
```
|
|
|
|
## Extraction Workflow
|
|
|
|
```
|
|
1. PARSE headline/role text
|
|
↓
|
|
2. IDENTIFY subdivision keywords (team, department, etc.)
|
|
↓
|
|
3. EXTRACT subdivision name
|
|
↓
|
|
4. CLASSIFY subdivision type
|
|
↓
|
|
5. CHECK for parent subdivisions (if hierarchical)
|
|
↓
|
|
6. STORE in structured format
|
|
```
|
|
|
|
## LinkUp Search Strategy
|
|
|
|
When using LinkUp to enrich profiles, specifically search for subdivision information:
|
|
|
|
```python
|
|
# Good queries for subdivision discovery
|
|
queries = [
|
|
f'"{person_name}" "{organization}" department team',
|
|
f'"{organization}" organizational structure',
|
|
f'"{organization}" staff directory departments',
|
|
]
|
|
```
|
|
|
|
## Examples from Real Data
|
|
|
|
### Example 1: Kadaster Data Science Team
|
|
```json
|
|
{
|
|
"name": "Aron Noordhoek",
|
|
"affiliations": [{
|
|
"custodian_name": "Kadaster",
|
|
"subdivision": {
|
|
"type": "team",
|
|
"name": "Data Science Team"
|
|
}
|
|
}]
|
|
}
|
|
```
|
|
|
|
### Example 2: Museum Department
|
|
```json
|
|
{
|
|
"name": "Sarah Johnson",
|
|
"affiliations": [{
|
|
"custodian_name": "Rijksmuseum",
|
|
"subdivision": {
|
|
"type": "department",
|
|
"name": "Paintings Conservation Department"
|
|
}
|
|
}]
|
|
}
|
|
```
|
|
|
|
### Example 3: Archive Unit
|
|
```json
|
|
{
|
|
"name": "Thomas van Berg",
|
|
"affiliations": [{
|
|
"custodian_name": "Nationaal Archief",
|
|
"subdivision": {
|
|
"type": "unit",
|
|
"name": "Digital Preservation Unit",
|
|
"parent_subdivision": {
|
|
"type": "department",
|
|
"name": "Collection Care"
|
|
}
|
|
}
|
|
}]
|
|
}
|
|
```
|
|
|
|
## Validation Rules
|
|
|
|
1. **Subdivision name MUST NOT be empty** if type is specified
|
|
2. **Type MUST be one of**: department, team, unit, division, section, lab_or_center, office
|
|
3. **extraction_source MUST be specified**: linkedin_headline, institutional_website, linkedin_experience, manual
|
|
4. **Parent subdivision (if any) MUST have valid type and name**
|
|
|
|
## Integration with Existing Rules
|
|
|
|
This rule complements:
|
|
- **Rule 12**: Person Data Reference Pattern
|
|
- **Rule 18**: Custodian Staff Parsing
|
|
- **Rule 20**: Person Entity Profiles
|
|
- **Rule 27**: Person-Custodian Data Architecture
|
|
|
|
## Provenance
|
|
|
|
When subdivision info comes from web sources, include provenance:
|
|
|
|
```json
|
|
{
|
|
"subdivision": {
|
|
"type": "department",
|
|
"name": "Conservation Department",
|
|
"provenance": {
|
|
"source_url": "https://www.museum.org/about/staff",
|
|
"retrieved_on": "2025-12-15T20:00:00Z",
|
|
"retrieval_agent": "firecrawl"
|
|
}
|
|
}
|
|
}
|
|
```
|
|
|
|
## See Also
|
|
|
|
- `schemas/20251121/linkml/modules/classes/OrganizationalUnit.yaml` (if exists)
|
|
- `.opencode/PERSON_CUSTODIAN_DATA_ARCHITECTURE.md`
|
|
- `AGENTS.md` Rule 31
|