glam/.opencode/ORGANIZATIONAL_SUBDIVISION_EXTRACTION.md
2025-12-15 22:31:41 +01:00

218 lines
5.7 KiB
Markdown

# Rule 31: Organizational Subdivision Extraction
**🚨 CRITICAL: When extracting person/staff affiliations, ALWAYS capture organizational subdivisions (departments, teams, units, divisions, sections) as structured data. This information is highly valuable for understanding institutional structure.**
## Why This Matters
Organizational subdivisions reveal:
1. **Institutional Structure**: How heritage custodians organize their work
2. **Expertise Clustering**: Where specific skills/knowledge are concentrated
3. **Contact Routing**: Who to contact for specific inquiries
4. **Network Analysis**: How teams connect across institutions
5. **Career Tracking**: Movement between teams/departments over time
## What to Extract
| Subdivision Type | Examples | Field Name |
|------------------|----------|------------|
| **Department** | Collections Department, Conservation Department | `department` |
| **Team** | Data Science Team, Digitization Team | `team` |
| **Unit** | Research Unit, Acquisitions Unit | `unit` |
| **Division** | Public Services Division, Technical Services | `division` |
| **Section** | Photographs Section, Maps Section | `section` |
| **Lab/Center** | Conservation Lab, Research Center | `lab_or_center` |
| **Office** | Director's Office, Communications Office | `office` |
## Detection Patterns
### LinkedIn Headlines
Parse subdivision indicators from headlines:
```
"Kadaster Data Science Team | BSc Artificial Intelligence UU"
organization: Kadaster
team: Data Science Team
"Senior Curator, Asian Art Department | Rijksmuseum"
organization: Rijksmuseum
department: Asian Art Department
"Head of Conservation Lab | British Museum"
organization: British Museum
lab_or_center: Conservation Lab
```
### Keywords to Detect
| Language | Keywords |
|----------|----------|
| **English** | team, department, dept., division, unit, section, lab, laboratory, center, centre, office, group, branch |
| **Dutch** | team, afdeling, afd., divisie, eenheid, sectie, laboratorium, centrum, kantoor, groep |
| **German** | Team, Abteilung, Abt., Division, Einheit, Sektion, Labor, Zentrum, Büro, Gruppe |
| **French** | équipe, département, dép., division, unité, section, laboratoire, centre, bureau, groupe |
## Data Structure
### In Person Entity Files
```json
{
"profile_data": {
"name": "Aron Noordhoek",
"headline": "Kadaster Data Science Team | BSc Artificial Intelligence UU"
},
"affiliations": [
{
"custodian_name": "Kadaster",
"custodian_slug": "kadaster",
"role_title": "Data Science Team Member",
"subdivision": {
"type": "team",
"name": "Data Science Team",
"parent_subdivision": null,
"extraction_source": "linkedin_headline"
},
"heritage_relevant": true,
"heritage_type": "D",
"current": true
}
]
}
```
### Nested Subdivisions
Some organizations have hierarchical subdivisions:
```json
{
"subdivision": {
"type": "section",
"name": "Photographs Section",
"parent_subdivision": {
"type": "department",
"name": "Collections Department"
},
"extraction_source": "institutional_website"
}
}
```
## Extraction Workflow
```
1. PARSE headline/role text
2. IDENTIFY subdivision keywords (team, department, etc.)
3. EXTRACT subdivision name
4. CLASSIFY subdivision type
5. CHECK for parent subdivisions (if hierarchical)
6. STORE in structured format
```
## LinkUp Search Strategy
When using LinkUp to enrich profiles, specifically search for subdivision information:
```python
# Good queries for subdivision discovery
queries = [
f'"{person_name}" "{organization}" department team',
f'"{organization}" organizational structure',
f'"{organization}" staff directory departments',
]
```
## Examples from Real Data
### Example 1: Kadaster Data Science Team
```json
{
"name": "Aron Noordhoek",
"affiliations": [{
"custodian_name": "Kadaster",
"subdivision": {
"type": "team",
"name": "Data Science Team"
}
}]
}
```
### Example 2: Museum Department
```json
{
"name": "Sarah Johnson",
"affiliations": [{
"custodian_name": "Rijksmuseum",
"subdivision": {
"type": "department",
"name": "Paintings Conservation Department"
}
}]
}
```
### Example 3: Archive Unit
```json
{
"name": "Thomas van Berg",
"affiliations": [{
"custodian_name": "Nationaal Archief",
"subdivision": {
"type": "unit",
"name": "Digital Preservation Unit",
"parent_subdivision": {
"type": "department",
"name": "Collection Care"
}
}
}]
}
```
## Validation Rules
1. **Subdivision name MUST NOT be empty** if type is specified
2. **Type MUST be one of**: department, team, unit, division, section, lab_or_center, office
3. **extraction_source MUST be specified**: linkedin_headline, institutional_website, linkedin_experience, manual
4. **Parent subdivision (if any) MUST have valid type and name**
## Integration with Existing Rules
This rule complements:
- **Rule 12**: Person Data Reference Pattern
- **Rule 18**: Custodian Staff Parsing
- **Rule 20**: Person Entity Profiles
- **Rule 27**: Person-Custodian Data Architecture
## Provenance
When subdivision info comes from web sources, include provenance:
```json
{
"subdivision": {
"type": "department",
"name": "Conservation Department",
"provenance": {
"source_url": "https://www.museum.org/about/staff",
"retrieved_on": "2025-12-15T20:00:00Z",
"retrieval_agent": "firecrawl"
}
}
}
```
## See Also
- `schemas/20251121/linkml/modules/classes/OrganizationalUnit.yaml` (if exists)
- `.opencode/PERSON_CUSTODIAN_DATA_ARCHITECTURE.md`
- `AGENTS.md` Rule 31