glam/frontend/public/schemas/20251121/linkml/rules/README.md
kempersc 3a6ead8fde feat: Add legal form filtering rule for CustodianName
- Introduced LEGAL-FORM-FILTER rule to standardize CustodianName by removing legal form designations.
- Documented rationale, examples, and implementation guidelines for the filtering process.

docs: Create README for value standardization rules

- Established a comprehensive README outlining various value standardization rules applicable to Heritage Custodian classes.
- Categorized rules into Name Standardization, Geographic Standardization, Web Observation, and Schema Evolution.

feat: Implement transliteration standards for non-Latin scripts

- Added TRANSLIT-ISO rule to ensure GHCID abbreviations are generated from emic names using ISO standards for transliteration.
- Included detailed guidelines for various scripts and languages, along with implementation examples.

feat: Define XPath provenance rules for web observations

- Created XPATH-PROVENANCE rule mandating XPath pointers for claims extracted from web sources.
- Established a workflow for archiving websites and verifying claims against archived HTML.

chore: Update records lifecycle diagram

- Generated a new Mermaid diagram illustrating the records lifecycle for heritage custodians.
- Included phases for active records, inactive archives, and processed heritage collections with key relationships and classifications.
2025-12-09 16:58:41 +01:00

4.3 KiB

Value Standardization Rules

Location: schemas/20251121/linkml/rules/
Purpose: Data transformation and processing rules for achieving standardized values required by Heritage Custodian (HC) classes.


About These Rules

These rules are formally outside the LinkML schema convention but document HOW data values are:

  • Transformed
  • Converted
  • Processed
  • Normalized

to achieve the standardized values required by particular HC classes.

IMPORTANT: These are NOT LinkML validation rules. They are processing instructions for data pipelines and extraction agents.


Rule Categories

1. Name Standardization Rules

Rule ID File Applies To Summary
LEGAL-FORM-FILTER LEGAL_FORM_FILTER.md CustodianName Remove legal form terms (Stichting, Foundation, Inc.) from emic names
ABBREV-CHAR-FILTER ABBREVIATION_RULES.md GHCID abbreviation Remove special characters (&, /, +, @) and normalize diacritics to ASCII
TRANSLIT-ISO TRANSLITERATION.md GHCID abbreviation Transliterate non-Latin scripts (Cyrillic, CJK, Arabic) using ISO standards

2. Geographic Standardization Rules

Rule ID File Applies To Summary
GEONAMES-SETTLEMENT GEONAMES_SETTLEMENT.md Settlement codes Use GeoNames as single source for settlement names
FEATURE-CODE-FILTER GEONAMES_SETTLEMENT.md Reverse geocoding Only use PPL* feature codes, never PPLX (neighborhoods)

3. Web Observation Rules

Rule ID File Applies To Summary
XPATH-PROVENANCE XPATH_PROVENANCE.md WebClaim Every web claim MUST have XPath pointer to archived HTML

4. Schema Evolution Rules

Rule ID File Applies To Summary
ENUM-TO-CLASS ENUM_TO_CLASS.md Enums/Classes When enum promoted to class hierarchy, delete original enum

GLAMORCUBESFIXPHDNT Taxonomy Applicability

Each rule primarily applies to certain custodian types:

Rule Primary Types All Types
LEGAL-FORM-FILTER All
ABBREV-SPECIAL-CHAR All
ABBREV-DIACRITICS All
TRANSLITERATION International (non-Latin script countries) Partial
GEONAMES-SETTLEMENT All
XPATH-PROVENANCE D (Digital platforms) Partial

Integration with bronhouder.nl

These rules are displayed under a separate "Regels" (Rules) category on the bronhouder.nl LinkML visualization page, distinct from:

  • Classes
  • Slots
  • Enums
  • Instances

Each rule includes:

  • Rule ID (short identifier)
  • Applicable class(es)
  • GLAMORCUBESFIXPHDNT type indicator
  • Transformation examples
  • Implementation code (Python)

Rule Template

New rules should follow this template:

# Rule Title

**Rule ID**: SHORT-ID  
**Status**: MANDATORY | RECOMMENDED | OPTIONAL  
**Applies To**: Class or slot name  
**Created**: YYYY-MM-DD  
**Updated**: YYYY-MM-DD

---

## Summary

One-paragraph summary of what this rule does.

---

## Rationale

Why this rule exists (numbered list of reasons).

---

## Specification

Detailed specification with examples.

---

## Implementation

Python code showing how to implement this rule.

---

## Examples

| Input | Output | Explanation |
|-------|--------|-------------|

---

## Related Rules

- Other related rules

---

## Changelog

| Date | Change |
|------|--------|

File List

rules/
├── README.md                  # This file (rule index)
├── ABBREVIATION_RULES.md      # ABBREV-CHAR-FILTER: Special char + diacritics normalization
├── LEGAL_FORM_FILTER.md       # LEGAL-FORM-FILTER: Legal form removal from emic names
├── GEONAMES_SETTLEMENT.md     # GEONAMES-SETTLEMENT: Geographic standardization via GeoNames
├── XPATH_PROVENANCE.md        # XPATH-PROVENANCE: WebClaim XPath requirements
├── TRANSLITERATION.md         # TRANSLIT-ISO: Non-Latin script transliteration
└── ENUM_TO_CLASS.md           # ENUM-TO-CLASS: Schema evolution pattern