- Introduced LEGAL-FORM-FILTER rule to standardize CustodianName by removing legal form designations. - Documented rationale, examples, and implementation guidelines for the filtering process. docs: Create README for value standardization rules - Established a comprehensive README outlining various value standardization rules applicable to Heritage Custodian classes. - Categorized rules into Name Standardization, Geographic Standardization, Web Observation, and Schema Evolution. feat: Implement transliteration standards for non-Latin scripts - Added TRANSLIT-ISO rule to ensure GHCID abbreviations are generated from emic names using ISO standards for transliteration. - Included detailed guidelines for various scripts and languages, along with implementation examples. feat: Define XPath provenance rules for web observations - Created XPATH-PROVENANCE rule mandating XPath pointers for claims extracted from web sources. - Established a workflow for archiving websites and verifying claims against archived HTML. chore: Update records lifecycle diagram - Generated a new Mermaid diagram illustrating the records lifecycle for heritage custodians. - Included phases for active records, inactive archives, and processed heritage collections with key relationships and classifications. |
||
|---|---|---|
| .. | ||
| ABBREVIATION_RULES.md | ||
| ENUM_TO_CLASS.md | ||
| GEONAMES_SETTLEMENT.md | ||
| LEGAL_FORM_FILTER.md | ||
| README.md | ||
| TRANSLITERATION.md | ||
| XPATH_PROVENANCE.md | ||
Value Standardization Rules
Location: schemas/20251121/linkml/rules/
Purpose: Data transformation and processing rules for achieving standardized values required by Heritage Custodian (HC) classes.
About These Rules
These rules are formally outside the LinkML schema convention but document HOW data values are:
- Transformed
- Converted
- Processed
- Normalized
to achieve the standardized values required by particular HC classes.
IMPORTANT: These are NOT LinkML validation rules. They are processing instructions for data pipelines and extraction agents.
Rule Categories
1. Name Standardization Rules
| Rule ID | File | Applies To | Summary |
|---|---|---|---|
| LEGAL-FORM-FILTER | LEGAL_FORM_FILTER.md |
CustodianName |
Remove legal form terms (Stichting, Foundation, Inc.) from emic names |
| ABBREV-CHAR-FILTER | ABBREVIATION_RULES.md |
GHCID abbreviation | Remove special characters (&, /, +, @) and normalize diacritics to ASCII |
| TRANSLIT-ISO | TRANSLITERATION.md |
GHCID abbreviation | Transliterate non-Latin scripts (Cyrillic, CJK, Arabic) using ISO standards |
2. Geographic Standardization Rules
| Rule ID | File | Applies To | Summary |
|---|---|---|---|
| GEONAMES-SETTLEMENT | GEONAMES_SETTLEMENT.md |
Settlement codes | Use GeoNames as single source for settlement names |
| FEATURE-CODE-FILTER | GEONAMES_SETTLEMENT.md |
Reverse geocoding | Only use PPL* feature codes, never PPLX (neighborhoods) |
3. Web Observation Rules
| Rule ID | File | Applies To | Summary |
|---|---|---|---|
| XPATH-PROVENANCE | XPATH_PROVENANCE.md |
WebClaim |
Every web claim MUST have XPath pointer to archived HTML |
4. Schema Evolution Rules
| Rule ID | File | Applies To | Summary |
|---|---|---|---|
| ENUM-TO-CLASS | ENUM_TO_CLASS.md |
Enums/Classes | When enum promoted to class hierarchy, delete original enum |
GLAMORCUBESFIXPHDNT Taxonomy Applicability
Each rule primarily applies to certain custodian types:
| Rule | Primary Types | All Types |
|---|---|---|
| LEGAL-FORM-FILTER | All | ✅ |
| ABBREV-SPECIAL-CHAR | All | ✅ |
| ABBREV-DIACRITICS | All | ✅ |
| TRANSLITERATION | International (non-Latin script countries) | Partial |
| GEONAMES-SETTLEMENT | All | ✅ |
| XPATH-PROVENANCE | D (Digital platforms) | Partial |
Integration with bronhouder.nl
These rules are displayed under a separate "Regels" (Rules) category on the bronhouder.nl LinkML visualization page, distinct from:
- Classes
- Slots
- Enums
- Instances
Each rule includes:
- Rule ID (short identifier)
- Applicable class(es)
- GLAMORCUBESFIXPHDNT type indicator
- Transformation examples
- Implementation code (Python)
Rule Template
New rules should follow this template:
# Rule Title
**Rule ID**: SHORT-ID
**Status**: MANDATORY | RECOMMENDED | OPTIONAL
**Applies To**: Class or slot name
**Created**: YYYY-MM-DD
**Updated**: YYYY-MM-DD
---
## Summary
One-paragraph summary of what this rule does.
---
## Rationale
Why this rule exists (numbered list of reasons).
---
## Specification
Detailed specification with examples.
---
## Implementation
Python code showing how to implement this rule.
---
## Examples
| Input | Output | Explanation |
|-------|--------|-------------|
---
## Related Rules
- Other related rules
---
## Changelog
| Date | Change |
|------|--------|
File List
rules/
├── README.md # This file (rule index)
├── ABBREVIATION_RULES.md # ABBREV-CHAR-FILTER: Special char + diacritics normalization
├── LEGAL_FORM_FILTER.md # LEGAL-FORM-FILTER: Legal form removal from emic names
├── GEONAMES_SETTLEMENT.md # GEONAMES-SETTLEMENT: Geographic standardization via GeoNames
├── XPATH_PROVENANCE.md # XPATH-PROVENANCE: WebClaim XPath requirements
├── TRANSLITERATION.md # TRANSLIT-ISO: Non-Latin script transliteration
└── ENUM_TO_CLASS.md # ENUM-TO-CLASS: Schema evolution pattern