glam/.opencode/TRANSLITERATION_STANDARDS.md
kempersc 271545fa8b docs: add Z.AI GLM API and transliteration rules to AGENTS.md
- Add Rule 11 for Z.AI Coding Plan API usage (not BigModel)
- Add transliteration standards for non-Latin scripts
- Document GLM model options and Python implementation
2025-12-08 14:58:22 +01:00

23 KiB
Raw Permalink Blame History

Transliteration Standards for Non-Latin Scripts

Rule ID: TRANSLIT-ISO
Status: MANDATORY
Applies To: GHCID abbreviation generation from emic names in non-Latin scripts
Created: 2025-12-08


Summary

When generating GHCID abbreviations from institution names written in non-Latin scripts, the emic name MUST first be transliterated to Latin characters using the designated ISO or recognized standard for that script.

This rule affects 170 institutions across 21 languages with non-Latin writing systems.

Key Principles

  1. Emic name is preserved - The original script is stored in custodian_name.emic_name
  2. Transliteration is for processing only - Used to generate abbreviations
  3. ISO/recognized standards required - No ad-hoc romanization
  4. Deterministic output - Same input always produces same Latin output
  5. Existing GHCIDs grandfathered - Only applies to NEW custodians

Transliteration Standards by Script/Language

Cyrillic Scripts

Language ISO Code Standard Library/Tool Notes
Russian ru ISO 9:1995 transliterate Scientific transliteration
Ukrainian uk ISO 9:1995 transliterate Includes Ukrainian-specific letters
Bulgarian bg ISO 9:1995 transliterate Uses same Cyrillic base
Serbian sr ISO 9:1995 transliterate Serbian Cyrillic variant
Kazakh kk ISO 9:1995 transliterate Cyrillic-based (pre-2023)

ISO 9:1995 Mapping (Core Characters):

Cyrillic Latin Cyrillic Latin
А а A a П п P p
Б б B b Р р R r
В в V v С с S s
Г г G g Т т T t
Д д D d У у U u
Е е E e Ф ф F f
Ё ё Ë ë Х х H h
Ж ж Ž ž Ц ц C c
З з Z z Ч ч Č č
И и I i Ш ш Š š
Й й J j Щ щ Ŝ ŝ
К к K k Ъ ъ ʺ (hard sign)
Л л L l Ы ы Y y
М м M m Ь ь ʹ (soft sign)
Н н N n Э э È è
О о O o Ю ю Û û
Я я Â â

Example:

Input:  Институт восточных рукописей РАН
ISO 9:  Institut vostočnyh rukopisej RAN
Abbrev: IVRRAN → IVRRAN (after diacritic normalization)

CJK Scripts

Chinese (Hanzi)

Variant Standard Library/Tool Notes
Simplified Hanyu Pinyin (ISO 7098) pypinyin Standard PRC romanization
Traditional Hanyu Pinyin pypinyin Same standard applies

Pinyin Rules:

  • Tone marks are OMITTED for abbreviation (diacritics removed anyway)
  • Word boundaries follow natural spacing
  • Proper nouns capitalized

Example:

Input:  东巴文化博物院
Pinyin: Dōngbā Wénhuà Bówùyuàn
ASCII:  Dongba Wenhua Bowuyuan
Abbrev: DWB

Japanese (Kanji/Kana)

Standard Library/Tool Notes
Modified Hepburn pykakasi, romkan Most widely used internationally

Hepburn Rules:

  • Long vowels: ō, ū (normalized to o, u for abbreviation)
  • Particles: は (wa), を (wo), へ (e)
  • Syllabic n: ん = n (before vowels: n')

Example:

Input:  国立中央博物館
Romaji: Kokuritsu Chūō Hakubutsukan
ASCII:  Kokuritsu Chuo Hakubutsukan
Abbrev: KCH

Korean (Hangul)

Standard Library/Tool Notes
Revised Romanization (RR) korean-romanizer, hangul-romanize Official South Korean standard (2000)

RR Rules:

  • No diacritics (unlike McCune-Reischauer)
  • Consonant assimilation reflected in spelling
  • Word boundaries at natural breaks

Example:

Input:  독립기념관
RR:     Dongnip Ginyeomgwan
Abbrev: DG

Arabic Script

Language ISO Code Standard Library/Tool Notes
Arabic ar ISO 233-2:1993 arabic-transliteration Simplified standard
Persian/Farsi fa ISO 233-3:1999 persian-transliteration Persian extensions
Urdu ur ISO 233-3 + Urdu extensions urdu-transliteration Additional characters

ISO 233 Mapping (Core Arabic):

Arabic Name Latin
ا Alif ā / a
ب Ba b
ت Ta t
ث Tha
ج Jim ǧ / j
ح Ha
خ Kha ḫ / kh
د Dal d
ذ Dhal
ر Ra r
ز Zay z
س Sin s
ش Shin š / sh
ص Sad
ض Dad
ط Ta
ظ Za
ع Ayn ʿ
غ Ghayn ġ / gh
ف Fa f
ق Qaf q
ك Kaf k
ل Lam l
م Mim m
ن Nun n
ه Ha h
و Waw w / ū
ي Ya y / ī

Example (Arabic):

Input:  المكتبة الوطنية للمملكة المغربية
ISO:    al-Maktaba al-Waṭanīya lil-Mamlaka al-Maġribīya
ASCII:  al-Maktaba al-Wataniya lil-Mamlaka al-Maghribiya
Abbrev: MWMM (skip "al-" articles)

Example (Persian):

Input:  وزارت امور خارجه ایران
ISO:    Vezārat-e Omur-e Khāreǧe-ye Īrān
ASCII:  Vezarat-e Omur-e Khareje-ye Iran
Abbrev: VOKI (skip "e" connector)

Hebrew Script

Standard Library/Tool Notes
ISO 259-3:1999 hebrew-transliteration Simplified romanization

ISO 259 Mapping:

Hebrew Name Latin
א Aleph ʾ / (silent)
ב Bet b / v
ג Gimel g
ד Dalet d
ה He h
ו Vav v / o / u
ז Zayin z
ח Chet ḥ / ch
ט Tet ṭ / t
י Yod y / i
כ ך Kaf k / kh
ל Lamed l
מ ם Mem m
נ ן Nun n
ס Samekh s
ע Ayin ʿ / (silent)
פ ף Pe p / f
צ ץ Tsade ṣ / ts
ק Qof q / k
ר Resh r
ש Shin/Sin š / s
ת Tav t

Example:

Input:  ארכיון הסיפור העממי בישראל
ISO:    Arḵiyon ha-Sipur ha-ʿAmami be-Yiśraʾel
ASCII:  Arkhiyon ha-Sipur ha-Amami be-Yisrael
Abbrev: ASAY (skip "ha-" and "be-" articles)

Greek Script

Standard Library/Tool Notes
ISO 843:1997 greek-transliteration Romanization of Greek

ISO 843 Mapping:

Greek Latin Greek Latin
Α α A a Ν ν N n
Β β V v Ξ ξ X x
Γ γ G g Ο ο O o
Δ δ D d Π π P p
Ε ε E e Ρ ρ R r
Ζ ζ Z z Σ σ ς S s
Η η Ī ī Τ τ T t
Θ θ Th th Υ υ Y y
Ι ι I i Φ φ F f
Κ κ K k Χ χ Ch ch
Λ λ L l Ψ ψ Ps ps
Μ μ M m Ω ω Ō ō

Example:

Input:  Αρχαιολογικό Μουσείο Θεσσαλονίκης
ISO:    Archaiologikó Mouseío Thessaloníkīs
ASCII:  Archaiologiko Mouseio Thessalonikis
Abbrev: AMT

Indic Scripts

Language Script Standard Library/Tool
Hindi Devanagari ISO 15919 indic-transliteration
Bengali Bengali ISO 15919 indic-transliteration
Nepali Devanagari ISO 15919 indic-transliteration
Sinhala Sinhala ISO 15919 indic-transliteration

ISO 15919 Core Consonants (Devanagari):

Devanagari Latin Devanagari Latin
ka ta
kha tha
ga da
gha dha
ṅa na
ca pa
cha pha
ja ba
jha bha
ña ma
ṭa ya
ṭha ra
ḍa la
ḍha va
ṇa śa
ṣa
sa
ha

Example (Hindi):

Input:  राजस्थान प्राच्यविद्या प्रतिष्ठान
ISO:    Rājasthāna Prācyavidyā Pratiṣṭhāna
ASCII:  Rajasthana Pracyavidya Pratishthana
Abbrev: RPP

Southeast Asian Scripts

Language Script Standard Library/Tool
Thai Thai ISO 11940-2 thai-romanization
Khmer Khmer ALA-LC khmer-romanization

Thai Example:

Input:  สำนักหอจดหมายเหตุแห่งชาติ
ISO:    Samnak Ho Chotmaihet Haeng Chat
Abbrev: SHCHC

Khmer Example:

Input:  សារមន្ទីរទួលស្លែង
ALA-LC: Sāramanṭīr Tūl Slèṅ
ASCII:  Saramantir Tuol Sleng
Abbrev: STS

Other Scripts

Language Script Standard Library/Tool
Armenian Armenian ISO 9985 armenian-transliteration
Georgian Georgian ISO 9984 georgian-transliteration

Armenian Example:

Input:  Մdelays delays delays delays delays delays delays delays delays delays delays delays delays delays delaysdelays delays delays delays delays delays delaysdelays delaysdelays delaysdelays delaysатdelays delays delaysенадаранdelays delays delays
Input:  Մdelays delays delays delays delays delaysделays delays delaysատdelays delays delays delaysdelays delaysdelays delaysdelays delaysdelays delaysdelays delaysdelays delays delays delaysdeенadaran
Input:  Մdelays delays delays delaysатенадаранdelays delays delays delaysdeленадаран
Input:  Մdelays delays delaysатенадаран
Input:  Մdelays delaysатенадаран
Input:  Մатенадаран
Input:  Մatenadaran
ISO:    Matenadaran
Abbrev: M

Georgian Example:

Input:  ხელნაწერთა ეროვნული ცენტრი
ISO:    Xelnawerti Erovnuli C'ent'ri
ASCII:  Khelnawerti Erovnuli Centri
Abbrev: KEC

Implementation

Python Transliteration Utility

#!/usr/bin/env python3
"""
Transliteration utility for GHCID abbreviation generation.
Uses ISO and recognized standards for each script/language.
"""

import unicodedata
from typing import Optional

# Try importing transliteration libraries
try:
    from pypinyin import pinyin, Style
    HAS_PYPINYIN = True
except ImportError:
    HAS_PYPINYIN = False

try:
    import pykakasi
    HAS_PYKAKASI = True
except ImportError:
    HAS_PYKAKASI = False

try:
    from transliterate import translit
    HAS_TRANSLITERATE = True
except ImportError:
    HAS_TRANSLITERATE = False


def detect_script(text: str) -> str:
    """
    Detect the primary script of the input text.
    
    Returns one of:
    - 'latin': Latin alphabet
    - 'cyrillic': Cyrillic script
    - 'chinese': Chinese characters (Hanzi)
    - 'japanese': Japanese (mixed Kanji/Kana)
    - 'korean': Korean Hangul
    - 'arabic': Arabic script (includes Persian, Urdu)
    - 'hebrew': Hebrew script
    - 'greek': Greek script
    - 'devanagari': Devanagari (Hindi, Nepali, Sanskrit)
    - 'bengali': Bengali script
    - 'thai': Thai script
    - 'armenian': Armenian script
    - 'georgian': Georgian script
    - 'unknown': Cannot determine
    """
    script_ranges = {
        'cyrillic': (0x0400, 0x04FF),
        'arabic': (0x0600, 0x06FF),
        'hebrew': (0x0590, 0x05FF),
        'devanagari': (0x0900, 0x097F),
        'bengali': (0x0980, 0x09FF),
        'thai': (0x0E00, 0x0E7F),
        'greek': (0x0370, 0x03FF),
        'armenian': (0x0530, 0x058F),
        'georgian': (0x10A0, 0x10FF),
        'korean': (0xAC00, 0xD7AF),  # Hangul syllables
        'japanese_hiragana': (0x3040, 0x309F),
        'japanese_katakana': (0x30A0, 0x30FF),
        'chinese': (0x4E00, 0x9FFF),  # CJK Unified Ideographs
    }
    
    script_counts = {script: 0 for script in script_ranges}
    latin_count = 0
    
    for char in text:
        code = ord(char)
        
        # Check Latin
        if ('a' <= char <= 'z') or ('A' <= char <= 'Z'):
            latin_count += 1
            continue
            
        # Check other scripts
        for script, (start, end) in script_ranges.items():
            if start <= code <= end:
                script_counts[script] += 1
                break
    
    # Determine primary script
    if latin_count > 0 and all(c == 0 for c in script_counts.values()):
        return 'latin'
    
    # Find max non-Latin script
    max_script = max(script_counts, key=script_counts.get)
    if script_counts[max_script] > 0:
        # Handle Japanese (can be Kanji + Kana)
        if max_script in ('japanese_hiragana', 'japanese_katakana', 'chinese'):
            if script_counts['japanese_hiragana'] > 0 or script_counts['japanese_katakana'] > 0:
                return 'japanese'
            return 'chinese'
        return max_script
    
    return 'latin' if latin_count > 0 else 'unknown'


def transliterate_cyrillic(text: str, lang: str = 'ru') -> str:
    """Transliterate Cyrillic text using ISO 9."""
    if HAS_TRANSLITERATE:
        try:
            return translit(text, lang, reversed=True)
        except Exception:
            pass
    
    # Fallback: basic Cyrillic to Latin mapping
    cyrillic_map = {
        'А': 'A', 'Б': 'B', 'В': 'V', 'Г': 'G', 'Д': 'D', 'Е': 'E',
        'Ё': 'E', 'Ж': 'Zh', 'З': 'Z', 'И': 'I', 'Й': 'Y', 'К': 'K',
        'Л': 'L', 'М': 'M', 'Н': 'N', 'О': 'O', 'П': 'P', 'Р': 'R',
        'С': 'S', 'Т': 'T', 'У': 'U', 'Ф': 'F', 'Х': 'Kh', 'Ц': 'Ts',
        'Ч': 'Ch', 'Ш': 'Sh', 'Щ': 'Shch', 'Ъ': '', 'Ы': 'Y', 'Ь': '',
        'Э': 'E', 'Ю': 'Yu', 'Я': 'Ya',
        'а': 'a', 'б': 'b', 'в': 'v', 'г': 'g', 'д': 'd', 'е': 'e',
        'ё': 'e', 'ж': 'zh', 'з': 'z', 'и': 'i', 'й': 'y', 'к': 'k',
        'л': 'l', 'м': 'm', 'н': 'n', 'о': 'o', 'п': 'p', 'р': 'r',
        'с': 's', 'т': 't', 'у': 'u', 'ф': 'f', 'х': 'kh', 'ц': 'ts',
        'ч': 'ch', 'ш': 'sh', 'щ': 'shch', 'ъ': '', 'ы': 'y', 'ь': '',
        'э': 'e', 'ю': 'yu', 'я': 'ya',
        # Ukrainian additions
        'І': 'I', 'і': 'i', 'Ї': 'Yi', 'ї': 'yi', 'Є': 'Ye', 'є': 'ye',
        'Ґ': 'G', 'ґ': 'g',
    }
    return ''.join(cyrillic_map.get(c, c) for c in text)


def transliterate_chinese(text: str) -> str:
    """Transliterate Chinese to Pinyin."""
    if HAS_PYPINYIN:
        # Get pinyin without tone marks
        result = pinyin(text, style=Style.NORMAL)
        return ' '.join([''.join(p) for p in result])
    
    # Fallback: return as-is (requires manual handling)
    return text


def transliterate_japanese(text: str) -> str:
    """Transliterate Japanese to Romaji (Hepburn)."""
    if HAS_PYKAKASI:
        kakasi = pykakasi.kakasi()
        result = kakasi.convert(text)
        return ' '.join([item['hepburn'] for item in result])
    
    # Fallback: return as-is
    return text


def transliterate_korean(text: str) -> str:
    """Transliterate Korean Hangul to Revised Romanization."""
    # Korean romanization is complex - use library if available
    try:
        from korean_romanizer.romanizer import Romanizer
        r = Romanizer(text)
        return r.romanize()
    except ImportError:
        pass
    
    # Fallback: basic Hangul syllable decomposition
    # This is a simplified implementation
    return text


def transliterate_arabic(text: str) -> str:
    """Transliterate Arabic script to Latin (ISO 233 simplified)."""
    arabic_map = {
        'ا': 'a', 'أ': 'a', 'إ': 'i', 'آ': 'a',
        'ب': 'b', 'ت': 't', 'ث': 'th', 'ج': 'j',
        'ح': 'h', 'خ': 'kh', 'د': 'd', 'ذ': 'dh',
        'ر': 'r', 'ز': 'z', 'س': 's', 'ش': 'sh',
        'ص': 's', 'ض': 'd', 'ط': 't', 'ظ': 'z',
        'ع': "'", 'غ': 'gh', 'ف': 'f', 'ق': 'q',
        'ك': 'k', 'ل': 'l', 'م': 'm', 'ن': 'n',
        'ه': 'h', 'و': 'w', 'ي': 'y', 'ى': 'a',
        'ة': 'a', 'ء': "'",
        # Persian additions
        'پ': 'p', 'چ': 'ch', 'ژ': 'zh', 'گ': 'g',
        'ک': 'k', 'ی': 'i',
    }
    result = []
    for c in text:
        if c in arabic_map:
            result.append(arabic_map[c])
        elif c == ' ' or c.isalnum():
            result.append(c)
    return ''.join(result)


def transliterate_hebrew(text: str) -> str:
    """Transliterate Hebrew to Latin (ISO 259 simplified)."""
    hebrew_map = {
        'א': '', 'ב': 'v', 'ג': 'g', 'ד': 'd', 'ה': 'h',
        'ו': 'v', 'ז': 'z', 'ח': 'ch', 'ט': 't', 'י': 'y',
        'כ': 'k', 'ך': 'k', 'ל': 'l', 'מ': 'm', 'ם': 'm',
        'נ': 'n', 'ן': 'n', 'ס': 's', 'ע': '', 'פ': 'f',
        'ף': 'f', 'צ': 'ts', 'ץ': 'ts', 'ק': 'k', 'ר': 'r',
        'ש': 'sh', 'ת': 't',
    }
    result = []
    for c in text:
        if c in hebrew_map:
            result.append(hebrew_map[c])
        elif c == ' ' or c.isalnum():
            result.append(c)
    return ''.join(result)


def transliterate_greek(text: str) -> str:
    """Transliterate Greek to Latin (ISO 843)."""
    greek_map = {
        'Α': 'A', 'α': 'a', 'Β': 'V', 'β': 'v', 'Γ': 'G', 'γ': 'g',
        'Δ': 'D', 'δ': 'd', 'Ε': 'E', 'ε': 'e', 'Ζ': 'Z', 'ζ': 'z',
        'Η': 'I', 'η': 'i', 'Θ': 'Th', 'θ': 'th', 'Ι': 'I', 'ι': 'i',
        'Κ': 'K', 'κ': 'k', 'Λ': 'L', 'λ': 'l', 'Μ': 'M', 'μ': 'm',
        'Ν': 'N', 'ν': 'n', 'Ξ': 'X', 'ξ': 'x', 'Ο': 'O', 'ο': 'o',
        'Π': 'P', 'π': 'p', 'Ρ': 'R', 'ρ': 'r', 'Σ': 'S', 'σ': 's',
        'ς': 's', 'Τ': 'T', 'τ': 't', 'Υ': 'Y', 'υ': 'y', 'Φ': 'F',
        'φ': 'f', 'Χ': 'Ch', 'χ': 'ch', 'Ψ': 'Ps', 'ψ': 'ps',
        'Ω': 'O', 'ω': 'o',
    }
    return ''.join(greek_map.get(c, c) for c in text)


def transliterate_devanagari(text: str) -> str:
    """Transliterate Devanagari to Latin (ISO 15919 simplified)."""
    try:
        from indic_transliteration import sanscript
        from indic_transliteration.sanscript import transliterate as indic_translit
        return indic_translit(text, sanscript.DEVANAGARI, sanscript.IAST)
    except ImportError:
        pass
    
    # Fallback: basic mapping
    # This would need a full Devanagari character map
    return text


def transliterate_thai(text: str) -> str:
    """Transliterate Thai to Latin (Royal Thai General System)."""
    try:
        from thaispellcheck import transliterate as thai_translit
        return thai_translit(text)
    except ImportError:
        pass
    
    # Fallback
    return text


def transliterate(text: str, lang: Optional[str] = None) -> str:
    """
    Transliterate text from non-Latin script to Latin.
    
    Args:
        text: Input text in any script
        lang: Optional ISO 639-1 language code (e.g., 'ru', 'zh', 'ko')
              If not provided, script is auto-detected.
    
    Returns:
        Transliterated text in Latin characters.
    """
    if not text:
        return text
    
    # Detect script if language not provided
    if lang:
        script_map = {
            'ru': 'cyrillic', 'uk': 'cyrillic', 'bg': 'cyrillic',
            'sr': 'cyrillic', 'kk': 'cyrillic',
            'zh': 'chinese',
            'ja': 'japanese',
            'ko': 'korean',
            'ar': 'arabic', 'fa': 'arabic', 'ur': 'arabic',
            'he': 'hebrew',
            'el': 'greek',
            'hi': 'devanagari', 'ne': 'devanagari',
            'bn': 'bengali',
            'th': 'thai',
            'hy': 'armenian',
            'ka': 'georgian',
        }
        script = script_map.get(lang, detect_script(text))
    else:
        script = detect_script(text)
    
    # Apply appropriate transliteration
    transliterators = {
        'cyrillic': lambda t: transliterate_cyrillic(t, lang or 'ru'),
        'chinese': transliterate_chinese,
        'japanese': transliterate_japanese,
        'korean': transliterate_korean,
        'arabic': transliterate_arabic,
        'hebrew': transliterate_hebrew,
        'greek': transliterate_greek,
        'devanagari': transliterate_devanagari,
        'thai': transliterate_thai,
        'latin': lambda t: t,  # No transliteration needed
    }
    
    translit_func = transliterators.get(script, lambda t: t)
    result = translit_func(text)
    
    # Normalize diacritics to ASCII
    normalized = unicodedata.normalize('NFD', result)
    ascii_result = ''.join(c for c in normalized if unicodedata.category(c) != 'Mn')
    
    return ascii_result


def transliterate_for_abbreviation(emic_name: str, lang: str) -> str:
    """
    Transliterate emic name for GHCID abbreviation generation.
    
    This is the main entry point for GHCID generation scripts.
    
    Args:
        emic_name: Institution name in original script
        lang: ISO 639-1 language code
    
    Returns:
        Transliterated name ready for abbreviation extraction
    """
    # Step 1: Transliterate to Latin
    latin = transliterate(emic_name, lang)
    
    # Step 2: Normalize diacritics (handled in transliterate())
    
    # Step 3: Remove special characters (except spaces)
    import re
    clean = re.sub(r'[^a-zA-Z\s]', ' ', latin)
    
    # Step 4: Normalize whitespace
    clean = ' '.join(clean.split())
    
    return clean


# Example usage
if __name__ == '__main__':
    test_cases = [
        ('Институт восточных рукописей РАН', 'ru'),
        ('东巴文化博物院', 'zh'),
        ('독립기념관', 'ko'),
        ('राजस्थान प्राच्यविद्या प्रतिष्ठान', 'hi'),
        ('المكتبة الوطنية للمملكة المغربية', 'ar'),
        ('ארכיון הסיפור העממי בישראל', 'he'),
        ('Αρχαιολογικό Μουσείο Θεσσαλονίκης', 'el'),
    ]
    
    for name, lang in test_cases:
        result = transliterate_for_abbreviation(name, lang)
        print(f'{lang}: {name}')
        print(f'    → {result}')
        print()

Skip Words by Language

When extracting abbreviations from transliterated text, skip these articles/prepositions:

Arabic

  • al- (the definite article)
  • bi-, li-, fi- (prepositions)

Hebrew

  • ha- (the)
  • ve- (and)
  • be-, le-, me- (prepositions)

Persian

  • -e, -ye (ezafe connector)
  • va (and)

CJK Languages

  • No skip words (particles are integral to meaning)

Indic Languages

  • ka, ki, ke (Hindi: of)
  • aur (Hindi: and)

Validation

Check Transliteration Output

def validate_transliteration(result: str) -> bool:
    """
    Validate that transliteration output contains only ASCII letters and spaces.
    """
    import re
    return bool(re.match(r'^[a-zA-Z\s]+$', result))

Manual Review Queue

Non-Latin institutions should be flagged for manual review if:

  1. Transliteration library not available for that script
  2. Confidence in transliteration is low
  3. Institution has multiple official romanizations

  • AGENTS.md - Rule 12: Transliteration Standards
  • ABBREVIATION_SPECIAL_CHAR_RULE.md - Character filtering after transliteration
  • docs/TRANSLITERATION_CONVENTIONS.md - Extended examples and edge cases
  • scripts/transliterate_emic_names.py - Production transliteration script

Changelog

Date Change
2025-12-08 Initial standards document created