glam/docs/convention/schema/entity_annotation_rules_instance.yaml
2025-12-02 14:36:01 +01:00

1036 lines
33 KiB
YAML

# Entity Annotation Rules Instance - Section 3 of Convention v1.4.3
# Captures all entity annotation rules for the Gado2 annotation scheme
annotation_scheme: "Gado2"
no_double_tagging: true
context_sensitive: true
# General annotation policies
general_rules:
- Never use double tagging (e.g., 'Leiden University' is only tagged as
organisation, not a place)
- The exact same word can be tagged in different ways depending on its context
- When in doubt, check how words are tagged in previous ground truths
- This convention distinguishes titles from general designations:
* Titles are used/accepted by subjects (emic description)
* General designations are imposed upon subjects (ethnic description)
- Transcription status: Done → Final → Ground Truth (GT)
entity_types:
# 3.1 PERSON
- entity_type: PERSON
description: >-
Personal names and specific references to persons (named or unnamed).
Includes persons, animals with proper names, fictional characters,
gods, spirits, saints and prophets.
subcategories:
- name: personal_names
description: Given name and surname, patronyms
examples:
- "Gouverneur-Generaal van Starkenborgh"
- "President Soekarno"
- "Sultan Muzaffar"
- "Dr. Raden Soetomo"
- "De Heer Balkenende"
- "mr. dr. Beel"
- "Tumenggung Sukapura"
- "Officier Suksapura"
- "Officier Tumenggung Sukapura"
- "Lieutenant Winkelaar"
- "Lieutenant Adolph Winkelaar"
- "Mr. van Agt"
- "Mej. den Uyl"
- "Marquis De La Chetardie"
- "Meneer Drees"
- "gesaghebber Van Arrewijne"
- "gewesen Sicacalsen Serlasker Abdul Nabij"
- "princesse Gamoelamoe (vertaling van Ratu Gamulamu)"
- "Engelschen vice admiraal Cornisch"
- "Japara 's resident Lepeltak"
- "Japaras resident Falck"
- "resident Falck"
- "vice admiraal Cornisch"
- name: animal_names
description: Proper names of animals
examples:
- "Lassie"
- "Golden Retriever"
- name: fictional_characters
description: Proper names of fictional characters
examples:
- "Kuifje"
- name: religious_figures
description: Gods, spirits, saints and prophets
examples:
- "God"
- "Jesus"
- "Mohammed"
- "Heilige Geest"
- "Rama"
- name: specific_references
description: Specific references to a person without their name being mentioned
examples:
- "de Koning van Pruisen"
- "de Koningin van Nederland"
- "de Sultan van Ternate"
- "de Minister van Buitenlandse Zaken"
- "de Gouverneur-Generaal van Nederl.-Indië"
- "de Koning"
- "de Sultan"
- "de Directeur"
- "Tenno Heika"
- "zijn Vrouw"
- "haar slaaf"
- "de geseijde slavin"
- "de eerder genoemde leider"
- "den Capiteijn"
- "des Conincx"
- "dit Jongetie"
- "dat kind"
- "Zijne Majesteit"
- "haaren Coning"
- "dezen Prins"
- "dien Coning"
- "onsen Capiteijn"
- "Bapak itu (Indonesian)"
- "gemelte gouvern:r"
inclusion_rules:
- rule_id: PER_INC001
description: Include titles with personal names only in exceptional cases
conditions:
- "If titles/designations are required to identify a person (e.g., women referred to by husband's surname)"
- "If only given name or surname are mentioned, titles are always included, but general designations are not"
examples:
- "Gouverneur-Generaal van Starkenborgh"
- "President Soekarno"
- "Sultan Muzaffar"
- rule_id: PER_INC002
description: Include words between title and personal name
conditions:
- "The distance between a title and personal name can vary within a single phrase"
- "If another word stands between the title and the personal name, it will be tagged along"
examples:
- "Engelschen vice admiraal Cornisch"
- rule_id: PER_INC003
description: Include articles and demonstratives with specific references
conditions:
- "The article/demonstrative/genitive case/possessive pronouns are tagged along"
examples:
- "de Koning van Pruisen"
- "zijn Vrouw"
- "des Conincx"
- "dit Jongetie"
- rule_id: PER_INC004
description: Include backreferences
conditions:
- "References that point back to previously mentioned persons"
examples:
- "gemelte gouvern:r"
exclusion_rules:
- rule_id: PER_EXC001
description: Do not tag series of titles
rationale: >-
Series of titles (with or without indications of place names within
them) are not included
examples:
- "Engelschen vice admiraal (in series) - only tag final title + name"
- rule_id: PER_EXC002
description: Do not tag abstract references
rationale: >-
Abstract references (often indicated by plural forms or indefinite
articles) are not tagged
examples:
- "Een President"
- "Een Premier"
- "Een resident"
- "Een generaal-majoor"
- "Een Voorzitter"
- "hij ontving de titel dr."
- "ambassadeurs"
- "Koninginnen"
- "Koningen"
- "draagt de titel Graaf"
- "die persoon"
- "de mannen"
- "de vrouwen"
- "die lieden"
- "de kindere"
- "Bapak (Indonesisch)"
- "De slaaf Louis"
- rule_id: PER_EXC003
description: Do not tag pronouns
rationale: >-
Pronouns are not tagged. They are consistent enough to be traced
through regular expressions and do not require entity processing.
examples:
- "hij"
- rule_id: PER_EXC004
description: Do not tag general designations with complete personal names
rationale: Enhances entity linking process
examples:
- "geseijde slavin tanjong (slavin = designation, not tagged)"
- "de slaaf Anthonij (slaaf = designation, not tagged)"
- "de Javaanschen Cap:n Soeta Wangsa (Javaanschen Cap:n = designation)"
- "seekeren Moor op Batavia gen:t Anthonij (Moor = designation)"
- rule_id: PER_EXC005
description: Do not include associated organisations and places
rationale: These are tagged separately
examples:
- "De Minister van buitenlandse zaken (buitenlandse zaken = organisation, separate)"
# 3.2 PLACE
- entity_type: PLACE
description: >-
Geographic locations including streets, cities, provinces, countries,
continents, infrastructure, landforms, public spaces, buildings, and
astronomical objects.
subcategories:
- name: street_names_addresses
description: Street names and addresses
examples:
- "Laan van Meerdervoort"
- name: trajectories_directions
description: Trajectories and wind directions
examples:
- "schoone wegh met seer schoone water plaatsen versien, streckende meest O:Z:O. en W:N:W. aen de noordt zijde"
- "Oostzijde van Molenvliet"
- name: locations
description: Dorp, Stad, Provincie, Land, Continent, Bisdom, Zones
examples:
- "Sectie [0-9, I-V]"
- name: infrastructure
description: Brug, Haven, Dam
examples: []
- name: landform
description: Berg, Gebergte, Bos, Rivier, Bron, Rijstvelden, Vallei, Tuin, Natuurreservaat, Nationaal park, Plantage, Strand
examples: []
- name: public_space
description: >-
Plein, Veld, Theater, Museum, School, Markt, Vliegtuig, Station,
Zwembad, Ziekenhuis, Sportveld, Bioscoop, Tentoonstelling, Campus,
Lanceerplatform, Club, Huis, Universiteit, Bibliotheek, Bidruimte,
Medisch centrum, Parkeergarage, Speeltuin, Grafplaats, Bedrijvenpark
examples: []
- name: companies_as_places
description: >-
Companies only if contextualised as places (otherwise organisation):
Apotheek, Bar, Restaurant, Eettent, Depot, Hotel, Hostel, Fabriek,
Nachtclub, Muziekpodium
examples: []
- name: buildings
description: >-
Huis, Wooncomplex, Klooster, Kleuterzaal, Flat, Kazerne, Fort,
Verzorgingshuis, Winkelcentrum, Paleis
examples: []
- name: astronomical_objects
description: Zon, Aarde, Maan, Planeten, Comets (often named after persons/dates)
examples: []
- name: coordinates
description: Geographic coordinates
examples:
- "Z. breete van 35 — 31 ten langte 5 — 15"
- "parabel van 36 — 50"
- "3½ graed of 42 mijlen bewesten"
inclusion_rules:
- rule_id: PLC_INC001
description: Include relevant adjectives
conditions:
- "Directional and descriptive adjectives are tagged with place names"
examples:
- "Noord/Oost/Zuid/West(-kust)"
- "Aziatische-Afrikaanse landen"
- "Amerikaanse defensie-gebied"
- "Asia Raya"
- "Muang Thai"
- "Indonesische eilandenrijk"
- "Indonesische republiek"
- "Delische plantagegebied"
- "West-Javaanse Bandoeng"
- "Joodse gedeelte"
- rule_id: PLC_INC002
description: Include metonymy references
conditions:
- "Metonymy does not disqualify a place name (entity linking will classify as organisation later)"
examples:
- "Moskou weigert"
- "Den Haag bepaalt"
- "Nederland scoort"
- rule_id: PLC_INC003
description: Include articles for generic place references
conditions:
- "Articles are included when the place reference is generic"
examples:
- "De Laan van Meerdervoort"
- "in het comptoir"
exclusion_rules:
- rule_id: PLC_EXC001
description: Do not include articles for specific place names
rationale: Articles are not part of proper place names
examples:
- "Amsterdam (not 'de Amsterdam')"
- rule_id: PLC_EXC002
description: Do not tag representation of places as places
rationale: These are person references
examples:
- "De ambassadeur van Nederland (De ambassadeur = Person, not Nederland)"
# 3.3 ORGANISATION
- entity_type: ORGANISATION
description: >-
Organizations including companies, institutions, governments, branches,
associations, legislative bodies, political parties, military forces,
sports teams, meetings, bands, religious orders, and ships.
subcategories:
- name: companies
description: Studio, Bank, etc.
examples: []
- name: branches
description: Departments and organizational branches
examples:
- "ING Rotterdam"
- "Rekenkamer Gemeente Rotterdam"
- "Afdelingsbestuur NVM afdeling Arnhem"
- "NVM Arnhem"
- name: associations
description: Coöperatie, Markt (if not place)
examples: []
- name: public_facilities
description: School, Universiteit (if not already tagged as place)
examples: []
- name: legislative_body
description: Tweede kamer, etc.
examples: []
- name: grand_residence
description: Paleis (when referring to institution)
examples: []
- name: printer
description: Printing houses
examples: []
- name: news_agency
description: News agencies and media organizations
examples:
- "De NRC besloot"
- "medewerkers bij De Expres"
- "(cor. Volkskrant)"
- "hoofdredacteur van het Bataviaasch Nieuwsblad"
- name: media_campaign
description: PR campaigns
examples:
- "Hij organiseert de PR campagne"
- name: factory
description: Manufacturing facilities as organizations
examples: []
- name: political_party
description: Political parties
examples: []
- name: international_organisation
description: International organizations
examples:
- "Verenigde Naties"
- name: resistance_movement
description: Resistance movements
examples: []
- name: authorities
description: >-
Government, Ministries, Councils, Courts
examples:
- "Ministerie van Financiën"
- "Overzeese Rijksdelen"
- "Buitenlandse Zaken"
- "Raad voor Aangelegenheden van Indonesië (RAVI)"
- name: dynasties
description: Royal and ruling dynasties
examples:
- "Omajjaden dynastie"
- "Nasriden dynastie"
- name: military_forces
description: Army, Army units
examples: []
- name: sports_team
description: Sports teams
examples: []
- name: sport_tournament
description: Championship, Match
examples: []
- name: meeting
description: Conference
examples: []
- name: band_orchestra
description: Musical groups
examples:
- "Dé Carels"
- name: theatre_group
description: Theatre groups
examples: []
- name: religious_order
description: Religious orders
examples: []
- name: ship
description: Named ships
examples:
- "Stoomschip Sumatra"
inclusion_rules:
- rule_id: ORG_INC001
description: Tag branches with placenames without prepositions
conditions:
- "Placenames indicating branches are only tagged if there is no preposition between the organization and the placename"
examples:
- "ING Rotterdam (both tagged as organisation)"
- "Rekenkamer Gemeente Rotterdam"
- "NVM Arnhem"
- rule_id: ORG_INC002
description: Tag frequently repeated references to organisational groups
conditions:
- "Frequently repeated references to denominations which refer to organisations are tagged"
examples:
- "aen generael en raden (raden = council itself)"
exclusion_rules:
- rule_id: ORG_EXC001
description: Do not include articles
rationale: Articles are not part of organization names
examples:
- "Tweede Kamer (not 'de Tweede Kamer' in annotation)"
- rule_id: ORG_EXC002
description: Do not tag abbreviations separately
rationale: Abbreviations are tagged with full name or separately as needed
examples: []
- rule_id: ORG_EXC003
description: Do not tag groups lacking formal structure
rationale: These are considered denominations
examples:
- "De jongerenbeweging"
- "De vakbond bewegingen"
- "Aleis"
- rule_id: ORG_EXC004
description: Do not tag representatives as organisations
rationale: These are persons or denominations
examples:
- "De Minster van Buitenlandse Zaken (Minister = person)"
- rule_id: ORG_EXC005
description: Tag publications as textual references not organisations
conditions:
- "When the publication itself is referenced, not the company"
examples:
- "In de NRC stond ... (NRC = textual reference)"
- rule_id: ORG_EXC006
description: Separate placenames with prepositions
rationale: Place is tagged separately
examples:
- "Rechtbank te Carcassonne (Carcassonne = place, separate)"
- "Regering van Oostenrijk (Oostenrijk = place, separate)"
# 3.4 DENOMINATION
- entity_type: DENOMINATION
description: >-
Ethnicity, profession, religion, demonym, ideology, language, community
references. Includes adjectives referring to places/organisations/religions,
demonyms, languages, pejorative terms, professions, ideological affiliations,
and group references.
subcategories:
- name: adjective_phrases
description: Phrases containing adjectives referring to place/organisation/religion/ideology/language/community
examples:
- "Islamitische gemeenschap"
- "Marxistische overtuiging"
- "Turkse taal"
- name: demonym
description: References to people from a place
examples:
- "Chinees"
- "China"
- "Arabier"
- "Westerlingen"
- "Bandunger"
- "Brabander"
- "Europeanen"
- "Europeesche dame"
- "Indonesische jeugd"
- name: language
description: Language names as nouns
examples:
- "Duits"
- "Fries"
- "Azerbeidzjaans"
- name: abstract_bureaucratic
description: >-
Implicitly refer to ideological groups or departments, typical for
late 20th/early 21st century texts
examples:
- "mix van communicatie- en beleidsverantwoordelijken"
- name: time_zone
description: Time zone references
examples:
- "Zuid-Sumatratijd"
- name: religion
description: Religion names as nouns
examples:
- "Protestantisme"
- name: pejorative
description: Pejorative terms
examples:
- "Slaaf"
- "Koelie"
- "Inlander"
- "Zwarte"
- "Boschnegers"
- "Roodharige barbaren"
- "Totok"
- "Mohammedanen"
- name: profession
description: Professional titles and roles
examples:
- "Schrijver"
- "Klerk"
- "Agent"
- "Ministers"
- "Vertegenwoordiger"
- "Hoofden"
- "Mandoer"
- name: religious_ideological_members
description: Members of religions/ideologies
examples:
- "Kapitalist"
- "Communist"
- "Moslim"
- "Christen"
- "Pro-Russisch"
- "Anti-abortus"
- name: group_references
description: General nouns referring to groups through prepositions/possessives
examples:
- "Volk van West-Irian (West-Irian = place)"
- "De heren van de Volkrant (Volkrant = organisation)"
- "KPN medewerkers (KPN = organisation)"
- "die van sammadang (sammadang = place)"
- "Baccherachs volck (Baccherach = person)"
inclusion_rules:
- rule_id: DEN_INC001
description: Tag both adjective and noun in denomination phrases
conditions:
- "When a phrase contains an adjective referring to place/organisation/religion/ideology"
examples:
- "Islamitische gemeenschap (both words tagged)"
- "Marxistische overtuiging"
- rule_id: DEN_INC002
description: Tag profession/pejorative when appears alone without name
conditions:
- "Reference to profession or pejorative occurs without a personal name"
- "Follow rules from section 3.1.A"
examples:
- "De minister (= person, not denomination)"
- "De slaaf rent weg (de slaaf = person)"
exclusion_rules:
- rule_id: DEN_EXC001
description: Do not tag organisations as denominations
rationale: Organisations have formal structure
examples:
- "Nederlandse groep"
- "De Nederlandse Bank (= organisation)"
- rule_id: DEN_EXC002
description: Do not tag currencies as denominations
rationale: Currencies are textual references or quantities
examples:
- "Spaenschen reael (= textual reference or quantity)"
- rule_id: DEN_EXC003
description: Do not tag denominations with numerals as denominations
rationale: These are quantities
examples:
- "Twee Nederlanders (= quantity)"
- "Een twintigtal soldaten (= quantity)"
- rule_id: DEN_EXC004
description: Do not tag associated places/organisations/persons
rationale: These are tagged separately with their own type
examples:
- "Volk van West-Irian (West-Irian = place, separate)"
- "KPN medewerkers (KPN = organisation, separate)"
# 3.5 QUANTITY
- entity_type: QUANTITY
description: >-
Quantities including currency, merchandise counts, people counts,
age, school class, weapons, settlements, area, distance, calibre,
enumerations, carat, degree, and weight.
subcategories:
- name: currency
description: Monetary amounts
examples:
- "ƒ 1.50"
- "twee gulden"
- name: merchandise
description: Counts of goods
examples: []
- name: people
description: Counts of people
examples:
- "23: inlandsche zieken"
- name: troops
description: Military unit quantities
examples:
- "vijfhonderd weerbare mannen"
- "4. van de macassaeren"
- name: age
description: Age expressions
examples:
- "honderdjarige leeftijd"
- name: school_class
description: School class levels
examples:
- "derde klas middelbare school"
- name: weapon
description: Weapon quantities
examples:
- "ruijm 1000. stx:s schiet geweeren"
- name: settlement
description: Settlement counts
examples:
- "twee negorijen"
- name: area
description: Area measurements
examples: []
- name: distance
description: Distance measurements
examples:
- "25 mijlen"
- "2 mijl in zee"
- "drie â vier dagen varens"
- "5 uuren oostwaartsheenen"
- name: calibre
description: Weapon calibre
examples:
- "kaliber 6.5"
- name: enumeration
description: Lists of counted items
examples:
- "3 schootels, zadel, 2 stijgh beugels"
- name: carat
description: Gold/gem quality measure
examples:
- "gouden sieraden 22, 23 en 24 Kt"
- name: degree
description: Degree measurements
examples: []
- name: weight
description: Weight measurements
examples:
- "14940 石 quiksilver"
inclusion_rules:
- rule_id: QTY_INC001
description: Infer single items in enumerations
conditions:
- "In enumerations, items without explicit numbers are assumed to be singular"
examples:
- "3 schootels, zadel, 2 stijgh beugels (zadel = 1 saddle, tagged as quantity)"
- rule_id: QTY_INC002
description: Tag denominations with numerals as quantities
conditions:
- "Denominations preceded by numerals or quantitative adjectives become quantities"
examples:
- "Twee Nederlanders (= quantity)"
- "Een twintigtal soldaten (twintigtal soldaten = quantity)"
- rule_id: QTY_INC003
description: Tag travel time as distance not temporal
conditions:
- "Time expressions measuring distance are quantities not temporal references"
examples:
- "drie â vier dagen varens (not temporal reference)"
- "5 uuren oostwaartsheenen (not temporal reference)"
exclusion_rules:
- rule_id: QTY_EXC001
description: Do not tag textual references as quantities
rationale: Written sources are textual references even with numbers
examples:
- "2 brieven (= textual reference, not quantity)"
- rule_id: QTY_EXC002
description: Do not tag associated organisations as quantities
rationale: Organisation is tagged separately
examples:
- "derde klas middelbare school (middelbare school = organisation)"
# 3.6 TEMPORAL_REFERENCE
- entity_type: TEMPORAL_REFERENCE
description: >-
Temporal references including days, dates, campaigns/wars, holidays,
canonised periods, genitives, and temporal adjectives.
subcategories:
- name: days
description: References to specific days
examples:
- "Morgen"
- "Vanochtend"
- "Hedenmiddag"
- "gisteravond"
- name: days_of_week
description: Weekday names
examples:
- "Maandag"
- name: deictic_temporal
description: Deictic temporal pronouns
examples:
- "gisteren"
- "morgen"
- name: dates
description: Dates in every calendar
examples:
- "Vrijdag 8 November 1957"
- name: campaigns_wars
description: Campaigns/wars when referring to time periods
examples:
- "Hongitochten"
- "Twee Wereldoorlog"
- "1ste Nederlandse militaire actie"
- name: holidays_festivals
description: Holiday and festival names
examples:
- "Nieuwjaar"
- "geboortedag van de Profeet Mohammad"
- "Heldendag"
- name: canonised_periods
description: Canonised historical periods (any historiography)
examples:
- "Middeleeuwen (Western Europe)"
- "Periode van de Strijdende Staten (China)"
- "Zaman Hindu-budis (Indonesia)"
- name: genitives
description: Genitive temporal expressions
examples:
- "9 dezer"
- "Dezer dagen"
- name: temporal_adjectives
description: Temporal adjectives before place names
examples:
- "eighteenth-century Europe"
inclusion_rules:
- rule_id: TMP_INC001
description: Always tag days unless date is fully written
conditions:
- "Days are tagged unless a full date is already present"
examples:
- "Afgelopen Vrijdag (tagged)"
- "Vrijdag 8 November 1957 (not tagged separately, full date)"
- rule_id: TMP_INC002
description: Tag campaigns/wars as temporal when referring to period
conditions:
- "Campaign/War is tagged as temporal reference when contextualised to refer to time period"
examples:
- "Hongitochten"
- "Twee Wereldoorlog"
exclusion_rules:
- rule_id: TMP_EXC001
description: Do not tag days when full date is written
rationale: Full date already captures the information
examples:
- "Vrijdag 8 November 1957 (entire date tagged, not Vrijdag separately)"
# 3.7 TEXTUAL_REFERENCE
- entity_type: TEXTUAL_REFERENCE
description: >-
References to written sources, documents, laws, titles of cultural works,
inventory numbers, accounts, currency types, telephone numbers, URLs,
programs, policies, agreements, sanctions, statements, laws, surveys,
stocks, registers, meeting minutes, activities with recorded minutes,
slogans, proverbs, flags, and honours.
subcategories:
- name: radio_frequencies
description: Radio frequencies
examples: []
- name: board_games
description: Board game names
examples: []
- name: reports
description: Reports and documents
examples: []
- name: cultural_titles
description: Titles of books, songs, movies, pamphlets, records, manuscripts, musicals, programs, magazines, newspapers, journals
examples:
- "Stabat Mater van Pergolesi (Pergolesi = person)"
- name: inventory_numbers
description: Archive inventory numbers
examples:
- "AVS INV. 61855-3"
- name: accounts
description: Bank and postal accounts
examples:
- "Giro 158225"
- "postgirorekening No. 400"
- name: currency_types
description: Currency types (not amounts)
examples:
- "Spaenschen reael"
- "reael pedangh ofte rycxdaelder"
- name: mint_runs
description: Print/mint runs of currency
examples:
- "aan gehaalde staven, namentlijk No 72, 73, 74, 76"
- name: telephone_numbers
description: Telephone numbers
examples:
- "Tel. 020 -72 84 61"
- name: mailing_lists
description: Mailing lists and message systems
examples:
- "zij organiseerde de 'message box'"
- name: academic_references
description: Academic citations
examples:
- "(Scheveningen, 1914)"
- "(Scholte, 1995)"
- name: page_numbers
description: Page number references
examples:
- "Pagina 39"
- name: religious_texts
description: Religious text references
examples:
- "Surah 17:19"
- name: urls
description: Web URLs
examples:
- "www.colonialarchitecture.eu"
- name: programs
description: Software or organizational programs
examples: []
- name: policies
description: Publicly announced policies
examples:
- "passen-stelsel"
- "pers breidel"
- "Manokwari-plan"
- "Pax Neerlandica"
- "non-coöperatie"
- "presidentieel besluit No. 9"
- "Handvest der Verenigde Naties"
- name: agreements
description: Written agreements
examples: []
- name: sanctions
description: Written sanctions
examples:
- "Poenale Sanctie"
- name: statements
description: Written/recorded statements
examples:
- "communicatie-uitingen"
- name: laws
description: Legal references
examples:
- "artikel 156 alinea 2"
- name: land_surveys
description: Land survey documentation
examples:
- "verponding No. 63"
- name: stocks
description: Stock certificates and registered numbers
examples:
- "Controleursgrant no. 3"
- "acte ddo. 12 December 1927 No. 59"
- name: advertisement_registers
description: Registers of advertisements
examples:
- "No. 245 44 regels (44 regels = quantity)"
- name: meeting_minutes
description: Meeting minutes
examples: []
- name: recorded_activities
description: Activities with recorded minutes/reports
examples:
- "Conference"
- "Forum"
- "Concert"
- name: slogans
description: Political or advertising slogans
examples: []
- name: proverbs
description: Proverbs and adages (not idioms)
examples:
- "Wie honing wil eten moet lijden dat de bijen hem steken"
- "j'en passe et des meilleurs"
- name: flags
description: Flag descriptions
examples:
- "Rood, wit, blauw"
- "Bendera Kokki"
- name: honours
description: Titles of honours and awards
examples:
- "Ridder in de orde van Oranje Nassau"
inclusion_rules:
- rule_id: TXT_INC001
description: Tag currency types as textual references
conditions:
- "Currency types (not amounts) are textual references"
examples:
- "Spaenschen reael"
- rule_id: TXT_INC002
description: Tag publications as textual references not organisations
conditions:
- "When the publication itself is mentioned, not the company"
examples:
- "In de NRC stond ... (NRC = textual reference)"
- rule_id: TXT_INC003
description: Tag recorded activities as textual references
conditions:
- "Activities of which minutes or reports have been recorded"
examples:
- "Conference"
- "Forum"
- "Concert"
exclusion_rules:
- rule_id: TXT_EXC001
description: Do not tag currency amounts as textual references
rationale: Amounts are quantities
examples:
- "ƒ 1.50 (= quantity)"
- rule_id: TXT_EXC002
description: Do not tag news agencies as textual references when organizational
rationale: Context determines if organisation or textual reference
examples:
- "De NRC besloot ... (NRC = organisation)"
- "In de NRC stond ... (NRC = textual reference)"
- rule_id: TXT_EXC003
description: Do not confuse with quantities
rationale: Number of documents is quantity, not textual reference
examples:
- "2 brieven (= textual reference to the letters themselves, not the count)"