1036 lines
33 KiB
YAML
1036 lines
33 KiB
YAML
# Entity Annotation Rules Instance - Section 3 of Convention v1.4.3
|
|
# Captures all entity annotation rules for the Gado2 annotation scheme
|
|
|
|
annotation_scheme: "Gado2"
|
|
no_double_tagging: true
|
|
context_sensitive: true
|
|
|
|
# General annotation policies
|
|
general_rules:
|
|
- Never use double tagging (e.g., 'Leiden University' is only tagged as
|
|
organisation, not a place)
|
|
- The exact same word can be tagged in different ways depending on its context
|
|
- When in doubt, check how words are tagged in previous ground truths
|
|
- This convention distinguishes titles from general designations:
|
|
* Titles are used/accepted by subjects (emic description)
|
|
* General designations are imposed upon subjects (ethnic description)
|
|
- Transcription status: Done → Final → Ground Truth (GT)
|
|
|
|
entity_types:
|
|
# 3.1 PERSON
|
|
- entity_type: PERSON
|
|
description: >-
|
|
Personal names and specific references to persons (named or unnamed).
|
|
Includes persons, animals with proper names, fictional characters,
|
|
gods, spirits, saints and prophets.
|
|
|
|
subcategories:
|
|
- name: personal_names
|
|
description: Given name and surname, patronyms
|
|
examples:
|
|
- "Gouverneur-Generaal van Starkenborgh"
|
|
- "President Soekarno"
|
|
- "Sultan Muzaffar"
|
|
- "Dr. Raden Soetomo"
|
|
- "De Heer Balkenende"
|
|
- "mr. dr. Beel"
|
|
- "Tumenggung Sukapura"
|
|
- "Officier Suksapura"
|
|
- "Officier Tumenggung Sukapura"
|
|
- "Lieutenant Winkelaar"
|
|
- "Lieutenant Adolph Winkelaar"
|
|
- "Mr. van Agt"
|
|
- "Mej. den Uyl"
|
|
- "Marquis De La Chetardie"
|
|
- "Meneer Drees"
|
|
- "gesaghebber Van Arrewijne"
|
|
- "gewesen Sicacalsen Serlasker Abdul Nabij"
|
|
- "princesse Gamoelamoe (vertaling van Ratu Gamulamu)"
|
|
- "Engelschen vice admiraal Cornisch"
|
|
- "Japara 's resident Lepeltak"
|
|
- "Japaras resident Falck"
|
|
- "resident Falck"
|
|
- "vice admiraal Cornisch"
|
|
|
|
- name: animal_names
|
|
description: Proper names of animals
|
|
examples:
|
|
- "Lassie"
|
|
- "Golden Retriever"
|
|
|
|
- name: fictional_characters
|
|
description: Proper names of fictional characters
|
|
examples:
|
|
- "Kuifje"
|
|
|
|
- name: religious_figures
|
|
description: Gods, spirits, saints and prophets
|
|
examples:
|
|
- "God"
|
|
- "Jesus"
|
|
- "Mohammed"
|
|
- "Heilige Geest"
|
|
- "Rama"
|
|
|
|
- name: specific_references
|
|
description: Specific references to a person without their name being mentioned
|
|
examples:
|
|
- "de Koning van Pruisen"
|
|
- "de Koningin van Nederland"
|
|
- "de Sultan van Ternate"
|
|
- "de Minister van Buitenlandse Zaken"
|
|
- "de Gouverneur-Generaal van Nederl.-Indië"
|
|
- "de Koning"
|
|
- "de Sultan"
|
|
- "de Directeur"
|
|
- "Tenno Heika"
|
|
- "zijn Vrouw"
|
|
- "haar slaaf"
|
|
- "de geseijde slavin"
|
|
- "de eerder genoemde leider"
|
|
- "den Capiteijn"
|
|
- "des Conincx"
|
|
- "dit Jongetie"
|
|
- "dat kind"
|
|
- "Zijne Majesteit"
|
|
- "haaren Coning"
|
|
- "dezen Prins"
|
|
- "dien Coning"
|
|
- "onsen Capiteijn"
|
|
- "Bapak itu (Indonesian)"
|
|
- "gemelte gouvern:r"
|
|
|
|
inclusion_rules:
|
|
- rule_id: PER_INC001
|
|
description: Include titles with personal names only in exceptional cases
|
|
conditions:
|
|
- "If titles/designations are required to identify a person (e.g., women referred to by husband's surname)"
|
|
- "If only given name or surname are mentioned, titles are always included, but general designations are not"
|
|
examples:
|
|
- "Gouverneur-Generaal van Starkenborgh"
|
|
- "President Soekarno"
|
|
- "Sultan Muzaffar"
|
|
|
|
- rule_id: PER_INC002
|
|
description: Include words between title and personal name
|
|
conditions:
|
|
- "The distance between a title and personal name can vary within a single phrase"
|
|
- "If another word stands between the title and the personal name, it will be tagged along"
|
|
examples:
|
|
- "Engelschen vice admiraal Cornisch"
|
|
|
|
- rule_id: PER_INC003
|
|
description: Include articles and demonstratives with specific references
|
|
conditions:
|
|
- "The article/demonstrative/genitive case/possessive pronouns are tagged along"
|
|
examples:
|
|
- "de Koning van Pruisen"
|
|
- "zijn Vrouw"
|
|
- "des Conincx"
|
|
- "dit Jongetie"
|
|
|
|
- rule_id: PER_INC004
|
|
description: Include backreferences
|
|
conditions:
|
|
- "References that point back to previously mentioned persons"
|
|
examples:
|
|
- "gemelte gouvern:r"
|
|
|
|
exclusion_rules:
|
|
- rule_id: PER_EXC001
|
|
description: Do not tag series of titles
|
|
rationale: >-
|
|
Series of titles (with or without indications of place names within
|
|
them) are not included
|
|
examples:
|
|
- "Engelschen vice admiraal (in series) - only tag final title + name"
|
|
|
|
- rule_id: PER_EXC002
|
|
description: Do not tag abstract references
|
|
rationale: >-
|
|
Abstract references (often indicated by plural forms or indefinite
|
|
articles) are not tagged
|
|
examples:
|
|
- "Een President"
|
|
- "Een Premier"
|
|
- "Een resident"
|
|
- "Een generaal-majoor"
|
|
- "Een Voorzitter"
|
|
- "hij ontving de titel dr."
|
|
- "ambassadeurs"
|
|
- "Koninginnen"
|
|
- "Koningen"
|
|
- "draagt de titel Graaf"
|
|
- "die persoon"
|
|
- "de mannen"
|
|
- "de vrouwen"
|
|
- "die lieden"
|
|
- "de kindere"
|
|
- "Bapak (Indonesisch)"
|
|
- "De slaaf Louis"
|
|
|
|
- rule_id: PER_EXC003
|
|
description: Do not tag pronouns
|
|
rationale: >-
|
|
Pronouns are not tagged. They are consistent enough to be traced
|
|
through regular expressions and do not require entity processing.
|
|
examples:
|
|
- "hij"
|
|
|
|
- rule_id: PER_EXC004
|
|
description: Do not tag general designations with complete personal names
|
|
rationale: Enhances entity linking process
|
|
examples:
|
|
- "geseijde slavin tanjong (slavin = designation, not tagged)"
|
|
- "de slaaf Anthonij (slaaf = designation, not tagged)"
|
|
- "de Javaanschen Cap:n Soeta Wangsa (Javaanschen Cap:n = designation)"
|
|
- "seekeren Moor op Batavia gen:t Anthonij (Moor = designation)"
|
|
|
|
- rule_id: PER_EXC005
|
|
description: Do not include associated organisations and places
|
|
rationale: These are tagged separately
|
|
examples:
|
|
- "De Minister van buitenlandse zaken (buitenlandse zaken = organisation, separate)"
|
|
|
|
# 3.2 PLACE
|
|
- entity_type: PLACE
|
|
description: >-
|
|
Geographic locations including streets, cities, provinces, countries,
|
|
continents, infrastructure, landforms, public spaces, buildings, and
|
|
astronomical objects.
|
|
|
|
subcategories:
|
|
- name: street_names_addresses
|
|
description: Street names and addresses
|
|
examples:
|
|
- "Laan van Meerdervoort"
|
|
|
|
- name: trajectories_directions
|
|
description: Trajectories and wind directions
|
|
examples:
|
|
- "schoone wegh met seer schoone water plaatsen versien, streckende meest O:Z:O. en W:N:W. aen de noordt zijde"
|
|
- "Oostzijde van Molenvliet"
|
|
|
|
- name: locations
|
|
description: Dorp, Stad, Provincie, Land, Continent, Bisdom, Zones
|
|
examples:
|
|
- "Sectie [0-9, I-V]"
|
|
|
|
- name: infrastructure
|
|
description: Brug, Haven, Dam
|
|
examples: []
|
|
|
|
- name: landform
|
|
description: Berg, Gebergte, Bos, Rivier, Bron, Rijstvelden, Vallei, Tuin, Natuurreservaat, Nationaal park, Plantage, Strand
|
|
examples: []
|
|
|
|
- name: public_space
|
|
description: >-
|
|
Plein, Veld, Theater, Museum, School, Markt, Vliegtuig, Station,
|
|
Zwembad, Ziekenhuis, Sportveld, Bioscoop, Tentoonstelling, Campus,
|
|
Lanceerplatform, Club, Huis, Universiteit, Bibliotheek, Bidruimte,
|
|
Medisch centrum, Parkeergarage, Speeltuin, Grafplaats, Bedrijvenpark
|
|
examples: []
|
|
|
|
- name: companies_as_places
|
|
description: >-
|
|
Companies only if contextualised as places (otherwise organisation):
|
|
Apotheek, Bar, Restaurant, Eettent, Depot, Hotel, Hostel, Fabriek,
|
|
Nachtclub, Muziekpodium
|
|
examples: []
|
|
|
|
- name: buildings
|
|
description: >-
|
|
Huis, Wooncomplex, Klooster, Kleuterzaal, Flat, Kazerne, Fort,
|
|
Verzorgingshuis, Winkelcentrum, Paleis
|
|
examples: []
|
|
|
|
- name: astronomical_objects
|
|
description: Zon, Aarde, Maan, Planeten, Comets (often named after persons/dates)
|
|
examples: []
|
|
|
|
- name: coordinates
|
|
description: Geographic coordinates
|
|
examples:
|
|
- "Z. breete van 35 — 31 ten langte 5 — 15"
|
|
- "parabel van 36 — 50"
|
|
- "3½ graed of 42 mijlen bewesten"
|
|
|
|
inclusion_rules:
|
|
- rule_id: PLC_INC001
|
|
description: Include relevant adjectives
|
|
conditions:
|
|
- "Directional and descriptive adjectives are tagged with place names"
|
|
examples:
|
|
- "Noord/Oost/Zuid/West(-kust)"
|
|
- "Aziatische-Afrikaanse landen"
|
|
- "Amerikaanse defensie-gebied"
|
|
- "Asia Raya"
|
|
- "Muang Thai"
|
|
- "Indonesische eilandenrijk"
|
|
- "Indonesische republiek"
|
|
- "Delische plantagegebied"
|
|
- "West-Javaanse Bandoeng"
|
|
- "Joodse gedeelte"
|
|
|
|
- rule_id: PLC_INC002
|
|
description: Include metonymy references
|
|
conditions:
|
|
- "Metonymy does not disqualify a place name (entity linking will classify as organisation later)"
|
|
examples:
|
|
- "Moskou weigert"
|
|
- "Den Haag bepaalt"
|
|
- "Nederland scoort"
|
|
|
|
- rule_id: PLC_INC003
|
|
description: Include articles for generic place references
|
|
conditions:
|
|
- "Articles are included when the place reference is generic"
|
|
examples:
|
|
- "De Laan van Meerdervoort"
|
|
- "in het comptoir"
|
|
|
|
exclusion_rules:
|
|
- rule_id: PLC_EXC001
|
|
description: Do not include articles for specific place names
|
|
rationale: Articles are not part of proper place names
|
|
examples:
|
|
- "Amsterdam (not 'de Amsterdam')"
|
|
|
|
- rule_id: PLC_EXC002
|
|
description: Do not tag representation of places as places
|
|
rationale: These are person references
|
|
examples:
|
|
- "De ambassadeur van Nederland (De ambassadeur = Person, not Nederland)"
|
|
|
|
# 3.3 ORGANISATION
|
|
- entity_type: ORGANISATION
|
|
description: >-
|
|
Organizations including companies, institutions, governments, branches,
|
|
associations, legislative bodies, political parties, military forces,
|
|
sports teams, meetings, bands, religious orders, and ships.
|
|
|
|
subcategories:
|
|
- name: companies
|
|
description: Studio, Bank, etc.
|
|
examples: []
|
|
|
|
- name: branches
|
|
description: Departments and organizational branches
|
|
examples:
|
|
- "ING Rotterdam"
|
|
- "Rekenkamer Gemeente Rotterdam"
|
|
- "Afdelingsbestuur NVM afdeling Arnhem"
|
|
- "NVM Arnhem"
|
|
|
|
- name: associations
|
|
description: Coöperatie, Markt (if not place)
|
|
examples: []
|
|
|
|
- name: public_facilities
|
|
description: School, Universiteit (if not already tagged as place)
|
|
examples: []
|
|
|
|
- name: legislative_body
|
|
description: Tweede kamer, etc.
|
|
examples: []
|
|
|
|
- name: grand_residence
|
|
description: Paleis (when referring to institution)
|
|
examples: []
|
|
|
|
- name: printer
|
|
description: Printing houses
|
|
examples: []
|
|
|
|
- name: news_agency
|
|
description: News agencies and media organizations
|
|
examples:
|
|
- "De NRC besloot"
|
|
- "medewerkers bij De Expres"
|
|
- "(cor. Volkskrant)"
|
|
- "hoofdredacteur van het Bataviaasch Nieuwsblad"
|
|
|
|
- name: media_campaign
|
|
description: PR campaigns
|
|
examples:
|
|
- "Hij organiseert de PR campagne"
|
|
|
|
- name: factory
|
|
description: Manufacturing facilities as organizations
|
|
examples: []
|
|
|
|
- name: political_party
|
|
description: Political parties
|
|
examples: []
|
|
|
|
- name: international_organisation
|
|
description: International organizations
|
|
examples:
|
|
- "Verenigde Naties"
|
|
|
|
- name: resistance_movement
|
|
description: Resistance movements
|
|
examples: []
|
|
|
|
- name: authorities
|
|
description: >-
|
|
Government, Ministries, Councils, Courts
|
|
examples:
|
|
- "Ministerie van Financiën"
|
|
- "Overzeese Rijksdelen"
|
|
- "Buitenlandse Zaken"
|
|
- "Raad voor Aangelegenheden van Indonesië (RAVI)"
|
|
|
|
- name: dynasties
|
|
description: Royal and ruling dynasties
|
|
examples:
|
|
- "Omajjaden dynastie"
|
|
- "Nasriden dynastie"
|
|
|
|
- name: military_forces
|
|
description: Army, Army units
|
|
examples: []
|
|
|
|
- name: sports_team
|
|
description: Sports teams
|
|
examples: []
|
|
|
|
- name: sport_tournament
|
|
description: Championship, Match
|
|
examples: []
|
|
|
|
- name: meeting
|
|
description: Conference
|
|
examples: []
|
|
|
|
- name: band_orchestra
|
|
description: Musical groups
|
|
examples:
|
|
- "Dé Carels"
|
|
|
|
- name: theatre_group
|
|
description: Theatre groups
|
|
examples: []
|
|
|
|
- name: religious_order
|
|
description: Religious orders
|
|
examples: []
|
|
|
|
- name: ship
|
|
description: Named ships
|
|
examples:
|
|
- "Stoomschip Sumatra"
|
|
|
|
inclusion_rules:
|
|
- rule_id: ORG_INC001
|
|
description: Tag branches with placenames without prepositions
|
|
conditions:
|
|
- "Placenames indicating branches are only tagged if there is no preposition between the organization and the placename"
|
|
examples:
|
|
- "ING Rotterdam (both tagged as organisation)"
|
|
- "Rekenkamer Gemeente Rotterdam"
|
|
- "NVM Arnhem"
|
|
|
|
- rule_id: ORG_INC002
|
|
description: Tag frequently repeated references to organisational groups
|
|
conditions:
|
|
- "Frequently repeated references to denominations which refer to organisations are tagged"
|
|
examples:
|
|
- "aen generael en raden (raden = council itself)"
|
|
|
|
exclusion_rules:
|
|
- rule_id: ORG_EXC001
|
|
description: Do not include articles
|
|
rationale: Articles are not part of organization names
|
|
examples:
|
|
- "Tweede Kamer (not 'de Tweede Kamer' in annotation)"
|
|
|
|
- rule_id: ORG_EXC002
|
|
description: Do not tag abbreviations separately
|
|
rationale: Abbreviations are tagged with full name or separately as needed
|
|
examples: []
|
|
|
|
- rule_id: ORG_EXC003
|
|
description: Do not tag groups lacking formal structure
|
|
rationale: These are considered denominations
|
|
examples:
|
|
- "De jongerenbeweging"
|
|
- "De vakbond bewegingen"
|
|
- "Aleis"
|
|
|
|
- rule_id: ORG_EXC004
|
|
description: Do not tag representatives as organisations
|
|
rationale: These are persons or denominations
|
|
examples:
|
|
- "De Minster van Buitenlandse Zaken (Minister = person)"
|
|
|
|
- rule_id: ORG_EXC005
|
|
description: Tag publications as textual references not organisations
|
|
conditions:
|
|
- "When the publication itself is referenced, not the company"
|
|
examples:
|
|
- "In de NRC stond ... (NRC = textual reference)"
|
|
|
|
- rule_id: ORG_EXC006
|
|
description: Separate placenames with prepositions
|
|
rationale: Place is tagged separately
|
|
examples:
|
|
- "Rechtbank te Carcassonne (Carcassonne = place, separate)"
|
|
- "Regering van Oostenrijk (Oostenrijk = place, separate)"
|
|
|
|
# 3.4 DENOMINATION
|
|
- entity_type: DENOMINATION
|
|
description: >-
|
|
Ethnicity, profession, religion, demonym, ideology, language, community
|
|
references. Includes adjectives referring to places/organisations/religions,
|
|
demonyms, languages, pejorative terms, professions, ideological affiliations,
|
|
and group references.
|
|
|
|
subcategories:
|
|
- name: adjective_phrases
|
|
description: Phrases containing adjectives referring to place/organisation/religion/ideology/language/community
|
|
examples:
|
|
- "Islamitische gemeenschap"
|
|
- "Marxistische overtuiging"
|
|
- "Turkse taal"
|
|
|
|
- name: demonym
|
|
description: References to people from a place
|
|
examples:
|
|
- "Chinees"
|
|
- "China"
|
|
- "Arabier"
|
|
- "Westerlingen"
|
|
- "Bandunger"
|
|
- "Brabander"
|
|
- "Europeanen"
|
|
- "Europeesche dame"
|
|
- "Indonesische jeugd"
|
|
|
|
- name: language
|
|
description: Language names as nouns
|
|
examples:
|
|
- "Duits"
|
|
- "Fries"
|
|
- "Azerbeidzjaans"
|
|
|
|
- name: abstract_bureaucratic
|
|
description: >-
|
|
Implicitly refer to ideological groups or departments, typical for
|
|
late 20th/early 21st century texts
|
|
examples:
|
|
- "mix van communicatie- en beleidsverantwoordelijken"
|
|
|
|
- name: time_zone
|
|
description: Time zone references
|
|
examples:
|
|
- "Zuid-Sumatratijd"
|
|
|
|
- name: religion
|
|
description: Religion names as nouns
|
|
examples:
|
|
- "Protestantisme"
|
|
|
|
- name: pejorative
|
|
description: Pejorative terms
|
|
examples:
|
|
- "Slaaf"
|
|
- "Koelie"
|
|
- "Inlander"
|
|
- "Zwarte"
|
|
- "Boschnegers"
|
|
- "Roodharige barbaren"
|
|
- "Totok"
|
|
- "Mohammedanen"
|
|
|
|
- name: profession
|
|
description: Professional titles and roles
|
|
examples:
|
|
- "Schrijver"
|
|
- "Klerk"
|
|
- "Agent"
|
|
- "Ministers"
|
|
- "Vertegenwoordiger"
|
|
- "Hoofden"
|
|
- "Mandoer"
|
|
|
|
- name: religious_ideological_members
|
|
description: Members of religions/ideologies
|
|
examples:
|
|
- "Kapitalist"
|
|
- "Communist"
|
|
- "Moslim"
|
|
- "Christen"
|
|
- "Pro-Russisch"
|
|
- "Anti-abortus"
|
|
|
|
- name: group_references
|
|
description: General nouns referring to groups through prepositions/possessives
|
|
examples:
|
|
- "Volk van West-Irian (West-Irian = place)"
|
|
- "De heren van de Volkrant (Volkrant = organisation)"
|
|
- "KPN medewerkers (KPN = organisation)"
|
|
- "die van sammadang (sammadang = place)"
|
|
- "Baccherachs volck (Baccherach = person)"
|
|
|
|
inclusion_rules:
|
|
- rule_id: DEN_INC001
|
|
description: Tag both adjective and noun in denomination phrases
|
|
conditions:
|
|
- "When a phrase contains an adjective referring to place/organisation/religion/ideology"
|
|
examples:
|
|
- "Islamitische gemeenschap (both words tagged)"
|
|
- "Marxistische overtuiging"
|
|
|
|
- rule_id: DEN_INC002
|
|
description: Tag profession/pejorative when appears alone without name
|
|
conditions:
|
|
- "Reference to profession or pejorative occurs without a personal name"
|
|
- "Follow rules from section 3.1.A"
|
|
examples:
|
|
- "De minister (= person, not denomination)"
|
|
- "De slaaf rent weg (de slaaf = person)"
|
|
|
|
exclusion_rules:
|
|
- rule_id: DEN_EXC001
|
|
description: Do not tag organisations as denominations
|
|
rationale: Organisations have formal structure
|
|
examples:
|
|
- "Nederlandse groep"
|
|
- "De Nederlandse Bank (= organisation)"
|
|
|
|
- rule_id: DEN_EXC002
|
|
description: Do not tag currencies as denominations
|
|
rationale: Currencies are textual references or quantities
|
|
examples:
|
|
- "Spaenschen reael (= textual reference or quantity)"
|
|
|
|
- rule_id: DEN_EXC003
|
|
description: Do not tag denominations with numerals as denominations
|
|
rationale: These are quantities
|
|
examples:
|
|
- "Twee Nederlanders (= quantity)"
|
|
- "Een twintigtal soldaten (= quantity)"
|
|
|
|
- rule_id: DEN_EXC004
|
|
description: Do not tag associated places/organisations/persons
|
|
rationale: These are tagged separately with their own type
|
|
examples:
|
|
- "Volk van West-Irian (West-Irian = place, separate)"
|
|
- "KPN medewerkers (KPN = organisation, separate)"
|
|
|
|
# 3.5 QUANTITY
|
|
- entity_type: QUANTITY
|
|
description: >-
|
|
Quantities including currency, merchandise counts, people counts,
|
|
age, school class, weapons, settlements, area, distance, calibre,
|
|
enumerations, carat, degree, and weight.
|
|
|
|
subcategories:
|
|
- name: currency
|
|
description: Monetary amounts
|
|
examples:
|
|
- "ƒ 1.50"
|
|
- "twee gulden"
|
|
|
|
- name: merchandise
|
|
description: Counts of goods
|
|
examples: []
|
|
|
|
- name: people
|
|
description: Counts of people
|
|
examples:
|
|
- "23: inlandsche zieken"
|
|
|
|
- name: troops
|
|
description: Military unit quantities
|
|
examples:
|
|
- "vijfhonderd weerbare mannen"
|
|
- "4. van de macassaeren"
|
|
|
|
- name: age
|
|
description: Age expressions
|
|
examples:
|
|
- "honderdjarige leeftijd"
|
|
|
|
- name: school_class
|
|
description: School class levels
|
|
examples:
|
|
- "derde klas middelbare school"
|
|
|
|
- name: weapon
|
|
description: Weapon quantities
|
|
examples:
|
|
- "ruijm 1000. stx:s schiet geweeren"
|
|
|
|
- name: settlement
|
|
description: Settlement counts
|
|
examples:
|
|
- "twee negorijen"
|
|
|
|
- name: area
|
|
description: Area measurements
|
|
examples: []
|
|
|
|
- name: distance
|
|
description: Distance measurements
|
|
examples:
|
|
- "25 mijlen"
|
|
- "2 mijl in zee"
|
|
- "drie â vier dagen varens"
|
|
- "5 uuren oostwaartsheenen"
|
|
|
|
- name: calibre
|
|
description: Weapon calibre
|
|
examples:
|
|
- "kaliber 6.5"
|
|
|
|
- name: enumeration
|
|
description: Lists of counted items
|
|
examples:
|
|
- "3 schootels, zadel, 2 stijgh beugels"
|
|
|
|
- name: carat
|
|
description: Gold/gem quality measure
|
|
examples:
|
|
- "gouden sieraden 22, 23 en 24 Kt"
|
|
|
|
- name: degree
|
|
description: Degree measurements
|
|
examples: []
|
|
|
|
- name: weight
|
|
description: Weight measurements
|
|
examples:
|
|
- "14940 石 quiksilver"
|
|
|
|
inclusion_rules:
|
|
- rule_id: QTY_INC001
|
|
description: Infer single items in enumerations
|
|
conditions:
|
|
- "In enumerations, items without explicit numbers are assumed to be singular"
|
|
examples:
|
|
- "3 schootels, zadel, 2 stijgh beugels (zadel = 1 saddle, tagged as quantity)"
|
|
|
|
- rule_id: QTY_INC002
|
|
description: Tag denominations with numerals as quantities
|
|
conditions:
|
|
- "Denominations preceded by numerals or quantitative adjectives become quantities"
|
|
examples:
|
|
- "Twee Nederlanders (= quantity)"
|
|
- "Een twintigtal soldaten (twintigtal soldaten = quantity)"
|
|
|
|
- rule_id: QTY_INC003
|
|
description: Tag travel time as distance not temporal
|
|
conditions:
|
|
- "Time expressions measuring distance are quantities not temporal references"
|
|
examples:
|
|
- "drie â vier dagen varens (not temporal reference)"
|
|
- "5 uuren oostwaartsheenen (not temporal reference)"
|
|
|
|
exclusion_rules:
|
|
- rule_id: QTY_EXC001
|
|
description: Do not tag textual references as quantities
|
|
rationale: Written sources are textual references even with numbers
|
|
examples:
|
|
- "2 brieven (= textual reference, not quantity)"
|
|
|
|
- rule_id: QTY_EXC002
|
|
description: Do not tag associated organisations as quantities
|
|
rationale: Organisation is tagged separately
|
|
examples:
|
|
- "derde klas middelbare school (middelbare school = organisation)"
|
|
|
|
# 3.6 TEMPORAL_REFERENCE
|
|
- entity_type: TEMPORAL_REFERENCE
|
|
description: >-
|
|
Temporal references including days, dates, campaigns/wars, holidays,
|
|
canonised periods, genitives, and temporal adjectives.
|
|
|
|
subcategories:
|
|
- name: days
|
|
description: References to specific days
|
|
examples:
|
|
- "Morgen"
|
|
- "Vanochtend"
|
|
- "Hedenmiddag"
|
|
- "gisteravond"
|
|
|
|
- name: days_of_week
|
|
description: Weekday names
|
|
examples:
|
|
- "Maandag"
|
|
|
|
- name: deictic_temporal
|
|
description: Deictic temporal pronouns
|
|
examples:
|
|
- "gisteren"
|
|
- "morgen"
|
|
|
|
- name: dates
|
|
description: Dates in every calendar
|
|
examples:
|
|
- "Vrijdag 8 November 1957"
|
|
|
|
- name: campaigns_wars
|
|
description: Campaigns/wars when referring to time periods
|
|
examples:
|
|
- "Hongitochten"
|
|
- "Twee Wereldoorlog"
|
|
- "1ste Nederlandse militaire actie"
|
|
|
|
- name: holidays_festivals
|
|
description: Holiday and festival names
|
|
examples:
|
|
- "Nieuwjaar"
|
|
- "geboortedag van de Profeet Mohammad"
|
|
- "Heldendag"
|
|
|
|
- name: canonised_periods
|
|
description: Canonised historical periods (any historiography)
|
|
examples:
|
|
- "Middeleeuwen (Western Europe)"
|
|
- "Periode van de Strijdende Staten (China)"
|
|
- "Zaman Hindu-budis (Indonesia)"
|
|
|
|
- name: genitives
|
|
description: Genitive temporal expressions
|
|
examples:
|
|
- "9 dezer"
|
|
- "Dezer dagen"
|
|
|
|
- name: temporal_adjectives
|
|
description: Temporal adjectives before place names
|
|
examples:
|
|
- "eighteenth-century Europe"
|
|
|
|
inclusion_rules:
|
|
- rule_id: TMP_INC001
|
|
description: Always tag days unless date is fully written
|
|
conditions:
|
|
- "Days are tagged unless a full date is already present"
|
|
examples:
|
|
- "Afgelopen Vrijdag (tagged)"
|
|
- "Vrijdag 8 November 1957 (not tagged separately, full date)"
|
|
|
|
- rule_id: TMP_INC002
|
|
description: Tag campaigns/wars as temporal when referring to period
|
|
conditions:
|
|
- "Campaign/War is tagged as temporal reference when contextualised to refer to time period"
|
|
examples:
|
|
- "Hongitochten"
|
|
- "Twee Wereldoorlog"
|
|
|
|
exclusion_rules:
|
|
- rule_id: TMP_EXC001
|
|
description: Do not tag days when full date is written
|
|
rationale: Full date already captures the information
|
|
examples:
|
|
- "Vrijdag 8 November 1957 (entire date tagged, not Vrijdag separately)"
|
|
|
|
# 3.7 TEXTUAL_REFERENCE
|
|
- entity_type: TEXTUAL_REFERENCE
|
|
description: >-
|
|
References to written sources, documents, laws, titles of cultural works,
|
|
inventory numbers, accounts, currency types, telephone numbers, URLs,
|
|
programs, policies, agreements, sanctions, statements, laws, surveys,
|
|
stocks, registers, meeting minutes, activities with recorded minutes,
|
|
slogans, proverbs, flags, and honours.
|
|
|
|
subcategories:
|
|
- name: radio_frequencies
|
|
description: Radio frequencies
|
|
examples: []
|
|
|
|
- name: board_games
|
|
description: Board game names
|
|
examples: []
|
|
|
|
- name: reports
|
|
description: Reports and documents
|
|
examples: []
|
|
|
|
- name: cultural_titles
|
|
description: Titles of books, songs, movies, pamphlets, records, manuscripts, musicals, programs, magazines, newspapers, journals
|
|
examples:
|
|
- "Stabat Mater van Pergolesi (Pergolesi = person)"
|
|
|
|
- name: inventory_numbers
|
|
description: Archive inventory numbers
|
|
examples:
|
|
- "AVS INV. 61855-3"
|
|
|
|
- name: accounts
|
|
description: Bank and postal accounts
|
|
examples:
|
|
- "Giro 158225"
|
|
- "postgirorekening No. 400"
|
|
|
|
- name: currency_types
|
|
description: Currency types (not amounts)
|
|
examples:
|
|
- "Spaenschen reael"
|
|
- "reael pedangh ofte rycxdaelder"
|
|
|
|
- name: mint_runs
|
|
description: Print/mint runs of currency
|
|
examples:
|
|
- "aan gehaalde staven, namentlijk No 72, 73, 74, 76"
|
|
|
|
- name: telephone_numbers
|
|
description: Telephone numbers
|
|
examples:
|
|
- "Tel. 020 -72 84 61"
|
|
|
|
- name: mailing_lists
|
|
description: Mailing lists and message systems
|
|
examples:
|
|
- "zij organiseerde de 'message box'"
|
|
|
|
- name: academic_references
|
|
description: Academic citations
|
|
examples:
|
|
- "(Scheveningen, 1914)"
|
|
- "(Scholte, 1995)"
|
|
|
|
- name: page_numbers
|
|
description: Page number references
|
|
examples:
|
|
- "Pagina 39"
|
|
|
|
- name: religious_texts
|
|
description: Religious text references
|
|
examples:
|
|
- "Surah 17:19"
|
|
|
|
- name: urls
|
|
description: Web URLs
|
|
examples:
|
|
- "www.colonialarchitecture.eu"
|
|
|
|
- name: programs
|
|
description: Software or organizational programs
|
|
examples: []
|
|
|
|
- name: policies
|
|
description: Publicly announced policies
|
|
examples:
|
|
- "passen-stelsel"
|
|
- "pers breidel"
|
|
- "Manokwari-plan"
|
|
- "Pax Neerlandica"
|
|
- "non-coöperatie"
|
|
- "presidentieel besluit No. 9"
|
|
- "Handvest der Verenigde Naties"
|
|
|
|
- name: agreements
|
|
description: Written agreements
|
|
examples: []
|
|
|
|
- name: sanctions
|
|
description: Written sanctions
|
|
examples:
|
|
- "Poenale Sanctie"
|
|
|
|
- name: statements
|
|
description: Written/recorded statements
|
|
examples:
|
|
- "communicatie-uitingen"
|
|
|
|
- name: laws
|
|
description: Legal references
|
|
examples:
|
|
- "artikel 156 alinea 2"
|
|
|
|
- name: land_surveys
|
|
description: Land survey documentation
|
|
examples:
|
|
- "verponding No. 63"
|
|
|
|
- name: stocks
|
|
description: Stock certificates and registered numbers
|
|
examples:
|
|
- "Controleursgrant no. 3"
|
|
- "acte ddo. 12 December 1927 No. 59"
|
|
|
|
- name: advertisement_registers
|
|
description: Registers of advertisements
|
|
examples:
|
|
- "No. 245 44 regels (44 regels = quantity)"
|
|
|
|
- name: meeting_minutes
|
|
description: Meeting minutes
|
|
examples: []
|
|
|
|
- name: recorded_activities
|
|
description: Activities with recorded minutes/reports
|
|
examples:
|
|
- "Conference"
|
|
- "Forum"
|
|
- "Concert"
|
|
|
|
- name: slogans
|
|
description: Political or advertising slogans
|
|
examples: []
|
|
|
|
- name: proverbs
|
|
description: Proverbs and adages (not idioms)
|
|
examples:
|
|
- "Wie honing wil eten moet lijden dat de bijen hem steken"
|
|
- "j'en passe et des meilleurs"
|
|
|
|
- name: flags
|
|
description: Flag descriptions
|
|
examples:
|
|
- "Rood, wit, blauw"
|
|
- "Bendera Kokki"
|
|
|
|
- name: honours
|
|
description: Titles of honours and awards
|
|
examples:
|
|
- "Ridder in de orde van Oranje Nassau"
|
|
|
|
inclusion_rules:
|
|
- rule_id: TXT_INC001
|
|
description: Tag currency types as textual references
|
|
conditions:
|
|
- "Currency types (not amounts) are textual references"
|
|
examples:
|
|
- "Spaenschen reael"
|
|
|
|
- rule_id: TXT_INC002
|
|
description: Tag publications as textual references not organisations
|
|
conditions:
|
|
- "When the publication itself is mentioned, not the company"
|
|
examples:
|
|
- "In de NRC stond ... (NRC = textual reference)"
|
|
|
|
- rule_id: TXT_INC003
|
|
description: Tag recorded activities as textual references
|
|
conditions:
|
|
- "Activities of which minutes or reports have been recorded"
|
|
examples:
|
|
- "Conference"
|
|
- "Forum"
|
|
- "Concert"
|
|
|
|
exclusion_rules:
|
|
- rule_id: TXT_EXC001
|
|
description: Do not tag currency amounts as textual references
|
|
rationale: Amounts are quantities
|
|
examples:
|
|
- "ƒ 1.50 (= quantity)"
|
|
|
|
- rule_id: TXT_EXC002
|
|
description: Do not tag news agencies as textual references when organizational
|
|
rationale: Context determines if organisation or textual reference
|
|
examples:
|
|
- "De NRC besloot ... (NRC = organisation)"
|
|
- "In de NRC stond ... (NRC = textual reference)"
|
|
|
|
- rule_id: TXT_EXC003
|
|
description: Do not confuse with quantities
|
|
rationale: Number of documents is quantity, not textual reference
|
|
examples:
|
|
- "2 brieven (= textual reference to the letters themselves, not the count)"
|