# Entity Annotation Rules Instance - Section 3 of Convention v1.4.3 # Captures all entity annotation rules for the Gado2 annotation scheme annotation_scheme: "Gado2" no_double_tagging: true context_sensitive: true # General annotation policies general_rules: - Never use double tagging (e.g., 'Leiden University' is only tagged as organisation, not a place) - The exact same word can be tagged in different ways depending on its context - When in doubt, check how words are tagged in previous ground truths - This convention distinguishes titles from general designations: * Titles are used/accepted by subjects (emic description) * General designations are imposed upon subjects (ethnic description) - Transcription status: Done → Final → Ground Truth (GT) entity_types: # 3.1 PERSON - entity_type: PERSON description: >- Personal names and specific references to persons (named or unnamed). Includes persons, animals with proper names, fictional characters, gods, spirits, saints and prophets. subcategories: - name: personal_names description: Given name and surname, patronyms examples: - "Gouverneur-Generaal van Starkenborgh" - "President Soekarno" - "Sultan Muzaffar" - "Dr. Raden Soetomo" - "De Heer Balkenende" - "mr. dr. Beel" - "Tumenggung Sukapura" - "Officier Suksapura" - "Officier Tumenggung Sukapura" - "Lieutenant Winkelaar" - "Lieutenant Adolph Winkelaar" - "Mr. van Agt" - "Mej. den Uyl" - "Marquis De La Chetardie" - "Meneer Drees" - "gesaghebber Van Arrewijne" - "gewesen Sicacalsen Serlasker Abdul Nabij" - "princesse Gamoelamoe (vertaling van Ratu Gamulamu)" - "Engelschen vice admiraal Cornisch" - "Japara 's resident Lepeltak" - "Japaras resident Falck" - "resident Falck" - "vice admiraal Cornisch" - name: animal_names description: Proper names of animals examples: - "Lassie" - "Golden Retriever" - name: fictional_characters description: Proper names of fictional characters examples: - "Kuifje" - name: religious_figures description: Gods, spirits, saints and prophets examples: - "God" - "Jesus" - "Mohammed" - "Heilige Geest" - "Rama" - name: specific_references description: Specific references to a person without their name being mentioned examples: - "de Koning van Pruisen" - "de Koningin van Nederland" - "de Sultan van Ternate" - "de Minister van Buitenlandse Zaken" - "de Gouverneur-Generaal van Nederl.-Indië" - "de Koning" - "de Sultan" - "de Directeur" - "Tenno Heika" - "zijn Vrouw" - "haar slaaf" - "de geseijde slavin" - "de eerder genoemde leider" - "den Capiteijn" - "des Conincx" - "dit Jongetie" - "dat kind" - "Zijne Majesteit" - "haaren Coning" - "dezen Prins" - "dien Coning" - "onsen Capiteijn" - "Bapak itu (Indonesian)" - "gemelte gouvern:r" inclusion_rules: - rule_id: PER_INC001 description: Include titles with personal names only in exceptional cases conditions: - "If titles/designations are required to identify a person (e.g., women referred to by husband's surname)" - "If only given name or surname are mentioned, titles are always included, but general designations are not" examples: - "Gouverneur-Generaal van Starkenborgh" - "President Soekarno" - "Sultan Muzaffar" - rule_id: PER_INC002 description: Include words between title and personal name conditions: - "The distance between a title and personal name can vary within a single phrase" - "If another word stands between the title and the personal name, it will be tagged along" examples: - "Engelschen vice admiraal Cornisch" - rule_id: PER_INC003 description: Include articles and demonstratives with specific references conditions: - "The article/demonstrative/genitive case/possessive pronouns are tagged along" examples: - "de Koning van Pruisen" - "zijn Vrouw" - "des Conincx" - "dit Jongetie" - rule_id: PER_INC004 description: Include backreferences conditions: - "References that point back to previously mentioned persons" examples: - "gemelte gouvern:r" exclusion_rules: - rule_id: PER_EXC001 description: Do not tag series of titles rationale: >- Series of titles (with or without indications of place names within them) are not included examples: - "Engelschen vice admiraal (in series) - only tag final title + name" - rule_id: PER_EXC002 description: Do not tag abstract references rationale: >- Abstract references (often indicated by plural forms or indefinite articles) are not tagged examples: - "Een President" - "Een Premier" - "Een resident" - "Een generaal-majoor" - "Een Voorzitter" - "hij ontving de titel dr." - "ambassadeurs" - "Koninginnen" - "Koningen" - "draagt de titel Graaf" - "die persoon" - "de mannen" - "de vrouwen" - "die lieden" - "de kindere" - "Bapak (Indonesisch)" - "De slaaf Louis" - rule_id: PER_EXC003 description: Do not tag pronouns rationale: >- Pronouns are not tagged. They are consistent enough to be traced through regular expressions and do not require entity processing. examples: - "hij" - rule_id: PER_EXC004 description: Do not tag general designations with complete personal names rationale: Enhances entity linking process examples: - "geseijde slavin tanjong (slavin = designation, not tagged)" - "de slaaf Anthonij (slaaf = designation, not tagged)" - "de Javaanschen Cap:n Soeta Wangsa (Javaanschen Cap:n = designation)" - "seekeren Moor op Batavia gen:t Anthonij (Moor = designation)" - rule_id: PER_EXC005 description: Do not include associated organisations and places rationale: These are tagged separately examples: - "De Minister van buitenlandse zaken (buitenlandse zaken = organisation, separate)" # 3.2 PLACE - entity_type: PLACE description: >- Geographic locations including streets, cities, provinces, countries, continents, infrastructure, landforms, public spaces, buildings, and astronomical objects. subcategories: - name: street_names_addresses description: Street names and addresses examples: - "Laan van Meerdervoort" - name: trajectories_directions description: Trajectories and wind directions examples: - "schoone wegh met seer schoone water plaatsen versien, streckende meest O:Z:O. en W:N:W. aen de noordt zijde" - "Oostzijde van Molenvliet" - name: locations description: Dorp, Stad, Provincie, Land, Continent, Bisdom, Zones examples: - "Sectie [0-9, I-V]" - name: infrastructure description: Brug, Haven, Dam examples: [] - name: landform description: Berg, Gebergte, Bos, Rivier, Bron, Rijstvelden, Vallei, Tuin, Natuurreservaat, Nationaal park, Plantage, Strand examples: [] - name: public_space description: >- Plein, Veld, Theater, Museum, School, Markt, Vliegtuig, Station, Zwembad, Ziekenhuis, Sportveld, Bioscoop, Tentoonstelling, Campus, Lanceerplatform, Club, Huis, Universiteit, Bibliotheek, Bidruimte, Medisch centrum, Parkeergarage, Speeltuin, Grafplaats, Bedrijvenpark examples: [] - name: companies_as_places description: >- Companies only if contextualised as places (otherwise organisation): Apotheek, Bar, Restaurant, Eettent, Depot, Hotel, Hostel, Fabriek, Nachtclub, Muziekpodium examples: [] - name: buildings description: >- Huis, Wooncomplex, Klooster, Kleuterzaal, Flat, Kazerne, Fort, Verzorgingshuis, Winkelcentrum, Paleis examples: [] - name: astronomical_objects description: Zon, Aarde, Maan, Planeten, Comets (often named after persons/dates) examples: [] - name: coordinates description: Geographic coordinates examples: - "Z. breete van 35 — 31 ten langte 5 — 15" - "parabel van 36 — 50" - "3½ graed of 42 mijlen bewesten" inclusion_rules: - rule_id: PLC_INC001 description: Include relevant adjectives conditions: - "Directional and descriptive adjectives are tagged with place names" examples: - "Noord/Oost/Zuid/West(-kust)" - "Aziatische-Afrikaanse landen" - "Amerikaanse defensie-gebied" - "Asia Raya" - "Muang Thai" - "Indonesische eilandenrijk" - "Indonesische republiek" - "Delische plantagegebied" - "West-Javaanse Bandoeng" - "Joodse gedeelte" - rule_id: PLC_INC002 description: Include metonymy references conditions: - "Metonymy does not disqualify a place name (entity linking will classify as organisation later)" examples: - "Moskou weigert" - "Den Haag bepaalt" - "Nederland scoort" - rule_id: PLC_INC003 description: Include articles for generic place references conditions: - "Articles are included when the place reference is generic" examples: - "De Laan van Meerdervoort" - "in het comptoir" exclusion_rules: - rule_id: PLC_EXC001 description: Do not include articles for specific place names rationale: Articles are not part of proper place names examples: - "Amsterdam (not 'de Amsterdam')" - rule_id: PLC_EXC002 description: Do not tag representation of places as places rationale: These are person references examples: - "De ambassadeur van Nederland (De ambassadeur = Person, not Nederland)" # 3.3 ORGANISATION - entity_type: ORGANISATION description: >- Organizations including companies, institutions, governments, branches, associations, legislative bodies, political parties, military forces, sports teams, meetings, bands, religious orders, and ships. subcategories: - name: companies description: Studio, Bank, etc. examples: [] - name: branches description: Departments and organizational branches examples: - "ING Rotterdam" - "Rekenkamer Gemeente Rotterdam" - "Afdelingsbestuur NVM afdeling Arnhem" - "NVM Arnhem" - name: associations description: Coöperatie, Markt (if not place) examples: [] - name: public_facilities description: School, Universiteit (if not already tagged as place) examples: [] - name: legislative_body description: Tweede kamer, etc. examples: [] - name: grand_residence description: Paleis (when referring to institution) examples: [] - name: printer description: Printing houses examples: [] - name: news_agency description: News agencies and media organizations examples: - "De NRC besloot" - "medewerkers bij De Expres" - "(cor. Volkskrant)" - "hoofdredacteur van het Bataviaasch Nieuwsblad" - name: media_campaign description: PR campaigns examples: - "Hij organiseert de PR campagne" - name: factory description: Manufacturing facilities as organizations examples: [] - name: political_party description: Political parties examples: [] - name: international_organisation description: International organizations examples: - "Verenigde Naties" - name: resistance_movement description: Resistance movements examples: [] - name: authorities description: >- Government, Ministries, Councils, Courts examples: - "Ministerie van Financiën" - "Overzeese Rijksdelen" - "Buitenlandse Zaken" - "Raad voor Aangelegenheden van Indonesië (RAVI)" - name: dynasties description: Royal and ruling dynasties examples: - "Omajjaden dynastie" - "Nasriden dynastie" - name: military_forces description: Army, Army units examples: [] - name: sports_team description: Sports teams examples: [] - name: sport_tournament description: Championship, Match examples: [] - name: meeting description: Conference examples: [] - name: band_orchestra description: Musical groups examples: - "Dé Carels" - name: theatre_group description: Theatre groups examples: [] - name: religious_order description: Religious orders examples: [] - name: ship description: Named ships examples: - "Stoomschip Sumatra" inclusion_rules: - rule_id: ORG_INC001 description: Tag branches with placenames without prepositions conditions: - "Placenames indicating branches are only tagged if there is no preposition between the organization and the placename" examples: - "ING Rotterdam (both tagged as organisation)" - "Rekenkamer Gemeente Rotterdam" - "NVM Arnhem" - rule_id: ORG_INC002 description: Tag frequently repeated references to organisational groups conditions: - "Frequently repeated references to denominations which refer to organisations are tagged" examples: - "aen generael en raden (raden = council itself)" exclusion_rules: - rule_id: ORG_EXC001 description: Do not include articles rationale: Articles are not part of organization names examples: - "Tweede Kamer (not 'de Tweede Kamer' in annotation)" - rule_id: ORG_EXC002 description: Do not tag abbreviations separately rationale: Abbreviations are tagged with full name or separately as needed examples: [] - rule_id: ORG_EXC003 description: Do not tag groups lacking formal structure rationale: These are considered denominations examples: - "De jongerenbeweging" - "De vakbond bewegingen" - "Aleis" - rule_id: ORG_EXC004 description: Do not tag representatives as organisations rationale: These are persons or denominations examples: - "De Minster van Buitenlandse Zaken (Minister = person)" - rule_id: ORG_EXC005 description: Tag publications as textual references not organisations conditions: - "When the publication itself is referenced, not the company" examples: - "In de NRC stond ... (NRC = textual reference)" - rule_id: ORG_EXC006 description: Separate placenames with prepositions rationale: Place is tagged separately examples: - "Rechtbank te Carcassonne (Carcassonne = place, separate)" - "Regering van Oostenrijk (Oostenrijk = place, separate)" # 3.4 DENOMINATION - entity_type: DENOMINATION description: >- Ethnicity, profession, religion, demonym, ideology, language, community references. Includes adjectives referring to places/organisations/religions, demonyms, languages, pejorative terms, professions, ideological affiliations, and group references. subcategories: - name: adjective_phrases description: Phrases containing adjectives referring to place/organisation/religion/ideology/language/community examples: - "Islamitische gemeenschap" - "Marxistische overtuiging" - "Turkse taal" - name: demonym description: References to people from a place examples: - "Chinees" - "China" - "Arabier" - "Westerlingen" - "Bandunger" - "Brabander" - "Europeanen" - "Europeesche dame" - "Indonesische jeugd" - name: language description: Language names as nouns examples: - "Duits" - "Fries" - "Azerbeidzjaans" - name: abstract_bureaucratic description: >- Implicitly refer to ideological groups or departments, typical for late 20th/early 21st century texts examples: - "mix van communicatie- en beleidsverantwoordelijken" - name: time_zone description: Time zone references examples: - "Zuid-Sumatratijd" - name: religion description: Religion names as nouns examples: - "Protestantisme" - name: pejorative description: Pejorative terms examples: - "Slaaf" - "Koelie" - "Inlander" - "Zwarte" - "Boschnegers" - "Roodharige barbaren" - "Totok" - "Mohammedanen" - name: profession description: Professional titles and roles examples: - "Schrijver" - "Klerk" - "Agent" - "Ministers" - "Vertegenwoordiger" - "Hoofden" - "Mandoer" - name: religious_ideological_members description: Members of religions/ideologies examples: - "Kapitalist" - "Communist" - "Moslim" - "Christen" - "Pro-Russisch" - "Anti-abortus" - name: group_references description: General nouns referring to groups through prepositions/possessives examples: - "Volk van West-Irian (West-Irian = place)" - "De heren van de Volkrant (Volkrant = organisation)" - "KPN medewerkers (KPN = organisation)" - "die van sammadang (sammadang = place)" - "Baccherachs volck (Baccherach = person)" inclusion_rules: - rule_id: DEN_INC001 description: Tag both adjective and noun in denomination phrases conditions: - "When a phrase contains an adjective referring to place/organisation/religion/ideology" examples: - "Islamitische gemeenschap (both words tagged)" - "Marxistische overtuiging" - rule_id: DEN_INC002 description: Tag profession/pejorative when appears alone without name conditions: - "Reference to profession or pejorative occurs without a personal name" - "Follow rules from section 3.1.A" examples: - "De minister (= person, not denomination)" - "De slaaf rent weg (de slaaf = person)" exclusion_rules: - rule_id: DEN_EXC001 description: Do not tag organisations as denominations rationale: Organisations have formal structure examples: - "Nederlandse groep" - "De Nederlandse Bank (= organisation)" - rule_id: DEN_EXC002 description: Do not tag currencies as denominations rationale: Currencies are textual references or quantities examples: - "Spaenschen reael (= textual reference or quantity)" - rule_id: DEN_EXC003 description: Do not tag denominations with numerals as denominations rationale: These are quantities examples: - "Twee Nederlanders (= quantity)" - "Een twintigtal soldaten (= quantity)" - rule_id: DEN_EXC004 description: Do not tag associated places/organisations/persons rationale: These are tagged separately with their own type examples: - "Volk van West-Irian (West-Irian = place, separate)" - "KPN medewerkers (KPN = organisation, separate)" # 3.5 QUANTITY - entity_type: QUANTITY description: >- Quantities including currency, merchandise counts, people counts, age, school class, weapons, settlements, area, distance, calibre, enumerations, carat, degree, and weight. subcategories: - name: currency description: Monetary amounts examples: - "ƒ 1.50" - "twee gulden" - name: merchandise description: Counts of goods examples: [] - name: people description: Counts of people examples: - "23: inlandsche zieken" - name: troops description: Military unit quantities examples: - "vijfhonderd weerbare mannen" - "4. van de macassaeren" - name: age description: Age expressions examples: - "honderdjarige leeftijd" - name: school_class description: School class levels examples: - "derde klas middelbare school" - name: weapon description: Weapon quantities examples: - "ruijm 1000. stx:s schiet geweeren" - name: settlement description: Settlement counts examples: - "twee negorijen" - name: area description: Area measurements examples: [] - name: distance description: Distance measurements examples: - "25 mijlen" - "2 mijl in zee" - "drie â vier dagen varens" - "5 uuren oostwaartsheenen" - name: calibre description: Weapon calibre examples: - "kaliber 6.5" - name: enumeration description: Lists of counted items examples: - "3 schootels, zadel, 2 stijgh beugels" - name: carat description: Gold/gem quality measure examples: - "gouden sieraden 22, 23 en 24 Kt" - name: degree description: Degree measurements examples: [] - name: weight description: Weight measurements examples: - "14940 石 quiksilver" inclusion_rules: - rule_id: QTY_INC001 description: Infer single items in enumerations conditions: - "In enumerations, items without explicit numbers are assumed to be singular" examples: - "3 schootels, zadel, 2 stijgh beugels (zadel = 1 saddle, tagged as quantity)" - rule_id: QTY_INC002 description: Tag denominations with numerals as quantities conditions: - "Denominations preceded by numerals or quantitative adjectives become quantities" examples: - "Twee Nederlanders (= quantity)" - "Een twintigtal soldaten (twintigtal soldaten = quantity)" - rule_id: QTY_INC003 description: Tag travel time as distance not temporal conditions: - "Time expressions measuring distance are quantities not temporal references" examples: - "drie â vier dagen varens (not temporal reference)" - "5 uuren oostwaartsheenen (not temporal reference)" exclusion_rules: - rule_id: QTY_EXC001 description: Do not tag textual references as quantities rationale: Written sources are textual references even with numbers examples: - "2 brieven (= textual reference, not quantity)" - rule_id: QTY_EXC002 description: Do not tag associated organisations as quantities rationale: Organisation is tagged separately examples: - "derde klas middelbare school (middelbare school = organisation)" # 3.6 TEMPORAL_REFERENCE - entity_type: TEMPORAL_REFERENCE description: >- Temporal references including days, dates, campaigns/wars, holidays, canonised periods, genitives, and temporal adjectives. subcategories: - name: days description: References to specific days examples: - "Morgen" - "Vanochtend" - "Hedenmiddag" - "gisteravond" - name: days_of_week description: Weekday names examples: - "Maandag" - name: deictic_temporal description: Deictic temporal pronouns examples: - "gisteren" - "morgen" - name: dates description: Dates in every calendar examples: - "Vrijdag 8 November 1957" - name: campaigns_wars description: Campaigns/wars when referring to time periods examples: - "Hongitochten" - "Twee Wereldoorlog" - "1ste Nederlandse militaire actie" - name: holidays_festivals description: Holiday and festival names examples: - "Nieuwjaar" - "geboortedag van de Profeet Mohammad" - "Heldendag" - name: canonised_periods description: Canonised historical periods (any historiography) examples: - "Middeleeuwen (Western Europe)" - "Periode van de Strijdende Staten (China)" - "Zaman Hindu-budis (Indonesia)" - name: genitives description: Genitive temporal expressions examples: - "9 dezer" - "Dezer dagen" - name: temporal_adjectives description: Temporal adjectives before place names examples: - "eighteenth-century Europe" inclusion_rules: - rule_id: TMP_INC001 description: Always tag days unless date is fully written conditions: - "Days are tagged unless a full date is already present" examples: - "Afgelopen Vrijdag (tagged)" - "Vrijdag 8 November 1957 (not tagged separately, full date)" - rule_id: TMP_INC002 description: Tag campaigns/wars as temporal when referring to period conditions: - "Campaign/War is tagged as temporal reference when contextualised to refer to time period" examples: - "Hongitochten" - "Twee Wereldoorlog" exclusion_rules: - rule_id: TMP_EXC001 description: Do not tag days when full date is written rationale: Full date already captures the information examples: - "Vrijdag 8 November 1957 (entire date tagged, not Vrijdag separately)" # 3.7 TEXTUAL_REFERENCE - entity_type: TEXTUAL_REFERENCE description: >- References to written sources, documents, laws, titles of cultural works, inventory numbers, accounts, currency types, telephone numbers, URLs, programs, policies, agreements, sanctions, statements, laws, surveys, stocks, registers, meeting minutes, activities with recorded minutes, slogans, proverbs, flags, and honours. subcategories: - name: radio_frequencies description: Radio frequencies examples: [] - name: board_games description: Board game names examples: [] - name: reports description: Reports and documents examples: [] - name: cultural_titles description: Titles of books, songs, movies, pamphlets, records, manuscripts, musicals, programs, magazines, newspapers, journals examples: - "Stabat Mater van Pergolesi (Pergolesi = person)" - name: inventory_numbers description: Archive inventory numbers examples: - "AVS INV. 61855-3" - name: accounts description: Bank and postal accounts examples: - "Giro 158225" - "postgirorekening No. 400" - name: currency_types description: Currency types (not amounts) examples: - "Spaenschen reael" - "reael pedangh ofte rycxdaelder" - name: mint_runs description: Print/mint runs of currency examples: - "aan gehaalde staven, namentlijk No 72, 73, 74, 76" - name: telephone_numbers description: Telephone numbers examples: - "Tel. 020 -72 84 61" - name: mailing_lists description: Mailing lists and message systems examples: - "zij organiseerde de 'message box'" - name: academic_references description: Academic citations examples: - "(Scheveningen, 1914)" - "(Scholte, 1995)" - name: page_numbers description: Page number references examples: - "Pagina 39" - name: religious_texts description: Religious text references examples: - "Surah 17:19" - name: urls description: Web URLs examples: - "www.colonialarchitecture.eu" - name: programs description: Software or organizational programs examples: [] - name: policies description: Publicly announced policies examples: - "passen-stelsel" - "pers breidel" - "Manokwari-plan" - "Pax Neerlandica" - "non-coöperatie" - "presidentieel besluit No. 9" - "Handvest der Verenigde Naties" - name: agreements description: Written agreements examples: [] - name: sanctions description: Written sanctions examples: - "Poenale Sanctie" - name: statements description: Written/recorded statements examples: - "communicatie-uitingen" - name: laws description: Legal references examples: - "artikel 156 alinea 2" - name: land_surveys description: Land survey documentation examples: - "verponding No. 63" - name: stocks description: Stock certificates and registered numbers examples: - "Controleursgrant no. 3" - "acte ddo. 12 December 1927 No. 59" - name: advertisement_registers description: Registers of advertisements examples: - "No. 245 44 regels (44 regels = quantity)" - name: meeting_minutes description: Meeting minutes examples: [] - name: recorded_activities description: Activities with recorded minutes/reports examples: - "Conference" - "Forum" - "Concert" - name: slogans description: Political or advertising slogans examples: [] - name: proverbs description: Proverbs and adages (not idioms) examples: - "Wie honing wil eten moet lijden dat de bijen hem steken" - "j'en passe et des meilleurs" - name: flags description: Flag descriptions examples: - "Rood, wit, blauw" - "Bendera Kokki" - name: honours description: Titles of honours and awards examples: - "Ridder in de orde van Oranje Nassau" inclusion_rules: - rule_id: TXT_INC001 description: Tag currency types as textual references conditions: - "Currency types (not amounts) are textual references" examples: - "Spaenschen reael" - rule_id: TXT_INC002 description: Tag publications as textual references not organisations conditions: - "When the publication itself is mentioned, not the company" examples: - "In de NRC stond ... (NRC = textual reference)" - rule_id: TXT_INC003 description: Tag recorded activities as textual references conditions: - "Activities of which minutes or reports have been recorded" examples: - "Conference" - "Forum" - "Concert" exclusion_rules: - rule_id: TXT_EXC001 description: Do not tag currency amounts as textual references rationale: Amounts are quantities examples: - "ƒ 1.50 (= quantity)" - rule_id: TXT_EXC002 description: Do not tag news agencies as textual references when organizational rationale: Context determines if organisation or textual reference examples: - "De NRC besloot ... (NRC = organisation)" - "In de NRC stond ... (NRC = textual reference)" - rule_id: TXT_EXC003 description: Do not confuse with quantities rationale: Number of documents is quantity, not textual reference examples: - "2 brieven (= textual reference to the letters themselves, not the count)"