glam/docs/convention/schema/layout_rules_instance.yaml
2025-12-02 14:36:01 +01:00

128 lines
4.6 KiB
YAML

# Layout Rules Instance - Section 1 of Convention v1.4.3
# Captures all layout correction rules for transcribed documents
text_region_types:
- name: PARAGRAPH
description: >-
The running text within the type area. This is the main body text
of the document.
ordering_rules: >-
Text lines within paragraphs should be ordered top-to-bottom based
on baseline coordinates.
- name: PAGE_NUMBER
description: >-
A number—either in digits or written out—, letter or combination of
both indicating the order of a page or folium in a book or other type
of writing.
ordering_rules: >-
Page numbers are typically located at top or bottom margins and should
be identified as separate regions.
- name: HEADER
description: >-
A general text at the top margin of a page or paragraph which can be
assigned to multiple sections within a source.
ordering_rules: >-
Headers should be ordered before main body text when processing pages.
- name: FOOTER
description: >-
A general text at the lower margin of a page or paragraph which can be
assigned to multiple sections within a source.
ordering_rules: >-
Footers should be ordered after main body text when processing pages.
- name: HEADING
description: >-
A title or index designation which only applies to a single section of
a source, i.e., the paragraphs written directly underneath it.
ordering_rules: >-
Headings should be ordered immediately before the paragraphs they describe.
- name: FOOTNOTE
description: >-
Indexed annotations and references which occur underneath the running text
across multiple pages in a successive order.
ordering_rules: >-
Footnotes should be ordered by their index number/symbol and associated
with their reference in main text.
- name: TABLE
description: >-
Indices in which the layout of the text is more important than the syntax.
ordering_rules: >-
Table cells should preserve their row-column structure. Cell content
should be ordered left-to-right, top-to-bottom.
- name: MARGINALIA
description: >-
Notes, scribbles, and commentary in the margins of pages.
ordering_rules: >-
Marginalia should be associated with adjacent main text but marked as
separate regions.
- name: CAPTION
description: >-
Description of an image which is located approximate—often directly
underneath—it.
ordering_rules: >-
Captions should be associated with their images and ordered after the
image they describe.
- name: COLOPHON
description: >-
A piece of text or section of a page in which the author or scribes of
a textual source are mentioned or in which the creation, place of writing,
or the delivery of the source are specified.
ordering_rules: >-
Colophons typically appear at the end of documents or sections.
baseline_rules:
- rule_id: BL001
description: Remove transcribed text on pages in the background
applies_to: >-
Text regions that do not belong to the current page being transcribed
action: REMOVE
- rule_id: BL002
description: Shorten baselines extending to decorative textual elements
applies_to: >-
Baselines that incorrectly extend into decorative elements such as
illuminated letters, flourishes, or ornamental borders
action: SHORTEN
- rule_id: BL003
description: Add space dividers for unusually long distances between words
applies_to: >-
Distances between words which are longer than usual considering the
handwriting style
action: ADJUST
- rule_id: BL004
description: Split baseline when word distance exceeds half baseline length
applies_to: >-
When the distance between words extends beyond half of the total length
of the baseline
action: SPLIT
- rule_id: BL005
description: Connect inserted texts to main baseline
applies_to: >-
Inserted texts between baselines need to be connected to the main baseline
of which they are part. This also applies to Lombardic capitals.
action: MERGE
- rule_id: BL006
description: Cut text region in half when lines cross columns
applies_to: >-
Text lines that cross columns or text regions that extend too far
action: SPLIT
text_line_ordering:
method: coordinate-based
applies_to_region: PARAGRAPH
description: >-
In Transkribus Expert Client: Click on 'Layout', select the text region
(or 'Page' to order text regions), then click on 'Assign Child Shapes'.
Text lines will be ordered based on their coordinates.