128 lines
4.6 KiB
YAML
128 lines
4.6 KiB
YAML
# Layout Rules Instance - Section 1 of Convention v1.4.3
|
|
# Captures all layout correction rules for transcribed documents
|
|
|
|
text_region_types:
|
|
- name: PARAGRAPH
|
|
description: >-
|
|
The running text within the type area. This is the main body text
|
|
of the document.
|
|
ordering_rules: >-
|
|
Text lines within paragraphs should be ordered top-to-bottom based
|
|
on baseline coordinates.
|
|
|
|
- name: PAGE_NUMBER
|
|
description: >-
|
|
A number—either in digits or written out—, letter or combination of
|
|
both indicating the order of a page or folium in a book or other type
|
|
of writing.
|
|
ordering_rules: >-
|
|
Page numbers are typically located at top or bottom margins and should
|
|
be identified as separate regions.
|
|
|
|
- name: HEADER
|
|
description: >-
|
|
A general text at the top margin of a page or paragraph which can be
|
|
assigned to multiple sections within a source.
|
|
ordering_rules: >-
|
|
Headers should be ordered before main body text when processing pages.
|
|
|
|
- name: FOOTER
|
|
description: >-
|
|
A general text at the lower margin of a page or paragraph which can be
|
|
assigned to multiple sections within a source.
|
|
ordering_rules: >-
|
|
Footers should be ordered after main body text when processing pages.
|
|
|
|
- name: HEADING
|
|
description: >-
|
|
A title or index designation which only applies to a single section of
|
|
a source, i.e., the paragraphs written directly underneath it.
|
|
ordering_rules: >-
|
|
Headings should be ordered immediately before the paragraphs they describe.
|
|
|
|
- name: FOOTNOTE
|
|
description: >-
|
|
Indexed annotations and references which occur underneath the running text
|
|
across multiple pages in a successive order.
|
|
ordering_rules: >-
|
|
Footnotes should be ordered by their index number/symbol and associated
|
|
with their reference in main text.
|
|
|
|
- name: TABLE
|
|
description: >-
|
|
Indices in which the layout of the text is more important than the syntax.
|
|
ordering_rules: >-
|
|
Table cells should preserve their row-column structure. Cell content
|
|
should be ordered left-to-right, top-to-bottom.
|
|
|
|
- name: MARGINALIA
|
|
description: >-
|
|
Notes, scribbles, and commentary in the margins of pages.
|
|
ordering_rules: >-
|
|
Marginalia should be associated with adjacent main text but marked as
|
|
separate regions.
|
|
|
|
- name: CAPTION
|
|
description: >-
|
|
Description of an image which is located approximate—often directly
|
|
underneath—it.
|
|
ordering_rules: >-
|
|
Captions should be associated with their images and ordered after the
|
|
image they describe.
|
|
|
|
- name: COLOPHON
|
|
description: >-
|
|
A piece of text or section of a page in which the author or scribes of
|
|
a textual source are mentioned or in which the creation, place of writing,
|
|
or the delivery of the source are specified.
|
|
ordering_rules: >-
|
|
Colophons typically appear at the end of documents or sections.
|
|
|
|
baseline_rules:
|
|
- rule_id: BL001
|
|
description: Remove transcribed text on pages in the background
|
|
applies_to: >-
|
|
Text regions that do not belong to the current page being transcribed
|
|
action: REMOVE
|
|
|
|
- rule_id: BL002
|
|
description: Shorten baselines extending to decorative textual elements
|
|
applies_to: >-
|
|
Baselines that incorrectly extend into decorative elements such as
|
|
illuminated letters, flourishes, or ornamental borders
|
|
action: SHORTEN
|
|
|
|
- rule_id: BL003
|
|
description: Add space dividers for unusually long distances between words
|
|
applies_to: >-
|
|
Distances between words which are longer than usual considering the
|
|
handwriting style
|
|
action: ADJUST
|
|
|
|
- rule_id: BL004
|
|
description: Split baseline when word distance exceeds half baseline length
|
|
applies_to: >-
|
|
When the distance between words extends beyond half of the total length
|
|
of the baseline
|
|
action: SPLIT
|
|
|
|
- rule_id: BL005
|
|
description: Connect inserted texts to main baseline
|
|
applies_to: >-
|
|
Inserted texts between baselines need to be connected to the main baseline
|
|
of which they are part. This also applies to Lombardic capitals.
|
|
action: MERGE
|
|
|
|
- rule_id: BL006
|
|
description: Cut text region in half when lines cross columns
|
|
applies_to: >-
|
|
Text lines that cross columns or text regions that extend too far
|
|
action: SPLIT
|
|
|
|
text_line_ordering:
|
|
method: coordinate-based
|
|
applies_to_region: PARAGRAPH
|
|
description: >-
|
|
In Transkribus Expert Client: Click on 'Layout', select the text region
|
|
(or 'Page' to order text regions), then click on 'Assign Child Shapes'.
|
|
Text lines will be ordered based on their coordinates.
|