# Layout Rules Instance - Section 1 of Convention v1.4.3 # Captures all layout correction rules for transcribed documents text_region_types: - name: PARAGRAPH description: >- The running text within the type area. This is the main body text of the document. ordering_rules: >- Text lines within paragraphs should be ordered top-to-bottom based on baseline coordinates. - name: PAGE_NUMBER description: >- A number—either in digits or written out—, letter or combination of both indicating the order of a page or folium in a book or other type of writing. ordering_rules: >- Page numbers are typically located at top or bottom margins and should be identified as separate regions. - name: HEADER description: >- A general text at the top margin of a page or paragraph which can be assigned to multiple sections within a source. ordering_rules: >- Headers should be ordered before main body text when processing pages. - name: FOOTER description: >- A general text at the lower margin of a page or paragraph which can be assigned to multiple sections within a source. ordering_rules: >- Footers should be ordered after main body text when processing pages. - name: HEADING description: >- A title or index designation which only applies to a single section of a source, i.e., the paragraphs written directly underneath it. ordering_rules: >- Headings should be ordered immediately before the paragraphs they describe. - name: FOOTNOTE description: >- Indexed annotations and references which occur underneath the running text across multiple pages in a successive order. ordering_rules: >- Footnotes should be ordered by their index number/symbol and associated with their reference in main text. - name: TABLE description: >- Indices in which the layout of the text is more important than the syntax. ordering_rules: >- Table cells should preserve their row-column structure. Cell content should be ordered left-to-right, top-to-bottom. - name: MARGINALIA description: >- Notes, scribbles, and commentary in the margins of pages. ordering_rules: >- Marginalia should be associated with adjacent main text but marked as separate regions. - name: CAPTION description: >- Description of an image which is located approximate—often directly underneath—it. ordering_rules: >- Captions should be associated with their images and ordered after the image they describe. - name: COLOPHON description: >- A piece of text or section of a page in which the author or scribes of a textual source are mentioned or in which the creation, place of writing, or the delivery of the source are specified. ordering_rules: >- Colophons typically appear at the end of documents or sections. baseline_rules: - rule_id: BL001 description: Remove transcribed text on pages in the background applies_to: >- Text regions that do not belong to the current page being transcribed action: REMOVE - rule_id: BL002 description: Shorten baselines extending to decorative textual elements applies_to: >- Baselines that incorrectly extend into decorative elements such as illuminated letters, flourishes, or ornamental borders action: SHORTEN - rule_id: BL003 description: Add space dividers for unusually long distances between words applies_to: >- Distances between words which are longer than usual considering the handwriting style action: ADJUST - rule_id: BL004 description: Split baseline when word distance exceeds half baseline length applies_to: >- When the distance between words extends beyond half of the total length of the baseline action: SPLIT - rule_id: BL005 description: Connect inserted texts to main baseline applies_to: >- Inserted texts between baselines need to be connected to the main baseline of which they are part. This also applies to Lombardic capitals. action: MERGE - rule_id: BL006 description: Cut text region in half when lines cross columns applies_to: >- Text lines that cross columns or text regions that extend too far action: SPLIT text_line_ordering: method: coordinate-based applies_to_region: PARAGRAPH description: >- In Transkribus Expert Client: Click on 'Layout', select the text region (or 'Page' to order text regions), then click on 'Assign Child Shapes'. Text lines will be ordered based on their coordinates.