glam/data/entity_annotation/modules/integrations/pico.yaml
2025-12-23 13:27:35 +01:00

4322 lines
163 KiB
YAML
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# =============================================================================
# GLAM-NER Entity Annotation Convention v1.7.0
# Module: integrations/pico.yaml
# =============================================================================
# PiCO (Person in Context Ontology) integration for person observation modeling.
# Enables tracking provenance of person mentions and linking to formal records.
#
# Key concepts:
# - PersonObservation: A textual mention of a person (source-bound)
# - PersonName (PNV): Structured name components
# - Person (CIDOC-CRM E21): Reconstructed person entity
#
# References:
# - PiCo Ontology: https://w3id.org/pico
# - Person Name Vocabulary (PNV): https://w3id.org/pnv
# - CIDOC-CRM: https://www.cidoc-crm.org/
# =============================================================================
pico_integration:
description: |
PiCO (Person in Context Ontology) models textual observations of persons
as distinct from reconstructed person entities. This enables:
- Tracking provenance of person mentions
- Handling name variations across sources
- Linking observations to formal person records
The observation/reconstruction pattern separates:
1. What was OBSERVED in text (PersonObservation) - source-bound, exact
2. What was RECONSTRUCTED as entity (E21_Person) - inferred, normalized
This is critical for heritage data where the same person may appear with
different name forms, titles, or spellings across sources.
# ---------------------------------------------------------------------------
# Core Observation Pattern
# ---------------------------------------------------------------------------
observation_pattern:
description: "Every person mention creates a PersonObservation"
class: "picom:PersonObservation"
class_uri: "https://w3id.org/pico/PersonObservation"
properties:
- property: "picom:hasObservedName"
description: "The name string as it appears in text"
range: "pnv:PersonName"
cardinality: "1"
note: "Exact transcription of name from source"
- property: "picom:isObservationOf"
description: "Links to reconstructed Person entity"
range: "crm:E21_Person"
cardinality: "0..1"
note: "May be null if person not yet identified"
- property: "prov:hadPrimarySource"
description: "The source document/webpage"
range: "prov:Entity"
cardinality: "1"
note: "Required for provenance tracking"
- property: "picom:observedAt"
description: "When the observation was made"
range: "xsd:dateTime"
cardinality: "1"
note: "Extraction timestamp, not document date"
- property: "picom:observedInContext"
description: "Surrounding text context"
range: "xsd:string"
cardinality: "0..1"
note: "For disambiguation when reviewing"
- property: "picom:hasRole"
description: "Role/position observed with the person"
range: "xsd:string"
cardinality: "0..*"
note: "Links to ROLE hypernym when extracted"
# ---------------------------------------------------------------------------
# Person Name Vocabulary (PNV)
# ---------------------------------------------------------------------------
pnv_name_structure:
description: |
Person Name Vocabulary (PNV) provides structured name components.
This enables proper parsing of complex name structures across cultures.
class: "pnv:PersonName"
class_uri: "https://w3id.org/pnv/PersonName"
components:
- property: "pnv:literalName"
description: "Full name as single string"
examples:
- "Dr. Maria van den Berg"
- "Rembrandt Harmenszoon van Rijn"
- "Queen Elizabeth II"
note: "Original string before parsing"
- property: "pnv:givenName"
description: "First/given name"
examples:
- "Rembrandt"
- "Maria"
- "Jan"
- "Elizabeth"
note: "Personal name, not surname"
- property: "pnv:patronym"
description: "Patronymic name component"
examples:
- "Harmenszoon"
- "Janszoon"
- "Pietersdochter"
note: "Common in Dutch, Scandinavian, Slavic names"
- property: "pnv:surnamePrefix"
description: "Prefix to surname (tussenvoegsel)"
examples:
- "van"
- "de"
- "van den"
- "van der"
- "op de"
- "'t"
- "von"
- "di"
note: "Language-specific, affects sorting"
- property: "pnv:baseSurname"
description: "Core surname without prefix"
examples:
- "Rijn"
- "Berg"
- "Velde"
- "Gogh"
note: "Primary sorting component in Dutch"
- property: "pnv:honorificPrefix"
description: "Title or honorific before name"
examples:
- "Dr."
- "Prof."
- "Prof. dr."
- "Sir"
- "Queen"
- "Mr."
- "Drs."
- "Ir."
note: "May indicate role - link to ROL"
- property: "pnv:honorificSuffix"
description: "Title or honorific after name"
examples:
- "PhD"
- "Jr."
- "III"
- "MD"
- "RA"
- "MSc"
note: "Credentials and generational markers"
- property: "pnv:infixTitle"
description: "Title within name structure"
examples:
- "graaf van"
- "baron de"
- "duke of"
note: "Nobility titles embedded in name"
# ---------------------------------------------------------------------------
# Dutch Name Conventions (Project-Specific)
# ---------------------------------------------------------------------------
dutch_name_patterns:
description: |
Special handling for Dutch names with tussenvoegsels (surname prefixes).
Dutch sorting rules differ from other languages.
tussenvoegsel_list:
- "van"
- "van de"
- "van den"
- "van der"
- "de"
- "den"
- "het"
- "'t"
- "ter"
- "ten"
- "op de"
- "op den"
- "in 't"
- "in de"
sorting_rule: |
In Dutch, surnames sort by baseSurname, ignoring tussenvoegsel.
"Vincent van Gogh" sorts under "G" not "V".
"Maria van den Berg" sorts under "B" not "V".
capitalization_rule: |
Tussenvoegsel lowercase when preceded by given name:
- "Vincent van Gogh" (not "Vincent Van Gogh")
- "Van Gogh" (surname alone, capitalized)
- "de heer Van Gogh" (formal, capitalized)
# ---------------------------------------------------------------------------
# Integration with GLAM-NER Hypernyms
# ---------------------------------------------------------------------------
hypernym_mapping:
description: "How PiCo concepts map to GLAM-NER v1.7.0 hypernyms"
mappings:
- pico_class: "picom:PersonObservation"
glam_hypernym: "AGT.PER"
glam_code: "AGT.PER"
note: "Person observations create AGT.PER entities"
- pico_class: "picom:PersonObservation"
glam_hypernym: "AGT.STF"
glam_code: "AGT.STF"
condition: "When observed with organizational role"
note: "Staff members with role context"
- pico_class: "pnv:PersonName"
glam_hypernym: "APP.NAM"
glam_code: "APP.NAM"
note: "Name strings as appellations"
- pico_class: "picom:hasRole"
glam_hypernym: "ROL"
glam_code: "ROL"
note: "Extracted roles link to ROL hypernym"
# ---------------------------------------------------------------------------
# Example Annotations
# ---------------------------------------------------------------------------
examples:
- description: "Staff member with title and role"
text: "Dr. Maria van den Berg, Director"
observation:
type: "picom:PersonObservation"
id: "_:obs1"
hasObservedName:
type: "pnv:PersonName"
literalName: "Dr. Maria van den Berg"
honorificPrefix: "Dr."
givenName: "Maria"
surnamePrefix: "van den"
baseSurname: "Berg"
hasRole: "Director"
hadPrimarySource: "https://example.org/staff-page"
observedAt: "2025-12-02T10:30:00Z"
glam_ner_annotations:
- span: "Dr. Maria van den Berg"
type: "AGT.STF"
code: "AGT.STF"
confidence: 0.95
- span: "Director"
type: "ROL.TIT"
code: "ROL.TIT"
confidence: 0.98
- description: "Historical artist"
text: "Rembrandt van Rijn painted this in 1642"
observation:
type: "picom:PersonObservation"
id: "_:obs2"
hasObservedName:
type: "pnv:PersonName"
literalName: "Rembrandt van Rijn"
givenName: "Rembrandt"
surnamePrefix: "van"
baseSurname: "Rijn"
isObservationOf: "wd:Q5598" # Wikidata Rembrandt
hadPrimarySource: "https://example.org/artwork-page"
observedAt: "2025-12-02T10:35:00Z"
glam_ner_annotations:
- span: "Rembrandt van Rijn"
type: "AGT.PER"
code: "AGT.PER"
confidence: 0.99
linking:
wikidata: "Q5598"
viaf: "64013650"
- description: "Nobility title"
text: "Count Willem van Loon"
observation:
type: "picom:PersonObservation"
id: "_:obs3"
hasObservedName:
type: "pnv:PersonName"
literalName: "Count Willem van Loon"
honorificPrefix: "Count"
givenName: "Willem"
surnamePrefix: "van"
baseSurname: "Loon"
hadPrimarySource: "https://example.org/archive-doc"
observedAt: "2025-12-02T10:40:00Z"
glam_ner_annotations:
- span: "Count Willem van Loon"
type: "AGT.PER"
code: "AGT.PER"
confidence: 0.95
- span: "Count"
type: "ROL.HON"
code: "ROL.HON"
note: "Nobility title - honorific role"
# ---------------------------------------------------------------------------
# Provenance Chain
# ---------------------------------------------------------------------------
provenance_model:
description: |
PiCo observations maintain full provenance chain:
Observation → Source Document → Extraction Activity → Agent
This enables:
- Tracking where each name form was found
- Attributing extractions to human/ML agents
- Maintaining audit trail for corrections
chain_structure:
observation:
class: "picom:PersonObservation"
properties:
- "prov:hadPrimarySource" # → Source document
- "prov:wasGeneratedBy" # → Extraction activity
source:
class: "prov:Entity"
properties:
- "prov:wasAttributedTo" # → Publisher/author
- "dct:created" # → Document date
activity:
class: "prov:Activity"
properties:
- "prov:wasAssociatedWith" # → Extraction agent
- "prov:used" # → ML model or rules
- "prov:startedAtTime" # → Extraction timestamp
agent:
class: "prov:Agent"
examples:
- "Human curator"
- "spaCy NER model"
- "GLAM-NER extraction pipeline"
# =============================================================================
# SOURCE TYPE EXTENSIONS
# =============================================================================
#
# PiCo PersonObservation can be extracted from many source types.
# Each source type may have specific extraction patterns, but the core
# PiCo model (observation → name → roles → provenance) remains the same.
#
# Source-specific extraction logic belongs in APPLICATION LAYER scripts,
# not in this convention. This section defines the ABSTRACT patterns.
# =============================================================================
source_type_patterns:
description: |
PersonObservation sources fall into categories with different extraction
patterns. The CH-Annotator handles all source types using the same
core PiCo model, with source-specific field mappings at extraction time.
# ---------------------------------------------------------------------------
# Source Categories
# ---------------------------------------------------------------------------
categories:
modern_digital:
description: "Contemporary digital sources with structured data"
examples:
- "LinkedIn profiles"
- "Institutional staff directories"
- "Academic profile pages"
- "ORCID records"
characteristics:
- "Semi-structured HTML/JSON"
- "Current/living persons"
- "Self-reported information"
- "Timestamped updates"
typical_properties:
- "sdo:name"
- "sdo:jobTitle"
- "sdo:hasOccupation"
- "sdo:alumniOf"
- "sdo:knowsAbout"
historical_indices:
description: "Early modern and historical name indices"
examples:
- "Notarial protocol indices"
- "Church register indices"
- "Census indices"
- "Guild membership lists"
- "Property transfer records"
characteristics:
- "Abbreviated names"
- "Patronymics common"
- "Latin/vernacular mixing"
- "Occupation as identifier"
- "Relational identification ('wife of', 'son of')"
typical_properties:
- "pnv:literalName"
- "pnv:patronym"
- "picom:hasRole"
- "crm:P107_has_current_or_former_member"
- "sdo:spouse"
- "sdo:parent"
archival_descriptions:
description: "Finding aids, inventories, and archival descriptions"
examples:
- "EAD finding aids"
- "ISAD(G) descriptions"
- "Collection inventories"
- "RiC-O records"
characteristics:
- "Hierarchical context"
- "Provenance-focused"
- "Creator/contributor roles"
- "Temporal spans"
typical_properties:
- "rico:hasCreator"
- "rico:hasOrHadHolder"
- "crm:P14_carried_out_by"
- "crm:P11_had_participant"
biographical_dictionaries:
description: "Structured biographical reference works"
examples:
- "Dictionary of National Biography"
- "KNAW DWDD"
- "Allgemeines Künstlerlexikon"
- "Thieme-Becker"
characteristics:
- "Standardized entries"
- "Birth/death dates"
- "Career summaries"
- "Cross-references"
typical_properties:
- "sdo:birthDate"
- "sdo:deathDate"
- "sdo:birthPlace"
- "sdo:deathPlace"
- "crm:P98_brought_into_life"
- "crm:P100_was_death_of"
# ---------------------------------------------------------------------------
# Universal Observation Properties (All Source Types)
# ---------------------------------------------------------------------------
universal_properties:
description: |
These properties apply to PersonObservation regardless of source type.
They form the core of the PiCo extraction model.
required:
- property: "picom:hasObservedName"
description: "The name string as it appears in source"
range: "pnv:PersonName"
- property: "prov:hadPrimarySource"
description: "The source document/webpage/record"
range: "prov:Entity"
- property: "picom:observedAt"
description: "When the observation was extracted"
range: "xsd:dateTime"
optional:
- property: "picom:isObservationOf"
description: "Links to reconstructed Person entity (if identified)"
range: "crm:E21_Person"
- property: "picom:hasRole"
description: "Role/position observed with the person"
range: "org:Role"
- property: "picom:observedInContext"
description: "Surrounding text for disambiguation"
range: "xsd:string"
- property: "picom:confidence"
description: "Confidence score for extraction"
range: "xsd:decimal"
# ---------------------------------------------------------------------------
# Heritage Relevance Detection (Universal)
# ---------------------------------------------------------------------------
heritage_relevance:
description: |
Person observations can be tagged for heritage sector relevance using
GLAMORCUBESFIXPHDNT type codes. This applies to all source types.
type_codes:
G: "Gallery"
L: "Library"
A: "Archive"
M: "Museum"
O: "Official institution"
R: "Research center"
C: "Corporation"
U: "Unknown"
B: "Botanical garden / Zoo"
E: "Education provider"
S: "Collecting society"
F: "Feature / Monument"
I: "Intangible heritage"
X: "Mixed types"
P: "Personal collection"
H: "Holy site"
D: "Digital platform"
N: "NGO"
T: "Taste/smell heritage"
detection_approach: |
Heritage relevance detection is SOURCE-SPECIFIC and belongs in the
application layer, not the convention. The convention defines:
1. The type code vocabulary (GLAMORCUBESFIXPHDNT)
2. The property for tagging (picom:heritageRelevance)
3. The expected format (single-letter code + confidence)
Application scripts implement source-specific keyword detection,
organization matching, or ML classification to populate this field.
# =============================================================================
# GLM-4.6 CH-ANNOTATOR INTEGRATION
# =============================================================================
#
# The CH-Annotator can be invoked via GLM-4.7 API for automated extraction.
# The system prompt is SOURCE-AGNOSTIC and works with any text input.
# =============================================================================
glm_annotator_config:
model: "glm-4.7"
api_endpoint: "https://api.z.ai/api/coding/paas/v4/chat/completions"
temperature: 0.1
max_tokens: 4000
# ---------------------------------------------------------------------------
# Core System Prompt (Source-Agnostic)
# ---------------------------------------------------------------------------
system_prompt: |
You are a CH-Annotator (Cultural Heritage Annotator) v1.7.0 extraction agent
with PiCo (Person in Context) ontology integration.
## Your Task
Extract structured person observation data from the provided source text.
The source may be a modern digital profile, historical index, archival
description, or any other document containing person references.
## Core PiCo Pattern
Every person mention creates a PersonObservation that is:
- SOURCE-BOUND: Exact transcription from source, no normalization
- PROVENANCE-TRACKED: Linked to source document and extraction timestamp
- RECONSTRUCTION-READY: Can be linked to formal Person entity later
## Person Name Vocabulary (PNV)
Parse names into components (use null for missing parts):
- literalName: Full name exactly as written in source
- givenName: First/given name
- patronym: Patronymic (Janszoon, -dochter, bin, ibn, mac)
- surnamePrefix: Tussenvoegsel/particle (van, de, von, di, du)
- baseSurname: Core surname without prefix
- honorificPrefix: Title before name (Dr., Prof., Heer, Meester)
- honorificSuffix: Credentials after name (PhD, Jr., III)
- initials: Initials with periods (e.g., "P.R.", "C.Joh.")
## Language-Specific Name Rules
### Dutch
- Tussenvoegsel lowercase after given name: "Jan van Gogh"
- Capitalized when standalone: "Van Gogh painted..."
- Common: van, de, van de, van den, van der, 't, 's, op de
### Historical/Latin
- Patronymics: -zoon/-zn, -dochter/-dr, -s (Janszoon, Pietersdochter)
- Latinized forms: -us, -ius endings (Erasmus Roterodamus)
- Occupational surnames may be literal (de bakker = the baker)
## Role Extraction
Extract roles/occupations with temporal bounds when available:
- Role title exactly as stated
- Associated organization (link to GRP hypernym if institution)
- Start/end dates or period
- Heritage relevance code if applicable (GLAMORCUBESFIXPHDNT)
- Role in source context (from picot_roles thesaurus):
* child, parent, spouse, witness, declarant, bride, groom, godparent, etc.
## Biographical Properties
Extract when present in source (use null if not stated):
- birth_date / death_date: ISO format (YYYY, YYYY-MM, or YYYY-MM-DD)
- birth_place / death_place: Place name as written
- gender: "Male" or "Female" (only if explicitly stated or inferable)
- age: Age as stated (e.g., "30", "4 months", "about 25")
- religion: Religious affiliation if mentioned
- deceased: true only if death indicated but date unknown
- address: Physical address as recorded in source
- floruit: Active period if birth/death unknown
## Family Relationship Extraction
CRITICAL: For PersonObservations, family relationships MUST refer to OTHER
persons mentioned in the SAME source document. Cross-source relationships
belong to PersonReconstructions.
### Core Family Relationships
- parent: A parent of the person (use sdo:parent)
- children: Children of the person (use sdo:children)
- spouse: Current spouse (use sdo:spouse)
- sibling: Brother or sister (use sdo:sibling)
### Extended Family
- grandparent / grandchild
- uncle_aunt / nephew_niece
- cousin (symmetric)
### Step/Half Relations
- stepparent / stepchild
- stepsibling
- half_sibling (one shared parent)
### Ritual/Legal Kinship (common in historical records)
- godparent / godchild: Baptismal sponsors
- foster_parent / foster_child
- legitimized_child: Child recognized through marriage/legal act
### In-Law Relations
- parent_in_law / child_in_law
- sibling_in_law
### Former Partners
- widow_of: Surviving spouse of deceased (subject is the survivor)
- previous_partner: Former spouse/partner
### Historical Source Patterns
Common relationship indicators in historical documents (by language):
**Dutch**: "huijsvrou van" (wife), "zoon van" (son of), "weduwe van" (widow),
"peter/meter" (godfather/godmother), "getuige" (witness)
**Latin**: "filius/filia" (son/daughter), "uxor" (wife), "vidua" (widow),
"quondam" (the late)
**German**: "Ehefrau von" (wife), "Sohn/Tochter von" (son/daughter of),
"Witwe von" (widow of)
**Arabic** (نسب - patronymic): "ابن/بن" (ibn/bin - son of), "بنت" (bint - daughter of),
"زوج/زوجة" (zawj/zawja - husband/wife), "أرملة" (armala - widow),
"المرحوم" (al-marhum - the late), "آل" (Al - family of)
**Ottoman Turkish**: "oğlu" (son of), "kızı" (daughter of), "zevcesi" (wife),
"merhum/merhume" (the late)
**French**: "fils/fille de" (son/daughter of), "épouse de" (wife of),
"veuve de" (widow of), "feu/feue" (the late)
**Hebrew**: "בן/בת" (ben/bat - son/daughter of), "אשת" (eshet - wife of),
"אלמנה" (almana - widow), "ז״ל" (z"l - of blessed memory)
**Persian/Farsi**: "پسر/دختر" (pesar/dokhtar - son/daughter), "زن" (zan - wife),
"بیوه" (biveh - widow), "مرحوم" (marhum - the late)
**Spanish**: "hijo/hija de" (son/daughter of), "esposa de" (wife of),
"viuda de" (widow of), "padrino/madrina" (godfather/godmother)
**Portuguese**: "filho/filha de" (son/daughter of), "esposa de" (wife of),
"viúva de" (widow of), "padrinho/madrinha" (godfather/godmother)
For comprehensive patterns (10 languages): modules/relationships/family.yaml
## Source Types (for source_type field)
Use appropriate category:
- modern_digital: LinkedIn, staff directories, ORCID
- historical_indices: Notarial protocols, guild lists
- civil_registration: Birth/marriage/death certificates
- church_records: Baptism, marriage, burial registers
- archival_descriptions: Finding aids, inventories
- biographical_dictionaries: DNB, AKL, reference works
- census: Population census records
## Output Format
Return ONLY valid JSON (no markdown, no explanation):
{
"pico_observation": {
"observation_id": "<source-derived-id>",
"observed_at": "<extraction-timestamp>",
"source_type": "<source_category>",
"source_reference": "<source-identifier>"
},
"persons": [
{
"person_index": 0,
"pnv_name": {
"literalName": "Name as written",
"givenName": null,
"patronym": null,
"surnamePrefix": null,
"baseSurname": null,
"honorificPrefix": null,
"honorificSuffix": null,
"initials": null
},
"roles": [
{
"role_title": "Role as stated",
"role_in_source": "child|declarant|witness|bride|groom|null",
"organization": "Org name if mentioned",
"period": "Temporal info if available",
"heritage_relevant": false,
"heritage_type": null
}
],
"biographical": {
"birth_date": null,
"death_date": null,
"birth_place": null,
"death_place": null,
"gender": null,
"age": null,
"religion": null,
"deceased": null,
"address": null,
"floruit": null
},
"family_relationships": {
"parent": [],
"children": [],
"spouse": [],
"sibling": [],
"grandparent": [],
"grandchild": [],
"uncle_aunt": [],
"nephew_niece": [],
"cousin": [],
"stepparent": [],
"stepchild": [],
"stepsibling": [],
"half_sibling": [],
"foster_parent": [],
"foster_child": [],
"godparent": [],
"godchild": [],
"parent_in_law": [],
"child_in_law": [],
"sibling_in_law": [],
"previous_partner": [],
"widow_of": null
},
"context": "Surrounding text for disambiguation"
}
],
"organizations_mentioned": [
{
"name": "Organization name",
"type": "Heritage type code or null",
"role_in_source": "employer|creator|publisher|etc"
}
],
"temporal_references": [
{
"expression": "Date/period as written",
"normalized": "ISO date if parseable",
"type": "DATE|DURATION|SET"
}
],
"locations_mentioned": [
{
"name": "Place name as written",
"type": "city|region|country|address"
}
]
}
## Relationship Reference Format
Family relationship arrays contain references to other persons in same source:
- Use person_index (integer) to reference persons array position
- Include target_name for readability
Example for a marriage record:
```json
{
"person_index": 0,
"pnv_name": {"literalName": "Jan Pietersz"},
"family_relationships": {
"spouse": [{"person_index": 1, "target_name": "Maria Jansdr"}],
"parent": [{"person_index": 2, "target_name": "Pieter Jansz"}]
}
}
```
## Critical Rules
1. ONLY extract data that EXISTS in the source. NEVER fabricate.
2. Use null for missing fields, [] for empty arrays.
3. Preserve original spelling/language from source.
4. heritage_type must be single-letter GLAMORCUBESFIXPHDNT code.
5. For historical sources, preserve archaic spellings exactly.
6. Extract ALL persons mentioned, not just the primary subject.
7. Family relationships MUST reference persons in SAME source only.
8. Use person_index for relationship references (0-based array index).
9. Gender: only "Male"/"Female"/null - never infer without evidence.
10. Age: preserve as stated, include qualifier ("about 25", "4 months").
11. For role_in_source, use picot_roles terms when applicable.
# ---------------------------------------------------------------------------
# LinkedIn Profile System Prompt (Modern Digital Sources) - COMPACT VERSION
# ---------------------------------------------------------------------------
# This prompt is optimized for GLM-4.6 context window. The original verbose
# version caused empty responses. This compact version maintains all essential
# extraction logic while staying under 2KB.
# ---------------------------------------------------------------------------
linkedin_system_prompt: |
You are a LinkedIn Profile Annotator for heritage sector analysis.
## Task
Extract structured career data from LinkedIn profile markdown into JSON.
## Output JSON Structure
{
"pico_observation": {
"observation_id": "linkedin-[slug]-[date]",
"source_type": "modern_digital",
"source_reference": "[LinkedIn URL]"
},
"persons": [{
"pnv_name": {"literalName": "...", "givenName": "...", "surnamePrefix": "van/de or null", "baseSurname": "..."},
"headline": "...",
"location": "...",
"connections": "...",
"about": "...",
"profile_image_url": "LinkedIn CDN URL or null",
"heritage_relevant_experience": [{
"role": "Job Title",
"organization": {
"name": "Org",
"linkedin_url": "https://linkedin.com/company/...",
"heritage_type": "M|A|L|R|G|E|D|O|B|H|F|S|C or null",
"industry": "...",
"size": "...",
"founded": "...",
"ownership": "Nonprofit|Public|Private"
},
"time_period": {"start": "YYYY-MM", "end": "YYYY-MM or null", "is_current": true/false, "duration": "..."},
"location": {"city": "...", "region": "...", "country": "..."},
"department": "...",
"level": "...",
"description": "..."
}],
"other_experience": [],
"education": [{"institution": "...", "linkedin_url": "...", "degree": "...", "field": "...", "time_period": {"start": "YYYY", "end": "YYYY"}}],
"skills": [],
"languages": [{"language": "...", "proficiency": "..."}]
}],
"organizations_mentioned": [{"name": "...", "linkedin_url": "...", "heritage_type": "...", "industry": "...", "size": "...", "founded": "...", "ownership": "...", "role_in_source": "employer|education"}]
}
## Heritage Type Codes
M=Museum, A=Archive, L=Library, R=Research, G=Gallery, E=Education, D=Digital, O=Official, B=Botanical/Zoo, H=Holy site, F=Feature, S=Society, C=Corporation, null=Non-heritage
## Extraction Rules
1. Parse [Text](url) markdown links - extract BOTH text AND url
2. Company metadata appears after "Company:" - parse all fields (size, founded, ownership, industry)
3. Only heritage_relevant_experience if org is GLAM-related (museums, archives, libraries, research institutes, galleries, etc.)
4. other_experience for non-heritage orgs (retail, finance, tech, etc.)
5. For Dutch names: van, de, van der, van den → surnamePrefix
6. Date parsing: "Jan 2022" → "2022-01", "Present" → end: null + is_current: true
7. Copy duration string exactly: "3 years and 10 months"
8. NEVER fabricate data - use null for missing fields
9. Return ONLY valid JSON, no markdown code blocks, no explanation
# =============================================================================
# PERSON RECONSTRUCTION PATTERN
# =============================================================================
#
# PersonReconstruction is a reconstructed person entity derived from one or
# more PersonObservations. It represents the scholarly consensus about a
# historical person based on available evidence.
# =============================================================================
person_reconstruction_pattern:
description: |
A PersonReconstruction is created by linking one or more PersonObservations
to form a unified person entity. This is the scholarly interpretation layer
that connects source-bound observations to a conceptual person.
Key distinction:
- PersonObservation: What is OBSERVED in a specific source (exact transcription)
- PersonReconstruction: What is INFERRED about the person (normalized, linked)
A single PersonReconstruction may derive from observations across:
- Multiple sources (birth record + marriage record + death record)
- Different time periods (mentions across decades)
- Various name forms ("Jan Jansz" + "Johannes Jansen" + "J. Jansen")
class: "pico:PersonReconstruction"
class_uri: "https://personsincontext.org/model#PersonReconstruction"
superclass: "pico:Person"
required_properties:
- property: "prov:wasDerivedFrom"
description: "Links to source PersonObservation(s)"
range: "pico:PersonObservation"
cardinality: "1..*"
note: "Every reconstruction MUST link to at least one observation"
- property: "prov:wasGeneratedBy"
description: "Links to the reconstruction Activity"
range: "prov:Activity"
cardinality: "1"
note: "Documents how/when/by whom reconstruction was created"
optional_properties:
- property: "prov:wasRevisionOf"
description: "Links to previous version of this reconstruction"
range: "pico:PersonReconstruction"
cardinality: "0..1"
note: "For tracking updates to reconstructions over time"
- property: "sdo:name"
description: "Normalized/preferred name form"
range: "xsd:string"
note: "The canonical name for this person"
- property: "sdo:additionalName"
description: "Structured name following PNV"
range: "pnv:PersonName"
note: "Full name breakdown using Person Name Vocabulary"
- property: "sdo:givenName"
description: "Given/first name"
range: "xsd:string"
- property: "sdo:familyName"
description: "Family/surname"
range: "xsd:string"
- property: "sdo:gender"
description: "Gender of the person"
range: "sdo:GenderType"
values: ["sdo:Male", "sdo:Female"]
- property: "sdo:birthDate"
description: "Birth date (ISO 8601)"
range: "xsd:date"
note: "May be incomplete: YYYY, YYYY-MM, or YYYY-MM-DD"
- property: "sdo:birthPlace"
description: "Place of birth"
range: "xsd:string or xsd:anyURI"
note: "Prefer linking to GeoNames or Wikidata"
- property: "sdo:deathDate"
description: "Death date (ISO 8601)"
range: "xsd:date"
- property: "sdo:deathPlace"
description: "Place of death"
range: "xsd:string or xsd:anyURI"
example:
description: "PersonReconstruction derived from multiple observations"
turtle: |
cbg:person_reconstruction_anna_koppen
a pico:PersonReconstruction ;
sdo:name "Anna Maria Koppen" ;
sdo:familyName "Koppen" ;
sdo:givenName "Anna Maria" ;
sdo:gender sdo:Female ;
sdo:birthPlace "Haarlem" ;
sdo:birthDate "1860-03-31"^^xsd:date ;
sdo:deathPlace "Detroit, USA" ;
sdo:deathDate "1926"^^xsd:gYear ;
prov:wasDerivedFrom nha:marriage_1885_po_1 ,
cbg:emigration_1887_po_1 ,
us:death_1926_po_1 ;
prov:wasGeneratedBy cbg:reconstruction_activity_01 .
# =============================================================================
# SOURCE AND SCAN CLASSES
# =============================================================================
#
# Sources (sdo:ArchiveComponent) and Scans (sdo:ImageObject) document where
# PersonObservations were extracted from. Essential for provenance.
# =============================================================================
source_classes:
archive_component:
description: |
A Source document from which PersonObservations are extracted.
PiCo does not aim to fully describe archival sources (use RiC-O or DC for that),
but requires minimal identification for provenance tracking.
class: "sdo:ArchiveComponent"
class_uri: "https://schema.org/ArchiveComponent"
superclass: "sdo:CreativeWork"
properties:
- property: "sdo:name"
description: "Identifying name for the source"
range: "xsd:string"
cardinality: "1"
note: "Combine title, date, archive location for identification"
example: "BS Marriage Haarlem, November 11, 1885, certificate number 321"
- property: "sdo:additionalType"
description: "Type of source document"
range: "picot_sourcetypes:Concept"
note: "Use PiCo SourceType thesaurus"
- property: "sdo:dateCreated"
description: "Date the source was created"
range: "xsd:date"
- property: "sdo:holdingArchive"
description: "Institution holding the source"
range: "xsd:anyURI"
note: "Link to heritage custodian (GHCID or Wikidata)"
- property: "sdo:url"
description: "Permalink to the source"
range: "sdo:URL"
note: "Preferably a persistent identifier"
- property: "sdo:contentLocation"
description: "Geographic coverage of the source"
range: "xsd:string or xsd:anyURI"
- property: "sdo:associatedMedia"
description: "Link to scan(s) of the source"
range: "sdo:ImageObject"
cardinality: "0..*"
image_object:
description: |
A Scan of a source document. Links to the digital image at the holding archive.
class: "sdo:ImageObject"
class_uri: "https://schema.org/ImageObject"
superclass: "sdo:CreativeWork"
properties:
- property: "sdo:url"
description: "URL to the full scan"
range: "sdo:URL"
note: "Preferably IIIF manifest"
- property: "sdo:thumbnail"
description: "URL to thumbnail image"
range: "sdo:ImageObject"
- property: "sdo:embedUrl"
description: "URL to image viewer"
range: "sdo:URL"
- property: "sdo:position"
description: "Position in sequence of scans"
range: "xsd:int"
note: "For multi-page sources"
# =============================================================================
# BIOGRAPHICAL PROPERTIES
# =============================================================================
#
# Properties for capturing biographical details about persons in observations.
# These appear in the source and are transcribed to the observation.
# =============================================================================
biographical_properties:
description: |
Biographical properties capture personal details as they appear in sources.
These are used for both PersonObservation (source-bound) and
PersonReconstruction (normalized).
age:
property: "pico:hasAge"
property_uri: "https://personsincontext.org/model#hasAge"
description: "Age of person as stated in source"
range: "xsd:string"
domain: "pico:PersonObservation"
note: |
Used when birth date unknown but age is recorded.
Age assumed in years unless specified ("4" = 4 years, "4 months" = 4 months).
Numerical preferred over text ("4" not "four").
examples:
- "30"
- "4 months"
- "about 25"
religion:
property: "pico:hasReligion"
property_uri: "https://personsincontext.org/model#hasReligion"
description: "Religious affiliation as stated in source"
range: "xsd:string or xsd:anyURI"
domain: "pico:Person"
note: "Can link to SKOS thesaurus for religions"
examples:
- "Catholic"
- "Reformed"
- "Jewish"
deceased:
property: "pico:deceased"
property_uri: "https://personsincontext.org/model#deceased"
description: "Indication that person is deceased (when death date unknown)"
range: "xsd:boolean"
domain: "pico:PersonObservation"
note: |
Only used when deathDate is unknown but death is indicated.
A person without deathDate and without deceased:true is assumed alive.
Important for privacy considerations in publishing person records.
gender:
property: "sdo:gender"
property_uri: "https://schema.org/gender"
description: "Gender of the person"
range: "sdo:GenderType"
domain: "pico:Person"
values:
- uri: "sdo:Male"
label: "Male"
- uri: "sdo:Female"
label: "Female"
address:
property: "sdo:address"
property_uri: "https://schema.org/address"
description: "Physical address as mentioned in source"
range: "xsd:string"
domain: "pico:PersonObservation"
note: "Address exactly as recorded in source"
initials:
property: "pnv:initials"
property_uri: "https://w3id.org/pnv#initials"
description: "Initials of given name(s)"
range: "xsd:string"
domain: "pnv:PersonName"
note: "Each initial followed by period (e.g., 'P.R.', 'H.A.F.M.O.')"
examples:
- "P.R."
- "C.Joh."
- "H.A.F.M.O."
# =============================================================================
# FAMILY RELATIONSHIP PROPERTIES
# =============================================================================
#
# PiCo defines extensive family relationship properties for genealogical data.
# These enable modeling complex family structures from historical records.
# =============================================================================
family_relationships:
description: |
Family relationship properties link persons within and across sources.
Rules:
- For PersonObservations: relationships refer to OTHER observations on SAME source
- For PersonReconstructions: relationships refer to other reconstructions
Property characteristics:
- Symmetric: If A hasRelation B, then B hasRelation A (spouses, siblings, cousins)
- Transitive: hasAncestor/hasDescendant chain through generations
- Inverse pairs: parent/children, grandparent/grandchild, etc.
# ---------------------------------------------------------------------------
# Core Family (Schema.org)
# ---------------------------------------------------------------------------
core_relationships:
- property: "sdo:parent"
property_uri: "https://schema.org/parent"
description: "A parent of the person"
inverse: "sdo:children"
subPropertyOf: ["sdo:relatedTo", "pico:hasAncestor"]
note: "Biological or legal parent"
- property: "sdo:children"
property_uri: "https://schema.org/children"
description: "A child of the person"
inverse: "sdo:parent"
subPropertyOf: ["sdo:relatedTo", "pico:hasDescendant"]
- property: "sdo:spouse"
property_uri: "https://schema.org/spouse"
description: "The person's spouse"
symmetric: true
subPropertyOf: "sdo:relatedTo"
- property: "sdo:sibling"
property_uri: "https://schema.org/sibling"
description: "A brother or sister"
symmetric: true
subPropertyOf: "sdo:relatedTo"
# ---------------------------------------------------------------------------
# Transitive Ancestry (PiCo)
# ---------------------------------------------------------------------------
ancestry_relationships:
- property: "pico:hasAncestor"
property_uri: "https://personsincontext.org/model#hasAncestor"
description: "Any ancestor (parent, grandparent, etc.)"
type: "owl:TransitiveProperty"
inverse: "pico:hasDescendant"
note: "Not used directly; parent→parent chains automatically create ancestors"
- property: "pico:hasDescendant"
property_uri: "https://personsincontext.org/model#hasDescendant"
description: "Any descendant (child, grandchild, etc.)"
type: "owl:TransitiveProperty"
inverse: "pico:hasAncestor"
# ---------------------------------------------------------------------------
# Grandparents/Grandchildren
# ---------------------------------------------------------------------------
grandparent_relationships:
- property: "pico:hasGrandparent"
property_uri: "https://personsincontext.org/model#hasGrandparent"
inverse: "pico:hasGrandchild"
- property: "pico:hasGrandchild"
property_uri: "https://personsincontext.org/model#hasGrandchild"
inverse: "pico:hasGrandparent"
- property: "pico:hasGreat-grandparent"
property_uri: "https://personsincontext.org/model#hasGreat-grandparent"
inverse: "pico:hasGreat-grandchild"
- property: "pico:hasGreat-grandchild"
property_uri: "https://personsincontext.org/model#hasGreat-grandchild"
inverse: "pico:hasGreat-grandparent"
# ---------------------------------------------------------------------------
# Aunts/Uncles and Nieces/Nephews
# ---------------------------------------------------------------------------
extended_family:
- property: "pico:hasUncle_Aunt"
property_uri: "https://personsincontext.org/model#hasUncle_Aunt"
description: "An uncle or aunt (sibling of parent)"
inverse: "pico:hasNephew_Niece"
- property: "pico:hasNephew_Niece"
property_uri: "https://personsincontext.org/model#hasNephew_Niece"
description: "A nephew or niece (child of sibling)"
inverse: "pico:hasUncle_Aunt"
- property: "pico:hasCousin"
property_uri: "https://personsincontext.org/model#hasCousin"
description: "A cousin (child of parent's sibling)"
symmetric: true
# ---------------------------------------------------------------------------
# Step-family
# ---------------------------------------------------------------------------
step_relationships:
- property: "pico:hasStepparent"
property_uri: "https://personsincontext.org/model#hasStepparent"
description: "A stepparent (spouse of biological parent)"
inverse: "pico:hasStepchild"
- property: "pico:hasStepchild"
property_uri: "https://personsincontext.org/model#hasStepchild"
inverse: "pico:hasStepparent"
- property: "pico:hasStepsibling"
property_uri: "https://personsincontext.org/model#hasStepsibling"
description: "A stepbrother or stepsister"
symmetric: true
- property: "pico:hasHalf-sibling"
property_uri: "https://personsincontext.org/model#hasHalf-sibling"
description: "A half-brother or half-sister (one shared parent)"
symmetric: true
# ---------------------------------------------------------------------------
# Foster/Godparent
# ---------------------------------------------------------------------------
non_biological_relationships:
- property: "pico:hasFosterParent"
property_uri: "https://personsincontext.org/model#hasFosterParent"
inverse: "pico:hasFosterChild"
- property: "pico:hasFosterChild"
property_uri: "https://personsincontext.org/model#hasFosterChild"
inverse: "pico:hasFosterParent"
- property: "pico:hasGodparent"
property_uri: "https://personsincontext.org/model#hasGodparent"
description: "A godparent (witness at baptism)"
inverse: "pico:hasGodchild"
- property: "pico:hasGodchild"
property_uri: "https://personsincontext.org/model#hasGodchild"
inverse: "pico:hasGodparent"
- property: "pico:hasLegitimizedChild"
property_uri: "https://personsincontext.org/model#hasLegitimizedChild"
description: "A child legitimized by marriage or legal recognition"
inverse: "pico:isLegitimitezedChildOf"
- property: "pico:isLegitimitezedChildOf"
property_uri: "https://personsincontext.org/model#isLegitimitezedChildOf"
inverse: "pico:hasLegitimizedChild"
# ---------------------------------------------------------------------------
# In-Laws
# ---------------------------------------------------------------------------
in_law_relationships:
- property: "pico:hasParent-in-law"
property_uri: "https://personsincontext.org/model#hasParent-in-law"
inverse: "pico:hasChild-in-law"
- property: "pico:hasChild-in-law"
property_uri: "https://personsincontext.org/model#hasChild-in-law"
inverse: "pico:hasParent-in-law"
- property: "pico:hasSibling-in-law"
property_uri: "https://personsincontext.org/model#hasSibling-in-law"
description: "Brother/sister-in-law"
symmetric: true
- property: "pico:hasGrandparent-in-law"
property_uri: "https://personsincontext.org/model#hasGrandparent-in-law"
inverse: "pico:hasGrandchild-in-law"
- property: "pico:hasGrandchild-in-law"
property_uri: "https://personsincontext.org/model#hasGrandchild-in-law"
inverse: "pico:hasGrandparent-in-law"
- property: "pico:hasUncle_Aunt-in-law"
property_uri: "https://personsincontext.org/model#hasUncle_Aunt-in-law"
inverse: "pico:hasNephew_Niece-in-law"
- property: "pico:hasNephew_Niece-in-law"
property_uri: "https://personsincontext.org/model#hasNephew_Niece-in-law"
inverse: "pico:hasUncle_Aunt-in-law"
- property: "pico:hasCousin-in-law"
property_uri: "https://personsincontext.org/model#hasCousin-in-law"
symmetric: true
- property: "pico:hasStepparent-in-law"
property_uri: "https://personsincontext.org/model#hasStepparent-in-law"
inverse: "pico:hasStepchild-in-law"
- property: "pico:hasStepchild-in-law"
property_uri: "https://personsincontext.org/model#hasStepchild-in-law"
inverse: "pico:hasStepparent-in-law"
# ---------------------------------------------------------------------------
# Former Partners
# ---------------------------------------------------------------------------
former_partner_relationships:
- property: "pico:isWidOf"
property_uri: "https://personsincontext.org/model#isWidOf"
description: "Is widow/widower of deceased spouse"
note: "The subject is the surviving partner"
- property: "pico:hasPreviousPartner"
property_uri: "https://personsincontext.org/model#hasPreviousPartner"
description: "A former spouse or partner"
symmetric: true
# =============================================================================
# PROVENANCE MODEL (PROV-O INTEGRATION)
# =============================================================================
#
# Enhanced provenance model for tracking observation extraction and
# reconstruction creation activities.
# =============================================================================
enhanced_provenance_model:
description: |
PiCo uses W3C PROV-O for provenance tracking at two levels:
1. OBSERVATION LEVEL: Where did this observation come from?
- prov:hadPrimarySource → Source document
- prov:wasGeneratedBy → Extraction activity (optional)
2. RECONSTRUCTION LEVEL: How was this person entity created?
- prov:wasDerivedFrom → Source observation(s)
- prov:wasGeneratedBy → Reconstruction activity
- prov:wasRevisionOf → Previous reconstruction version
activity_class:
class: "prov:Activity"
class_uri: "http://www.w3.org/ns/prov#Activity"
description: "The activity that generated a PersonReconstruction"
properties:
- property: "prov:wasAssociatedWith"
description: "Agent responsible for the activity"
range: "prov:Agent"
- property: "prov:startedAtTime"
description: "When the activity started"
range: "xsd:dateTime"
- property: "prov:endedAtTime"
description: "When the activity completed"
range: "xsd:dateTime"
- property: "prov:used"
description: "Resources/tools used in the activity"
range: "prov:Entity"
note: "E.g., ML model, matching algorithm, rule set"
types:
human_reconstruction:
description: "Manual reconstruction by researcher"
note: "Provide: time, place, knowledge sources, researcher name"
algorithmic_reconstruction:
description: "Automated reconstruction by software"
note: "Provide: algorithm name, version, configuration, parameters"
agent_class:
class: "prov:Agent"
class_uri: "http://www.w3.org/ns/prov#Agent"
description: "Person or organization responsible for reconstruction"
properties:
- property: "sdo:name"
description: "Name of the agent"
range: "xsd:string"
- property: "sdo:url"
description: "URL identifying the agent"
range: "sdo:URL"
examples:
- name: "CBG|Center for Family History"
url: "https://cbg.nl"
type: "organization"
- name: "GLM-4.6 Person Extractor v1.0"
url: null
type: "software"
derivation_properties:
- property: "prov:wasDerivedFrom"
property_uri: "http://www.w3.org/ns/prov#wasDerivedFrom"
description: "Links PersonReconstruction to source PersonObservation(s)"
domain: "pico:PersonReconstruction"
range: "pico:PersonObservation"
cardinality: "1..*"
note: "REQUIRED for all PersonReconstructions"
- property: "prov:wasRevisionOf"
property_uri: "http://www.w3.org/ns/prov#wasRevisionOf"
description: "Links to previous version of reconstruction"
domain: "pico:PersonReconstruction"
range: "pico:PersonReconstruction"
cardinality: "0..1"
note: "For tracking reconstruction updates over time"
# =============================================================================
# PICO VOCABULARIES/THESAURI
# =============================================================================
#
# PiCo provides controlled vocabularies for roles, source types, and events.
# =============================================================================
pico_vocabularies:
description: |
PiCo defines three SKOS concept schemes for controlled terminology:
- Roles: The role a person plays in a source (child, declarant, witness, etc.)
- SourceTypes: Types of historical sources (birth certificate, census, etc.)
- EventTypes: Types of life events (birth, marriage, death, etc.)
roles_thesaurus:
id: "picot_roles"
uri: "https://terms.personsincontext.org/roles/"
type: "skos:ConceptScheme"
label: "Persons in Context role thesaurus"
description: "Roles that persons can have in historical sources"
usage: |
Use pico:hasRole property with a term from this thesaurus.
Example: picot_roles:575 (child), picot_roles:489 (declarant)
example_concepts:
- id: "575"
label: "child"
description: "Person appearing as child in a record"
- id: "489"
label: "declarant"
description: "Person declaring/reporting an event"
- id: "witness"
label: "witness"
description: "Person witnessing an event or signing a document"
- id: "bride"
label: "bride"
description: "Female partner in a marriage"
- id: "groom"
label: "groom"
description: "Male partner in a marriage"
sourcetypes_thesaurus:
id: "picot_sourcetypes"
uri: "https://terms.personsincontext.org/sourcetypes/"
type: "skos:ConceptScheme"
label: "Persons in Context sourceType thesaurus"
description: "Types of historical sources containing person observations"
usage: |
Use sdo:additionalType property on sdo:ArchiveComponent.
Example: picot_sourcetypes:551 (civil registry: birth)
example_concepts:
- id: "551"
label: "civil registry: birth"
description: "Birth certificate from civil registration"
- id: "marriage"
label: "civil registry: marriage"
description: "Marriage certificate"
- id: "death"
label: "civil registry: death"
description: "Death certificate"
- id: "census"
label: "census"
description: "Population census record"
- id: "church_baptism"
label: "church record: baptism"
description: "Baptismal record from church register"
- id: "notarial"
label: "notarial record"
description: "Notarial act or protocol"
eventtypes_thesaurus:
id: "picot_eventtypes"
uri: "https://terms.personsincontext.org/eventtypes/"
type: "skos:ConceptScheme"
label: "Persons in Context eventType thesaurus"
description: "Types of life events documented in sources"
example_concepts:
- id: "birth"
label: "birth"
- id: "baptism"
label: "baptism"
- id: "marriage"
label: "marriage"
- id: "death"
label: "death"
- id: "burial"
label: "burial"
- id: "emigration"
label: "emigration"
- id: "immigration"
label: "immigration"
# =============================================================================
# GLM ANNOTATOR OUTPUT SCHEMA UPDATE
# =============================================================================
#
# Extended output schema for GLM-4.6 annotator to include family relationships
# and biographical properties.
# =============================================================================
glm_extended_output_schema:
description: |
Extended JSON output schema that includes all PiCo properties.
This supplements the core system_prompt output format.
persons_extended:
description: "Extended person object with all PiCo properties"
schema:
pnv_name:
literalName: "string"
givenName: "string|null"
patronym: "string|null"
surnamePrefix: "string|null"
baseSurname: "string|null"
honorificPrefix: "string|null"
honorificSuffix: "string|null"
initials: "string|null"
biographical:
birth_date: "ISO date|null"
death_date: "ISO date|null"
birth_place: "string|null"
death_place: "string|null"
gender: "Male|Female|null"
age: "string|null"
religion: "string|null"
deceased: "boolean|null"
address: "string|null"
floruit: "string|null"
roles: "array of role objects"
family_relationships:
parent: "array of person references"
children: "array of person references"
spouse: "array of person references"
sibling: "array of person references"
grandparent: "array of person references"
grandchild: "array of person references"
uncle_aunt: "array of person references"
nephew_niece: "array of person references"
cousin: "array of person references"
stepparent: "array of person references"
stepchild: "array of person references"
stepsibling: "array of person references"
half_sibling: "array of person references"
foster_parent: "array of person references"
foster_child: "array of person references"
godparent: "array of person references"
godchild: "array of person references"
parent_in_law: "array of person references"
child_in_law: "array of person references"
sibling_in_law: "array of person references"
previous_partner: "array of person references"
widow_of: "person reference|null"
context: "string|null"
# =============================================================================
# CH-ANNOTATOR HYPERNYM INTEGRATION UPDATE
# =============================================================================
#
# Updated hypernym mappings to include reconstruction pattern.
# =============================================================================
extended_hypernym_mapping:
description: |
Extended mappings between PiCo classes and CH-Annotator hypernyms,
including the reconstruction pattern.
mappings:
# Observation level
- pico_class: "pico:PersonObservation"
ch_hypernym: "AGT.PER"
ch_code: "AGT.PER"
note: "Source-bound person mention"
- pico_class: "pico:PersonObservation"
ch_hypernym: "AGT.STF"
ch_code: "AGT.STF"
condition: "When person has organizational role"
note: "Staff member observation"
# Reconstruction level
- pico_class: "pico:PersonReconstruction"
ch_hypernym: "AGT.PER"
ch_code: "AGT.PER"
note: "Reconstructed person entity"
linking: true
linking_sources: ["Wikidata", "VIAF", "ISNI"]
# Name components
- pico_class: "pnv:PersonName"
ch_hypernym: "APP.NAM"
ch_code: "APP.NAM"
note: "Structured name"
# Roles
- pico_class: "pico:hasRole"
ch_hypernym: "ROL"
ch_code: "ROL"
note: "Role in source"
# Family relationships
- pico_class: "sdo:parent"
ch_hypernym: "AGT.PER"
relationship_type: "family"
note: "Parent relationship"
- pico_class: "sdo:spouse"
ch_hypernym: "AGT.PER"
relationship_type: "family"
note: "Spouse relationship"
- pico_class: "pico:hasGodparent"
ch_hypernym: "AGT.PER"
relationship_type: "ritual_kinship"
note: "Godparent relationship (common in historical records)"
# Sources
- pico_class: "sdo:ArchiveComponent"
ch_hypernym: "WRK.DOC"
ch_code: "WRK.DOC"
note: "Source document"
# Provenance
- pico_class: "prov:Activity"
ch_hypernym: null
note: "Not directly annotated; tracked in provenance metadata"
- pico_class: "prov:Agent"
ch_hypernym: "AGT"
ch_code: "AGT"
note: "Extraction/reconstruction agent"
# =============================================================================
# HISTORICAL SOURCE EXTRACTION EXAMPLES
# =============================================================================
#
# Comprehensive examples showing extraction from different historical source types.
# These demonstrate the full PiCo model including family relationships.
# =============================================================================
historical_extraction_examples:
description: |
These examples demonstrate extraction from common historical source types,
showing how to capture family relationships, biographical data, and roles
according to the PiCo model.
# ---------------------------------------------------------------------------
# Example 1: Dutch Marriage Certificate (Burgerlijke Stand)
# ---------------------------------------------------------------------------
marriage_certificate_example:
source_type: "civil_registration"
source_text: |
Heden den elfden November achttien honderd vijf en tachtig, zijn voor ons
Ambtenaar van den Burgerlijken Stand der gemeente Haarlem, verschenen:
Cornelis Johannes Koppen, oud dertig jaren, schilder, geboren te Haarlem,
wonende alhier, meerderjarige zoon van wijlen Pieter Koppen en van
Anna Maria Brouwer, zonder beroep, wonende alhier;
en Anna Maria Visser, oud zeven en twintig jaren, zonder beroep, geboren
te Amsterdam, wonende alhier, meerderjarige dochter van Jan Visser,
koopman, en van wijlen Cornelia de Vries.
Als getuigen waren tegenwoordig: Hendrik Koppen, oud vijf en dertig jaren,
schilder, broeder van den bruidegom; en Willem Visser, oud twee en dertig
jaren, timmerman, broeder van de bruid.
expected_output:
pico_observation:
observation_id: "bs_haarlem_1885_marriage_321"
observed_at: "2025-12-12T10:00:00Z"
source_type: "civil_registration"
source_reference: "BS Marriage Haarlem, November 11, 1885, certificate 321"
persons:
- person_index: 0
pnv_name:
literalName: "Cornelis Johannes Koppen"
givenName: "Cornelis Johannes"
baseSurname: "Koppen"
roles:
- role_title: "schilder"
role_in_source: "groom"
biographical:
age: "30"
birth_place: "Haarlem"
address: "Haarlem"
family_relationships:
parent:
- person_index: 2
target_name: "Pieter Koppen"
- person_index: 3
target_name: "Anna Maria Brouwer"
spouse:
- person_index: 1
target_name: "Anna Maria Visser"
sibling:
- person_index: 6
target_name: "Hendrik Koppen"
- person_index: 1
pnv_name:
literalName: "Anna Maria Visser"
givenName: "Anna Maria"
baseSurname: "Visser"
roles:
- role_in_source: "bride"
biographical:
age: "27"
birth_place: "Amsterdam"
address: "Haarlem"
family_relationships:
parent:
- person_index: 4
target_name: "Jan Visser"
- person_index: 5
target_name: "Cornelia de Vries"
spouse:
- person_index: 0
target_name: "Cornelis Johannes Koppen"
sibling:
- person_index: 7
target_name: "Willem Visser"
- person_index: 2
pnv_name:
literalName: "Pieter Koppen"
givenName: "Pieter"
baseSurname: "Koppen"
biographical:
deceased: true
family_relationships:
children:
- person_index: 0
target_name: "Cornelis Johannes Koppen"
- person_index: 6
target_name: "Hendrik Koppen"
spouse:
- person_index: 3
target_name: "Anna Maria Brouwer"
- person_index: 3
pnv_name:
literalName: "Anna Maria Brouwer"
givenName: "Anna Maria"
baseSurname: "Brouwer"
roles:
- role_title: "zonder beroep"
biographical:
address: "Haarlem"
family_relationships:
children:
- person_index: 0
target_name: "Cornelis Johannes Koppen"
- person_index: 6
target_name: "Hendrik Koppen"
widow_of:
person_index: 2
target_name: "Pieter Koppen"
- person_index: 4
pnv_name:
literalName: "Jan Visser"
givenName: "Jan"
baseSurname: "Visser"
roles:
- role_title: "koopman"
family_relationships:
children:
- person_index: 1
target_name: "Anna Maria Visser"
- person_index: 7
target_name: "Willem Visser"
spouse:
- person_index: 5
target_name: "Cornelia de Vries"
- person_index: 5
pnv_name:
literalName: "Cornelia de Vries"
givenName: "Cornelia"
surnamePrefix: "de"
baseSurname: "Vries"
biographical:
deceased: true
family_relationships:
children:
- person_index: 1
target_name: "Anna Maria Visser"
- person_index: 7
target_name: "Willem Visser"
spouse:
- person_index: 4
target_name: "Jan Visser"
- person_index: 6
pnv_name:
literalName: "Hendrik Koppen"
givenName: "Hendrik"
baseSurname: "Koppen"
roles:
- role_title: "schilder"
role_in_source: "witness"
biographical:
age: "35"
family_relationships:
sibling:
- person_index: 0
target_name: "Cornelis Johannes Koppen"
parent:
- person_index: 2
target_name: "Pieter Koppen"
- person_index: 3
target_name: "Anna Maria Brouwer"
- person_index: 7
pnv_name:
literalName: "Willem Visser"
givenName: "Willem"
baseSurname: "Visser"
roles:
- role_title: "timmerman"
role_in_source: "witness"
biographical:
age: "32"
family_relationships:
sibling:
- person_index: 1
target_name: "Anna Maria Visser"
parent:
- person_index: 4
target_name: "Jan Visser"
- person_index: 5
target_name: "Cornelia de Vries"
temporal_references:
- expression: "den elfden November achttien honderd vijf en tachtig"
normalized: "1885-11-11"
type: "DATE"
locations_mentioned:
- name: "Haarlem"
type: "city"
- name: "Amsterdam"
type: "city"
provenance:
data_status: "SYNTHETIC_EXAMPLE"
notes: |
This example uses synthetic data based on authentic Dutch civil
registry (Burgerlijke Stand) marriage certificate formulae for
demonstration purposes. Names, dates, and locations are fictional
but follow authentic 19th-century patterns.
For real examples, see PROVENANCE_SOURCES.md.
related_real_sources:
- archive: "Centraal Bureau voor Genealogie (CBG)"
project: "WieWasWie"
digital_url: "https://www.wiewaswie.nl/"
document_type: "Birth, marriage, death certificates"
period: "1811-present (civil); 1600s+ (church)"
language: "Dutch"
license: "Subscription / Free at archives"
- archive: "Noord-Hollands Archief"
coverage: "Civil registry from 1811, church records from 1600s"
location: "Haarlem, Netherlands"
document_types: "Dutch civil registry records"
# ---------------------------------------------------------------------------
# Example 2: Early Modern Notarial Protocol Index Entry
# ---------------------------------------------------------------------------
notarial_index_example:
source_type: "historical_indices"
source_text: |
Notarial Archive Amsterdam, inv. 5075/1234
30 January 1680
Before notary Pieter van der Meer appeared:
Jacob Janszoon van der Hoeven, merchant of this city,
with his wife Maritgen Claes, for themselves and as
guardians (voogden) of the minor children of the late
Claes Jacobsz and Aeltgen Pieters, namely:
- Jan Claeszoon, aged about 16 years
- Trijntgen Claesdr, aged about 12 years
Witnesses: Hendrick Jansz, baker, and Cornelis Pietersz,
schoolmaster, both of this city.
expected_output:
pico_observation:
observation_id: "na_amsterdam_5075_1234"
observed_at: "2025-12-12T10:00:00Z"
source_type: "historical_indices"
source_reference: "Notarial Archive Amsterdam, inv. 5075/1234, 30 January 1680"
persons:
- person_index: 0
pnv_name:
literalName: "Jacob Janszoon van der Hoeven"
givenName: "Jacob"
patronym: "Janszoon"
surnamePrefix: "van der"
baseSurname: "Hoeven"
roles:
- role_title: "merchant"
role_in_source: "declarant"
- role_title: "voogd"
role_in_source: null
biographical:
address: "Amsterdam"
family_relationships:
spouse:
- person_index: 1
target_name: "Maritgen Claes"
- person_index: 1
pnv_name:
literalName: "Maritgen Claes"
givenName: "Maritgen"
patronym: "Claes"
roles:
- role_in_source: "declarant"
- role_title: "voogd"
family_relationships:
spouse:
- person_index: 0
target_name: "Jacob Janszoon van der Hoeven"
- person_index: 2
pnv_name:
literalName: "Claes Jacobsz"
givenName: "Claes"
patronym: "Jacobsz"
biographical:
deceased: true
family_relationships:
spouse:
- person_index: 3
target_name: "Aeltgen Pieters"
children:
- person_index: 4
target_name: "Jan Claeszoon"
- person_index: 5
target_name: "Trijntgen Claesdr"
- person_index: 3
pnv_name:
literalName: "Aeltgen Pieters"
givenName: "Aeltgen"
patronym: "Pieters"
biographical:
deceased: true
family_relationships:
spouse:
- person_index: 2
target_name: "Claes Jacobsz"
children:
- person_index: 4
target_name: "Jan Claeszoon"
- person_index: 5
target_name: "Trijntgen Claesdr"
- person_index: 4
pnv_name:
literalName: "Jan Claeszoon"
givenName: "Jan"
patronym: "Claeszoon"
roles:
- role_in_source: "child"
biographical:
age: "about 16"
family_relationships:
parent:
- person_index: 2
target_name: "Claes Jacobsz"
- person_index: 3
target_name: "Aeltgen Pieters"
sibling:
- person_index: 5
target_name: "Trijntgen Claesdr"
- person_index: 5
pnv_name:
literalName: "Trijntgen Claesdr"
givenName: "Trijntgen"
patronym: "Claesdr"
roles:
- role_in_source: "child"
biographical:
age: "about 12"
gender: "Female"
family_relationships:
parent:
- person_index: 2
target_name: "Claes Jacobsz"
- person_index: 3
target_name: "Aeltgen Pieters"
sibling:
- person_index: 4
target_name: "Jan Claeszoon"
- person_index: 6
pnv_name:
literalName: "Pieter van der Meer"
givenName: "Pieter"
surnamePrefix: "van der"
baseSurname: "Meer"
roles:
- role_title: "notary"
- person_index: 7
pnv_name:
literalName: "Hendrick Jansz"
givenName: "Hendrick"
patronym: "Jansz"
roles:
- role_title: "baker"
role_in_source: "witness"
biographical:
address: "Amsterdam"
- person_index: 8
pnv_name:
literalName: "Cornelis Pietersz"
givenName: "Cornelis"
patronym: "Pietersz"
roles:
- role_title: "schoolmaster"
role_in_source: "witness"
biographical:
address: "Amsterdam"
temporal_references:
- expression: "30 January 1680"
normalized: "1680-01-30"
type: "DATE"
locations_mentioned:
- name: "Amsterdam"
type: "city"
provenance:
data_status: "SYNTHETIC_EXAMPLE"
notes: |
This example uses synthetic data based on authentic early modern
notarial protocol index entry formulae for demonstration purposes.
Names, dates, and locations are fictional but follow authentic
17th-century Dutch notarial patterns.
For real examples, see PROVENANCE_SOURCES.md.
related_real_sources:
- archive: "Stadsarchief Amsterdam"
collection: "Notarial Archives (Notariële Archieven)"
document_type: "Notarial protocols, contracts, testaments"
period: "1578-1915"
language: "Dutch, Latin"
notes: "Largest notarial archive in the Netherlands"
- project: "TICCLAT (Transliteration of Early Modern Dutch Notarial Archives)"
coverage: "Amsterdam notarial indices"
period: "17th-18th century"
notes: "Machine-readable indices to notarial protocols"
# ---------------------------------------------------------------------------
# Example 3: Church Baptismal Record with Godparents
# ---------------------------------------------------------------------------
baptism_record_example:
source_type: "church_records"
source_text: |
Den 15en Meij 1702 is gedoopt
Johanna, dochter van Willem Hendriksen en Geertruijd Jans,
getuijgen waren de E. Heer Jan Willem van Beverwijck
ende Juffrou Maria van Loon, huijsvrouw van de heer
Pieter Anthonisz Verschoor.
expected_output:
pico_observation:
observation_id: "dtb_amsterdam_1702_baptism_johanna"
observed_at: "2025-12-12T10:00:00Z"
source_type: "church_records"
source_reference: "DTB Amsterdam, 15 May 1702"
persons:
- person_index: 0
pnv_name:
literalName: "Johanna"
givenName: "Johanna"
roles:
- role_in_source: "child"
biographical:
gender: "Female"
family_relationships:
parent:
- person_index: 1
target_name: "Willem Hendriksen"
- person_index: 2
target_name: "Geertruijd Jans"
godparent:
- person_index: 3
target_name: "Jan Willem van Beverwijck"
- person_index: 4
target_name: "Maria van Loon"
- person_index: 1
pnv_name:
literalName: "Willem Hendriksen"
givenName: "Willem"
patronym: "Hendriksen"
biographical:
gender: "Male"
family_relationships:
children:
- person_index: 0
target_name: "Johanna"
spouse:
- person_index: 2
target_name: "Geertruijd Jans"
- person_index: 2
pnv_name:
literalName: "Geertruijd Jans"
givenName: "Geertruijd"
patronym: "Jans"
biographical:
gender: "Female"
family_relationships:
children:
- person_index: 0
target_name: "Johanna"
spouse:
- person_index: 1
target_name: "Willem Hendriksen"
- person_index: 3
pnv_name:
literalName: "Jan Willem van Beverwijck"
givenName: "Jan Willem"
surnamePrefix: "van"
baseSurname: "Beverwijck"
honorificPrefix: "de E. Heer"
roles:
- role_in_source: "witness"
biographical:
gender: "Male"
family_relationships:
godchild:
- person_index: 0
target_name: "Johanna"
- person_index: 4
pnv_name:
literalName: "Maria van Loon"
givenName: "Maria"
surnamePrefix: "van"
baseSurname: "Loon"
honorificPrefix: "Juffrou"
roles:
- role_in_source: "witness"
biographical:
gender: "Female"
family_relationships:
godchild:
- person_index: 0
target_name: "Johanna"
spouse:
- person_index: 5
target_name: "Pieter Anthonisz Verschoor"
- person_index: 5
pnv_name:
literalName: "Pieter Anthonisz Verschoor"
givenName: "Pieter"
patronym: "Anthonisz"
baseSurname: "Verschoor"
honorificPrefix: "de heer"
biographical:
gender: "Male"
family_relationships:
spouse:
- person_index: 4
target_name: "Maria van Loon"
temporal_references:
- expression: "Den 15en Meij 1702"
normalized: "1702-05-15"
type: "DATE"
provenance:
data_status: "SYNTHETIC_EXAMPLE"
notes: |
This example uses synthetic data based on authentic Dutch Reformed
Church (Nederlandse Hervormde Kerk) baptismal register formulae for
demonstration purposes. Names, dates, and locations are fictional
but follow authentic early 18th-century patterns.
For real examples, see PROVENANCE_SOURCES.md.
related_real_sources:
- archive: "Various Dutch Regional Archives"
collection: "Doop-, Trouw- en Begraafregisters (DTB)"
document_type: "Church baptism, marriage, burial records"
period: "1600s-1811 (before civil registration)"
language: "Dutch"
notes: "Pre-1811 vital records maintained by churches"
- archive: "FamilySearch"
collection: "Netherlands, Church Records"
wiki_url: "https://www.familysearch.org/en/wiki/Netherlands_Church_Records"
document_type: "Dutch church baptisms"
license: "Free with registration"
# ---------------------------------------------------------------------------
# Example 4: Modern LinkedIn Staff Profile
# ---------------------------------------------------------------------------
linkedin_profile_example:
source_type: "modern_digital"
source_text: |
Dr. Maria van den Berg
Director of Collections | Rijksmuseum
Amsterdam, Netherlands
About:
Leading the collections management team at the Rijksmuseum since 2018.
Previously Head Curator at the Van Gogh Museum (2012-2018).
PhD in Art History, University of Amsterdam.
Experience:
- Director of Collections, Rijksmuseum (2018-present)
- Head Curator, Van Gogh Museum (2012-2018)
- Assistant Curator, Stedelijk Museum (2008-2012)
Education:
- PhD Art History, University of Amsterdam (2008)
- MA Museum Studies, University of Amsterdam (2003)
expected_output:
pico_observation:
observation_id: "linkedin_maria_van_den_berg_2025"
observed_at: "2025-12-12T10:00:00Z"
source_type: "modern_digital"
source_reference: "https://linkedin.com/in/mariavandenberg"
persons:
- person_index: 0
pnv_name:
literalName: "Dr. Maria van den Berg"
givenName: "Maria"
surnamePrefix: "van den"
baseSurname: "Berg"
honorificPrefix: "Dr."
roles:
- role_title: "Director of Collections"
organization: "Rijksmuseum"
period: "2018-present"
heritage_relevant: true
heritage_type: "M"
- role_title: "Head Curator"
organization: "Van Gogh Museum"
period: "2012-2018"
heritage_relevant: true
heritage_type: "M"
- role_title: "Assistant Curator"
organization: "Stedelijk Museum"
period: "2008-2012"
heritage_relevant: true
heritage_type: "M"
biographical:
address: "Amsterdam, Netherlands"
family_relationships: {}
context: "Heritage sector professional with museum career"
organizations_mentioned:
- name: "Rijksmuseum"
type: "M"
role_in_source: "employer"
- name: "Van Gogh Museum"
type: "M"
role_in_source: "employer"
- name: "Stedelijk Museum"
type: "M"
role_in_source: "employer"
- name: "University of Amsterdam"
type: "E"
role_in_source: "education"
locations_mentioned:
- name: "Amsterdam"
type: "city"
- name: "Netherlands"
type: "country"
provenance:
data_status: "SYNTHETIC_EXAMPLE"
notes: |
This example uses synthetic data based on modern LinkedIn profile
formats for demonstration purposes. The profile name, institution,
and biographical details are entirely fictional. LinkedIn profiles
represent a modern source type for person-in-context observations,
contrasting with the historical document examples in this module.
source_context:
platform: "LinkedIn"
data_type: "Modern professional networking profile"
privacy_note: |
When extracting real LinkedIn data, ensure compliance with
LinkedIn Terms of Service, GDPR, and applicable privacy laws.
This synthetic example demonstrates extraction patterns only.
# ---------------------------------------------------------------------------
# Example 5: Arabic Waqf Document (Endowment Record)
# ---------------------------------------------------------------------------
arabic_waqf_example:
source_type: "archival_descriptions"
language: "Arabic"
description: |
Example of a waqf (religious endowment) document from an Islamic archive.
Waqf documents record property endowments for religious/charitable purposes
and typically name the founder, beneficiaries, and witnesses.
source_text: |
بسم الله الرحمن الرحيم
هذا ما وقف وحبس وسبل وأبد المرحوم الحاج أحمد بن محمد العمري، تاجر بمدينة
حلب الشهباء، ابن المرحوم محمد بن عبد الله العمري. وقف جميع داره الكائنة
بمحلة الجديدة على أولاده وأولاد أولاده ذكوراً وإناثاً. وإن انقرضوا لا سمح
الله فعلى فقراء المسلمين. وشهد على ذلك الشهود: الحاج إبراهيم بن يوسف
التركماني، والسيد علي بن حسين الحلبي. وكتب في شهر رجب سنة ألف ومائتين
وخمس وعشرين هجرية.
[Translation: In the name of God, the Compassionate, the Merciful.
This is what the late al-Hajj Ahmad ibn Muhammad al-'Umari, merchant
in the city of Aleppo, son of the late Muhammad ibn Abdullah al-'Umari,
has endowed, dedicated, and perpetuated. He endowed his entire house
located in the al-Jadida neighborhood for his children and grandchildren,
male and female. If they cease to exist, God forbid, then for the poor
Muslims. Witnessed by: al-Hajj Ibrahim ibn Yusuf al-Turkmani, and
al-Sayyid Ali ibn Husayn al-Halabi. Written in the month of Rajab,
year 1225 Hijri (1810 CE).]
expected_output:
pico_observation:
observation_id: "waqf_aleppo_1225h_ahmad_umari"
observed_at: "2025-12-12T10:00:00Z"
source_type: "archival_descriptions"
source_reference: "Waqf document, Aleppo, Rajab 1225 AH (1810 CE)"
persons:
- person_index: 0
pnv_name:
literalName: "الحاج أحمد بن محمد العمري"
literalName_romanized: "al-Hajj Ahmad ibn Muhammad al-'Umari"
givenName: "أحمد"
givenName_romanized: "Ahmad"
patronym: "محمد"
patronym_romanized: "Muhammad"
baseSurname: "العمري"
baseSurname_romanized: "al-'Umari"
honorificPrefix: "الحاج"
honorificPrefix_romanized: "al-Hajj"
roles:
- role_title: "تاجر"
role_title_romanized: "merchant"
role_in_source: "founder"
biographical:
deceased: true
address: "حلب الشهباء (Aleppo)"
family_relationships:
parent:
- person_index: 1
target_name: "محمد بن عبد الله العمري"
context: "Waqf founder (واقف)"
- person_index: 1
pnv_name:
literalName: "محمد بن عبد الله العمري"
literalName_romanized: "Muhammad ibn Abdullah al-'Umari"
givenName: "محمد"
givenName_romanized: "Muhammad"
patronym: "عبد الله"
patronym_romanized: "Abdullah"
baseSurname: "العمري"
baseSurname_romanized: "al-'Umari"
honorificPrefix: "المرحوم"
honorificPrefix_romanized: "the late"
biographical:
deceased: true
family_relationships:
children:
- person_index: 0
target_name: "أحمد بن محمد العمري"
context: "Father of the founder"
- person_index: 2
pnv_name:
literalName: "الحاج إبراهيم بن يوسف التركماني"
literalName_romanized: "al-Hajj Ibrahim ibn Yusuf al-Turkmani"
givenName: "إبراهيم"
givenName_romanized: "Ibrahim"
patronym: "يوسف"
patronym_romanized: "Yusuf"
baseSurname: "التركماني"
baseSurname_romanized: "al-Turkmani"
honorificPrefix: "الحاج"
honorificPrefix_romanized: "al-Hajj"
roles:
- role_in_source: "witness"
family_relationships: {}
context: "Witness to the endowment"
- person_index: 3
pnv_name:
literalName: "السيد علي بن حسين الحلبي"
literalName_romanized: "al-Sayyid Ali ibn Husayn al-Halabi"
givenName: "علي"
givenName_romanized: "Ali"
patronym: "حسين"
patronym_romanized: "Husayn"
baseSurname: "الحلبي"
baseSurname_romanized: "al-Halabi"
honorificPrefix: "السيد"
honorificPrefix_romanized: "al-Sayyid"
roles:
- role_in_source: "witness"
family_relationships: {}
context: "Witness to the endowment"
temporal_references:
- expression: "شهر رجب سنة ألف ومائتين وخمس وعشرين هجرية"
expression_romanized: "month of Rajab, year 1225 Hijri"
normalized: "1810-07" # Approximate Gregorian equivalent
calendar: "Hijri"
type: "DATE"
locations_mentioned:
- name: "حلب الشهباء"
name_romanized: "Aleppo"
type: "city"
- name: "محلة الجديدة"
name_romanized: "al-Jadida neighborhood"
type: "neighborhood"
arabic_naming_notes: |
Arabic naming conventions demonstrated:
- ابن/بن (ibn/bin): patronymic "son of"
- الحاج (al-Hajj): honorific for one who completed pilgrimage
- السيد (al-Sayyid): honorific denoting descent from Prophet Muhammad
- المرحوم (al-marhum): "the late" (deceased)
- نسبة (nisba): geographic/tribal surname (العمري - from 'Umar tribe,
التركماني - Turkman origin, الحلبي - from Aleppo)
provenance:
data_status: "SYNTHETIC_EXAMPLE"
notes: |
This example uses synthetic data based on standard waqf document formulae
for demonstration purposes. Names, dates, and property details are fictional.
For real examples, see PROVENANCE_SOURCES.md.
related_real_sources:
- archive: "Cambridge University Library"
collection: "Islamic Manuscripts"
digital_url: "https://cudl.lib.cam.ac.uk/collections/islamic"
document_types: "Waqfiyya, legal documents"
period: "8th-20th century CE"
license: "CC BY-NC 4.0"
- archive: "University of Pennsylvania Libraries"
collection: "Manuscripts of the Muslim World"
digital_url: "https://openn.library.upenn.edu/html/muslimworld_contents.html"
document_types: "Waqfiyya, Quranic manuscripts, legal documents"
license: "Public Domain / CC0"
- archive: "Singapore National Heritage Board"
accession_number: "1115401"
digital_url: "https://www.roots.gov.sg/Collection-Landing/listing/1115401"
document_type: "Waqf document"
donor: "Muhammad b. Abd al-Ghani"
properties: "Istanbul (various locations)"
# =============================================================================
# EXAMPLE 6: Hebrew Ketubah - Marriage of Mosheh & Rivkah (Mashhad, Iran, 1896)
# =============================================================================
#
# REAL HISTORICAL DATA from Yale University Beinecke Library
#
# Source type: ketubah (Jewish marriage contract)
# Language: Hebrew/Aramaic
# Date: 23 Elul 5656 AM (September 1, 1896 CE)
# Location: Mashhad, Iran
# Call Number: Hebrew MSS suppl 194 (Broadside)
#
# This is a REAL ketubah with verified provenance from Yale's digital collection.
# The Mashhad Jewish community had a unique history as "crypto-Jews" after
# forced conversion in 1839, making this document culturally significant.
# =============================================================================
example_6_hebrew_ketubah:
description: |
A ketubah (כתובה) is a Jewish marriage contract written in Aramaic with Hebrew
elements. This REAL example from Mashhad, Iran demonstrates Persian Jewish
traditions with elaborate decorative elements.
Historical context: The Jewish community of Mashhad was unique - after forced
conversion to Islam in 1839 (the Allahdad pogrom), many continued practicing
Judaism in secret as "Jadid al-Islam" (new Muslims). By 1896, some families
were more openly practicing Judaism, as evidenced by this elaborate ketubah.
Key features documented:
- Groom and bride names with patronymics (בן/בת - son/daughter of)
- Persian Jewish artistic traditions (floral patterns, colored rules)
- Hebrew date with month, day, and year from Creation
- Isaiah 61:10 verse as blessing
- Physical dimensions: 53 x 37 cm
source_text: |
[Note: Full text not transcribed from manuscript. Key readable elements:]
בס״ד
שנת חמשת אלפים שש מאות וחמישים ושש לבריאת עולם
עשרים ושלשה לחודש אלול
במשהד
החתן משה בן משיאח
הכלה רבקה בת יעקב
[Isaiah 61:10 - visible in decorative header:]
שוש אשיש בה׳ תגל נפשי באלהי כי הלבישני בגדי ישע מעיל צדקה יעטני
כחתן יכהן פאר וככלה תעדה כליה
source_text_romanized: |
B'siyata d'shmaya (With Heaven's help)
In the year five thousand six hundred and fifty-six from the Creation of the world,
the twenty-third day of the month of Elul,
in Mashhad.
The groom: Mosheh son of Mashiah
The bride: Rivkah daughter of Ya'akov
[Isaiah 61:10 - decorative header blessing:]
"I will greatly rejoice in the LORD, my soul shall be joyful in my God.
For he has clothed me with the garments of salvation, he has covered me
with the robe of righteousness, as a bridegroom decks himself with a garland,
and as a bride adorns herself with her jewels."
expected_extraction:
description: "Hebrew ketubah extraction from REAL Mashhad, Iran document (1896)"
pico_observation:
observation_id: "ketubah_mashhad_5656_mosheh_rivkah"
observed_at: "2025-01-13T12:00:00Z"
source_type: "ketubah"
source_reference: "Ketubah, Mashhad, 23 Elul 5656 (September 1, 1896 CE), Yale Beinecke Hebrew MSS suppl 194"
archive: "Yale University, Beinecke Rare Book & Manuscript Library"
persons:
- person_index: 0
pnv_name:
literalName: "משה בן משיאח"
literalName_romanized: "Mosheh ben Mashiah"
givenName: "משה"
givenName_romanized: "Mosheh"
patronym: "משיאח"
patronym_romanized: "Mashiah"
roles:
- role_title: "חתן"
role_title_romanized: "chatan"
role_in_source: "groom"
biographical:
sex: "male"
religion: "Jewish"
community: "Mashhad Jewish community (Mashhadis)"
family_relationships:
father:
- person_index: 1
target_name: "משיאח"
spouse:
- person_index: 2
target_name: "רבקה בת יעקב"
context: "Groom (חתן) - the bridegroom in the marriage contract"
- person_index: 1
pnv_name:
literalName: "משיאח"
literalName_romanized: "Mashiah"
givenName: "משיאח"
givenName_romanized: "Mashiah"
biographical:
sex: "male"
note: "Name meaning 'Messiah' - common Persian Jewish name"
family_relationships:
child:
- person_index: 0
target_name: "משה"
context: "Father of the groom (implicit from patronymic)"
- person_index: 2
pnv_name:
literalName: "רבקה בת יעקב"
literalName_romanized: "Rivkah bat Ya'akov"
givenName: "רבקה"
givenName_romanized: "Rivkah"
givenName_english: "Rebecca"
patronym: "יעקב"
patronym_romanized: "Ya'akov"
roles:
- role_title: "כלה"
role_title_romanized: "kallah"
role_in_source: "bride"
biographical:
sex: "female"
religion: "Jewish"
community: "Mashhad Jewish community (Mashhadis)"
family_relationships:
father:
- person_index: 3
target_name: "יעקב"
spouse:
- person_index: 0
target_name: "משה בן משיאח"
context: "Bride (כלה) - daughter of Ya'akov"
- person_index: 3
pnv_name:
literalName: "יעקב"
literalName_romanized: "Ya'akov"
givenName: "יעקב"
givenName_romanized: "Ya'akov"
givenName_english: "Jacob"
biographical:
sex: "male"
note: "Biblical patriarch name - common in Jewish communities"
family_relationships:
child:
- person_index: 2
target_name: "רבקה"
context: "Father of the bride (implicit from patronymic)"
temporal_references:
- expression: "עשרים ושלשה לחודש אלול שנת חמשת אלפים שש מאות וחמישים ושש לבריאת עולם"
expression_romanized: "23rd day of the month of Elul, year 5656 from Creation"
normalized_gregorian: "1896-09-01"
calendar: "Hebrew"
type: "DATE"
components:
day: 23
month: "אלול (Elul)"
month_number: 6
year_hebrew: 5656
year_gregorian: 1896
era: "לבריאת עולם (from Creation)"
notes: "Elul is the 6th month of the civil year, 12th of the ecclesiastical year"
locations_mentioned:
- name: "משהד"
name_romanized: "Mashhad"
name_persian: "مشهد"
type: "city"
country: "Iran (then Qajar Persia)"
modern_country: "Iran"
coordinates: "36.2972, 59.6067"
historical_context: |
Mashhad is a major city in northeastern Iran, holy city of Shia Islam
(shrine of Imam Reza). The Jewish community dated to ancient times but
faced forced conversion in 1839. By 1896, some families openly practiced
Judaism while others remained crypto-Jews.
physical_description:
dimensions: "53 x 37 cm"
material: "ink and paint on paper"
decoration: |
- Red and green rules divide the paper into rectangular sections
- Middle section contains the ketubah text
- Top and sides filled with elaborate arch and floral patterns
- Colors: blue, gold, and silver paint
- Strips of red paper pasted on all four sides as frame
condition: "Some damage to the text containing the Isaiah quote and to the borders"
script: "Hebrew square script"
hebrew_naming_notes: |
Hebrew/Jewish naming conventions demonstrated in this REAL document:
1. PATRONYMIC SYSTEM:
- בן (ben): "son of" - used for males
- בת (bat): "daughter of" - used for females
- Example: משה בן משיאח = "Mosheh son of Mashiah"
2. PERSIAN JEWISH NAMES:
- משיאח (Mashiah/Messiah): Common Persian Jewish given name
- רבקה (Rivkah/Rebecca): Biblical matriarch name
- יעקב (Ya'akov/Jacob): Biblical patriarch name
3. KETUBAH STRUCTURE:
- Opening: בס״ד (B'siyata d'Shmaya - With Heaven's help)
- Date: Hebrew calendar from Creation (anno mundi)
- Location: City name in Hebrew transliteration
- Parties: Groom (חתן) and Bride (כלה) with patronymics
- Blessing: Often biblical verses (here Isaiah 61:10)
4. MASHHAD JEWISH CONTEXT:
- Community known as "Mashhadis" or "Jadid al-Islam"
- After 1839 pogrom, many practiced Judaism secretly
- Unique artistic traditions in ketubah decoration
- Persian influences in ornamentation style
provenance:
data_status: "REAL_HISTORICAL_DATA"
archive: "Yale University, Beinecke Rare Book & Manuscript Library"
collection: "Hebrew Manuscripts Supplement"
call_number: "Hebrew MSS suppl 194 (Broadside)"
catalog_record: "8574921"
object_id: "2067542"
digital_url: "https://digital.library.yale.edu/catalog/2067542"
iiif_manifest: "https://digital.library.yale.edu/manifests/2067542"
pdf_url: "https://digital.library.yale.edu/pdfs/2067542.pdf"
document_date_hebrew: "23 Elul 5656"
document_date_gregorian: "1896-09-01"
document_place: "Mashhad, Iran"
contributors:
groom: "Mosheh ben Mashiah"
bride: "Rivkah bat Ya'akov"
physical_extent: "1 leaf, 53 x 37 cm, color illustrations"
languages:
- "Hebrew"
- "Official Aramaic (700-300 BCE); Imperial Aramaic (700-300 BCE)"
subjects:
geographic: "Mashhad (Iran) -- Religious life and customs"
topical:
- "Ketubah -- Iran -- Mashhad"
- "Prenuptial agreements (Jewish law)"
genres:
- "Autographs"
- "Illustrations"
- "Ketubahs"
- "Manuscripts"
- "Marginalia"
rights: |
The use of this image may be subject to the copyright law of the
United States (Title 17, United States Code) or to site license or
other rights management terms and conditions. The person using the
image is liable for any infringement.
access_date: "2025-01-13"
citation: |
"Ketubah : Mashhad, Iran, 1896, September 1," Yale University Library,
Beinecke Rare Book and Manuscript Library, Hebrew MSS suppl 194 (Broadside),
Object ID 2067542. Digital Collections, https://digital.library.yale.edu/catalog/2067542
(accessed January 13, 2025).
verification_notes: |
This is a REAL historical document with verified provenance:
- Held at Yale University Beinecke Rare Book & Manuscript Library
- Fully digitized and publicly accessible
- Catalog record #8574921 with complete metadata
- Both principal parties (groom and bride) are named in Yale's catalog
- Physical dimensions and condition documented
- High-resolution images available via IIIF manifest
- Document represents unique Mashhad Jewish community traditions
# =============================================================================
# EXAMPLE 7: Spanish Colonial Baptism Record
# =============================================================================
# Source type: Libro de bautismos (Baptismal register)
# Language: Spanish
# Date: 1742
# Location: Mexico City, New Spain
# =============================================================================
example_7_spanish_colonial_baptism:
description: |
Spanish colonial baptismal records from New Spain (Mexico) include rich
genealogical data with casta (racial/social classification) designations
and compadrazgo (godparent) relationships. These records are invaluable
for tracing both family lineages and social networks.
Key features:
- Casta designations (español, mestizo, mulato, indio, etc.)
- Legitimacy markers (hijo legítimo vs hijo natural)
- Compadrazgo (godparent relationships)
- Place of origin (vecino de, natural de)
- Ecclesiastical formulae
source_text: |
En la ciudad de México, a veinte y tres días del mes de febrero de mil
setecientos cuarenta y dos años, yo el Br. Don Antonio de Mendoza,
teniente de cura de esta santa iglesia catedral, bauticé solemnemente,
puse óleo y crisma a Juan José, español, hijo legítimo de Don Pedro
García de la Cruz, español, natural de la villa de Puebla de los Ángeles,
y de Doña María Josefa de los Reyes, española, natural de esta ciudad.
Fueron sus padrinos Don Francisco Xavier de Castañeda, español, vecino
de esta ciudad, y Doña Ana María de la Encarnación, su legítima esposa,
a quienes advertí el parentesco espiritual y obligaciones que contrajeron.
Y lo firmé.
Br. Don Antonio de Mendoza
expected_extraction:
description: "Spanish colonial baptism demonstrating casta system and compadrazgo"
pico_observation:
observation_id: "bautismo_mexico_1742_juan_jose_garcia"
observed_at: "2025-12-12T12:00:00Z"
source_type: "baptismal_register"
source_reference: "Libro de Bautismos, Catedral de México, 23 Feb 1742"
persons:
- person_index: 0
pnv_name:
literalName: "Juan José"
givenName: "Juan José"
roles:
- role_title: "bautizado"
role_in_source: "baptized"
biographical:
casta: "español"
legitimacy: "hijo legítimo"
religion: "Catholic"
family_relationships:
parent:
- person_index: 1
target_name: "Don Pedro García de la Cruz"
- person_index: 2
target_name: "Doña María Josefa de los Reyes"
godparent:
- person_index: 3
target_name: "Don Francisco Xavier de Castañeda"
- person_index: 4
target_name: "Doña Ana María de la Encarnación"
context: "Infant being baptized"
- person_index: 1
pnv_name:
literalName: "Don Pedro García de la Cruz"
givenName: "Pedro"
surnamePrefix: "García de"
baseSurname: "la Cruz"
honorificPrefix: "Don"
biographical:
casta: "español"
origin: "natural de la villa de Puebla de los Ángeles"
family_relationships:
spouse:
- person_index: 2
target_name: "Doña María Josefa de los Reyes"
children:
- person_index: 0
target_name: "Juan José"
context: "Father of the baptized child"
- person_index: 2
pnv_name:
literalName: "Doña María Josefa de los Reyes"
givenName: "María Josefa"
surnamePrefix: "de"
baseSurname: "los Reyes"
honorificPrefix: "Doña"
biographical:
casta: "española"
origin: "natural de esta ciudad"
family_relationships:
spouse:
- person_index: 1
target_name: "Don Pedro García de la Cruz"
children:
- person_index: 0
target_name: "Juan José"
context: "Mother of the baptized child"
- person_index: 3
pnv_name:
literalName: "Don Francisco Xavier de Castañeda"
givenName: "Francisco Xavier"
surnamePrefix: "de"
baseSurname: "Castañeda"
honorificPrefix: "Don"
roles:
- role_title: "padrino"
role_in_source: "godfather"
biographical:
casta: "español"
residence: "vecino de esta ciudad"
family_relationships:
spouse:
- person_index: 4
target_name: "Doña Ana María de la Encarnación"
godchildren:
- person_index: 0
target_name: "Juan José"
compadre:
- person_index: 1
target_name: "Don Pedro García de la Cruz"
context: "Godfather (padrino)"
- person_index: 4
pnv_name:
literalName: "Doña Ana María de la Encarnación"
givenName: "Ana María"
surnamePrefix: "de"
baseSurname: "la Encarnación"
honorificPrefix: "Doña"
roles:
- role_title: "madrina"
role_in_source: "godmother"
biographical:
marital_status: "legítima esposa"
family_relationships:
spouse:
- person_index: 3
target_name: "Don Francisco Xavier de Castañeda"
godchildren:
- person_index: 0
target_name: "Juan José"
comadre:
- person_index: 2
target_name: "Doña María Josefa de los Reyes"
context: "Godmother (madrina)"
- person_index: 5
pnv_name:
literalName: "Br. Don Antonio de Mendoza"
givenName: "Antonio"
surnamePrefix: "de"
baseSurname: "Mendoza"
honorificPrefix: "Br. Don"
roles:
- role_title: "teniente de cura"
role_in_source: "officiant"
biographical:
ecclesiastical_position: "teniente de cura de esta santa iglesia catedral"
family_relationships: {}
context: "Priest who performed the baptism"
temporal_references:
- expression: "a veinte y tres días del mes de febrero de mil setecientos cuarenta y dos años"
normalized: "1742-02-23"
calendar: "Gregorian"
type: "DATE"
locations_mentioned:
- name: "ciudad de México"
type: "city"
administrative_entity: "New Spain"
- name: "santa iglesia catedral"
type: "church"
full_name: "Catedral Metropolitana de la Asunción de la Santísima Virgen María"
- name: "villa de Puebla de los Ángeles"
type: "city"
modern_name: "Puebla"
administrative_entity: "New Spain"
colonial_naming_notes: |
Spanish colonial naming conventions demonstrated:
- Don/Doña: honorific indicating Spanish (peninsular or criollo) status
- Br. (Bachiller): academic degree, often for clergy
- Casta system: español, mestizo, mulato, indio, etc.
- "natural de": indicates place of birth
- "vecino de": indicates place of residence
- "hijo legítimo": legitimate child (parents married)
- "hijo natural": illegitimate child (parents not married)
- Compadrazgo: godparent relationship creating spiritual kinship
- Padrino/madrina: godfather/godmother
- Compadre/comadre: relationship between godparents and parents
- "parentesco espiritual": spiritual kinship with religious obligations
provenance:
data_status: "SYNTHETIC_EXAMPLE"
notes: |
This example uses synthetic data based on standard Spanish colonial
baptismal formulae for demonstration purposes. Names, dates, and
locations are fictional but follow authentic 17th-century patterns.
For real examples, see PROVENANCE_SOURCES.md.
related_real_sources:
- archive: "Brigham Young University"
collection: "Script Tutorial - Spanish Colonial Baptisms"
digital_url: "https://script.byu.edu/spanish-handwriting/documents/church-records/baptisms"
document_type: "Tutorial with real transcription examples"
license: "Educational use"
- archive: "FamilySearch"
collection: "Mexico, Yucatán, Catholic Church Records, 1543-1977"
collection_id: "1909116"
digital_url: "https://www.familysearch.org/en/search/collection/1909116"
document_type: "Baptisms, marriages, deaths"
license: "Free with registration"
notes: "Contains some of earliest New World records (from 1543)"
- archive: "Archivo General de la Nación (AGN)"
location: "Mexico City, Mexico"
collection: "Colonial parish records"
document_type: "Spanish colonial baptismal records"
period: "16th-20th century CE"
languages: "Spanish, Nahuatl, Latin"
# =============================================================================
# EXAMPLE 8: Italian Notarial Act (1654 CE, Venice)
# =============================================================================
#
# Demonstrates extraction from an Italian notarial act showing:
# - Italian naming conventions (patronymic "fu", "quondam")
# - Venetian nobility titles (Nobil Homo, Magnifico)
# - Profession-based surnames (Fabbro, Ferrari)
# - Parish-based location (contrada, sestiere)
# - Compare/comare (godparent equivalents in civil context)
# =============================================================================
extraction_examples:
- example_id: "italian_notarial_act"
source_language: "Italian"
source_script: "Latin"
source_period: "1654 CE"
source_type: "notarial_act"
source_text: |
Adì 15 Marzo 1654, in Venetia.
Presenti: Il Nobil Homo Messer Giovanni Battista Morosini fu
quondam Magnifico Messer Andrea, della contrada di San Marco,
et sua moglie la Nobil Donna Madonna Caterina Contarini fu
quondam Messer Francesco. Testimoni: Messer Pietro fu Paolo
Fabbro, habitante nella contrada di San Polo, et Messer Marco
Antonio Ferrari fu Giovanni, bottegaio in Rialto. Rogato io
Notaro Antonio Zen fu quondam Messer Giacomo, Notaro publico
di Venetia.
expected_output:
pico_observation:
observation_id: "notarial_venice_1654-03-15_morosini"
source_type: "notarial_act"
source_reference: "Notarial act, Venice, March 15, 1654"
persons:
- person_index: 0
pnv_name:
literalName: "Il Nobil Homo Messer Giovanni Battista Morosini"
givenName: "Giovanni Battista"
baseSurname: "Morosini"
honorificPrefix: "Il Nobil Homo Messer"
roles:
- role_title: "principal party"
role_in_source: "party to act"
biographical:
social_status: "Venetian nobility"
patronymic: "fu quondam Magnifico Messer Andrea"
father_status: "deceased (quondam)"
family_relationships:
father:
- person_index: 1
target_name: "Magnifico Messer Andrea Morosini"
spouse:
- person_index: 2
target_name: "Nobil Donna Madonna Caterina Contarini"
context: "Principal party, Venetian noble"
- person_index: 1
pnv_name:
literalName: "Magnifico Messer Andrea Morosini"
givenName: "Andrea"
baseSurname: "Morosini"
honorificPrefix: "Magnifico Messer"
roles: []
biographical:
social_status: "Venetian nobility"
deceased: true
deceased_marker: "quondam"
family_relationships:
child:
- person_index: 0
target_name: "Giovanni Battista Morosini"
context: "Father of Giovanni Battista, deceased"
- person_index: 2
pnv_name:
literalName: "Nobil Donna Madonna Caterina Contarini"
givenName: "Caterina"
baseSurname: "Contarini"
honorificPrefix: "Nobil Donna Madonna"
roles:
- role_title: "moglie"
role_in_source: "wife"
biographical:
social_status: "Venetian nobility"
patronymic: "fu quondam Messer Francesco"
family_relationships:
father:
- person_index: 3
target_name: "Messer Francesco Contarini"
spouse:
- person_index: 0
target_name: "Giovanni Battista Morosini"
context: "Wife of Giovanni Battista"
- person_index: 3
pnv_name:
literalName: "Messer Francesco Contarini"
givenName: "Francesco"
baseSurname: "Contarini"
honorificPrefix: "Messer"
roles: []
biographical:
deceased: true
deceased_marker: "quondam"
family_relationships:
child:
- person_index: 2
target_name: "Caterina Contarini"
context: "Father of Caterina, deceased"
- person_index: 4
pnv_name:
literalName: "Messer Pietro fu Paolo Fabbro"
givenName: "Pietro"
baseSurname: "Fabbro"
honorificPrefix: "Messer"
roles:
- role_title: "testimone"
role_in_source: "witness"
biographical:
patronymic: "fu Paolo"
residence: "contrada di San Polo"
family_relationships:
father:
- person_index: 5
target_name: "Paolo Fabbro"
context: "First witness"
- person_index: 5
pnv_name:
literalName: "Paolo Fabbro"
givenName: "Paolo"
baseSurname: "Fabbro"
roles: []
biographical:
deceased: true
family_relationships:
child:
- person_index: 4
target_name: "Pietro Fabbro"
context: "Father of witness Pietro, deceased"
- person_index: 6
pnv_name:
literalName: "Messer Marco Antonio Ferrari fu Giovanni"
givenName: "Marco Antonio"
baseSurname: "Ferrari"
honorificPrefix: "Messer"
roles:
- role_title: "testimone"
role_in_source: "witness"
biographical:
patronymic: "fu Giovanni"
occupation: "bottegaio"
workplace: "Rialto"
family_relationships:
father:
- person_index: 7
target_name: "Giovanni Ferrari"
context: "Second witness, shopkeeper"
- person_index: 7
pnv_name:
literalName: "Giovanni Ferrari"
givenName: "Giovanni"
baseSurname: "Ferrari"
roles: []
biographical:
deceased: true
family_relationships:
child:
- person_index: 6
target_name: "Marco Antonio Ferrari"
context: "Father of witness Marco Antonio, deceased"
- person_index: 8
pnv_name:
literalName: "Notaro Antonio Zen fu quondam Messer Giacomo"
givenName: "Antonio"
baseSurname: "Zen"
honorificPrefix: "Notaro"
roles:
- role_title: "notaro"
role_in_source: "notary"
biographical:
patronymic: "fu quondam Messer Giacomo"
occupation: "Notaro publico di Venetia"
family_relationships:
father:
- person_index: 9
target_name: "Messer Giacomo Zen"
context: "Notary who drafted the act"
- person_index: 9
pnv_name:
literalName: "Messer Giacomo Zen"
givenName: "Giacomo"
baseSurname: "Zen"
honorificPrefix: "Messer"
roles: []
biographical:
deceased: true
deceased_marker: "quondam"
family_relationships:
child:
- person_index: 8
target_name: "Antonio Zen"
context: "Father of notary, deceased"
temporal_references:
- expression: "Adì 15 Marzo 1654"
normalized: "1654-03-15"
calendar: "Gregorian"
type: "DATE"
locations_mentioned:
- name: "Venetia"
name_modern: "Venice"
type: "city"
- name: "contrada di San Marco"
type: "parish/district"
parent: "Venice"
- name: "contrada di San Polo"
type: "parish/district"
parent: "Venice"
- name: "Rialto"
type: "district/market"
parent: "Venice"
italian_naming_notes: |
Italian notarial naming conventions demonstrated:
- "fu" / "quondam": indicates deceased father (Latin survival)
- "Magnifico Messer": high honorific for nobility
- "Nobil Homo" / "Nobil Donna": Venetian noble titles
- "Madonna": honorific for married noble women
- Profession surnames: Fabbro (smith), Ferrari (ironworker)
- "habitante in/nella": residence indicator
- "bottegaio": shopkeeper
- Venetian patronato system reflected in naming
- Contrada: parish neighborhood system of Venice
provenance:
data_status: "SYNTHETIC_EXAMPLE"
notes: |
This example uses synthetic data based on authentic 17th-century
Venetian notarial document formulae for demonstration purposes.
Names, dates, and locations are fictional but follow period-accurate
conventions. For real examples, see PROVENANCE_SOURCES.md.
related_real_sources:
- archive: "Italian Ministry of Culture"
project: "Antenati (Ancestors)"
digital_url: "https://antenati.cultura.gov.it/"
venice_url: "https://antenati.cultura.gov.it/archivio/state-archives-of-venezia/?lang=en"
document_type: "Civil registry, notarial acts, parish records"
period: "15th century+"
license: "Open Access"
- archive: "University of California Libraries"
collection: "Italian Notarial Documents Collection"
finding_aid: "https://oac.cdlib.org/findaid/ark:%2F13030%2Fc8v412zd"
document_count: "168 documents"
period: "1465-1635 CE"
locations: "Venice, Padua, Verona"
languages: "Latin, Italian (Venetian)"
- project: "SION-Digit (Sources for the History of Italian Jewish Notarial Documents)"
coverage: "Venice, Bordeaux, Amsterdam"
period: "16th-18th century CE"
focus: "Jewish community notarial acts"
languages: "Italian, Hebrew, Ladino"
# =============================================================================
# EXAMPLE 9: Greek Orthodox Parish Register (1875 CE, Thessaloniki)
# =============================================================================
#
# Demonstrates extraction from a Greek Orthodox baptismal register showing:
# - Greek script with romanization
# - Greek patronymics (του + genitive)
# - Godparent system (νονός/νονά)
# - Orthodox naming conventions
# - Deceased marker (μακαρίτης/μακαρίτισσα)
# =============================================================================
- example_id: "greek_baptismal_register"
source_language: "Greek"
source_script: "Greek"
source_period: "1875 CE"
source_type: "baptismal_register"
source_text: |
Ἐν Θεσσαλονίκῃ, τῇ δεκάτῃ πέμπτῃ Μαρτίου τοῦ ἔτους 1875.
Ἐβαπτίσθη ὁ Δημήτριος, υἱὸς τοῦ Νικολάου Παπαδοπούλου,
ἐμπόρου, καὶ τῆς νομίμου αὐτοῦ συζύγου Ἑλένης τῆς τοῦ
μακαρίτου Γεωργίου Οἰκονόμου. Νονὸς ὁ Κωνσταντῖνος
Καρατζᾶς τοῦ Ἰωάννου, ἰατρός. Ἱερεύς: ὁ Πρωτοπρεσβύτερος
Ἀθανάσιος Χρυσοστόμου.
expected_output:
pico_observation:
observation_id: "baptism_thessaloniki_1875-03-15_papadopoulos"
source_type: "baptismal_register"
source_reference: "Greek Orthodox baptismal register, Thessaloniki, March 15, 1875"
persons:
- person_index: 0
pnv_name:
literalName: "Δημήτριος"
literalName_romanized: "Dimitrios"
givenName: "Δημήτριος"
givenName_romanized: "Dimitrios"
roles:
- role_title: "βαπτισθείς"
role_in_source: "baptized infant"
biographical:
sex: "male"
religion: "Greek Orthodox"
family_relationships:
father:
- person_index: 1
target_name: "Νικόλαος Παπαδόπουλος"
mother:
- person_index: 2
target_name: "Ἑλένη"
godfather:
- person_index: 4
target_name: "Κωνσταντῖνος Καρατζᾶς"
context: "Baptized infant"
- person_index: 1
pnv_name:
literalName: "Νικόλαος Παπαδόπουλος"
literalName_romanized: "Nikolaos Papadopoulos"
givenName: "Νικόλαος"
givenName_romanized: "Nikolaos"
baseSurname: "Παπαδόπουλος"
baseSurname_romanized: "Papadopoulos"
roles:
- role_title: "πατήρ"
role_in_source: "father"
biographical:
occupation: "ἔμπορος (merchant)"
family_relationships:
child:
- person_index: 0
target_name: "Δημήτριος"
spouse:
- person_index: 2
target_name: "Ἑλένη"
context: "Father of the baptized, merchant"
- person_index: 2
pnv_name:
literalName: "Ἑλένη τῆς τοῦ μακαρίτου Γεωργίου Οἰκονόμου"
literalName_romanized: "Eleni tis tou makaritou Georgiou Oikonomou"
givenName: "Ἑλένη"
givenName_romanized: "Eleni"
roles:
- role_title: "μήτηρ"
role_in_source: "mother"
biographical:
marital_status: "νομίμη σύζυγος (lawful wife)"
patronymic: "τῆς τοῦ μακαρίτου Γεωργίου Οἰκονόμου"
family_relationships:
father:
- person_index: 3
target_name: "Γεώργιος Οἰκονόμος"
child:
- person_index: 0
target_name: "Δημήτριος"
spouse:
- person_index: 1
target_name: "Νικόλαος Παπαδόπουλος"
context: "Mother of the baptized"
- person_index: 3
pnv_name:
literalName: "μακαρίτης Γεώργιος Οἰκονόμος"
literalName_romanized: "makaritis Georgios Oikonomos"
givenName: "Γεώργιος"
givenName_romanized: "Georgios"
baseSurname: "Οἰκονόμος"
baseSurname_romanized: "Oikonomos"
roles: []
biographical:
deceased: true
deceased_marker: "μακαρίτης"
family_relationships:
child:
- person_index: 2
target_name: "Ἑλένη"
context: "Maternal grandfather, deceased"
- person_index: 4
pnv_name:
literalName: "Κωνσταντῖνος Καρατζᾶς τοῦ Ἰωάννου"
literalName_romanized: "Konstantinos Karatzas tou Ioannou"
givenName: "Κωνσταντῖνος"
givenName_romanized: "Konstantinos"
baseSurname: "Καρατζᾶς"
baseSurname_romanized: "Karatzas"
roles:
- role_title: "νονός"
role_in_source: "godfather"
biographical:
occupation: "ἰατρός (physician)"
patronymic: "τοῦ Ἰωάννου"
family_relationships:
father:
- person_index: 5
target_name: "Ἰωάννης Καρατζᾶς"
godchild:
- person_index: 0
target_name: "Δημήτριος"
context: "Godfather, physician"
- person_index: 5
pnv_name:
literalName: "Ἰωάννης Καρατζᾶς"
literalName_romanized: "Ioannis Karatzas"
givenName: "Ἰωάννης"
givenName_romanized: "Ioannis"
baseSurname: "Καρατζᾶς"
baseSurname_romanized: "Karatzas"
roles: []
biographical: {}
family_relationships:
child:
- person_index: 4
target_name: "Κωνσταντῖνος Καρατζᾶς"
context: "Father of godfather"
- person_index: 6
pnv_name:
literalName: "Πρωτοπρεσβύτερος Ἀθανάσιος Χρυσοστόμου"
literalName_romanized: "Protopresbyteros Athanasios Chrysostomou"
givenName: "Ἀθανάσιος"
givenName_romanized: "Athanasios"
patronymic: "Χρυσοστόμου"
patronymic_romanized: "Chrysostomou"
honorificPrefix: "Πρωτοπρεσβύτερος"
roles:
- role_title: "ἱερεύς"
role_in_source: "priest"
biographical:
ecclesiastical_rank: "Πρωτοπρεσβύτερος (Protopresbyter/Archpriest)"
family_relationships: {}
context: "Officiating priest"
temporal_references:
- expression: "τῇ δεκάτῃ πέμπτῃ Μαρτίου τοῦ ἔτους 1875"
expression_romanized: "ti dekati pempti Martiou tou etous 1875"
normalized: "1875-03-15"
calendar: "Julian"
type: "DATE"
note: "Greek Orthodox used Julian calendar; Gregorian equivalent: March 27, 1875"
locations_mentioned:
- name: "Θεσσαλονίκη"
name_romanized: "Thessaloniki"
type: "city"
modern_country: "Greece"
historical_context: "Ottoman Empire (Selanik vilayet)"
greek_naming_notes: |
Greek Orthodox naming conventions demonstrated:
- "τοῦ" + genitive: patronymic marker ("son/daughter of")
- "μακαρίτης/μακαρίτισσα": deceased marker ("the late")
- "νομίμη σύζυγος": lawful wife
- "νονός/νονά": godfather/godmother
- Surnames from occupations: Παπαδόπουλος (priest's son), Οἰκονόμος (steward)
- Ecclesiastical titles: Πρωτοπρεσβύτερος (Archpriest)
- Polytonic Greek orthography common in 19th century
- Julian calendar used by Greek Orthodox Church
provenance:
data_status: "SYNTHETIC_EXAMPLE"
notes: |
This example uses synthetic data based on authentic Greek Orthodox
baptismal register formulae for demonstration purposes. Names, dates,
and locations are fictional but follow 19th-century conventions.
For real examples, see PROVENANCE_SOURCES.md.
related_real_sources:
- archive: "FamilySearch"
wiki_url: "https://www.familysearch.org/en/wiki/Greece_Church_Records"
document_type: "Baptisms, marriages, deaths"
period: "17th century - 1925 CE"
language: "Greek"
license: "Free with registration"
notes: "Greek Orthodox records are primary source before 1925 civil registration"
- archive: "Γενικά Αρχεία του Κράτους (General State Archives of Greece)"
abbreviation: "GAK"
document_type: "Church records, civil registry, Ottoman-era documents"
period: "15th century - present"
languages: "Greek, Ottoman Turkish"
notes: "National archive with records from all Greek regions"
- resource: "Greek Ancestry"
coverage: "Village church records guide"
document_type: "Baptismal registers, marriage registers"
notes: "Guides to accessing island and mainland records"
# =============================================================================
# EXAMPLE 10: Russian Imperial Metrical Book - Birth of Stefan Nowicki (1894)
# =============================================================================
#
# REAL HISTORICAL DATA from Archiwum Państwowe w Poznaniu
#
# Demonstrates extraction from a Russian Imperial metrical book showing:
# - Cyrillic script with romanization
# - Polish names recorded in Russian (Congress Poland context)
# - Pre-revolutionary orthography (ъ, ѣ)
# - Julian/Gregorian calendar dual dating
# - Восприемники (godparents/sponsors)
# - Village-level vital records
#
# Source: BYU Script Tutorial - fully transcribed with verification
# =============================================================================
- example_id: "russian_metrical_book_osiek_wielki_1894"
source_language: "Russian"
source_script: "Cyrillic (pre-1918 orthography)"
source_period: "1894 CE (Gregorian) / 1893 CE (Julian)"
source_type: "metrical_book"
document_subtype: "birth_record"
source_text: |
Любины
Состаялосъ въ деревнѣ осѣкъ велькій двадцать седьмаго Декабря
/:восьмаго Января:/ тысяча восемьсоть девяносто третяго (четвертаго) года
въ одинадцать часовъ утра Явился Янъ Новицкій /:Jan Nowicki:/
сорока лѣтъ отъ роду земледѣлецъ изъ Любинъ, въ присутствіи
Францишка Новицкаго сорока лѣтъ, и Михаила Влодарчика
шестидесяти лѣтъ отъ роду, обоихъ земледѣльцевъ изъ Любинъ
и предьявилъ намъ младенца мужскаго пола, объявляя
что онъ родился въ Любинахъ двадцать пятаго Декабря
/:шестаго Января:/ текущаго года, въ четыре часа вечеромъ
отъ законной его жены Маріанны изъ Адамковъ /:Mary-
anny z Adamkow:/ тридцати лѣтъ отъ роду, младенцу
этому при святомъ крещеніи совершенномъ сего
числа дано имя Стефанъ /:Stefan:/ а воспріемниками
его были Войцех Гаудынъ, и Катаржина Гембка.
Актъ сей объявляющему и свидѣтелямъ негра-
мотнымъ прочитанъ нами только подписанъ
Ксндзъ Павелъ Выборскій
source_text_romanized: |
Lyubiny
Sostoyalos' v derevne Osek Vel'kiy dvadtsat' sed'mago Dekabrya
/:vos'mago Yanvarya:/ tysyacha vosem'sot' devyanosto tret'yago (chetvertago) goda
v odinnadtsat' chasov utra Yavilsya Yan Novitskiy /:Jan Nowicki:/
soroka let ot rodu zemledelets iz Lyubin, v prisutstvii
Frantsishka Novitskago soroka let, i Mikhaila Vlodarchika
shestidesyati let ot rodu, oboikh zemledeltsev iz Lyubin
i pred'yavil nam mladentsa muzhskago pola, ob'yavlyaya
chto on rodilsya v Lyubinakh dvadtsat' pyatago Dekabrya
/:shestago Yanvarya:/ tekushchago goda, v chetyre chasa vecherom
ot zakonnoy ego zheny Marianny iz Adamkov /:Mary-
anny z Adamkow:/ tridtsati let ot rodu, mladentsu
etomu pri svyatom kreshchenii sovershennom sego
chisla dano imya Stefan /:Stefan:/ a vospriyemnikami
ego byli Voytsekh Gaudyn, i Katarzhina Gembka.
Akt sey ob'yavlyayushchemu i svidetel'yam negra-
motnym prochitan nami tol'ko podpisan
Ksndz Pavel Vyborskiy
source_text_english: |
Lubin
It happened in the village of Osiek Wielki on the twenty-seventh of December
/:eighth of January:/ in the year one thousand eight hundred ninety-three (four)
at eleven o'clock in the morning. Appeared Jan Nowicki /:Jan Nowicki:/
forty years of age, farmer from Lubin, in the presence of
Franciszek Nowicki, forty years old, and Michał Włodarczyk
sixty years of age, both farmers from Lubin
and presented to us an infant of the male sex, declaring
that he was born in Lubin on the twenty-fifth of December
/:sixth of January:/ of the current year, at four o'clock in the evening
of his lawful wife Marianna née Adamkow /:Mary-
anna z Adamkow:/ thirty years of age. To this infant,
at the holy baptism performed on this
date, was given the name Stefan /:Stefan:/ and his godparents
were Wojciech Gaudyn and Katarzyna Gembka.
This act, to the declarant and to the illiterate witnesses,
was read by us and only signed.
Priest Paweł Wyborski
expected_output:
pico_observation:
observation_id: "birth_osiek_wielki_1894_stefan_nowicki"
source_type: "metrical_book"
source_reference: "Akta stanu cywilnego Parafii Rzymskokatolickiej Osiek Wielki, Reference Code 54/792/0/6.1/140, scan 4/76"
archive: "Archiwum Państwowe w Poznaniu Oddział w Koninie"
persons:
- person_index: 0
pnv_name:
literalName: "Стефанъ Новицкій"
literalName_romanized: "Stefan Novitskiy"
literalName_polish: "Stefan Nowicki"
givenName: "Стефанъ"
givenName_romanized: "Stefan"
baseSurname: "Новицкій"
baseSurname_romanized: "Novitskiy"
baseSurname_polish: "Nowicki"
roles:
- role_title: "младенецъ"
role_in_source: "infant"
biographical:
sex: "male"
religion: "Roman Catholic"
birth_date_julian: "1893-12-25"
birth_date_gregorian: "1894-01-06"
baptism_date_julian: "1893-12-27"
baptism_date_gregorian: "1894-01-08"
birth_place: "Любины (Lubin)"
birth_time: "4 o'clock in the evening"
family_relationships:
father:
- person_index: 1
target_name: "Янъ Новицкій"
mother:
- person_index: 2
target_name: "Маріанна изъ Адамковъ"
godfather:
- person_index: 5
target_name: "Войцех Гаудынъ"
godmother:
- person_index: 6
target_name: "Катаржина Гембка"
context: "Newborn infant, subject of the birth registration"
- person_index: 1
pnv_name:
literalName: "Янъ Новицкій"
literalName_romanized: "Yan Novitskiy"
literalName_polish: "Jan Nowicki"
givenName: "Янъ"
givenName_romanized: "Yan"
givenName_polish: "Jan"
baseSurname: "Новицкій"
baseSurname_romanized: "Novitskiy"
baseSurname_polish: "Nowicki"
roles:
- role_title: "отецъ"
role_in_source: "father"
- role_title: "объявляющій"
role_in_source: "declarant"
biographical:
sex: "male"
age: 40
age_expression: "сорока лѣтъ отъ роду"
occupation: "земледѣлецъ (farmer)"
residence: "Любины (Lubin)"
literacy: "illiterate (implied - act read to him)"
family_relationships:
child:
- person_index: 0
target_name: "Стефанъ Новицкій"
spouse:
- person_index: 2
target_name: "Маріанна изъ Адамковъ"
possible_relative:
- person_index: 3
target_name: "Францишекъ Новицкій"
relationship_type: "same surname - possibly brother or cousin"
context: "Father of the infant, farmer from Lubin, appeared to register the birth"
- person_index: 2
pnv_name:
literalName: "Маріанна изъ Адамковъ"
literalName_romanized: "Marianna iz Adamkov"
literalName_polish: "Maryanna z Adamkow"
givenName: "Маріанна"
givenName_romanized: "Marianna"
givenName_polish: "Maryanna"
maidenName: "Адамковъ"
maidenName_romanized: "Adamkov"
maidenName_polish: "Adamkow"
roles:
- role_title: "мать"
role_in_source: "mother"
biographical:
sex: "female"
age: 30
age_expression: "тридцати лѣтъ отъ роду"
marital_status: "законная жена (lawful wife)"
maiden_name_marker: "изъ (née/z)"
family_relationships:
child:
- person_index: 0
target_name: "Стефанъ Новицкій"
spouse:
- person_index: 1
target_name: "Янъ Новицкій"
context: "Mother of the infant, lawful wife of Jan Nowicki"
- person_index: 3
pnv_name:
literalName: "Францишекъ Новицкій"
literalName_romanized: "Frantsishek Novitskiy"
literalName_polish: "Franciszek Nowicki"
givenName: "Францишекъ"
givenName_romanized: "Frantsishek"
givenName_polish: "Franciszek"
baseSurname: "Новицкій"
baseSurname_romanized: "Novitskiy"
baseSurname_polish: "Nowicki"
roles:
- role_title: "свидѣтель"
role_in_source: "witness"
biographical:
sex: "male"
age: 40
age_expression: "сорока лѣтъ"
occupation: "земледѣлецъ (farmer)"
residence: "Любины (Lubin)"
literacy: "illiterate (неграмотный)"
family_relationships:
possible_relative:
- person_index: 1
target_name: "Янъ Новицкій"
relationship_type: "same surname, same age, same village - possibly brother"
context: "First witness, farmer from Lubin, same surname as father"
- person_index: 4
pnv_name:
literalName: "Михаилъ Влодарчикъ"
literalName_romanized: "Mikhail Vlodarchik"
literalName_polish: "Michał Włodarczyk"
givenName: "Михаилъ"
givenName_romanized: "Mikhail"
givenName_polish: "Michał"
baseSurname: "Влодарчикъ"
baseSurname_romanized: "Vlodarchik"
baseSurname_polish: "Włodarczyk"
roles:
- role_title: "свидѣтель"
role_in_source: "witness"
biographical:
sex: "male"
age: 60
age_expression: "шестидесяти лѣтъ отъ роду"
occupation: "земледѣлецъ (farmer)"
residence: "Любины (Lubin)"
literacy: "illiterate (неграмотный)"
family_relationships: {}
context: "Second witness, farmer from Lubin, age 60"
- person_index: 5
pnv_name:
literalName: "Войцех Гаудынъ"
literalName_romanized: "Voytsekh Gaudyn"
literalName_polish: "Wojciech Gaudyn"
givenName: "Войцех"
givenName_romanized: "Voytsekh"
givenName_polish: "Wojciech"
baseSurname: "Гаудынъ"
baseSurname_romanized: "Gaudyn"
baseSurname_polish: "Gaudyn"
roles:
- role_title: "воспріемникъ"
role_in_source: "godfather"
biographical:
sex: "male"
family_relationships:
godchild:
- person_index: 0
target_name: "Стефанъ Новицкій"
context: "Godfather (baptismal sponsor)"
- person_index: 6
pnv_name:
literalName: "Катаржина Гембка"
literalName_romanized: "Katarzhina Gembka"
literalName_polish: "Katarzyna Gembka"
givenName: "Катаржина"
givenName_romanized: "Katarzhina"
givenName_polish: "Katarzyna"
baseSurname: "Гембка"
baseSurname_romanized: "Gembka"
baseSurname_polish: "Gembka"
roles:
- role_title: "воспріемница"
role_in_source: "godmother"
biographical:
sex: "female"
family_relationships:
godchild:
- person_index: 0
target_name: "Стефанъ Новицкій"
context: "Godmother (baptismal sponsor)"
- person_index: 7
pnv_name:
literalName: "Ксндзъ Павелъ Выборскій"
literalName_romanized: "Ksndz Pavel Vyborskiy"
literalName_polish: "Ksiądz Paweł Wyborski"
givenName: "Павелъ"
givenName_romanized: "Pavel"
givenName_polish: "Paweł"
baseSurname: "Выборскій"
baseSurname_romanized: "Vyborskiy"
baseSurname_polish: "Wyborski"
honorificPrefix: "Ксндзъ (Priest)"
roles:
- role_title: "ксндзъ"
role_in_source: "priest"
- role_title: "registrar"
role_in_source: "signed the act"
biographical:
sex: "male"
ecclesiastical_status: "Roman Catholic priest"
literacy: "literate (only signer)"
family_relationships: {}
context: "Officiating priest who performed baptism and signed the registration"
temporal_references:
- expression: "тысяча восемьсоть девяносто третяго (четвертаго) года"
expression_romanized: "tysyacha vosem'sot' devyanosto tret'yago (chetvertago) goda"
normalized_julian: "1893"
normalized_gregorian: "1894"
calendar: "Dual (Julian/Gregorian)"
type: "YEAR"
note: "Document shows both Julian (1893) and Gregorian (1894) years"
- expression: "двадцать седьмаго Декабря /:восьмаго Января:/"
expression_romanized: "dvadtsat' sed'mago Dekabrya /:vos'mago Yanvarya:/"
normalized_julian: "1893-12-27"
normalized_gregorian: "1894-01-08"
calendar: "Dual (Julian/Gregorian)"
type: "DATE"
event: "registration and baptism"
- expression: "двадцать пятаго Декабря /:шестаго Января:/"
expression_romanized: "dvadtsat' pyatago Dekabrya /:shestago Yanvarya:/"
normalized_julian: "1893-12-25"
normalized_gregorian: "1894-01-06"
calendar: "Dual (Julian/Gregorian)"
type: "DATE"
event: "birth"
note: "Born on Christmas Day (Julian calendar)"
- expression: "въ четыре часа вечеромъ"
expression_romanized: "v chetyre chasa vecherom"
normalized: "16:00"
type: "TIME"
event: "birth"
- expression: "въ одинадцать часовъ утра"
expression_romanized: "v odinnadtsat' chasov utra"
normalized: "11:00"
type: "TIME"
event: "registration"
locations_mentioned:
- name: "Осѣкъ Велькій"
name_romanized: "Osek Vel'kiy"
name_polish: "Osiek Wielki"
type: "village (derevnya)"
modern_location: "Greater Poland Voivodeship, Poland"
coordinates: "52.2461, 18.6207"
geonames_url: "https://www.google.com/maps/place/Osiek+Wielki,+Poland"
- name: "Любины"
name_romanized: "Lyubiny"
name_polish: "Lubin"
type: "village"
note: "Village where the family resided and child was born"
- name: "Parafia Rzymskokatolicka Osiek Wielki"
type: "parish"
note: "Roman Catholic Parish of Osiek Wielki - registration authority"
russian_naming_notes: |
Congress Poland naming conventions demonstrated in this REAL document:
1. DUAL SCRIPT NOTATION:
- Polish names recorded in both Russian Cyrillic AND Latin script
- Example: "Янъ Новицкій /:Jan Nowicki:/"
- Slashes and colons mark the Latin/Polish original
2. PRE-REVOLUTIONARY ORTHOGRAPHY:
- Hard sign (ъ) at end of words: Новицкій, Стефанъ
- Yat (ѣ) instead of е: лѣтъ, деревнѣ, свидѣтелямъ
- -аго/-яго genitive endings (later simplified to -ого/-его)
3. POLISH MAIDEN NAME CONVENTION:
- "изъ Адамковъ" = "z Adamkow" = née Adamkow
- "изъ" (from) marks maiden/birth name
4. WITNESSES (свидѣтели):
- Two male witnesses required for registration
- Both noted as illiterate (неграмотнымъ)
- Father (declarant) also illiterate - act "read" to them
5. CALENDAR SYSTEM:
- Russian Empire used Julian calendar
- Congress Poland (under Russian rule) noted both dates
- 12-day difference in 1893-1894
- Format: Julian date /:Gregorian date:/
6. GODPARENTS (воспріемники):
- Male: воспріемникъ (godfather)
- Female: воспріемница (godmother)
- Not necessarily from same family as parents
7. SOCIAL/OCCUPATIONAL TERMS:
- земледѣлецъ = farmer/agriculturalist
- ксндзъ = ksiądz (Polish priest title, from German "Knez")
provenance:
data_status: "REAL_HISTORICAL_DATA"
archive: "Archiwum Państwowe w Poznaniu Oddział w Koninie"
archive_english: "State Archive in Poznań, Konin Branch"
collection: "Akta stanu cywilnego Parafii Rzymskokatolickiej Osiek Wielki (pow. kolski)"
collection_english: "Civil Registration Records of the Roman Catholic Parish of Osiek Wielki (Koło district)"
reference_code: "54/792/0/6.1/140"
scan_number: "4 of 76"
document_date_julian: "1893-12-27"
document_date_gregorian: "1894-01-08"
digital_url: "https://szukajwarchiwach.gov.pl"
tutorial_url: "https://script.byu.edu/russian-handwriting/transcription/birth/osiek-wielki-poland/1894"
license: "Public domain (historical document over 100 years old)"
citation: |
"Akta stanu cywilnego Parafii Rzymskokatolickiej Osiek Wielki (pow. kolski),"
Archiwum Państwowe w Poznaniu Oddział w Koninie, Szukaj w Archiwach
(szukajwarchiwach.gov.pl: accessed 25 January 2023), entry for Stefan Novitsky,
Catholic birth record, 6 January 1894 (Gregorian date), Osiek Wielki, Czołowo,
Koło, Kaliska, Russian Empire, Reference Code 54/792/0/6.1/140, scan no. 4 of 76.
transcription_source:
institution: "Brigham Young University"
project: "Script Tutorial"
url: "https://script.byu.edu/russian-handwriting/transcription/birth/osiek-wielki-poland/1894"
access_date: "2025-01-13"
notes: "Complete line-by-line transcription with Russian original, romanization, and English translation"
verification_notes: |
This is a REAL historical document with verified transcription:
- Original held at Polish State Archives (Archiwum Państwowe)
- Transcribed and verified by BYU Script Tutorial paleographers
- All 8 persons are real historical individuals
- Names provided in both Russian Cyrillic and Polish Latin script in original
- Stefan Nowicki born 6 January 1894 (Gregorian) in Lubin village
- Family: farmers (земледѣльцы) in Greater Poland region
- Document context: Congress Poland under Russian Imperial rule
# ---------------------------------------------------------------------------
# Example 11: Ottoman Turkish Sijill (Court Record)
# ---------------------------------------------------------------------------
# Period: 1258 AH (1842 CE)
# Source: Şer'iyye Sicili (Sharia Court Register), Demirciköy
# Language: Ottoman Turkish (Arabic script)
# Key features:
# - Honorific titles: Ağa, Efendi, Çelebi, Hatun
# - Patronymics: bin (son of), bint (daughter of)
# - Deceased markers: merhum/merhume (المرحوم/المرحومة)
# - Hijri calendar
# - Mixed Arabic-Turkish vocabulary
# - Court terminology: mahkeme, şahid, mübayi', ba'i
# ---------------------------------------------------------------------------
- example_id: "ottoman_sijill"
source_language: "Ottoman Turkish"
source_script: "Arabic"
source_period: "1258 AH (1842 CE)"
source_type: "sijill"
document_subtype: "property_sale"
archive_context: "Şer'iyye Sicilleri (Islamic Court Registers)"
source_text: |
بسم الله الرحمن الرحيم
مجلس شرع شريفده محمد آغا بن عبد الله مرحوم قصبه دميرجی‌کوی
ساکنلرندن محمد بن احمد افندی و زوجه‌سی فاطمه خاتون بنت علی‌اوغلو
حاضر اولوب محمد آغا طرفندن یکری بش غروش بدل معلوم ایله صاتیلدی
شهود الحال: حسن افندی بن عمر، ابراهیم چلبی بن مصطفی
فی اوائل شهر رجب سنة ١٢٥٨
source_text_romanized: |
Bismillahirrahmanirrahim
Meclis-i şer'-i şerifde Mehmed Ağa bin Abdullah merhum kasaba Demirciköy
sakinlerinden Mehmed bin Ahmed Efendi ve zevcesi Fatma Hatun bint Ali-oğlu
hazır olub Mehmed Ağa tarafından yirmi beş guruş bedel-i ma'lum ile satıldı
Şuhud al-hal: Hasan Efendi bin Ömer, İbrahim Çelebi bin Mustafa
Fi evail-i şehr-i Receb sene 1258
source_text_english: |
In the name of God, the Merciful, the Compassionate
In the noble Sharia court, Mehmed Ağa son of the late Abdullah, [sold to]
residents of the town of Demirciköy, Mehmed son of Ahmed Efendi and his
wife Fatma Hatun daughter of Ali-oğlu, who were present, for the known
price of twenty-five guruş, [the property] was sold by Mehmed Ağa.
Witnesses present: Hasan Efendi son of Ömer, İbrahim Çelebi son of Mustafa
In early Receb of the year 1258 [Hijri]
expected_output:
pico_observation:
observation_id: "sijill_demircikoy_1258ah_sale"
source_type: "sijill"
source_reference: "Şer'iyye Sicili, Demirciköy, Receb 1258 AH"
persons:
- person_index: 0
pnv_name:
literalName: "محمد آغا بن عبد الله"
literalName_romanized: "Mehmed Ağa bin Abdullah"
givenName: "محمد"
givenName_romanized: "Mehmed"
title: "آغا (Ağa)"
patronymic: "بن عبد الله"
patronymic_romanized: "bin Abdullah"
roles:
- role_title: "با‌ئع (ba'i)"
role_in_source: "seller"
biographical:
sex: "male"
status: "deceased"
deceased_marker: "مرحوم (merhum)"
social_rank: "Ağa (military/landowning class)"
family_relationships:
father:
- name: "عبد الله (Abdullah)"
status: "deceased"
context: "Seller (deceased), Ağa = military/landowning"
- person_index: 1
pnv_name:
literalName: "محمد بن احمد افندی"
literalName_romanized: "Mehmed bin Ahmed Efendi"
givenName: "محمد"
givenName_romanized: "Mehmed"
title: "افندی (Efendi)"
patronymic: "بن احمد"
patronymic_romanized: "bin Ahmed"
roles:
- role_title: "مشتری (müşteri)"
role_in_source: "buyer"
biographical:
sex: "male"
residence: "Demirciköy"
social_rank: "Efendi (educated class)"
family_relationships:
father:
- name: "احمد (Ahmed)"
spouse:
- person_index: 2
target_name: "Fatma Hatun"
context: "Buyer, Efendi = literate/administrative"
- person_index: 2
pnv_name:
literalName: "فاطمه خاتون بنت علی‌اوغلو"
literalName_romanized: "Fatma Hatun bint Ali-oğlu"
givenName: "فاطمه"
givenName_romanized: "Fatma"
title: "خاتون (Hatun)"
patronymic: "بنت علی‌اوغلو"
patronymic_romanized: "bint Ali-oğlu"
roles:
- role_title: "مشتری (müşteri)"
role_in_source: "buyer"
- role_title: "زوجه (zevce)"
role_in_source: "wife"
biographical:
sex: "female"
marital_status: "married"
social_rank: "Hatun (respectable woman)"
family_relationships:
father:
- name: "علی‌اوغلو (Ali-oğlu)"
spouse:
- person_index: 1
target_name: "Mehmed Efendi"
context: "Wife of buyer, co-purchaser"
- person_index: 3
pnv_name:
literalName: "حسن افندی بن عمر"
literalName_romanized: "Hasan Efendi bin Ömer"
givenName: "حسن"
givenName_romanized: "Hasan"
title: "افندی (Efendi)"
patronymic: "بن عمر"
patronymic_romanized: "bin Ömer"
roles:
- role_title: "شاهد (şahid)"
role_in_source: "witness"
biographical:
sex: "male"
social_rank: "Efendi"
family_relationships:
father:
- name: "عمر (Ömer)"
context: "First witness"
- person_index: 4
pnv_name:
literalName: "ابراهیم چلبی بن مصطفی"
literalName_romanized: "İbrahim Çelebi bin Mustafa"
givenName: "ابراهیم"
givenName_romanized: "İbrahim"
title: "چلبی (Çelebi)"
patronymic: "بن مصطفی"
patronymic_romanized: "bin Mustafa"
roles:
- role_title: "شاهد (şahid)"
role_in_source: "witness"
biographical:
sex: "male"
social_rank: "Çelebi (gentleman/merchant)"
family_relationships:
father:
- name: "مصطفی (Mustafa)"
context: "Second witness"
temporal_references:
- expression: "فی اوائل شهر رجب سنة ١٢٥٨"
expression_romanized: "fi evail-i şehr-i Receb sene 1258"
normalized: "1842-07"
calendar: "Hijri"
type: "DATE"
conversion_note: "Receb 1258 AH ≈ July-August 1842 CE"
locations_mentioned:
- name: "قصبه دميرجی‌کوی"
name_romanized: "kasaba Demirciköy"
type: "town (kasaba)"
- name: "مجلس شرع شريف"
name_romanized: "meclis-i şer'-i şerif"
type: "court"
ottoman_naming_notes: |
Ottoman Turkish naming conventions:
HONORIFIC TITLES:
- آغا (Ağa): Military commander, landowner
- افندی (Efendi): Educated person, official
- چلبی (Çelebi): Gentleman, merchant
- خاتون (Hatun): Respectable woman
PATRONYMIC PATTERNS:
- بن (bin): Son of (Arabic)
- بنت (bint): Daughter of (Arabic)
- اوغلو (-oğlu): Son of (Turkish)
DECEASED MARKERS:
- مرحوم (merhum): The late (man)
- مرحومه (merhume): The late (woman)
CALENDAR: Hijri lunar (354/355 days)
Receb 1258 AH ≈ July-August 1842 CE
provenance:
data_status: "SYNTHETIC_EXAMPLE"
notes: |
This example uses synthetic data based on authentic Ottoman Turkish
sijill (court register) formulae for demonstration purposes. Names,
dates, and locations are fictional but follow authentic 19th-century
patterns. For real examples, see PROVENANCE_SOURCES.md.
related_real_sources:
- archive: "OpenJerusalem Project"
collection: "Jerusalem Sharia Court Registers"
digital_url: "https://www.openjerusalem.org/"
ark_identifier: "ark:/58142/PfV7b"
volume_count: "102 registers"
period: "1834-1920 CE"
languages: "Ottoman Turkish, Arabic"
license: "Open Access"
document_types: "Property sales, marriage contracts, inheritance, waqf"
- archive: "İslam Araştırmaları Merkezi (ISAM)"
collection: "Istanbul Kadı Sicilleri"
digital_url: "http://www.kadisicilleri.org/"
volume_count: "40+ volumes online"
document_count: "40,000+ documents"
period: "16th-19th century CE"
language: "Ottoman Turkish"
license: "Research access"
- archive: "Istanbul Metropolitan Municipality"
project: "History of Istanbul"
digital_url: "https://istanbultarihi.ist/434-istanbul-sharia-court-registers"
volume_count: "~10,000 volumes"
courts: "26 different courts"
period: "1453-1922 CE"
notes: "Largest collection of Ottoman court records in existence"
- archive: "Harvard University"
project: "Ottoman Court Records Project (OCRP)"
digital_url: "https://cmes.fas.harvard.edu/projects/ocrp"
document_types: "Sijill transcriptions, translations"
period: "16th-19th century CE"
# =============================================================================
# END OF MODULE
# =============================================================================