glam/data/wikidata/GLAMORCUBEPSXHFN/README_F_EXTRACTION.md
kempersc 2761857b0d Add scripts for converting OWL/Turtle ontology to Mermaid and PlantUML diagrams
- Implemented `owl_to_mermaid.py` to convert OWL/Turtle files into Mermaid class diagrams.
- Implemented `owl_to_plantuml.py` to convert OWL/Turtle files into PlantUML class diagrams.
- Added two new PlantUML files for custodian multi-aspect diagrams.
2025-11-22 23:01:13 +01:00

140 lines
4.2 KiB
Markdown

# Type 'F' (FEATURES) Extraction Summary
## Overview
This document describes the extraction of all Wikidata entities with type 'F' (FEATURES - Physical landscape features with heritage significance) from the GLAMORCUBEPSXHFN taxonomy.
## Extraction Details
- **Source File**: `hyponyms_curated_full.yaml`
- **Output File**: `hyponyms_curated_full_f.yaml`
- **Extraction Date**: 2025-11-22
- **Total Entries Extracted**: 298
## Type 'F' Definition
According to the GLAMORCUBEPSXHFN taxonomy:
**FEATURES (F)**: Physical landscape features with heritage significance
- Examples: monuments, sculptures, statues, memorials, landmarks, cemeteries
- Not institutions maintaining collections, but the physical features themselves
- Heritage sites with cultural or historical significance
## Distribution by Hypernym Category
The 298 extracted entries are distributed across the following hypernym categories:
| Hypernym Category | Count | Percentage |
|-------------------------|-------|------------|
| heritage site | 144 | 48.3% |
| building | 33 | 11.1% |
| protected area | 23 | 7.7% |
| structure | 12 | 4.0% |
| museum | 8 | 2.7% |
| park | 7 | 2.3% |
| infrastructure | 6 | 2.0% |
| grave | 6 | 2.0% |
| space | 5 | 1.7% |
| memory space | 5 | 1.7% |
| information point | 5 | 1.7% |
| natural monument | 5 | 1.7% |
| tomb | 5 | 1.7% |
| object | 4 | 1.3% |
| geographical object | 4 | 1.3% |
| Other (27 categories) | 26 | 8.7% |
## Notable Examples
### Buildings
- Q1802963: mansion
- Q317557: parish church
- Q1021645: office building
### Structures
- Q336164: sewerage pumping station
- Q15710813: physical structure
- Q1411945: civil engineering construction
### Heritage Sites
- Q3694: vacation property
- Q2927789: buitenplaats (Dutch country estate)
- Q136396228: sacred shrine (Bali)
### Objects
- Q16686448: artificial object
- Q860861: sculpture
- Q223557: physical object
### Settlements
- Q124250988: urban settlement
- Q3957: town
## File Structure
The extracted YAML file maintains the same structure as the source file:
```yaml
metadata:
source_file: data/wikidata/GLAMORCUBEPSXHFN/hyponyms_curated_full.yaml
filter_criteria: 'type: F (FEATURES - Physical landscape features with heritage significance)'
extraction_date: '2025-11-20T18:28:12.123353+00:00'
count: 298
sources:
[original source metadata preserved]
hypernym:
- curated:
label: Q1802963
hypernym:
- building
type:
- F
wikidata:
[full Wikidata entity data]
enrichment_status: success
identifier: Q1802963
enrichment_date: '2025-11-20T18:28:12.600122+00:00'
[... 297 more entries]
```
## File Statistics
- **File Size**: 2.2 MB
- **Line Count**: 90,950 lines
- **Format**: YAML with UTF-8 encoding
## Usage
This file can be used for:
1. **Ontology mapping**: Understanding which Wikidata entities represent physical features
2. **Schema validation**: Ensuring FEATURES classification is correctly applied
3. **Geographic analysis**: Identifying heritage landscape features
4. **Data quality**: Cross-referencing with other heritage databases
## Schema Alignment
These entities should map to the following ontology classes in LinkML schema:
- **CIDOC-CRM**: `crm:E27_Site` (physical sites)
- **Schema.org**: `schema:Place`, `schema:LandmarksOrHistoricalBuildings`
- **RiC-O**: `rico:Place` (archival context)
## Next Steps
1. Map each Q-number to appropriate LinkML `class_uri`
2. Create `HeritageCustodian` records for institutions managing these features
3. Link physical features to custodian organizations via `manages_feature` relationship
4. Geocode locations for mapping visualization
## References
- Source taxonomy: GLAMORCUBEPSXHFN (19-type heritage classification)
- Base ontologies: `/data/ontology/` directory
- Schema definitions: `/schemas/20251121/linkml/01_custodian_name.yaml`
---
**Generated**: 2025-11-22
**Extraction Script**: Python YAML parser with type filter