7.3 KiB
7.3 KiB
FEATURES (F) - Initial Extraction Analysis
Date: 2025-11-11 21:17:54
Status: ✅ COMPLETE (6 of 6 root classes extracted)
Extraction Summary
| Metric | Value |
|---|---|
| Total bindings | 53,389 |
| Unique classes (Q-numbers) | 26,498 |
| Unique languages | 45 |
| Total unique terms | 38,524 |
| Root classes covered | 6/6 (monument, sculpture, statue, memorial, landmark, cemetery) |
Root Class Distribution
| Feature Type | Root Q-Number | Unique Classes | Total Terms | Avg Terms/Class |
|---|---|---|---|---|
| CEMETERY | Q39614 | 208 | 1,437 | 6.9 |
| LANDMARK | Q7075 | 393 | 2,654 | 6.8 |
| MEMORIAL | Q5003624 | 750 | 5,609 | 7.5 |
| MONUMENT | Q4989906 | 1,421 | 10,393 | 7.3 |
| SCULPTURE | Q860861 | 24,962 | 29,907 | 1.2 |
| STATUE | Q179700 | 425 | 3,389 | 8.0 |
Language Distribution
Top 25 Languages
| Language | Terms | Percentage |
|---|---|---|
| en | 28,918 | 54.16% |
| de | 2,275 | 4.26% |
| fr | 1,928 | 3.61% |
| es | 1,842 | 3.45% |
| ja | 1,758 | 3.29% |
| nl | 1,313 | 2.46% |
| zh | 1,202 | 2.25% |
| it | 1,154 | 2.16% |
| ru | 1,118 | 2.09% |
| tr | 1,012 | 1.90% |
| pt | 886 | 1.66% |
| cs | 878 | 1.64% |
| ca | 817 | 1.53% |
| uk | 808 | 1.51% |
| ar | 787 | 1.47% |
| hu | 671 | 1.26% |
| sv | 656 | 1.23% |
| pl | 627 | 1.17% |
| ko | 597 | 1.12% |
| he | 531 | 0.99% |
| fi | 474 | 0.89% |
| da | 404 | 0.76% |
| el | 346 | 0.65% |
| fa | 338 | 0.63% |
| id | 260 | 0.49% |
Language Balance
- English: 28,918 terms (54.2%)
- Non-English: 24,471 terms (45.8%)
Coverage Assessment
Geographic Coverage:
- ✅ European: de, fr, es, nl, it, ru, pt, cs, ca, uk, sv, pl, el, ro, hu, da, no, fi, sr, bg, hr
- ✅ Asian: ja, zh, ko, ar, tr, fa, he, hi, th, vi, id, ms, bn, ur, ta, te, pa, mr
- ⚠️ African: sw, am, ha, yo, zu, af, so, ti, lg, ny (low representation)
- ⚠️ Indigenous: Limited coverage (target for Phase 2 - Exa research)
Most Multilingual Classes (Top 20)
| Q-Number | Languages | English Label | Feature Types | Example Translations |
|---|---|---|---|---|
| Q173387 | 40 | grave | memorial, monument, statue | de:Grab; fr:tombe; es:sepultura |
| Q4989906 | 40 | monument | statue | de:Denkmal; fr:monument; es:monumento |
| Q162875 | 38 | mausoleum | memorial, monument | de:Mausoleum; fr:mausolée; es:mausoleo |
| Q381885 | 38 | tomb | memorial, monument, statue | de:Grab; fr:tombeau; es:tumba |
| Q80793 | 38 | sundial | monument | de:Sonnenuhr; fr:cadran solaire; es:reloj de sol |
| Q212805 | 37 | digital library | landmark | de:digitale Bibliothek; fr:bibliothèque numérique; |
| Q28564 | 37 | public library | landmark | de:öffentliche Bibliothek; fr:bibliothèque publiqu |
| Q20350 | 37 | moai | monument | de:Moai; fr:moaï; es:moái |
| Q22806 | 36 | national library | landmark | de:Nationalbibliothek; fr:bibliothèque nationale; |
| Q143912 | 35 | triumphal arch | monument, statue | de:Triumphbogen; fr:arc de triomphe; es:arco de tr |
| Q170980 | 35 | obelisk | monument, statue | de:Obelisk; fr:obélisque; es:obelisco |
| Q178743 | 35 | stele | monument, statue | de:Stele; fr:stèle; es:estela |
| Q856234 | 34 | academic library | landmark | de:Universitätsbibliothek; fr:bibliothèque univers |
| Q180927 | 34 | mastaba | memorial, monument | de:Mastaba; fr:mastaba; es:mastaba |
| Q10145 | 34 | milestone | memorial, monument | de:Meilenstein; fr:borne routière; es:miliario |
| Q5003624 | 34 | memorial | monument, statue | de:Gedenkstätte; fr:mémorial; es:monumento conmemo |
| Q172896 | 33 | catacombs | cemetery | de:Katakombe; fr:catacombes; es:catacumbas |
| Q200141 | 33 | necropolis | cemetery | de:Nekropole; fr:nécropole; es:necrópolis |
| Q1081138 | 33 | historic site | memorial, monument, sculpture | de:historische Stätte; fr:site historique; es:siti |
| Q193457 | 33 | mud volcano | monument | de:Schlammvulkan; fr:volcan de boue; es:volcán de |
Feature Type Language Distribution
CEMETERY
Top 10 Languages:
- en: 277 terms (19.3%)
- es: 117 terms (8.1%)
- de: 106 terms (7.4%)
- fr: 86 terms (6.0%)
- tr: 81 terms (5.6%)
- ja: 73 terms (5.1%)
- nl: 69 terms (4.8%)
- it: 58 terms (4.0%)
- uk: 56 terms (3.9%)
- ru: 51 terms (3.5%)
LANDMARK
Top 10 Languages:
- en: 471 terms (17.7%)
- es: 189 terms (7.1%)
- zh: 162 terms (6.1%)
- fr: 158 terms (6.0%)
- de: 154 terms (5.8%)
- hu: 123 terms (4.6%)
- ja: 118 terms (4.4%)
- nl: 105 terms (4.0%)
- cs: 102 terms (3.8%)
- it: 96 terms (3.6%)
MEMORIAL
Top 10 Languages:
- en: 924 terms (16.5%)
- de: 436 terms (7.8%)
- fr: 382 terms (6.8%)
- ja: 316 terms (5.6%)
- es: 307 terms (5.5%)
- nl: 276 terms (4.9%)
- ru: 220 terms (3.9%)
- zh: 211 terms (3.8%)
- it: 209 terms (3.7%)
- tr: 190 terms (3.4%)
MONUMENT
Top 10 Languages:
- en: 1,677 terms (16.1%)
- de: 900 terms (8.7%)
- fr: 684 terms (6.6%)
- ja: 600 terms (5.8%)
- es: 583 terms (5.6%)
- nl: 515 terms (5.0%)
- it: 417 terms (4.0%)
- tr: 401 terms (3.9%)
- ru: 399 terms (3.8%)
- zh: 398 terms (3.8%)
SCULPTURE
Top 10 Languages:
- en: 25,048 terms (83.8%)
- ja: 468 terms (1.6%)
- de: 428 terms (1.4%)
- es: 418 terms (1.4%)
- fr: 398 terms (1.3%)
- pt: 357 terms (1.2%)
- ar: 326 terms (1.1%)
- zh: 258 terms (0.9%)
- it: 227 terms (0.8%)
- ru: 223 terms (0.7%)
STATUE
Top 10 Languages:
- en: 521 terms (15.4%)
- de: 251 terms (7.4%)
- es: 228 terms (6.7%)
- fr: 220 terms (6.5%)
- ja: 183 terms (5.4%)
- it: 147 terms (4.3%)
- ru: 144 terms (4.2%)
- nl: 140 terms (4.1%)
- zh: 132 terms (3.9%)
- ca: 98 terms (2.9%)
Data Quality Notes
Strengths
- Comprehensive Coverage: All 6 root classes extracted (monument, sculpture, statue, memorial, landmark, cemetery)
- Large Scale: 26,498 unique classes across 45 languages
- Multilingual Depth: Top classes have 35-40 language translations
- European/Asian Balance: Strong coverage across European and major Asian languages
Challenges
- English Dominance: 54.2% of terms are English (sculpture class particularly English-heavy)
- African Language Gap: Very low representation (sw, am, ha, yo, zu combined < 1%)
- Query Limit: Initial query hit 50,000 binding limit (required statue supplemental query)
- Sculpture Bias: Sculpture class dominates with 24,962 classes (94.2% of total)
Recommended Actions
- ✅ Phase 1 Complete: Wikidata extraction successful
- 🔄 Phase 2 Target: Use Exa web research to enrich African and Indigenous language terms
- 📊 Quality Check: Validate top multilingual classes for translation accuracy
- 🔍 Depth Analysis: Investigate sculpture class depth distribution (may have deep/narrow branches)
Next Steps
- Immediate: Move to next GLAMORCUBEPSXH class (O - OFFICIAL_INSTITUTION)
- Post-extraction: Create combined vocabulary analysis across all 15 classes
- Phase 2 Planning: Identify specific African/Indigenous language targets for web research
Extraction Method: Wikidata SPARQL Query Service
Query Files:
F/sparql/features_hyponyms.sparql(main query, 5 root classes)F/sparql/features_statue_hyponyms.sparql(supplemental query) Raw Data:F/sparql/hyponyms_raw.json