713 lines
23 KiB
Markdown
713 lines
23 KiB
Markdown
# Wikidata SPARQL Queries for Global GLAMORCUBEPSXHF Extraction
|
|
|
|
This document provides comprehensive SPARQL queries to extract ALL GLAMORCUBEPSXHF (Galleries, Libraries, Archives, Museums, Official institutions, Research centers, Corporations, Universities, Botanical gardens/zoos, Educational providers, Personal collections, Societies, Holy sites, Features) institutions worldwide from Wikidata.
|
|
|
|
## Table of Contents
|
|
|
|
1. [Generic SPARQL Query Template](#generic-sparql-query-template)
|
|
2. [Institution Type Mappings](#institution-type-mappings)
|
|
3. [Country Codes with Wikidata QIDs](#country-codes-with-wikidata-qids)
|
|
4. [Usage Examples](#usage-examples)
|
|
5. [Query Optimization Strategies](#query-optimization-strategies)
|
|
|
|
---
|
|
|
|
## Generic SPARQL Query Template
|
|
|
|
This generic query extracts ALL metadata for heritage institutions in a specific country. Replace `{COUNTRY_QID}` with the country's Wikidata QID from the table below.
|
|
|
|
```sparql
|
|
# Generic GLAMORCUBEPSXHF Institution Query
|
|
# Replace {COUNTRY_QID} with actual country QID (e.g., Q55 for Netherlands)
|
|
# Replace {LANGUAGES} with comma-separated language codes (e.g., "nl,en" for Dutch/English)
|
|
|
|
SELECT DISTINCT
|
|
?item ?itemLabel ?itemDescription ?itemAltLabel
|
|
?instType ?instTypeLabel
|
|
?coords ?latitude ?longitude
|
|
?streetAddress ?postalCode ?city ?cityLabel ?region ?regionLabel
|
|
?isil ?viaf ?wikidataQID ?website ?email ?phone
|
|
?inception ?dissolved ?foundingDate
|
|
?parent ?parentLabel ?partOf ?partOfLabel
|
|
?collectionSize ?collectionType
|
|
?image ?logo
|
|
WHERE {
|
|
# =============================================================================
|
|
# INSTITUTION TYPE FILTER (Museums, Libraries, Archives, Galleries, etc.)
|
|
# =============================================================================
|
|
|
|
VALUES ?instTypeClass {
|
|
wd:Q33506 # museum
|
|
wd:Q7075 # library
|
|
wd:Q166118 # archive
|
|
wd:Q2668072 # art gallery
|
|
wd:Q5282129 # cultural center
|
|
wd:Q3152824 # research center
|
|
wd:Q294163 # public institution (official institutions)
|
|
wd:Q3918 # university
|
|
wd:Q167346 # botanical garden
|
|
wd:Q43229 # organization (for societies, corporations)
|
|
wd:Q16970 # church building (holy sites)
|
|
wd:Q44539 # temple (holy sites)
|
|
wd:Q32815 # mosque (holy sites)
|
|
wd:Q34627 # synagogue (holy sites)
|
|
wd:Q162378 # sanctuary (holy sites)
|
|
}
|
|
|
|
# Instance of (or subclass of) heritage institution type
|
|
?item wdt:P31/wdt:P279? ?instTypeClass .
|
|
|
|
# =============================================================================
|
|
# COUNTRY FILTER
|
|
# =============================================================================
|
|
|
|
?item wdt:P17 wd:{COUNTRY_QID} . # country
|
|
|
|
# =============================================================================
|
|
# CAPTURE INSTITUTION TYPE
|
|
# =============================================================================
|
|
|
|
?item wdt:P31 ?instType .
|
|
|
|
# =============================================================================
|
|
# GEOGRAPHIC DATA
|
|
# =============================================================================
|
|
|
|
# Coordinates (lat/lon)
|
|
OPTIONAL {
|
|
?item wdt:P625 ?coords .
|
|
BIND(geoLatitude(?coords) AS ?latitude)
|
|
BIND(geoLongitude(?coords) AS ?longitude)
|
|
}
|
|
|
|
# Physical address components
|
|
OPTIONAL { ?item wdt:P6375 ?streetAddress . } # street address
|
|
OPTIONAL { ?item wdt:P281 ?postalCode . } # postal code
|
|
|
|
# City/municipality
|
|
OPTIONAL {
|
|
?item wdt:P131 ?city . # located in administrative territory
|
|
?city wdt:P31/wdt:P279? wd:Q515 . # city
|
|
}
|
|
|
|
# Region/province/state
|
|
OPTIONAL {
|
|
?item wdt:P131 ?region .
|
|
?region wdt:P31/wdt:P279? wd:Q10864048 . # first-level admin division
|
|
}
|
|
|
|
# =============================================================================
|
|
# IDENTIFIERS
|
|
# =============================================================================
|
|
|
|
OPTIONAL { ?item wdt:P791 ?isil . } # ISIL code
|
|
OPTIONAL { ?item wdt:P214 ?viaf . } # VIAF ID
|
|
OPTIONAL { ?item wdt:P856 ?website . } # official website
|
|
OPTIONAL { ?item wdt:P968 ?email . } # email address
|
|
OPTIONAL { ?item wdt:P1329 ?phone . } # phone number
|
|
|
|
# Extract Wikidata QID from URI
|
|
BIND(STRAFTER(STR(?item), "http://www.wikidata.org/entity/") AS ?wikidataQID)
|
|
|
|
# =============================================================================
|
|
# TEMPORAL DATA
|
|
# =============================================================================
|
|
|
|
OPTIONAL { ?item wdt:P571 ?inception . } # inception date
|
|
OPTIONAL { ?item wdt:P576 ?dissolved . } # dissolved/abolished date
|
|
OPTIONAL { ?item wdt:P1619 ?foundingDate . } # official founding date
|
|
|
|
# =============================================================================
|
|
# ORGANIZATIONAL RELATIONSHIPS
|
|
# =============================================================================
|
|
|
|
OPTIONAL { ?item wdt:P749 ?parent . } # parent organization
|
|
OPTIONAL { ?item wdt:P361 ?partOf . } # part of (larger entity)
|
|
|
|
# =============================================================================
|
|
# COLLECTION METADATA
|
|
# =============================================================================
|
|
|
|
OPTIONAL { ?item wdt:P1301 ?collectionSize . } # number of items in collection
|
|
OPTIONAL { ?item wdt:P195 ?collectionType . } # collection type
|
|
|
|
# =============================================================================
|
|
# MEDIA
|
|
# =============================================================================
|
|
|
|
OPTIONAL { ?item wdt:P18 ?image . } # image
|
|
OPTIONAL { ?item wdt:P154 ?logo . } # logo
|
|
|
|
# =============================================================================
|
|
# LABELS AND DESCRIPTIONS (Multilingual)
|
|
# =============================================================================
|
|
|
|
SERVICE wikibase:label {
|
|
bd:serviceParam wikibase:language "{LANGUAGES}" .
|
|
?item rdfs:label ?itemLabel .
|
|
?item schema:description ?itemDescription .
|
|
?item skos:altLabel ?itemAltLabel .
|
|
?instType rdfs:label ?instTypeLabel .
|
|
?city rdfs:label ?cityLabel .
|
|
?region rdfs:label ?regionLabel .
|
|
?parent rdfs:label ?parentLabel .
|
|
?partOf rdfs:label ?partOfLabel .
|
|
}
|
|
}
|
|
ORDER BY ?itemLabel
|
|
LIMIT 1000
|
|
```
|
|
|
|
---
|
|
|
|
## Institution Type Mappings
|
|
|
|
Wikidata QIDs for each GLAMORCUBEPSXHF category:
|
|
|
|
| GLAMORCUBEPSXHF Type | Code | Wikidata QID(s) | Wikidata Label | Notes |
|
|
|---------------------|------|-----------------|----------------|-------|
|
|
| **GALLERY** | G | Q2668072 | art gallery | Commercial and exhibition galleries |
|
|
| **LIBRARY** | L | Q7075 | library | Public, academic, specialized libraries |
|
|
| **ARCHIVE** | A | Q166118 | archive | Government, corporate, community archives |
|
|
| **MUSEUM** | M | Q33506 | museum | Art, history, science, natural history museums |
|
|
| **OFFICIAL_INSTITUTION** | O | Q294163, Q5341295 | public institution, government agency | Official heritage agencies |
|
|
| **RESEARCH_CENTER** | R | Q3152824, Q31855 | research center, research institute | Heritage research institutes |
|
|
| **CORPORATION** | C | Q4830453 | business archive, Q783794 (company) | Corporate heritage collections |
|
|
| **UNIVERSITY** | U | Q3918 | university | Universities with heritage collections |
|
|
| **BOTANICAL_ZOO** | B | Q167346, Q43229 | botanical garden, zoo | Natural heritage sites |
|
|
| **EDUCATION_PROVIDER** | E | Q2385804, Q3914 | educational institution, school | Schools with heritage collections |
|
|
| **PERSONAL_COLLECTION** | P | Q23058176 | private collection | Personal collections (rare in Wikidata) |
|
|
| **COLLECTING_SOCIETY** | S | Q43229 | organization | Numismatic, philatelic, historical societies |
|
|
| **HOLY_SITES** | H | Q16970, Q44539, Q32815, Q34627, Q162378 | church, temple, mosque, synagogue, sanctuary | Religious heritage sites |
|
|
| **FEATURES** | F | Q4989906, Q860861, Q5003624, Q5003551 | monument, sculpture, statue, memorial | Physical landmarks with heritage significance |
|
|
| **MIXED** | X | (Multiple types) | - | Institutions with multiple types |
|
|
|
|
### Refined Query by Institution Type
|
|
|
|
For targeted extraction, query each type separately:
|
|
|
|
**Museums Only:**
|
|
```sparql
|
|
?item wdt:P31/wdt:P279? wd:Q33506 . # museum (including subtypes)
|
|
```
|
|
|
|
**Libraries Only:**
|
|
```sparql
|
|
?item wdt:P31/wdt:P279? wd:Q7075 . # library (public, academic, national)
|
|
```
|
|
|
|
**Archives Only:**
|
|
```sparql
|
|
?item wdt:P31/wdt:P279? wd:Q166118 . # archive
|
|
```
|
|
|
|
**Holy Sites with Collections:**
|
|
```sparql
|
|
VALUES ?holyType { wd:Q16970 wd:Q44539 wd:Q32815 wd:Q34627 wd:Q162378 }
|
|
?item wdt:P31 ?holyType .
|
|
# Add collection-related properties to filter heritage-managing sites
|
|
OPTIONAL { ?item wdt:P1301 ?collectionSize . }
|
|
FILTER(BOUND(?collectionSize) || EXISTS { ?item wdt:P195 ?collection . })
|
|
```
|
|
|
|
---
|
|
|
|
## Country Codes with Wikidata QIDs
|
|
|
|
Complete list of 250+ countries/territories with ISO 3166-1 alpha-2 codes and Wikidata QIDs for use in SPARQL queries.
|
|
|
|
### Africa (54 countries)
|
|
|
|
| Country | Code | Wikidata QID | Flag |
|
|
|---------|------|--------------|------|
|
|
| Algeria | DZ | Q262 | 🇩🇿 |
|
|
| Angola | AO | Q916 | 🇦🇴 |
|
|
| Benin | BJ | Q962 | 🇧🇯 |
|
|
| Botswana | BW | Q963 | 🇧🇼 |
|
|
| Burkina Faso | BF | Q965 | 🇧🇫 |
|
|
| Burundi | BI | Q967 | 🇧🇮 |
|
|
| Cameroon | CM | Q1009 | 🇨🇲 |
|
|
| Cape Verde | CV | Q1011 | 🇨🇻 |
|
|
| Central African Republic | CF | Q929 | 🇨🇫 |
|
|
| Chad | TD | Q657 | 🇹🇩 |
|
|
| Comoros | KM | Q970 | 🇰🇲 |
|
|
| Congo (Brazzaville) | CG | Q971 | 🇨🇬 |
|
|
| Congo (Kinshasa) | CD | Q974 | 🇨🇩 |
|
|
| Djibouti | DJ | Q977 | 🇩🇯 |
|
|
| Egypt | EG | Q79 | 🇪🇬 |
|
|
| Equatorial Guinea | GQ | Q983 | 🇬🇶 |
|
|
| Eritrea | ER | Q986 | 🇪🇷 |
|
|
| Eswatini (Swaziland) | SZ | Q1050 | 🇸🇿 |
|
|
| Ethiopia | ET | Q115 | 🇪🇹 |
|
|
| Gabon | GA | Q1000 | 🇬🇦 |
|
|
| Gambia | GM | Q1005 | 🇬🇲 |
|
|
| Ghana | GH | Q117 | 🇬🇭 |
|
|
| Guinea | GN | Q1006 | 🇬🇳 |
|
|
| Guinea-Bissau | GW | Q1007 | 🇬🇼 |
|
|
| Ivory Coast | CI | Q1008 | 🇨🇮 |
|
|
| Kenya | KE | Q114 | 🇰🇪 |
|
|
| Lesotho | LS | Q1013 | 🇱🇸 |
|
|
| Liberia | LR | Q1014 | 🇱🇷 |
|
|
| Libya | LY | Q1016 | 🇱🇾 |
|
|
| Madagascar | MG | Q1019 | 🇲🇬 |
|
|
| Malawi | MW | Q1020 | 🇲🇼 |
|
|
| Mali | ML | Q912 | 🇲🇱 |
|
|
| Mauritania | MR | Q1025 | 🇲🇷 |
|
|
| Mauritius | MU | Q1027 | 🇲🇺 |
|
|
| Morocco | MA | Q1028 | 🇲🇦 |
|
|
| Mozambique | MZ | Q1029 | 🇲🇿 |
|
|
| Namibia | NA | Q1030 | 🇳🇦 |
|
|
| Niger | NE | Q1032 | 🇳🇪 |
|
|
| Nigeria | NG | Q1033 | 🇳🇬 |
|
|
| Rwanda | RW | Q1037 | 🇷🇼 |
|
|
| São Tomé and Príncipe | ST | Q1039 | 🇸🇹 |
|
|
| Senegal | SN | Q1041 | 🇸🇳 |
|
|
| Seychelles | SC | Q1042 | 🇸🇨 |
|
|
| Sierra Leone | SL | Q1044 | 🇸🇱 |
|
|
| Somalia | SO | Q1045 | 🇸🇴 |
|
|
| South Africa | ZA | Q258 | 🇿🇦 |
|
|
| South Sudan | SS | Q958 | 🇸🇸 |
|
|
| Sudan | SD | Q1049 | 🇸🇩 |
|
|
| Tanzania | TZ | Q924 | 🇹🇿 |
|
|
| Togo | TG | Q945 | 🇹🇬 |
|
|
| Tunisia | TN | Q948 | 🇹🇳 |
|
|
| Uganda | UG | Q1036 | 🇺🇬 |
|
|
| Zambia | ZM | Q953 | 🇿🇲 |
|
|
| Zimbabwe | ZW | Q954 | 🇿🇼 |
|
|
|
|
### Americas (35 countries)
|
|
|
|
| Country | Code | Wikidata QID | Flag |
|
|
|---------|------|--------------|------|
|
|
| Antigua and Barbuda | AG | Q781 | 🇦🇬 |
|
|
| Argentina | AR | Q414 | 🇦🇷 |
|
|
| Bahamas | BS | Q778 | 🇧🇸 |
|
|
| Barbados | BB | Q244 | 🇧🇧 |
|
|
| Belize | BZ | Q242 | 🇧🇿 |
|
|
| Bolivia | BO | Q750 | 🇧🇴 |
|
|
| Brazil | BR | Q155 | 🇧🇷 |
|
|
| Canada | CA | Q16 | 🇨🇦 |
|
|
| Chile | CL | Q298 | 🇨🇱 |
|
|
| Colombia | CO | Q739 | 🇨🇴 |
|
|
| Costa Rica | CR | Q800 | 🇨🇷 |
|
|
| Cuba | CU | Q241 | 🇨🇺 |
|
|
| Dominica | DM | Q784 | 🇩🇲 |
|
|
| Dominican Republic | DO | Q786 | 🇩🇴 |
|
|
| Ecuador | EC | Q736 | 🇪🇨 |
|
|
| El Salvador | SV | Q792 | 🇸🇻 |
|
|
| Grenada | GD | Q769 | 🇬🇩 |
|
|
| Guatemala | GT | Q774 | 🇬🇹 |
|
|
| Guyana | GY | Q734 | 🇬🇾 |
|
|
| Haiti | HT | Q790 | 🇭🇹 |
|
|
| Honduras | HN | Q783 | 🇭🇳 |
|
|
| Jamaica | JM | Q766 | 🇯🇲 |
|
|
| Mexico | MX | Q96 | 🇲🇽 |
|
|
| Nicaragua | NI | Q811 | 🇳🇮 |
|
|
| Panama | PA | Q804 | 🇵🇦 |
|
|
| Paraguay | PY | Q733 | 🇵🇾 |
|
|
| Peru | PE | Q419 | 🇵🇪 |
|
|
| Saint Kitts and Nevis | KN | Q763 | 🇰🇳 |
|
|
| Saint Lucia | LC | Q760 | 🇱🇨 |
|
|
| Saint Vincent and the Grenadines | VC | Q757 | 🇻🇨 |
|
|
| Suriname | SR | Q730 | 🇸🇷 |
|
|
| Trinidad and Tobago | TT | Q754 | 🇹🇹 |
|
|
| United States | US | Q30 | 🇺🇸 |
|
|
| Uruguay | UY | Q77 | 🇺🇾 |
|
|
| Venezuela | VE | Q717 | 🇻🇪 |
|
|
|
|
### Asia (50 countries/territories)
|
|
|
|
| Country | Code | Wikidata QID | Flag |
|
|
|---------|------|--------------|------|
|
|
| Afghanistan | AF | Q889 | 🇦🇫 |
|
|
| Armenia | AM | Q399 | 🇦🇲 |
|
|
| Azerbaijan | AZ | Q227 | 🇦🇿 |
|
|
| Bahrain | BH | Q398 | 🇧🇭 |
|
|
| Bangladesh | BD | Q902 | 🇧🇩 |
|
|
| Bhutan | BT | Q917 | 🇧🇹 |
|
|
| Brunei | BN | Q921 | 🇧🇳 |
|
|
| Cambodia | KH | Q424 | 🇰🇭 |
|
|
| China | CN | Q148 | 🇨🇳 |
|
|
| Cyprus | CY | Q229 | 🇨🇾 |
|
|
| East Timor (Timor-Leste) | TL | Q574 | 🇹🇱 |
|
|
| Georgia | GE | Q230 | 🇬🇪 |
|
|
| India | IN | Q668 | 🇮🇳 |
|
|
| Indonesia | ID | Q252 | 🇮🇩 |
|
|
| Iran | IR | Q794 | 🇮🇷 |
|
|
| Iraq | IQ | Q796 | 🇮🇶 |
|
|
| Israel | IL | Q801 | 🇮🇱 |
|
|
| Japan | JP | Q17 | 🇯🇵 |
|
|
| Jordan | JO | Q810 | 🇯🇴 |
|
|
| Kazakhstan | KZ | Q232 | 🇰🇿 |
|
|
| Kuwait | KW | Q817 | 🇰🇼 |
|
|
| Kyrgyzstan | KG | Q813 | 🇰🇬 |
|
|
| Laos | LA | Q819 | 🇱🇦 |
|
|
| Lebanon | LB | Q822 | 🇱🇧 |
|
|
| Malaysia | MY | Q833 | 🇲🇾 |
|
|
| Maldives | MV | Q826 | 🇲🇻 |
|
|
| Mongolia | MN | Q711 | 🇲🇳 |
|
|
| Myanmar (Burma) | MM | Q836 | 🇲🇲 |
|
|
| Nepal | NP | Q837 | 🇳🇵 |
|
|
| North Korea | KP | Q423 | 🇰🇵 |
|
|
| Oman | OM | Q842 | 🇴🇲 |
|
|
| Pakistan | PK | Q843 | 🇵🇰 |
|
|
| Palestine | PS | Q219060 | 🇵🇸 |
|
|
| Philippines | PH | Q928 | 🇵🇭 |
|
|
| Qatar | QA | Q846 | 🇶🇦 |
|
|
| Saudi Arabia | SA | Q851 | 🇸🇦 |
|
|
| Singapore | SG | Q334 | 🇸🇬 |
|
|
| South Korea | KR | Q884 | 🇰🇷 |
|
|
| Sri Lanka | LK | Q854 | 🇱🇰 |
|
|
| Syria | SY | Q858 | 🇸🇾 |
|
|
| Taiwan | TW | Q865 | 🇹🇼 |
|
|
| Tajikistan | TJ | Q863 | 🇹🇯 |
|
|
| Thailand | TH | Q869 | 🇹🇭 |
|
|
| Turkey | TR | Q43 | 🇹🇷 |
|
|
| Turkmenistan | TM | Q874 | 🇹🇲 |
|
|
| United Arab Emirates | AE | Q878 | 🇦🇪 |
|
|
| Uzbekistan | UZ | Q265 | 🇺🇿 |
|
|
| Vietnam | VN | Q881 | 🇻🇳 |
|
|
| Yemen | YE | Q805 | 🇾🇪 |
|
|
|
|
### Europe (50 countries/territories)
|
|
|
|
| Country | Code | Wikidata QID | Flag |
|
|
|---------|------|--------------|------|
|
|
| Albania | AL | Q222 | 🇦🇱 |
|
|
| Andorra | AD | Q228 | 🇦🇩 |
|
|
| Austria | AT | Q40 | 🇦🇹 |
|
|
| Belarus | BY | Q184 | 🇧🇾 |
|
|
| Belgium | BE | Q31 | 🇧🇪 |
|
|
| Bosnia and Herzegovina | BA | Q225 | 🇧🇦 |
|
|
| Bulgaria | BG | Q219 | 🇧🇬 |
|
|
| Croatia | HR | Q224 | 🇭🇷 |
|
|
| Czech Republic | CZ | Q213 | 🇨🇿 |
|
|
| Denmark | DK | Q35 | 🇩🇰 |
|
|
| Estonia | EE | Q191 | 🇪🇪 |
|
|
| Finland | FI | Q33 | 🇫🇮 |
|
|
| France | FR | Q142 | 🇫🇷 |
|
|
| Germany | DE | Q183 | 🇩🇪 |
|
|
| Greece | GR | Q41 | 🇬🇷 |
|
|
| Hungary | HU | Q28 | 🇭🇺 |
|
|
| Iceland | IS | Q189 | 🇮🇸 |
|
|
| Ireland | IE | Q27 | 🇮🇪 |
|
|
| Italy | IT | Q38 | 🇮🇹 |
|
|
| Kosovo | XK | Q1246 | 🇽🇰 |
|
|
| Latvia | LV | Q211 | 🇱🇻 |
|
|
| Liechtenstein | LI | Q347 | 🇱🇮 |
|
|
| Lithuania | LT | Q37 | 🇱🇹 |
|
|
| Luxembourg | LU | Q32 | 🇱🇺 |
|
|
| Malta | MT | Q233 | 🇲🇹 |
|
|
| Moldova | MD | Q217 | 🇲🇩 |
|
|
| Monaco | MC | Q235 | 🇲🇨 |
|
|
| Montenegro | ME | Q236 | 🇲🇪 |
|
|
| Netherlands | NL | Q55 | 🇳🇱 |
|
|
| North Macedonia | MK | Q221 | 🇲🇰 |
|
|
| Norway | NO | Q20 | 🇳🇴 |
|
|
| Poland | PL | Q36 | 🇵🇱 |
|
|
| Portugal | PT | Q45 | 🇵🇹 |
|
|
| Romania | RO | Q218 | 🇷🇴 |
|
|
| Russia | RU | Q159 | 🇷🇺 |
|
|
| San Marino | SM | Q238 | 🇸🇲 |
|
|
| Serbia | RS | Q403 | 🇷🇸 |
|
|
| Slovakia | SK | Q214 | 🇸🇰 |
|
|
| Slovenia | SI | Q215 | 🇸🇮 |
|
|
| Spain | ES | Q29 | 🇪🇸 |
|
|
| Sweden | SE | Q34 | 🇸🇪 |
|
|
| Switzerland | CH | Q39 | 🇨🇭 |
|
|
| Ukraine | UA | Q212 | 🇺🇦 |
|
|
| United Kingdom | GB | Q145 | 🇬🇧 |
|
|
| Vatican City | VA | Q237 | 🇻🇦 |
|
|
|
|
### Oceania (16 countries/territories)
|
|
|
|
| Country | Code | Wikidata QID | Flag |
|
|
|---------|------|--------------|------|
|
|
| Australia | AU | Q408 | 🇦🇺 |
|
|
| Fiji | FJ | Q712 | 🇫🇯 |
|
|
| Kiribati | KI | Q710 | 🇰🇮 |
|
|
| Marshall Islands | MH | Q709 | 🇲🇭 |
|
|
| Micronesia | FM | Q702 | 🇫🇲 |
|
|
| Nauru | NR | Q697 | 🇳🇷 |
|
|
| New Zealand | NZ | Q664 | 🇳🇿 |
|
|
| Palau | PW | Q695 | 🇵🇼 |
|
|
| Papua New Guinea | PG | Q691 | 🇵🇬 |
|
|
| Samoa | WS | Q683 | 🇼🇸 |
|
|
| Solomon Islands | SB | Q685 | 🇸🇧 |
|
|
| Tonga | TO | Q678 | 🇹🇴 |
|
|
| Tuvalu | TV | Q672 | 🇹🇻 |
|
|
| Vanuatu | VU | Q686 | 🇻🇺 |
|
|
|
|
### Total: 205+ countries and territories
|
|
|
|
---
|
|
|
|
## Usage Examples
|
|
|
|
### Example 1: Extract All Dutch Museums
|
|
|
|
```sparql
|
|
SELECT DISTINCT ?item ?itemLabel ?itemDescription ?coords ?isil ?viaf ?website
|
|
WHERE {
|
|
?item wdt:P31/wdt:P279? wd:Q33506 . # museum
|
|
?item wdt:P17 wd:Q55 . # Netherlands
|
|
|
|
OPTIONAL { ?item wdt:P625 ?coords . }
|
|
OPTIONAL { ?item wdt:P791 ?isil . }
|
|
OPTIONAL { ?item wdt:P214 ?viaf . }
|
|
OPTIONAL { ?item wdt:P856 ?website . }
|
|
|
|
SERVICE wikibase:label {
|
|
bd:serviceParam wikibase:language "nl,en" .
|
|
}
|
|
}
|
|
LIMIT 1000
|
|
```
|
|
|
|
### Example 2: Extract Brazilian Libraries with Collections
|
|
|
|
```sparql
|
|
SELECT DISTINCT ?item ?itemLabel ?city ?cityLabel ?collectionSize ?website
|
|
WHERE {
|
|
?item wdt:P31/wdt:P279? wd:Q7075 . # library
|
|
?item wdt:P17 wd:Q155 . # Brazil
|
|
|
|
OPTIONAL {
|
|
?item wdt:P131 ?city .
|
|
?city wdt:P31/wdt:P279? wd:Q515 . # city
|
|
}
|
|
|
|
OPTIONAL { ?item wdt:P1301 ?collectionSize . } # collection size
|
|
OPTIONAL { ?item wdt:P856 ?website . }
|
|
|
|
SERVICE wikibase:label {
|
|
bd:serviceParam wikibase:language "pt,en" .
|
|
}
|
|
}
|
|
ORDER BY DESC(?collectionSize)
|
|
LIMIT 500
|
|
```
|
|
|
|
### Example 3: Extract Japanese Archives with ISIL Codes
|
|
|
|
```sparql
|
|
SELECT DISTINCT ?item ?itemLabel ?isil ?city ?cityLabel ?inception
|
|
WHERE {
|
|
?item wdt:P31/wdt:P279? wd:Q166118 . # archive
|
|
?item wdt:P17 wd:Q17 . # Japan
|
|
?item wdt:P791 ?isil . # MUST have ISIL code
|
|
|
|
OPTIONAL {
|
|
?item wdt:P131 ?city .
|
|
}
|
|
|
|
OPTIONAL { ?item wdt:P571 ?inception . }
|
|
|
|
SERVICE wikibase:label {
|
|
bd:serviceParam wikibase:language "ja,en" .
|
|
}
|
|
}
|
|
ORDER BY ?isil
|
|
LIMIT 1000
|
|
```
|
|
|
|
### Example 4: Extract All Heritage Institutions in Tunisia
|
|
|
|
```sparql
|
|
SELECT DISTINCT ?item ?itemLabel ?itemDescription ?instTypeLabel ?coords ?website
|
|
WHERE {
|
|
VALUES ?instTypeClass {
|
|
wd:Q33506 # museum
|
|
wd:Q7075 # library
|
|
wd:Q166118 # archive
|
|
wd:Q2668072 # art gallery
|
|
wd:Q5282129 # cultural center
|
|
}
|
|
|
|
?item wdt:P31/wdt:P279? ?instTypeClass .
|
|
?item wdt:P17 wd:Q948 . # Tunisia
|
|
?item wdt:P31 ?instType .
|
|
|
|
OPTIONAL { ?item wdt:P625 ?coords . }
|
|
OPTIONAL { ?item wdt:P856 ?website . }
|
|
|
|
SERVICE wikibase:label {
|
|
bd:serviceParam wikibase:language "ar,fr,en" .
|
|
}
|
|
}
|
|
LIMIT 1000
|
|
```
|
|
|
|
---
|
|
|
|
## Query Optimization Strategies
|
|
|
|
### 1. **Avoid Transitive Subclass Queries**
|
|
|
|
❌ **Slow** (causes 504 timeout):
|
|
```sparql
|
|
?item wdt:P31/wdt:P279* wd:Q33506 . # AVOID wdt:P279* (unbounded)
|
|
```
|
|
|
|
✅ **Fast** (limited depth):
|
|
```sparql
|
|
?item wdt:P31/wdt:P279? wd:Q33506 . # Use wdt:P279? (0 or 1 hop)
|
|
```
|
|
|
|
### 2. **Query Institution Types Separately**
|
|
|
|
Instead of one large query with all types, run separate queries for each type:
|
|
|
|
```python
|
|
institution_types = ["Q33506", "Q7075", "Q166118", "Q2668072"]
|
|
|
|
for inst_type in institution_types:
|
|
query = f"""
|
|
SELECT DISTINCT ?item ?itemLabel ...
|
|
WHERE {{
|
|
?item wdt:P31/wdt:P279? wd:{inst_type} .
|
|
?item wdt:P17 wd:{{COUNTRY_QID}} .
|
|
...
|
|
}}
|
|
LIMIT 1000
|
|
"""
|
|
results = execute_sparql(query)
|
|
```
|
|
|
|
### 3. **Use LIMIT and OFFSET for Pagination**
|
|
|
|
For countries with many institutions (e.g., France, Germany, UK):
|
|
|
|
```sparql
|
|
SELECT DISTINCT ?item ?itemLabel ...
|
|
WHERE {
|
|
...
|
|
}
|
|
ORDER BY ?itemLabel
|
|
LIMIT 1000
|
|
OFFSET 0 # Change to 1000, 2000, 3000, etc. for pagination
|
|
```
|
|
|
|
### 4. **Specify Language Priorities**
|
|
|
|
```sparql
|
|
SERVICE wikibase:label {
|
|
bd:serviceParam wikibase:language "nl,en" . # Dutch preferred, English fallback
|
|
}
|
|
```
|
|
|
|
Language codes by country:
|
|
- **Netherlands**: `"nl,en"`
|
|
- **Brazil**: `"pt,en"`
|
|
- **Japan**: `"ja,en"`
|
|
- **France**: `"fr,en"`
|
|
- **Germany**: `"de,en"`
|
|
- **China**: `"zh,en"`
|
|
- **Russia**: `"ru,en"`
|
|
- **Arabic countries**: `"ar,en"` or `"ar,fr,en"`
|
|
|
|
### 5. **Rate Limiting**
|
|
|
|
Wikidata SPARQL endpoint has rate limits. Best practices:
|
|
|
|
- **Add delays between queries**: Wait 2-5 seconds between requests
|
|
- **Use custom User-Agent**: `"GLAM-Extractor/0.2.0 (Wikidata Global Extraction)"`
|
|
- **Cache results**: Store results in `data/wikidata/{country}/{timestamp}.json`
|
|
- **Batch by country**: Process countries in priority order (see `enrich_institutions_wikidata_sparql.py`)
|
|
|
|
### 6. **Error Handling**
|
|
|
|
```python
|
|
import time
|
|
from SPARQLWrapper import SPARQLWrapper, SPARQLExceptions
|
|
|
|
def execute_sparql_with_retry(sparql, query, max_retries=3):
|
|
for attempt in range(max_retries):
|
|
try:
|
|
sparql.setQuery(query)
|
|
return sparql.query().convert()
|
|
except SPARQLExceptions.EndPointInternalError:
|
|
# 504 timeout - query too complex
|
|
print(f"Timeout on attempt {attempt+1}, simplifying query...")
|
|
time.sleep(5)
|
|
except Exception as e:
|
|
print(f"Error: {e}")
|
|
if attempt < max_retries - 1:
|
|
time.sleep(2 ** attempt) # Exponential backoff
|
|
else:
|
|
raise
|
|
return None
|
|
```
|
|
|
|
---
|
|
|
|
## Storage Format
|
|
|
|
Save results to `data/wikidata/{country}/{timestamp}.json`:
|
|
|
|
```json
|
|
{
|
|
"country_code": "NL",
|
|
"country_name": "Netherlands",
|
|
"country_qid": "Q55",
|
|
"extraction_date": "2025-11-11T10:30:00Z",
|
|
"total_institutions": 1247,
|
|
"institution_types": {
|
|
"museum": 843,
|
|
"library": 302,
|
|
"archive": 78,
|
|
"gallery": 24
|
|
},
|
|
"institutions": [
|
|
{
|
|
"wikidata_qid": "Q190804",
|
|
"name": "Rijksmuseum",
|
|
"description": "National museum of the Netherlands",
|
|
"institution_type": "museum",
|
|
"coordinates": {
|
|
"latitude": 52.36,
|
|
"longitude": 4.885
|
|
},
|
|
"identifiers": {
|
|
"ISIL": "NL-AsdRM",
|
|
"VIAF": "131375374",
|
|
"website": "https://www.rijksmuseum.nl"
|
|
},
|
|
"location": {
|
|
"city": "Amsterdam",
|
|
"region": "North Holland",
|
|
"country": "Netherlands"
|
|
},
|
|
"founding_date": "1800-01-01",
|
|
"collection_size": 1000000
|
|
}
|
|
]
|
|
}
|
|
```
|
|
|
|
---
|
|
|
|
## Next Steps
|
|
|
|
1. **Create Python script** (`scripts/extract_global_wikidata.py`) that:
|
|
- Loads country list from this document
|
|
- Executes generic SPARQL query for each country
|
|
- Saves results to `data/wikidata/{country}/{timestamp}.json`
|
|
- Tracks progress and errors
|
|
|
|
2. **Prioritize countries** by:
|
|
- Data quality (ISIL registry availability)
|
|
- Institution count (larger datasets first)
|
|
- Strategic importance (Netherlands, France, Germany, UK, US, Brazil, Japan)
|
|
|
|
3. **Convert to LinkML** instances:
|
|
- Parse JSON results
|
|
- Map to `HeritageCustodian` schema
|
|
- Generate GHCIDs
|
|
- Add provenance metadata (data_source: WIKIDATA, data_tier: TIER_3_CROWD_SOURCED)
|
|
|
|
---
|
|
|
|
**Version**: 1.0
|
|
**Last Updated**: 2025-11-11
|
|
**Maintained By**: GLAM Data Extraction Project
|