23 KiB
Wikidata SPARQL Queries for Global GLAMORCUBEPSXHF Extraction
This document provides comprehensive SPARQL queries to extract ALL GLAMORCUBEPSXHF (Galleries, Libraries, Archives, Museums, Official institutions, Research centers, Corporations, Universities, Botanical gardens/zoos, Educational providers, Personal collections, Societies, Holy sites, Features) institutions worldwide from Wikidata.
Table of Contents
- Generic SPARQL Query Template
- Institution Type Mappings
- Country Codes with Wikidata QIDs
- Usage Examples
- Query Optimization Strategies
Generic SPARQL Query Template
This generic query extracts ALL metadata for heritage institutions in a specific country. Replace {COUNTRY_QID} with the country's Wikidata QID from the table below.
# Generic GLAMORCUBEPSXHF Institution Query
# Replace {COUNTRY_QID} with actual country QID (e.g., Q55 for Netherlands)
# Replace {LANGUAGES} with comma-separated language codes (e.g., "nl,en" for Dutch/English)
SELECT DISTINCT
?item ?itemLabel ?itemDescription ?itemAltLabel
?instType ?instTypeLabel
?coords ?latitude ?longitude
?streetAddress ?postalCode ?city ?cityLabel ?region ?regionLabel
?isil ?viaf ?wikidataQID ?website ?email ?phone
?inception ?dissolved ?foundingDate
?parent ?parentLabel ?partOf ?partOfLabel
?collectionSize ?collectionType
?image ?logo
WHERE {
# =============================================================================
# INSTITUTION TYPE FILTER (Museums, Libraries, Archives, Galleries, etc.)
# =============================================================================
VALUES ?instTypeClass {
wd:Q33506 # museum
wd:Q7075 # library
wd:Q166118 # archive
wd:Q2668072 # art gallery
wd:Q5282129 # cultural center
wd:Q3152824 # research center
wd:Q294163 # public institution (official institutions)
wd:Q3918 # university
wd:Q167346 # botanical garden
wd:Q43229 # organization (for societies, corporations)
wd:Q16970 # church building (holy sites)
wd:Q44539 # temple (holy sites)
wd:Q32815 # mosque (holy sites)
wd:Q34627 # synagogue (holy sites)
wd:Q162378 # sanctuary (holy sites)
}
# Instance of (or subclass of) heritage institution type
?item wdt:P31/wdt:P279? ?instTypeClass .
# =============================================================================
# COUNTRY FILTER
# =============================================================================
?item wdt:P17 wd:{COUNTRY_QID} . # country
# =============================================================================
# CAPTURE INSTITUTION TYPE
# =============================================================================
?item wdt:P31 ?instType .
# =============================================================================
# GEOGRAPHIC DATA
# =============================================================================
# Coordinates (lat/lon)
OPTIONAL {
?item wdt:P625 ?coords .
BIND(geoLatitude(?coords) AS ?latitude)
BIND(geoLongitude(?coords) AS ?longitude)
}
# Physical address components
OPTIONAL { ?item wdt:P6375 ?streetAddress . } # street address
OPTIONAL { ?item wdt:P281 ?postalCode . } # postal code
# City/municipality
OPTIONAL {
?item wdt:P131 ?city . # located in administrative territory
?city wdt:P31/wdt:P279? wd:Q515 . # city
}
# Region/province/state
OPTIONAL {
?item wdt:P131 ?region .
?region wdt:P31/wdt:P279? wd:Q10864048 . # first-level admin division
}
# =============================================================================
# IDENTIFIERS
# =============================================================================
OPTIONAL { ?item wdt:P791 ?isil . } # ISIL code
OPTIONAL { ?item wdt:P214 ?viaf . } # VIAF ID
OPTIONAL { ?item wdt:P856 ?website . } # official website
OPTIONAL { ?item wdt:P968 ?email . } # email address
OPTIONAL { ?item wdt:P1329 ?phone . } # phone number
# Extract Wikidata QID from URI
BIND(STRAFTER(STR(?item), "http://www.wikidata.org/entity/") AS ?wikidataQID)
# =============================================================================
# TEMPORAL DATA
# =============================================================================
OPTIONAL { ?item wdt:P571 ?inception . } # inception date
OPTIONAL { ?item wdt:P576 ?dissolved . } # dissolved/abolished date
OPTIONAL { ?item wdt:P1619 ?foundingDate . } # official founding date
# =============================================================================
# ORGANIZATIONAL RELATIONSHIPS
# =============================================================================
OPTIONAL { ?item wdt:P749 ?parent . } # parent organization
OPTIONAL { ?item wdt:P361 ?partOf . } # part of (larger entity)
# =============================================================================
# COLLECTION METADATA
# =============================================================================
OPTIONAL { ?item wdt:P1301 ?collectionSize . } # number of items in collection
OPTIONAL { ?item wdt:P195 ?collectionType . } # collection type
# =============================================================================
# MEDIA
# =============================================================================
OPTIONAL { ?item wdt:P18 ?image . } # image
OPTIONAL { ?item wdt:P154 ?logo . } # logo
# =============================================================================
# LABELS AND DESCRIPTIONS (Multilingual)
# =============================================================================
SERVICE wikibase:label {
bd:serviceParam wikibase:language "{LANGUAGES}" .
?item rdfs:label ?itemLabel .
?item schema:description ?itemDescription .
?item skos:altLabel ?itemAltLabel .
?instType rdfs:label ?instTypeLabel .
?city rdfs:label ?cityLabel .
?region rdfs:label ?regionLabel .
?parent rdfs:label ?parentLabel .
?partOf rdfs:label ?partOfLabel .
}
}
ORDER BY ?itemLabel
LIMIT 1000
Institution Type Mappings
Wikidata QIDs for each GLAMORCUBEPSXHF category:
| GLAMORCUBEPSXHF Type | Code | Wikidata QID(s) | Wikidata Label | Notes |
|---|---|---|---|---|
| GALLERY | G | Q2668072 | art gallery | Commercial and exhibition galleries |
| LIBRARY | L | Q7075 | library | Public, academic, specialized libraries |
| ARCHIVE | A | Q166118 | archive | Government, corporate, community archives |
| MUSEUM | M | Q33506 | museum | Art, history, science, natural history museums |
| OFFICIAL_INSTITUTION | O | Q294163, Q5341295 | public institution, government agency | Official heritage agencies |
| RESEARCH_CENTER | R | Q3152824, Q31855 | research center, research institute | Heritage research institutes |
| CORPORATION | C | Q4830453 | business archive, Q783794 (company) | Corporate heritage collections |
| UNIVERSITY | U | Q3918 | university | Universities with heritage collections |
| BOTANICAL_ZOO | B | Q167346, Q43229 | botanical garden, zoo | Natural heritage sites |
| EDUCATION_PROVIDER | E | Q2385804, Q3914 | educational institution, school | Schools with heritage collections |
| PERSONAL_COLLECTION | P | Q23058176 | private collection | Personal collections (rare in Wikidata) |
| COLLECTING_SOCIETY | S | Q43229 | organization | Numismatic, philatelic, historical societies |
| HOLY_SITES | H | Q16970, Q44539, Q32815, Q34627, Q162378 | church, temple, mosque, synagogue, sanctuary | Religious heritage sites |
| FEATURES | F | Q4989906, Q860861, Q5003624, Q5003551 | monument, sculpture, statue, memorial | Physical landmarks with heritage significance |
| MIXED | X | (Multiple types) | - | Institutions with multiple types |
Refined Query by Institution Type
For targeted extraction, query each type separately:
Museums Only:
?item wdt:P31/wdt:P279? wd:Q33506 . # museum (including subtypes)
Libraries Only:
?item wdt:P31/wdt:P279? wd:Q7075 . # library (public, academic, national)
Archives Only:
?item wdt:P31/wdt:P279? wd:Q166118 . # archive
Holy Sites with Collections:
VALUES ?holyType { wd:Q16970 wd:Q44539 wd:Q32815 wd:Q34627 wd:Q162378 }
?item wdt:P31 ?holyType .
# Add collection-related properties to filter heritage-managing sites
OPTIONAL { ?item wdt:P1301 ?collectionSize . }
FILTER(BOUND(?collectionSize) || EXISTS { ?item wdt:P195 ?collection . })
Country Codes with Wikidata QIDs
Complete list of 250+ countries/territories with ISO 3166-1 alpha-2 codes and Wikidata QIDs for use in SPARQL queries.
Africa (54 countries)
| Country | Code | Wikidata QID | Flag |
|---|---|---|---|
| Algeria | DZ | Q262 | 🇩🇿 |
| Angola | AO | Q916 | 🇦🇴 |
| Benin | BJ | Q962 | 🇧🇯 |
| Botswana | BW | Q963 | 🇧🇼 |
| Burkina Faso | BF | Q965 | 🇧🇫 |
| Burundi | BI | Q967 | 🇧🇮 |
| Cameroon | CM | Q1009 | 🇨🇲 |
| Cape Verde | CV | Q1011 | 🇨🇻 |
| Central African Republic | CF | Q929 | 🇨🇫 |
| Chad | TD | Q657 | 🇹🇩 |
| Comoros | KM | Q970 | 🇰🇲 |
| Congo (Brazzaville) | CG | Q971 | 🇨🇬 |
| Congo (Kinshasa) | CD | Q974 | 🇨🇩 |
| Djibouti | DJ | Q977 | 🇩🇯 |
| Egypt | EG | Q79 | 🇪🇬 |
| Equatorial Guinea | GQ | Q983 | 🇬🇶 |
| Eritrea | ER | Q986 | 🇪🇷 |
| Eswatini (Swaziland) | SZ | Q1050 | 🇸🇿 |
| Ethiopia | ET | Q115 | 🇪🇹 |
| Gabon | GA | Q1000 | 🇬🇦 |
| Gambia | GM | Q1005 | 🇬🇲 |
| Ghana | GH | Q117 | 🇬🇭 |
| Guinea | GN | Q1006 | 🇬🇳 |
| Guinea-Bissau | GW | Q1007 | 🇬🇼 |
| Ivory Coast | CI | Q1008 | 🇨🇮 |
| Kenya | KE | Q114 | 🇰🇪 |
| Lesotho | LS | Q1013 | 🇱🇸 |
| Liberia | LR | Q1014 | 🇱🇷 |
| Libya | LY | Q1016 | 🇱🇾 |
| Madagascar | MG | Q1019 | 🇲🇬 |
| Malawi | MW | Q1020 | 🇲🇼 |
| Mali | ML | Q912 | 🇲🇱 |
| Mauritania | MR | Q1025 | 🇲🇷 |
| Mauritius | MU | Q1027 | 🇲🇺 |
| Morocco | MA | Q1028 | 🇲🇦 |
| Mozambique | MZ | Q1029 | 🇲🇿 |
| Namibia | NA | Q1030 | 🇳🇦 |
| Niger | NE | Q1032 | 🇳🇪 |
| Nigeria | NG | Q1033 | 🇳🇬 |
| Rwanda | RW | Q1037 | 🇷🇼 |
| São Tomé and Príncipe | ST | Q1039 | 🇸🇹 |
| Senegal | SN | Q1041 | 🇸🇳 |
| Seychelles | SC | Q1042 | 🇸🇨 |
| Sierra Leone | SL | Q1044 | 🇸🇱 |
| Somalia | SO | Q1045 | 🇸🇴 |
| South Africa | ZA | Q258 | 🇿🇦 |
| South Sudan | SS | Q958 | 🇸🇸 |
| Sudan | SD | Q1049 | 🇸🇩 |
| Tanzania | TZ | Q924 | 🇹🇿 |
| Togo | TG | Q945 | 🇹🇬 |
| Tunisia | TN | Q948 | 🇹🇳 |
| Uganda | UG | Q1036 | 🇺🇬 |
| Zambia | ZM | Q953 | 🇿🇲 |
| Zimbabwe | ZW | Q954 | 🇿🇼 |
Americas (35 countries)
| Country | Code | Wikidata QID | Flag |
|---|---|---|---|
| Antigua and Barbuda | AG | Q781 | 🇦🇬 |
| Argentina | AR | Q414 | 🇦🇷 |
| Bahamas | BS | Q778 | 🇧🇸 |
| Barbados | BB | Q244 | 🇧🇧 |
| Belize | BZ | Q242 | 🇧🇿 |
| Bolivia | BO | Q750 | 🇧🇴 |
| Brazil | BR | Q155 | 🇧🇷 |
| Canada | CA | Q16 | 🇨🇦 |
| Chile | CL | Q298 | 🇨🇱 |
| Colombia | CO | Q739 | 🇨🇴 |
| Costa Rica | CR | Q800 | 🇨🇷 |
| Cuba | CU | Q241 | 🇨🇺 |
| Dominica | DM | Q784 | 🇩🇲 |
| Dominican Republic | DO | Q786 | 🇩🇴 |
| Ecuador | EC | Q736 | 🇪🇨 |
| El Salvador | SV | Q792 | 🇸🇻 |
| Grenada | GD | Q769 | 🇬🇩 |
| Guatemala | GT | Q774 | 🇬🇹 |
| Guyana | GY | Q734 | 🇬🇾 |
| Haiti | HT | Q790 | 🇭🇹 |
| Honduras | HN | Q783 | 🇭🇳 |
| Jamaica | JM | Q766 | 🇯🇲 |
| Mexico | MX | Q96 | 🇲🇽 |
| Nicaragua | NI | Q811 | 🇳🇮 |
| Panama | PA | Q804 | 🇵🇦 |
| Paraguay | PY | Q733 | 🇵🇾 |
| Peru | PE | Q419 | 🇵🇪 |
| Saint Kitts and Nevis | KN | Q763 | 🇰🇳 |
| Saint Lucia | LC | Q760 | 🇱🇨 |
| Saint Vincent and the Grenadines | VC | Q757 | 🇻🇨 |
| Suriname | SR | Q730 | 🇸🇷 |
| Trinidad and Tobago | TT | Q754 | 🇹🇹 |
| United States | US | Q30 | 🇺🇸 |
| Uruguay | UY | Q77 | 🇺🇾 |
| Venezuela | VE | Q717 | 🇻🇪 |
Asia (50 countries/territories)
| Country | Code | Wikidata QID | Flag |
|---|---|---|---|
| Afghanistan | AF | Q889 | 🇦🇫 |
| Armenia | AM | Q399 | 🇦🇲 |
| Azerbaijan | AZ | Q227 | 🇦🇿 |
| Bahrain | BH | Q398 | 🇧🇭 |
| Bangladesh | BD | Q902 | 🇧🇩 |
| Bhutan | BT | Q917 | 🇧🇹 |
| Brunei | BN | Q921 | 🇧🇳 |
| Cambodia | KH | Q424 | 🇰🇭 |
| China | CN | Q148 | 🇨🇳 |
| Cyprus | CY | Q229 | 🇨🇾 |
| East Timor (Timor-Leste) | TL | Q574 | 🇹🇱 |
| Georgia | GE | Q230 | 🇬🇪 |
| India | IN | Q668 | 🇮🇳 |
| Indonesia | ID | Q252 | 🇮🇩 |
| Iran | IR | Q794 | 🇮🇷 |
| Iraq | IQ | Q796 | 🇮🇶 |
| Israel | IL | Q801 | 🇮🇱 |
| Japan | JP | Q17 | 🇯🇵 |
| Jordan | JO | Q810 | 🇯🇴 |
| Kazakhstan | KZ | Q232 | 🇰🇿 |
| Kuwait | KW | Q817 | 🇰🇼 |
| Kyrgyzstan | KG | Q813 | 🇰🇬 |
| Laos | LA | Q819 | 🇱🇦 |
| Lebanon | LB | Q822 | 🇱🇧 |
| Malaysia | MY | Q833 | 🇲🇾 |
| Maldives | MV | Q826 | 🇲🇻 |
| Mongolia | MN | Q711 | 🇲🇳 |
| Myanmar (Burma) | MM | Q836 | 🇲🇲 |
| Nepal | NP | Q837 | 🇳🇵 |
| North Korea | KP | Q423 | 🇰🇵 |
| Oman | OM | Q842 | 🇴🇲 |
| Pakistan | PK | Q843 | 🇵🇰 |
| Palestine | PS | Q219060 | 🇵🇸 |
| Philippines | PH | Q928 | 🇵🇭 |
| Qatar | QA | Q846 | 🇶🇦 |
| Saudi Arabia | SA | Q851 | 🇸🇦 |
| Singapore | SG | Q334 | 🇸🇬 |
| South Korea | KR | Q884 | 🇰🇷 |
| Sri Lanka | LK | Q854 | 🇱🇰 |
| Syria | SY | Q858 | 🇸🇾 |
| Taiwan | TW | Q865 | 🇹🇼 |
| Tajikistan | TJ | Q863 | 🇹🇯 |
| Thailand | TH | Q869 | 🇹🇭 |
| Turkey | TR | Q43 | 🇹🇷 |
| Turkmenistan | TM | Q874 | 🇹🇲 |
| United Arab Emirates | AE | Q878 | 🇦🇪 |
| Uzbekistan | UZ | Q265 | 🇺🇿 |
| Vietnam | VN | Q881 | 🇻🇳 |
| Yemen | YE | Q805 | 🇾🇪 |
Europe (50 countries/territories)
| Country | Code | Wikidata QID | Flag |
|---|---|---|---|
| Albania | AL | Q222 | 🇦🇱 |
| Andorra | AD | Q228 | 🇦🇩 |
| Austria | AT | Q40 | 🇦🇹 |
| Belarus | BY | Q184 | 🇧🇾 |
| Belgium | BE | Q31 | 🇧🇪 |
| Bosnia and Herzegovina | BA | Q225 | 🇧🇦 |
| Bulgaria | BG | Q219 | 🇧🇬 |
| Croatia | HR | Q224 | 🇭🇷 |
| Czech Republic | CZ | Q213 | 🇨🇿 |
| Denmark | DK | Q35 | 🇩🇰 |
| Estonia | EE | Q191 | 🇪🇪 |
| Finland | FI | Q33 | 🇫🇮 |
| France | FR | Q142 | 🇫🇷 |
| Germany | DE | Q183 | 🇩🇪 |
| Greece | GR | Q41 | 🇬🇷 |
| Hungary | HU | Q28 | 🇭🇺 |
| Iceland | IS | Q189 | 🇮🇸 |
| Ireland | IE | Q27 | 🇮🇪 |
| Italy | IT | Q38 | 🇮🇹 |
| Kosovo | XK | Q1246 | 🇽🇰 |
| Latvia | LV | Q211 | 🇱🇻 |
| Liechtenstein | LI | Q347 | 🇱🇮 |
| Lithuania | LT | Q37 | 🇱🇹 |
| Luxembourg | LU | Q32 | 🇱🇺 |
| Malta | MT | Q233 | 🇲🇹 |
| Moldova | MD | Q217 | 🇲🇩 |
| Monaco | MC | Q235 | 🇲🇨 |
| Montenegro | ME | Q236 | 🇲🇪 |
| Netherlands | NL | Q55 | 🇳🇱 |
| North Macedonia | MK | Q221 | 🇲🇰 |
| Norway | NO | Q20 | 🇳🇴 |
| Poland | PL | Q36 | 🇵🇱 |
| Portugal | PT | Q45 | 🇵🇹 |
| Romania | RO | Q218 | 🇷🇴 |
| Russia | RU | Q159 | 🇷🇺 |
| San Marino | SM | Q238 | 🇸🇲 |
| Serbia | RS | Q403 | 🇷🇸 |
| Slovakia | SK | Q214 | 🇸🇰 |
| Slovenia | SI | Q215 | 🇸🇮 |
| Spain | ES | Q29 | 🇪🇸 |
| Sweden | SE | Q34 | 🇸🇪 |
| Switzerland | CH | Q39 | 🇨🇭 |
| Ukraine | UA | Q212 | 🇺🇦 |
| United Kingdom | GB | Q145 | 🇬🇧 |
| Vatican City | VA | Q237 | 🇻🇦 |
Oceania (16 countries/territories)
| Country | Code | Wikidata QID | Flag |
|---|---|---|---|
| Australia | AU | Q408 | 🇦🇺 |
| Fiji | FJ | Q712 | 🇫🇯 |
| Kiribati | KI | Q710 | 🇰🇮 |
| Marshall Islands | MH | Q709 | 🇲🇭 |
| Micronesia | FM | Q702 | 🇫🇲 |
| Nauru | NR | Q697 | 🇳🇷 |
| New Zealand | NZ | Q664 | 🇳🇿 |
| Palau | PW | Q695 | 🇵🇼 |
| Papua New Guinea | PG | Q691 | 🇵🇬 |
| Samoa | WS | Q683 | 🇼🇸 |
| Solomon Islands | SB | Q685 | 🇸🇧 |
| Tonga | TO | Q678 | 🇹🇴 |
| Tuvalu | TV | Q672 | 🇹🇻 |
| Vanuatu | VU | Q686 | 🇻🇺 |
Total: 205+ countries and territories
Usage Examples
Example 1: Extract All Dutch Museums
SELECT DISTINCT ?item ?itemLabel ?itemDescription ?coords ?isil ?viaf ?website
WHERE {
?item wdt:P31/wdt:P279? wd:Q33506 . # museum
?item wdt:P17 wd:Q55 . # Netherlands
OPTIONAL { ?item wdt:P625 ?coords . }
OPTIONAL { ?item wdt:P791 ?isil . }
OPTIONAL { ?item wdt:P214 ?viaf . }
OPTIONAL { ?item wdt:P856 ?website . }
SERVICE wikibase:label {
bd:serviceParam wikibase:language "nl,en" .
}
}
LIMIT 1000
Example 2: Extract Brazilian Libraries with Collections
SELECT DISTINCT ?item ?itemLabel ?city ?cityLabel ?collectionSize ?website
WHERE {
?item wdt:P31/wdt:P279? wd:Q7075 . # library
?item wdt:P17 wd:Q155 . # Brazil
OPTIONAL {
?item wdt:P131 ?city .
?city wdt:P31/wdt:P279? wd:Q515 . # city
}
OPTIONAL { ?item wdt:P1301 ?collectionSize . } # collection size
OPTIONAL { ?item wdt:P856 ?website . }
SERVICE wikibase:label {
bd:serviceParam wikibase:language "pt,en" .
}
}
ORDER BY DESC(?collectionSize)
LIMIT 500
Example 3: Extract Japanese Archives with ISIL Codes
SELECT DISTINCT ?item ?itemLabel ?isil ?city ?cityLabel ?inception
WHERE {
?item wdt:P31/wdt:P279? wd:Q166118 . # archive
?item wdt:P17 wd:Q17 . # Japan
?item wdt:P791 ?isil . # MUST have ISIL code
OPTIONAL {
?item wdt:P131 ?city .
}
OPTIONAL { ?item wdt:P571 ?inception . }
SERVICE wikibase:label {
bd:serviceParam wikibase:language "ja,en" .
}
}
ORDER BY ?isil
LIMIT 1000
Example 4: Extract All Heritage Institutions in Tunisia
SELECT DISTINCT ?item ?itemLabel ?itemDescription ?instTypeLabel ?coords ?website
WHERE {
VALUES ?instTypeClass {
wd:Q33506 # museum
wd:Q7075 # library
wd:Q166118 # archive
wd:Q2668072 # art gallery
wd:Q5282129 # cultural center
}
?item wdt:P31/wdt:P279? ?instTypeClass .
?item wdt:P17 wd:Q948 . # Tunisia
?item wdt:P31 ?instType .
OPTIONAL { ?item wdt:P625 ?coords . }
OPTIONAL { ?item wdt:P856 ?website . }
SERVICE wikibase:label {
bd:serviceParam wikibase:language "ar,fr,en" .
}
}
LIMIT 1000
Query Optimization Strategies
1. Avoid Transitive Subclass Queries
❌ Slow (causes 504 timeout):
?item wdt:P31/wdt:P279* wd:Q33506 . # AVOID wdt:P279* (unbounded)
✅ Fast (limited depth):
?item wdt:P31/wdt:P279? wd:Q33506 . # Use wdt:P279? (0 or 1 hop)
2. Query Institution Types Separately
Instead of one large query with all types, run separate queries for each type:
institution_types = ["Q33506", "Q7075", "Q166118", "Q2668072"]
for inst_type in institution_types:
query = f"""
SELECT DISTINCT ?item ?itemLabel ...
WHERE {{
?item wdt:P31/wdt:P279? wd:{inst_type} .
?item wdt:P17 wd:{{COUNTRY_QID}} .
...
}}
LIMIT 1000
"""
results = execute_sparql(query)
3. Use LIMIT and OFFSET for Pagination
For countries with many institutions (e.g., France, Germany, UK):
SELECT DISTINCT ?item ?itemLabel ...
WHERE {
...
}
ORDER BY ?itemLabel
LIMIT 1000
OFFSET 0 # Change to 1000, 2000, 3000, etc. for pagination
4. Specify Language Priorities
SERVICE wikibase:label {
bd:serviceParam wikibase:language "nl,en" . # Dutch preferred, English fallback
}
Language codes by country:
- Netherlands:
"nl,en" - Brazil:
"pt,en" - Japan:
"ja,en" - France:
"fr,en" - Germany:
"de,en" - China:
"zh,en" - Russia:
"ru,en" - Arabic countries:
"ar,en"or"ar,fr,en"
5. Rate Limiting
Wikidata SPARQL endpoint has rate limits. Best practices:
- Add delays between queries: Wait 2-5 seconds between requests
- Use custom User-Agent:
"GLAM-Extractor/0.2.0 (Wikidata Global Extraction)" - Cache results: Store results in
data/wikidata/{country}/{timestamp}.json - Batch by country: Process countries in priority order (see
enrich_institutions_wikidata_sparql.py)
6. Error Handling
import time
from SPARQLWrapper import SPARQLWrapper, SPARQLExceptions
def execute_sparql_with_retry(sparql, query, max_retries=3):
for attempt in range(max_retries):
try:
sparql.setQuery(query)
return sparql.query().convert()
except SPARQLExceptions.EndPointInternalError:
# 504 timeout - query too complex
print(f"Timeout on attempt {attempt+1}, simplifying query...")
time.sleep(5)
except Exception as e:
print(f"Error: {e}")
if attempt < max_retries - 1:
time.sleep(2 ** attempt) # Exponential backoff
else:
raise
return None
Storage Format
Save results to data/wikidata/{country}/{timestamp}.json:
{
"country_code": "NL",
"country_name": "Netherlands",
"country_qid": "Q55",
"extraction_date": "2025-11-11T10:30:00Z",
"total_institutions": 1247,
"institution_types": {
"museum": 843,
"library": 302,
"archive": 78,
"gallery": 24
},
"institutions": [
{
"wikidata_qid": "Q190804",
"name": "Rijksmuseum",
"description": "National museum of the Netherlands",
"institution_type": "museum",
"coordinates": {
"latitude": 52.36,
"longitude": 4.885
},
"identifiers": {
"ISIL": "NL-AsdRM",
"VIAF": "131375374",
"website": "https://www.rijksmuseum.nl"
},
"location": {
"city": "Amsterdam",
"region": "North Holland",
"country": "Netherlands"
},
"founding_date": "1800-01-01",
"collection_size": 1000000
}
]
}
Next Steps
-
Create Python script (
scripts/extract_global_wikidata.py) that:- Loads country list from this document
- Executes generic SPARQL query for each country
- Saves results to
data/wikidata/{country}/{timestamp}.json - Tracks progress and errors
-
Prioritize countries by:
- Data quality (ISIL registry availability)
- Institution count (larger datasets first)
- Strategic importance (Netherlands, France, Germany, UK, US, Brazil, Japan)
-
Convert to LinkML instances:
- Parse JSON results
- Map to
HeritageCustodianschema - Generate GHCIDs
- Add provenance metadata (data_source: WIKIDATA, data_tier: TIER_3_CROWD_SOURCED)
Version: 1.0
Last Updated: 2025-11-11
Maintained By: GLAM Data Extraction Project