# Wikidata SPARQL Queries for Global GLAMORCUBEPSXHF Extraction This document provides comprehensive SPARQL queries to extract ALL GLAMORCUBEPSXHF (Galleries, Libraries, Archives, Museums, Official institutions, Research centers, Corporations, Universities, Botanical gardens/zoos, Educational providers, Personal collections, Societies, Holy sites, Features) institutions worldwide from Wikidata. ## Table of Contents 1. [Generic SPARQL Query Template](#generic-sparql-query-template) 2. [Institution Type Mappings](#institution-type-mappings) 3. [Country Codes with Wikidata QIDs](#country-codes-with-wikidata-qids) 4. [Usage Examples](#usage-examples) 5. [Query Optimization Strategies](#query-optimization-strategies) --- ## Generic SPARQL Query Template This generic query extracts ALL metadata for heritage institutions in a specific country. Replace `{COUNTRY_QID}` with the country's Wikidata QID from the table below. ```sparql # Generic GLAMORCUBEPSXHF Institution Query # Replace {COUNTRY_QID} with actual country QID (e.g., Q55 for Netherlands) # Replace {LANGUAGES} with comma-separated language codes (e.g., "nl,en" for Dutch/English) SELECT DISTINCT ?item ?itemLabel ?itemDescription ?itemAltLabel ?instType ?instTypeLabel ?coords ?latitude ?longitude ?streetAddress ?postalCode ?city ?cityLabel ?region ?regionLabel ?isil ?viaf ?wikidataQID ?website ?email ?phone ?inception ?dissolved ?foundingDate ?parent ?parentLabel ?partOf ?partOfLabel ?collectionSize ?collectionType ?image ?logo WHERE { # ============================================================================= # INSTITUTION TYPE FILTER (Museums, Libraries, Archives, Galleries, etc.) # ============================================================================= VALUES ?instTypeClass { wd:Q33506 # museum wd:Q7075 # library wd:Q166118 # archive wd:Q2668072 # art gallery wd:Q5282129 # cultural center wd:Q3152824 # research center wd:Q294163 # public institution (official institutions) wd:Q3918 # university wd:Q167346 # botanical garden wd:Q43229 # organization (for societies, corporations) wd:Q16970 # church building (holy sites) wd:Q44539 # temple (holy sites) wd:Q32815 # mosque (holy sites) wd:Q34627 # synagogue (holy sites) wd:Q162378 # sanctuary (holy sites) } # Instance of (or subclass of) heritage institution type ?item wdt:P31/wdt:P279? ?instTypeClass . # ============================================================================= # COUNTRY FILTER # ============================================================================= ?item wdt:P17 wd:{COUNTRY_QID} . # country # ============================================================================= # CAPTURE INSTITUTION TYPE # ============================================================================= ?item wdt:P31 ?instType . # ============================================================================= # GEOGRAPHIC DATA # ============================================================================= # Coordinates (lat/lon) OPTIONAL { ?item wdt:P625 ?coords . BIND(geoLatitude(?coords) AS ?latitude) BIND(geoLongitude(?coords) AS ?longitude) } # Physical address components OPTIONAL { ?item wdt:P6375 ?streetAddress . } # street address OPTIONAL { ?item wdt:P281 ?postalCode . } # postal code # City/municipality OPTIONAL { ?item wdt:P131 ?city . # located in administrative territory ?city wdt:P31/wdt:P279? wd:Q515 . # city } # Region/province/state OPTIONAL { ?item wdt:P131 ?region . ?region wdt:P31/wdt:P279? wd:Q10864048 . # first-level admin division } # ============================================================================= # IDENTIFIERS # ============================================================================= OPTIONAL { ?item wdt:P791 ?isil . } # ISIL code OPTIONAL { ?item wdt:P214 ?viaf . } # VIAF ID OPTIONAL { ?item wdt:P856 ?website . } # official website OPTIONAL { ?item wdt:P968 ?email . } # email address OPTIONAL { ?item wdt:P1329 ?phone . } # phone number # Extract Wikidata QID from URI BIND(STRAFTER(STR(?item), "http://www.wikidata.org/entity/") AS ?wikidataQID) # ============================================================================= # TEMPORAL DATA # ============================================================================= OPTIONAL { ?item wdt:P571 ?inception . } # inception date OPTIONAL { ?item wdt:P576 ?dissolved . } # dissolved/abolished date OPTIONAL { ?item wdt:P1619 ?foundingDate . } # official founding date # ============================================================================= # ORGANIZATIONAL RELATIONSHIPS # ============================================================================= OPTIONAL { ?item wdt:P749 ?parent . } # parent organization OPTIONAL { ?item wdt:P361 ?partOf . } # part of (larger entity) # ============================================================================= # COLLECTION METADATA # ============================================================================= OPTIONAL { ?item wdt:P1301 ?collectionSize . } # number of items in collection OPTIONAL { ?item wdt:P195 ?collectionType . } # collection type # ============================================================================= # MEDIA # ============================================================================= OPTIONAL { ?item wdt:P18 ?image . } # image OPTIONAL { ?item wdt:P154 ?logo . } # logo # ============================================================================= # LABELS AND DESCRIPTIONS (Multilingual) # ============================================================================= SERVICE wikibase:label { bd:serviceParam wikibase:language "{LANGUAGES}" . ?item rdfs:label ?itemLabel . ?item schema:description ?itemDescription . ?item skos:altLabel ?itemAltLabel . ?instType rdfs:label ?instTypeLabel . ?city rdfs:label ?cityLabel . ?region rdfs:label ?regionLabel . ?parent rdfs:label ?parentLabel . ?partOf rdfs:label ?partOfLabel . } } ORDER BY ?itemLabel LIMIT 1000 ``` --- ## Institution Type Mappings Wikidata QIDs for each GLAMORCUBEPSXHF category: | GLAMORCUBEPSXHF Type | Code | Wikidata QID(s) | Wikidata Label | Notes | |---------------------|------|-----------------|----------------|-------| | **GALLERY** | G | Q2668072 | art gallery | Commercial and exhibition galleries | | **LIBRARY** | L | Q7075 | library | Public, academic, specialized libraries | | **ARCHIVE** | A | Q166118 | archive | Government, corporate, community archives | | **MUSEUM** | M | Q33506 | museum | Art, history, science, natural history museums | | **OFFICIAL_INSTITUTION** | O | Q294163, Q5341295 | public institution, government agency | Official heritage agencies | | **RESEARCH_CENTER** | R | Q3152824, Q31855 | research center, research institute | Heritage research institutes | | **CORPORATION** | C | Q4830453 | business archive, Q783794 (company) | Corporate heritage collections | | **UNIVERSITY** | U | Q3918 | university | Universities with heritage collections | | **BOTANICAL_ZOO** | B | Q167346, Q43229 | botanical garden, zoo | Natural heritage sites | | **EDUCATION_PROVIDER** | E | Q2385804, Q3914 | educational institution, school | Schools with heritage collections | | **PERSONAL_COLLECTION** | P | Q23058176 | private collection | Personal collections (rare in Wikidata) | | **COLLECTING_SOCIETY** | S | Q43229 | organization | Numismatic, philatelic, historical societies | | **HOLY_SITES** | H | Q16970, Q44539, Q32815, Q34627, Q162378 | church, temple, mosque, synagogue, sanctuary | Religious heritage sites | | **FEATURES** | F | Q4989906, Q860861, Q5003624, Q5003551 | monument, sculpture, statue, memorial | Physical landmarks with heritage significance | | **MIXED** | X | (Multiple types) | - | Institutions with multiple types | ### Refined Query by Institution Type For targeted extraction, query each type separately: **Museums Only:** ```sparql ?item wdt:P31/wdt:P279? wd:Q33506 . # museum (including subtypes) ``` **Libraries Only:** ```sparql ?item wdt:P31/wdt:P279? wd:Q7075 . # library (public, academic, national) ``` **Archives Only:** ```sparql ?item wdt:P31/wdt:P279? wd:Q166118 . # archive ``` **Holy Sites with Collections:** ```sparql VALUES ?holyType { wd:Q16970 wd:Q44539 wd:Q32815 wd:Q34627 wd:Q162378 } ?item wdt:P31 ?holyType . # Add collection-related properties to filter heritage-managing sites OPTIONAL { ?item wdt:P1301 ?collectionSize . } FILTER(BOUND(?collectionSize) || EXISTS { ?item wdt:P195 ?collection . }) ``` --- ## Country Codes with Wikidata QIDs Complete list of 250+ countries/territories with ISO 3166-1 alpha-2 codes and Wikidata QIDs for use in SPARQL queries. ### Africa (54 countries) | Country | Code | Wikidata QID | Flag | |---------|------|--------------|------| | Algeria | DZ | Q262 | 🇩🇿 | | Angola | AO | Q916 | 🇦🇴 | | Benin | BJ | Q962 | 🇧🇯 | | Botswana | BW | Q963 | 🇧🇼 | | Burkina Faso | BF | Q965 | 🇧🇫 | | Burundi | BI | Q967 | 🇧🇮 | | Cameroon | CM | Q1009 | 🇨🇲 | | Cape Verde | CV | Q1011 | 🇨🇻 | | Central African Republic | CF | Q929 | 🇨🇫 | | Chad | TD | Q657 | 🇹🇩 | | Comoros | KM | Q970 | 🇰🇲 | | Congo (Brazzaville) | CG | Q971 | 🇨🇬 | | Congo (Kinshasa) | CD | Q974 | 🇨🇩 | | Djibouti | DJ | Q977 | 🇩🇯 | | Egypt | EG | Q79 | 🇪🇬 | | Equatorial Guinea | GQ | Q983 | 🇬🇶 | | Eritrea | ER | Q986 | 🇪🇷 | | Eswatini (Swaziland) | SZ | Q1050 | 🇸🇿 | | Ethiopia | ET | Q115 | 🇪🇹 | | Gabon | GA | Q1000 | 🇬🇦 | | Gambia | GM | Q1005 | 🇬🇲 | | Ghana | GH | Q117 | 🇬🇭 | | Guinea | GN | Q1006 | 🇬🇳 | | Guinea-Bissau | GW | Q1007 | 🇬🇼 | | Ivory Coast | CI | Q1008 | 🇨🇮 | | Kenya | KE | Q114 | 🇰🇪 | | Lesotho | LS | Q1013 | 🇱🇸 | | Liberia | LR | Q1014 | 🇱🇷 | | Libya | LY | Q1016 | 🇱🇾 | | Madagascar | MG | Q1019 | 🇲🇬 | | Malawi | MW | Q1020 | 🇲🇼 | | Mali | ML | Q912 | 🇲🇱 | | Mauritania | MR | Q1025 | 🇲🇷 | | Mauritius | MU | Q1027 | 🇲🇺 | | Morocco | MA | Q1028 | 🇲🇦 | | Mozambique | MZ | Q1029 | 🇲🇿 | | Namibia | NA | Q1030 | 🇳🇦 | | Niger | NE | Q1032 | 🇳🇪 | | Nigeria | NG | Q1033 | 🇳🇬 | | Rwanda | RW | Q1037 | 🇷🇼 | | São Tomé and Príncipe | ST | Q1039 | 🇸🇹 | | Senegal | SN | Q1041 | 🇸🇳 | | Seychelles | SC | Q1042 | 🇸🇨 | | Sierra Leone | SL | Q1044 | 🇸🇱 | | Somalia | SO | Q1045 | 🇸🇴 | | South Africa | ZA | Q258 | 🇿🇦 | | South Sudan | SS | Q958 | 🇸🇸 | | Sudan | SD | Q1049 | 🇸🇩 | | Tanzania | TZ | Q924 | 🇹🇿 | | Togo | TG | Q945 | 🇹🇬 | | Tunisia | TN | Q948 | 🇹🇳 | | Uganda | UG | Q1036 | 🇺🇬 | | Zambia | ZM | Q953 | 🇿🇲 | | Zimbabwe | ZW | Q954 | 🇿🇼 | ### Americas (35 countries) | Country | Code | Wikidata QID | Flag | |---------|------|--------------|------| | Antigua and Barbuda | AG | Q781 | 🇦🇬 | | Argentina | AR | Q414 | 🇦🇷 | | Bahamas | BS | Q778 | 🇧🇸 | | Barbados | BB | Q244 | 🇧🇧 | | Belize | BZ | Q242 | 🇧🇿 | | Bolivia | BO | Q750 | 🇧🇴 | | Brazil | BR | Q155 | 🇧🇷 | | Canada | CA | Q16 | 🇨🇦 | | Chile | CL | Q298 | 🇨🇱 | | Colombia | CO | Q739 | 🇨🇴 | | Costa Rica | CR | Q800 | 🇨🇷 | | Cuba | CU | Q241 | 🇨🇺 | | Dominica | DM | Q784 | 🇩🇲 | | Dominican Republic | DO | Q786 | 🇩🇴 | | Ecuador | EC | Q736 | 🇪🇨 | | El Salvador | SV | Q792 | 🇸🇻 | | Grenada | GD | Q769 | 🇬🇩 | | Guatemala | GT | Q774 | 🇬🇹 | | Guyana | GY | Q734 | 🇬🇾 | | Haiti | HT | Q790 | 🇭🇹 | | Honduras | HN | Q783 | 🇭🇳 | | Jamaica | JM | Q766 | 🇯🇲 | | Mexico | MX | Q96 | 🇲🇽 | | Nicaragua | NI | Q811 | 🇳🇮 | | Panama | PA | Q804 | 🇵🇦 | | Paraguay | PY | Q733 | 🇵🇾 | | Peru | PE | Q419 | 🇵🇪 | | Saint Kitts and Nevis | KN | Q763 | 🇰🇳 | | Saint Lucia | LC | Q760 | 🇱🇨 | | Saint Vincent and the Grenadines | VC | Q757 | 🇻🇨 | | Suriname | SR | Q730 | 🇸🇷 | | Trinidad and Tobago | TT | Q754 | 🇹🇹 | | United States | US | Q30 | 🇺🇸 | | Uruguay | UY | Q77 | 🇺🇾 | | Venezuela | VE | Q717 | 🇻🇪 | ### Asia (50 countries/territories) | Country | Code | Wikidata QID | Flag | |---------|------|--------------|------| | Afghanistan | AF | Q889 | 🇦🇫 | | Armenia | AM | Q399 | 🇦🇲 | | Azerbaijan | AZ | Q227 | 🇦🇿 | | Bahrain | BH | Q398 | 🇧🇭 | | Bangladesh | BD | Q902 | 🇧🇩 | | Bhutan | BT | Q917 | 🇧🇹 | | Brunei | BN | Q921 | 🇧🇳 | | Cambodia | KH | Q424 | 🇰🇭 | | China | CN | Q148 | 🇨🇳 | | Cyprus | CY | Q229 | 🇨🇾 | | East Timor (Timor-Leste) | TL | Q574 | 🇹🇱 | | Georgia | GE | Q230 | 🇬🇪 | | India | IN | Q668 | 🇮🇳 | | Indonesia | ID | Q252 | 🇮🇩 | | Iran | IR | Q794 | 🇮🇷 | | Iraq | IQ | Q796 | 🇮🇶 | | Israel | IL | Q801 | 🇮🇱 | | Japan | JP | Q17 | 🇯🇵 | | Jordan | JO | Q810 | 🇯🇴 | | Kazakhstan | KZ | Q232 | 🇰🇿 | | Kuwait | KW | Q817 | 🇰🇼 | | Kyrgyzstan | KG | Q813 | 🇰🇬 | | Laos | LA | Q819 | 🇱🇦 | | Lebanon | LB | Q822 | 🇱🇧 | | Malaysia | MY | Q833 | 🇲🇾 | | Maldives | MV | Q826 | 🇲🇻 | | Mongolia | MN | Q711 | 🇲🇳 | | Myanmar (Burma) | MM | Q836 | 🇲🇲 | | Nepal | NP | Q837 | 🇳🇵 | | North Korea | KP | Q423 | 🇰🇵 | | Oman | OM | Q842 | 🇴🇲 | | Pakistan | PK | Q843 | 🇵🇰 | | Palestine | PS | Q219060 | 🇵🇸 | | Philippines | PH | Q928 | 🇵🇭 | | Qatar | QA | Q846 | 🇶🇦 | | Saudi Arabia | SA | Q851 | 🇸🇦 | | Singapore | SG | Q334 | 🇸🇬 | | South Korea | KR | Q884 | 🇰🇷 | | Sri Lanka | LK | Q854 | 🇱🇰 | | Syria | SY | Q858 | 🇸🇾 | | Taiwan | TW | Q865 | 🇹🇼 | | Tajikistan | TJ | Q863 | 🇹🇯 | | Thailand | TH | Q869 | 🇹🇭 | | Turkey | TR | Q43 | 🇹🇷 | | Turkmenistan | TM | Q874 | 🇹🇲 | | United Arab Emirates | AE | Q878 | 🇦🇪 | | Uzbekistan | UZ | Q265 | 🇺🇿 | | Vietnam | VN | Q881 | 🇻🇳 | | Yemen | YE | Q805 | 🇾🇪 | ### Europe (50 countries/territories) | Country | Code | Wikidata QID | Flag | |---------|------|--------------|------| | Albania | AL | Q222 | 🇦🇱 | | Andorra | AD | Q228 | 🇦🇩 | | Austria | AT | Q40 | 🇦🇹 | | Belarus | BY | Q184 | 🇧🇾 | | Belgium | BE | Q31 | 🇧🇪 | | Bosnia and Herzegovina | BA | Q225 | 🇧🇦 | | Bulgaria | BG | Q219 | 🇧🇬 | | Croatia | HR | Q224 | 🇭🇷 | | Czech Republic | CZ | Q213 | 🇨🇿 | | Denmark | DK | Q35 | 🇩🇰 | | Estonia | EE | Q191 | 🇪🇪 | | Finland | FI | Q33 | 🇫🇮 | | France | FR | Q142 | 🇫🇷 | | Germany | DE | Q183 | 🇩🇪 | | Greece | GR | Q41 | 🇬🇷 | | Hungary | HU | Q28 | 🇭🇺 | | Iceland | IS | Q189 | 🇮🇸 | | Ireland | IE | Q27 | 🇮🇪 | | Italy | IT | Q38 | 🇮🇹 | | Kosovo | XK | Q1246 | 🇽🇰 | | Latvia | LV | Q211 | 🇱🇻 | | Liechtenstein | LI | Q347 | 🇱🇮 | | Lithuania | LT | Q37 | 🇱🇹 | | Luxembourg | LU | Q32 | 🇱🇺 | | Malta | MT | Q233 | 🇲🇹 | | Moldova | MD | Q217 | 🇲🇩 | | Monaco | MC | Q235 | 🇲🇨 | | Montenegro | ME | Q236 | 🇲🇪 | | Netherlands | NL | Q55 | 🇳🇱 | | North Macedonia | MK | Q221 | 🇲🇰 | | Norway | NO | Q20 | 🇳🇴 | | Poland | PL | Q36 | 🇵🇱 | | Portugal | PT | Q45 | 🇵🇹 | | Romania | RO | Q218 | 🇷🇴 | | Russia | RU | Q159 | 🇷🇺 | | San Marino | SM | Q238 | 🇸🇲 | | Serbia | RS | Q403 | 🇷🇸 | | Slovakia | SK | Q214 | 🇸🇰 | | Slovenia | SI | Q215 | 🇸🇮 | | Spain | ES | Q29 | 🇪🇸 | | Sweden | SE | Q34 | 🇸🇪 | | Switzerland | CH | Q39 | 🇨🇭 | | Ukraine | UA | Q212 | 🇺🇦 | | United Kingdom | GB | Q145 | 🇬🇧 | | Vatican City | VA | Q237 | 🇻🇦 | ### Oceania (16 countries/territories) | Country | Code | Wikidata QID | Flag | |---------|------|--------------|------| | Australia | AU | Q408 | 🇦🇺 | | Fiji | FJ | Q712 | 🇫🇯 | | Kiribati | KI | Q710 | 🇰🇮 | | Marshall Islands | MH | Q709 | 🇲🇭 | | Micronesia | FM | Q702 | 🇫🇲 | | Nauru | NR | Q697 | 🇳🇷 | | New Zealand | NZ | Q664 | 🇳🇿 | | Palau | PW | Q695 | 🇵🇼 | | Papua New Guinea | PG | Q691 | 🇵🇬 | | Samoa | WS | Q683 | 🇼🇸 | | Solomon Islands | SB | Q685 | 🇸🇧 | | Tonga | TO | Q678 | 🇹🇴 | | Tuvalu | TV | Q672 | 🇹🇻 | | Vanuatu | VU | Q686 | 🇻🇺 | ### Total: 205+ countries and territories --- ## Usage Examples ### Example 1: Extract All Dutch Museums ```sparql SELECT DISTINCT ?item ?itemLabel ?itemDescription ?coords ?isil ?viaf ?website WHERE { ?item wdt:P31/wdt:P279? wd:Q33506 . # museum ?item wdt:P17 wd:Q55 . # Netherlands OPTIONAL { ?item wdt:P625 ?coords . } OPTIONAL { ?item wdt:P791 ?isil . } OPTIONAL { ?item wdt:P214 ?viaf . } OPTIONAL { ?item wdt:P856 ?website . } SERVICE wikibase:label { bd:serviceParam wikibase:language "nl,en" . } } LIMIT 1000 ``` ### Example 2: Extract Brazilian Libraries with Collections ```sparql SELECT DISTINCT ?item ?itemLabel ?city ?cityLabel ?collectionSize ?website WHERE { ?item wdt:P31/wdt:P279? wd:Q7075 . # library ?item wdt:P17 wd:Q155 . # Brazil OPTIONAL { ?item wdt:P131 ?city . ?city wdt:P31/wdt:P279? wd:Q515 . # city } OPTIONAL { ?item wdt:P1301 ?collectionSize . } # collection size OPTIONAL { ?item wdt:P856 ?website . } SERVICE wikibase:label { bd:serviceParam wikibase:language "pt,en" . } } ORDER BY DESC(?collectionSize) LIMIT 500 ``` ### Example 3: Extract Japanese Archives with ISIL Codes ```sparql SELECT DISTINCT ?item ?itemLabel ?isil ?city ?cityLabel ?inception WHERE { ?item wdt:P31/wdt:P279? wd:Q166118 . # archive ?item wdt:P17 wd:Q17 . # Japan ?item wdt:P791 ?isil . # MUST have ISIL code OPTIONAL { ?item wdt:P131 ?city . } OPTIONAL { ?item wdt:P571 ?inception . } SERVICE wikibase:label { bd:serviceParam wikibase:language "ja,en" . } } ORDER BY ?isil LIMIT 1000 ``` ### Example 4: Extract All Heritage Institutions in Tunisia ```sparql SELECT DISTINCT ?item ?itemLabel ?itemDescription ?instTypeLabel ?coords ?website WHERE { VALUES ?instTypeClass { wd:Q33506 # museum wd:Q7075 # library wd:Q166118 # archive wd:Q2668072 # art gallery wd:Q5282129 # cultural center } ?item wdt:P31/wdt:P279? ?instTypeClass . ?item wdt:P17 wd:Q948 . # Tunisia ?item wdt:P31 ?instType . OPTIONAL { ?item wdt:P625 ?coords . } OPTIONAL { ?item wdt:P856 ?website . } SERVICE wikibase:label { bd:serviceParam wikibase:language "ar,fr,en" . } } LIMIT 1000 ``` --- ## Query Optimization Strategies ### 1. **Avoid Transitive Subclass Queries** ❌ **Slow** (causes 504 timeout): ```sparql ?item wdt:P31/wdt:P279* wd:Q33506 . # AVOID wdt:P279* (unbounded) ``` ✅ **Fast** (limited depth): ```sparql ?item wdt:P31/wdt:P279? wd:Q33506 . # Use wdt:P279? (0 or 1 hop) ``` ### 2. **Query Institution Types Separately** Instead of one large query with all types, run separate queries for each type: ```python institution_types = ["Q33506", "Q7075", "Q166118", "Q2668072"] for inst_type in institution_types: query = f""" SELECT DISTINCT ?item ?itemLabel ... WHERE {{ ?item wdt:P31/wdt:P279? wd:{inst_type} . ?item wdt:P17 wd:{{COUNTRY_QID}} . ... }} LIMIT 1000 """ results = execute_sparql(query) ``` ### 3. **Use LIMIT and OFFSET for Pagination** For countries with many institutions (e.g., France, Germany, UK): ```sparql SELECT DISTINCT ?item ?itemLabel ... WHERE { ... } ORDER BY ?itemLabel LIMIT 1000 OFFSET 0 # Change to 1000, 2000, 3000, etc. for pagination ``` ### 4. **Specify Language Priorities** ```sparql SERVICE wikibase:label { bd:serviceParam wikibase:language "nl,en" . # Dutch preferred, English fallback } ``` Language codes by country: - **Netherlands**: `"nl,en"` - **Brazil**: `"pt,en"` - **Japan**: `"ja,en"` - **France**: `"fr,en"` - **Germany**: `"de,en"` - **China**: `"zh,en"` - **Russia**: `"ru,en"` - **Arabic countries**: `"ar,en"` or `"ar,fr,en"` ### 5. **Rate Limiting** Wikidata SPARQL endpoint has rate limits. Best practices: - **Add delays between queries**: Wait 2-5 seconds between requests - **Use custom User-Agent**: `"GLAM-Extractor/0.2.0 (Wikidata Global Extraction)"` - **Cache results**: Store results in `data/wikidata/{country}/{timestamp}.json` - **Batch by country**: Process countries in priority order (see `enrich_institutions_wikidata_sparql.py`) ### 6. **Error Handling** ```python import time from SPARQLWrapper import SPARQLWrapper, SPARQLExceptions def execute_sparql_with_retry(sparql, query, max_retries=3): for attempt in range(max_retries): try: sparql.setQuery(query) return sparql.query().convert() except SPARQLExceptions.EndPointInternalError: # 504 timeout - query too complex print(f"Timeout on attempt {attempt+1}, simplifying query...") time.sleep(5) except Exception as e: print(f"Error: {e}") if attempt < max_retries - 1: time.sleep(2 ** attempt) # Exponential backoff else: raise return None ``` --- ## Storage Format Save results to `data/wikidata/{country}/{timestamp}.json`: ```json { "country_code": "NL", "country_name": "Netherlands", "country_qid": "Q55", "extraction_date": "2025-11-11T10:30:00Z", "total_institutions": 1247, "institution_types": { "museum": 843, "library": 302, "archive": 78, "gallery": 24 }, "institutions": [ { "wikidata_qid": "Q190804", "name": "Rijksmuseum", "description": "National museum of the Netherlands", "institution_type": "museum", "coordinates": { "latitude": 52.36, "longitude": 4.885 }, "identifiers": { "ISIL": "NL-AsdRM", "VIAF": "131375374", "website": "https://www.rijksmuseum.nl" }, "location": { "city": "Amsterdam", "region": "North Holland", "country": "Netherlands" }, "founding_date": "1800-01-01", "collection_size": 1000000 } ] } ``` --- ## Next Steps 1. **Create Python script** (`scripts/extract_global_wikidata.py`) that: - Loads country list from this document - Executes generic SPARQL query for each country - Saves results to `data/wikidata/{country}/{timestamp}.json` - Tracks progress and errors 2. **Prioritize countries** by: - Data quality (ISIL registry availability) - Institution count (larger datasets first) - Strategic importance (Netherlands, France, Germany, UK, US, Brazil, Japan) 3. **Convert to LinkML** instances: - Parse JSON results - Map to `HeritageCustodian` schema - Generate GHCIDs - Add provenance metadata (data_source: WIKIDATA, data_tier: TIER_3_CROWD_SOURCED) --- **Version**: 1.0 **Last Updated**: 2025-11-11 **Maintained By**: GLAM Data Extraction Project