445 lines
14 KiB
Markdown
445 lines
14 KiB
Markdown
# ISIL Enrichment Strategy for Latin American Institutions
|
|
|
|
**Date**: 2025-11-06
|
|
**Dataset**: 304 Latin American institutions (Brazil: 97, Chile: 90, Mexico: 117)
|
|
**Current ISIL Coverage**: 0% (0/304 institutions)
|
|
|
|
## Research Findings
|
|
|
|
### ISIL System Architecture
|
|
|
|
**Decentralized Model**: ISIL operates through national registration agencies, not a global database.
|
|
|
|
**Key Authorities**:
|
|
- **Denmark**: Slots- og Kulturstyrelsen (international registration authority for ISO 15511)
|
|
- **USA**: Library of Congress (US national agency, uses MARC org codes)
|
|
- **Europe**: data.europa.eu, data.overheid.nl (Dutch ISIL registry - we have this!)
|
|
- **National**: Each country's national library/archive maintains their own registry
|
|
|
|
**Important Discovery**: No single global ISIL database exists. Each country maintains its own.
|
|
|
|
### Country-Specific Findings
|
|
|
|
#### Brazil (BR-)
|
|
- **Expected Agency**: Biblioteca Nacional do Brasil or IBICT
|
|
- **Status**: No public ISIL registry found via web search
|
|
- **Challenge**: Search results dominated by ISIL (terrorism) references, not library codes
|
|
- **Recommendation**: Direct contact with Biblioteca Nacional do Brasil required
|
|
|
|
#### Mexico (MX-)
|
|
- **Expected Agency**: Biblioteca Nacional de México (under UNAM)
|
|
- **Alternate Contact**: Asociación Mexicana de Bibliotecarios (AMBAC)
|
|
- **Status**: No public ISIL registry found
|
|
- **Search Strategy**: Try .gob.mx or .unam.mx domain-specific searches for "código ISIL"
|
|
- **Recommendation**: Contact BNM/UNAM directly
|
|
|
|
#### Chile (CL-)
|
|
- **Expected Agency**: Biblioteca Nacional de Chile or DIBAM (now Servicio Nacional del Patrimonio Cultural)
|
|
- **Status**: No public ISIL registry found
|
|
- **Recommendation**: Contact Biblioteca Nacional de Chile
|
|
|
|
## Alternative Enrichment Strategy
|
|
|
|
### Phase 1: Leverage Existing Identifiers (IMMEDIATE - This Week)
|
|
|
|
Since ISIL codes are unavailable, we'll enrich using identifiers we CAN access:
|
|
|
|
#### 1.1 Wikidata Enrichment
|
|
|
|
**What We Have**: Some institutions already have Wikidata IDs in the dataset.
|
|
|
|
**What We Can Get**:
|
|
- ISIL codes (Wikidata Property P791) - if they exist
|
|
- Additional identifiers (GND, VIAF, Library of Congress)
|
|
- Geographic coordinates
|
|
- Parent organizations
|
|
- Website URLs
|
|
|
|
**Action**: Run Wikidata SPARQL queries
|
|
|
|
```python
|
|
# Script: scripts/enrich_from_wikidata.py
|
|
|
|
import requests
|
|
from SPARQLWrapper import SPARQLWrapper, JSON
|
|
|
|
def query_wikidata_isil(country_qid):
|
|
"""
|
|
Query Wikidata for institutions with ISIL codes
|
|
|
|
Args:
|
|
country_qid: Wikidata QID for country (Q155=Brazil, Q96=Mexico, Q298=Chile)
|
|
"""
|
|
|
|
sparql = SPARQLWrapper("https://query.wikidata.org/sparql")
|
|
|
|
query = f"""
|
|
SELECT DISTINCT ?org ?orgLabel ?isil ?viaf ?website ?coords WHERE {{
|
|
?org wdt:P17 wd:{country_qid} . # Country
|
|
OPTIONAL {{ ?org wdt:P791 ?isil . }} # ISIL code
|
|
OPTIONAL {{ ?org wdt:P214 ?viaf . }} # VIAF ID
|
|
OPTIONAL {{ ?org wdt:P856 ?website . }} # Official website
|
|
OPTIONAL {{ ?org wdt:P625 ?coords . }} # Coordinates
|
|
|
|
# Filter for GLAM institutions
|
|
VALUES ?type {{
|
|
wd:Q7075 # Library
|
|
wd:Q166118 # Archive
|
|
wd:Q33506 # Museum
|
|
wd:Q1007870 # Art gallery
|
|
}}
|
|
?org wdt:P31/wdt:P279* ?type .
|
|
|
|
SERVICE wikibase:label {{ bd:serviceParam wikibase:language "en,pt,es". }}
|
|
}}
|
|
LIMIT 1000
|
|
"""
|
|
|
|
sparql.setQuery(query)
|
|
sparql.setReturnFormat(JSON)
|
|
results = sparql.query().convert()
|
|
|
|
return results
|
|
```
|
|
|
|
**Expected Output**:
|
|
- Brazil: ~20-30 institutions with Wikidata records
|
|
- Mexico: ~15-25 institutions
|
|
- Chile: ~10-20 institutions
|
|
- ISIL codes: <10 total (realistic estimate)
|
|
|
|
#### 1.2 VIAF Enrichment
|
|
|
|
**What VIAF Provides**:
|
|
- Authority records for organizations
|
|
- ISIL codes (when available)
|
|
- Cross-references to national authority files
|
|
|
|
**Action**: For each institution with a VIAF ID, fetch the full VIAF record
|
|
|
|
```python
|
|
# Script: scripts/enrich_from_viaf.py
|
|
|
|
import requests
|
|
import xml.etree.ElementTree as ET
|
|
|
|
def fetch_viaf_record(viaf_id):
|
|
"""
|
|
Fetch VIAF record and extract ISIL code if present
|
|
|
|
Args:
|
|
viaf_id: VIAF identifier (e.g., "123556639")
|
|
|
|
Returns:
|
|
dict with extracted data including ISIL if available
|
|
"""
|
|
|
|
url = f"https://viaf.org/viaf/{viaf_id}/viaf.xml"
|
|
response = requests.get(url)
|
|
|
|
if response.status_code == 200:
|
|
root = ET.fromstring(response.content)
|
|
|
|
# Look for ISIL in various VIAF fields
|
|
# VIAF may include ISIL in organizational identifiers
|
|
|
|
data = {
|
|
'viaf_id': viaf_id,
|
|
'isil': None, # Extract if found
|
|
'alt_names': [],
|
|
'related_ids': {}
|
|
}
|
|
|
|
# Parse XML for ISIL references
|
|
# Format varies - may be in <ns1:sources> or <ns1:mainHeadings>
|
|
|
|
return data
|
|
|
|
return None
|
|
```
|
|
|
|
**Current Dataset Status**: Check how many institutions already have VIAF IDs
|
|
|
|
```bash
|
|
grep -c "VIAF" data/instances/latin_american_institutions.yaml
|
|
```
|
|
|
|
#### 1.3 OpenStreetMap (OSM) Enrichment
|
|
|
|
**What OSM Provides**:
|
|
- Building coordinates (better than city-level)
|
|
- Opening hours, contact info
|
|
- Sometimes website URLs
|
|
|
|
**Action**: For institutions with addresses, query Nominatim/Overpass API
|
|
|
|
```python
|
|
# Script: scripts/enrich_from_osm.py
|
|
|
|
import requests
|
|
|
|
def search_osm_by_name_and_location(institution_name, city, country):
|
|
"""
|
|
Search OpenStreetMap for institution details
|
|
|
|
Returns:
|
|
- Precise coordinates
|
|
- OSM tags (amenity=library, tourism=museum, etc.)
|
|
- Website URL (if tagged)
|
|
"""
|
|
|
|
overpass_url = "https://overpass-api.de/api/interpreter"
|
|
|
|
# Overpass QL query
|
|
query = f"""
|
|
[out:json];
|
|
area["name"="{country}"]->.country;
|
|
(
|
|
node(area.country)["name"~"{institution_name}",i];
|
|
way(area.country)["name"~"{institution_name}",i];
|
|
relation(area.country)["name"~"{institution_name}",i];
|
|
);
|
|
out body;
|
|
"""
|
|
|
|
response = requests.post(overpass_url, data={'data': query})
|
|
|
|
if response.status_code == 200:
|
|
return response.json()
|
|
|
|
return None
|
|
```
|
|
|
|
### Phase 2: Document the Gap (IMMEDIATE)
|
|
|
|
**Action**: Add provenance notes to all 304 institutions documenting ISIL absence
|
|
|
|
```yaml
|
|
# Example update to each institution record
|
|
|
|
provenance:
|
|
data_source: CONVERSATION_NLP
|
|
data_tier: TIER_4_INFERRED
|
|
extraction_date: "2025-11-05T14:30:00Z"
|
|
extraction_method: "AI agent comprehensive extraction"
|
|
confidence_score: 0.85
|
|
conversation_id: "..."
|
|
notes: >-
|
|
ISIL code not available. Brazil lacks publicly accessible ISIL registry
|
|
as of 2025-11-06. National registration agency (Biblioteca Nacional do
|
|
Brasil or IBICT) has not published ISIL directory. Alternative identifiers
|
|
used: Wikidata QID, website URL. See docs/isil_enrichment_strategy.md
|
|
for details.
|
|
```
|
|
|
|
**Script**: `scripts/add_isil_gap_notes.py`
|
|
|
|
```python
|
|
import yaml
|
|
from datetime import datetime
|
|
|
|
def add_isil_gap_documentation(input_file, output_file):
|
|
"""
|
|
Add standardized notes about ISIL unavailability to all institutions
|
|
"""
|
|
|
|
with open(input_file, 'r') as f:
|
|
institutions = yaml.safe_load(f)
|
|
|
|
gap_notes = {
|
|
'BR': "ISIL code not available. Brazil lacks publicly accessible ISIL registry as of 2025-11-06.",
|
|
'MX': "ISIL code not available. Mexico lacks publicly accessible ISIL registry as of 2025-11-06.",
|
|
'CL': "ISIL code not available. Chile lacks publicly accessible ISIL registry as of 2025-11-06."
|
|
}
|
|
|
|
for inst in institutions:
|
|
country = inst.get('locations', [{}])[0].get('country', 'Unknown')
|
|
|
|
if country in gap_notes and inst.get('provenance'):
|
|
existing_notes = inst['provenance'].get('notes', '')
|
|
|
|
# Append ISIL gap note if not already present
|
|
if 'ISIL code not available' not in existing_notes:
|
|
inst['provenance']['notes'] = (
|
|
existing_notes + '\n' + gap_notes[country]
|
|
).strip()
|
|
|
|
with open(output_file, 'w') as f:
|
|
yaml.dump(institutions, f, allow_unicode=True, sort_keys=False)
|
|
|
|
print(f"Updated {len(institutions)} institutions with ISIL gap documentation")
|
|
```
|
|
|
|
### Phase 3: National Agency Outreach (2-4 Weeks)
|
|
|
|
**Action**: Contact national libraries directly
|
|
|
|
#### Outreach Email Template
|
|
|
|
```
|
|
Subject: ISIL Registry Access Request - Global GLAM Dataset Research Project
|
|
|
|
Dear [National Library/Archive],
|
|
|
|
I am writing from the Global GLAM Dataset project, an open research initiative
|
|
documenting heritage institutions worldwide using LinkML schema and Linked Open Data.
|
|
|
|
We have extracted and geocoded data for [NUMBER] [COUNTRY] institutions from
|
|
archival research, including:
|
|
- [Institution examples]
|
|
|
|
To enhance data quality, we seek access to official ISIL (ISO 15511) codes.
|
|
|
|
Questions:
|
|
1. Is [INSTITUTION NAME] the designated ISIL national registration agency for [COUNTRY]?
|
|
2. Does a public ISIL registry or directory exist for [COUNTRY] institutions?
|
|
3. If available, can we access it for academic research purposes?
|
|
4. What is the process for obtaining ISIL codes for institutions lacking them?
|
|
|
|
Our dataset is published under CC-BY 4.0 and will credit all data sources.
|
|
|
|
Project repository: https://github.com/[your-repo]
|
|
Schema documentation: [link to LinkML schema]
|
|
|
|
Thank you for your assistance in advancing global cultural heritage data.
|
|
|
|
Best regards,
|
|
[Your Name]
|
|
Global GLAM Dataset Project
|
|
```
|
|
|
|
**Targets**:
|
|
|
|
**Brazil**:
|
|
- Biblioteca Nacional do Brasil (https://www.bn.gov.br/)
|
|
- Email: Contact form on website
|
|
- IBICT (https://www.ibict.br/)
|
|
- Email: ibict@ibict.br
|
|
|
|
**Mexico**:
|
|
- Biblioteca Nacional de México (https://www.bnm.unam.mx/)
|
|
- Email: Through UNAM contact
|
|
- AMBAC (Asociación Mexicana de Bibliotecarios)
|
|
|
|
**Chile**:
|
|
- Biblioteca Nacional de Chile (https://www.bibliotecanacional.gob.cl/)
|
|
- Email: Contact form on website
|
|
- Servicio Nacional del Patrimonio Cultural (https://www.patrimoniocultural.gob.cl/)
|
|
|
|
**Timeline**: Send emails by 2025-11-13, follow up after 2 weeks
|
|
|
|
### Phase 4: Generate Unofficial ISIL-Like Codes (Optional)
|
|
|
|
**Only if**: National agencies confirm no registry exists OR no response after 4 weeks
|
|
|
|
**Purpose**: Internal linking and future-proofing
|
|
|
|
**Format**: `{COUNTRY}-Unofficial-{CityCode}{InstitutionType}{Sequence}`
|
|
|
|
**Example**:
|
|
```yaml
|
|
# Museu de Arte do Rio, Brazil
|
|
identifiers:
|
|
- identifier_scheme: "ISIL-Unofficial"
|
|
identifier_value: "BR-Unofficial-RioM001"
|
|
identifier_url: null
|
|
notes: >-
|
|
Unofficial ISIL-like code generated for internal dataset use.
|
|
Format: BR (country) + Rio (city code) + M (museum) + 001 (sequence).
|
|
Not an official ISO 15511 ISIL code. Created 2025-11-06 due to
|
|
absence of public Brazilian ISIL registry.
|
|
```
|
|
|
|
**Provenance Tracking**:
|
|
```yaml
|
|
provenance:
|
|
data_tier: TIER_4_INFERRED
|
|
notes: >-
|
|
Unofficial ISIL-like identifier created for internal use only.
|
|
Official ISIL code unavailable as Brazil lacks public registry.
|
|
Code follows ISO 15511 format but is NOT officially registered.
|
|
```
|
|
|
|
**Warning**: Clearly document these are NOT official ISIL codes
|
|
|
|
### Phase 5: Publish Enriched Dataset (4-6 Weeks)
|
|
|
|
**Deliverables**:
|
|
1. Updated `latin_american_institutions.yaml` with:
|
|
- Wikidata-sourced ISIL codes (if any found)
|
|
- VIAF-sourced ISIL codes (if any found)
|
|
- OSM-enriched coordinates and URLs
|
|
- Documentation of ISIL gap in provenance notes
|
|
|
|
2. Report: `docs/latin_american_isil_findings.md`
|
|
- Summary of enrichment results
|
|
- National agency responses (if any)
|
|
- Recommendations for future work
|
|
|
|
3. Statistics:
|
|
- ISIL codes found: [number]/304
|
|
- Wikidata matches: [number]/304
|
|
- VIAF enrichments: [number]/304
|
|
- OSM coordinate improvements: [number]/304
|
|
|
|
## Implementation Checklist
|
|
|
|
**Week 1 (Nov 6-12)**:
|
|
- [x] Research ISIL availability (COMPLETED)
|
|
- [ ] Run Wikidata SPARQL queries for BR, MX, CL
|
|
- [ ] Extract ISIL codes from Wikidata results
|
|
- [ ] Count existing VIAF IDs in dataset
|
|
- [ ] Fetch VIAF records for institutions with VIAF IDs
|
|
|
|
**Week 2 (Nov 13-19)**:
|
|
- [ ] Send outreach emails to national libraries
|
|
- [ ] Run OSM enrichment for institutions with addresses
|
|
- [ ] Add ISIL gap documentation to all 304 provenance records
|
|
- [ ] Update PROGRESS.md with findings
|
|
|
|
**Week 3-4 (Nov 20-Dec 3)**:
|
|
- [ ] Process responses from national libraries (if any)
|
|
- [ ] Compile enrichment results
|
|
- [ ] Generate updated exports (JSON-LD, CSV, GeoJSON)
|
|
- [ ] Write final report
|
|
|
|
**Optional (if no registries found)**:
|
|
- [ ] Decide: Generate unofficial ISIL-like codes? (Discuss with stakeholders)
|
|
- [ ] If yes: Implement unofficial code generator
|
|
- [ ] Document clearly in schema and exports
|
|
|
|
## Expected Outcomes
|
|
|
|
### Realistic Scenario
|
|
- **ISIL codes found**: 5-15 (via Wikidata/VIAF, ~2-5% coverage)
|
|
- **Wikidata matches**: 50-80 institutions (~16-26%)
|
|
- **VIAF enrichments**: 30-50 institutions (~10-16%)
|
|
- **OSM improvements**: 100-150 institutions (~33-49%)
|
|
- **National agency responses**: 0-2 (low response rate expected)
|
|
|
|
### Success Metrics
|
|
- ✅ Document ISIL gap comprehensively
|
|
- ✅ Enrich with alternative authoritative identifiers
|
|
- ✅ Contact national agencies (attempt made)
|
|
- ✅ Provide clear provenance for all data
|
|
- ❓ Obtain official ISIL codes (low probability but worth trying)
|
|
|
|
## Lessons for Future Datasets
|
|
|
|
**Key Insights**:
|
|
1. ISIL is decentralized - no global database exists
|
|
2. Public registries vary by country (Netherlands: excellent, Brazil/Mexico/Chile: none found)
|
|
3. Alternative identifiers (Wikidata, VIAF) are more globally accessible
|
|
4. Direct outreach to national agencies is required for ISIL access
|
|
|
|
**Recommendations**:
|
|
- Prioritize Wikidata/VIAF as primary identifiers for global datasets
|
|
- Use ISIL when available (e.g., Netherlands, USA, some EU countries)
|
|
- Document identifier gaps transparently
|
|
- Consider unofficial codes only as last resort with clear labeling
|
|
|
|
---
|
|
|
|
**Status**: Strategy defined, implementation starting
|
|
**Next Update**: After Week 1 enrichment scripts complete
|
|
**Owner**: Global GLAM Dataset Project Team
|