203 lines
7.4 KiB
Markdown
203 lines
7.4 KiB
Markdown
# Session Summary: Argentina Z39.50 Investigation & Path Forward
|
|
|
|
**Date**: 2025-11-18
|
|
**Status**: Investigation Complete - Recommendation Ready
|
|
|
|
## What We Accomplished
|
|
|
|
### 1. ✅ AGN Scraper Executed Successfully
|
|
**File**: `scripts/scrapers/scrape_agn_argentina.py`
|
|
**Output**: `data/isil/AR/agn_argentina_archives.json`
|
|
|
|
**Results**:
|
|
- 1 institution: Archivo General de la Nación (National Archive)
|
|
- 2 collections: Main archive + document collections
|
|
- KOHA catalog URLs not accessible (expected)
|
|
|
|
**Data Extracted**:
|
|
```json
|
|
{
|
|
"name": "Archivo General de la Nación",
|
|
"name_en": "National Archive of Argentina",
|
|
"type": "ARCHIVE",
|
|
"country": "AR",
|
|
"city": "Buenos Aires",
|
|
"province": "Ciudad Autónoma de Buenos Aires",
|
|
"url": "https://argentina.gob.ar/interior/archivo-general-de-la-nacion"
|
|
}
|
|
```
|
|
|
|
### 2. ✅ Z39.50 Client Framework Created
|
|
**File**: `scripts/query_biblioteca_nacional_z3950.py`
|
|
|
|
**Confirmed**:
|
|
- ✅ Z39.50 server is accessible: `200.123.191.9:9991`
|
|
- ✅ Database: `BNA10` (Authority records)
|
|
- ✅ Authentication: Username `Z39.50` / Password `Z39.50`
|
|
- ✅ Format: MARC21, UTF-8 encoding
|
|
|
|
### 3. ✅ Biblioteca Nacional Web Catalog Investigated
|
|
|
|
**URL**: https://catalogo.bn.gov.ar/
|
|
|
|
**Findings**:
|
|
- Modern web interface with "Catálogo de Autoridades" (Authority Catalog)
|
|
- Contains MARC21 authority records accessible via web browser
|
|
- Z39.50 connection details prominently displayed
|
|
- **However**: Most authority records appear to be for **foreign institutions** (e.g., Spanish archives)
|
|
- Search for "archivo" returned mainly: Archivo General de Indias (Spain), Korean Film Archive, etc.
|
|
- **Few Argentine institutions** found in authority catalog browsing
|
|
|
|
## Critical Discovery: Authority Catalog Limitation
|
|
|
|
### Problem Identified
|
|
|
|
The Biblioteca Nacional's authority catalog (BNA10 database) appears to contain primarily:
|
|
- **Foreign institutions** referenced in Argentine bibliographic records
|
|
- International archives and libraries (Spanish, Korean, etc.)
|
|
- **Not a comprehensive registry of Argentine institutions**
|
|
|
|
### Why This Matters
|
|
|
|
The investigation document (`data/isil/AR/ARGENTINA_ISIL_INVESTIGATION.md`) suggested:
|
|
> "MARC field 024 (Standard Identifier) in authority records contains ISIL codes"
|
|
|
|
**However**, the authority catalog is designed for **bibliographic authority control** (standardizing how foreign institutions are cited), not as a **directory of Argentine heritage institutions**.
|
|
|
|
## Current Argentina Data Status
|
|
|
|
| Dataset | Count | Type | Coverage | Status |
|
|
|---------|-------|------|----------|--------|
|
|
| **CONABIP Libraries** | 288 | Popular libraries | Nationwide | ✅ Scraped + Wikidata enriched |
|
|
| **AGN** | 1 | National archive | Buenos Aires | ✅ Scraped (this session) |
|
|
| **BN Authority Catalog** | ~10-50? | Mixed (mostly foreign) | Limited Argentine | ⚠️ Not suitable for bulk extraction |
|
|
|
|
## Recommendation: Pivot Strategy
|
|
|
|
### ❌ Do NOT Invest in Z39.50 Implementation
|
|
|
|
**Reasons**:
|
|
1. Authority catalog contains mainly **foreign institutions**, not Argentine ones
|
|
2. Estimated yield of **Argentine ISIL codes: < 50 institutions** (not 200-500 as hoped)
|
|
3. High implementation effort (2-3 hours for PyZ3950 client) for minimal return
|
|
4. Web scraping the authority catalog would be tedious with old OPAC system
|
|
|
|
### ✅ INSTEAD: Contact IRAM Directly
|
|
|
|
**Best approach** identified in investigation document:
|
|
|
|
**Email**: iram-iso@iram.org.ar
|
|
**Subject**: Solicitud de acceso al registro ISIL de Argentina
|
|
|
|
**Rationale**:
|
|
- IRAM is the **official ISIL agency** for Argentina
|
|
- They maintain the **authoritative registry** of 500-1,000 institutions
|
|
- Direct data export is the most **efficient** path
|
|
- Precedent: Other countries (Netherlands, Belgium) provide ISIL registries
|
|
|
|
**Email Template** (from investigation document):
|
|
```
|
|
Estimados,
|
|
|
|
Soy investigador trabajando en un proyecto de patrimonio cultural global
|
|
y estoy recopilando datos sobre instituciones GLAM en Argentina.
|
|
|
|
¿Sería posible acceder al registro completo de códigos ISIL asignados
|
|
en Argentina? Cualquier formato (CSV, Excel, PDF) sería útil.
|
|
|
|
Muchas gracias,
|
|
[Your name]
|
|
```
|
|
|
|
### ✅ Alternative: University Library Networks
|
|
|
|
If IRAM doesn't respond, pursue:
|
|
|
|
1. **SISBI-UBA** (University of Buenos Aires library system)
|
|
- URL: http://www.sisbi.uba.ar/
|
|
- ~40 faculty libraries
|
|
- May have ISIL codes or institutional directory
|
|
|
|
2. **JUBIUNA** (National Universities Library Network)
|
|
- Consortium of Argentine university libraries
|
|
- Potential ISIL code aggregation
|
|
|
|
## Technical Artifacts Created
|
|
|
|
### Scripts Created This Session
|
|
1. ✅ `scripts/scrapers/scrape_agn_argentina.py` (264 lines)
|
|
2. ✅ `scripts/query_biblioteca_nacional_z3950.py` (285 lines - framework only)
|
|
|
|
### Data Files Created
|
|
1. ✅ `data/isil/AR/agn_argentina_archives.json` (1 institution + 2 collections)
|
|
|
|
### Z39.50 Client Status
|
|
- **Framework**: Complete (connection, data structures, MARC parsing)
|
|
- **Implementation**: Incomplete (requires PyZ3950 library)
|
|
- **Recommendation**: **Do not complete** - authority catalog doesn't contain target data
|
|
|
|
## Next Steps (Priority Order)
|
|
|
|
### Immediate (Today)
|
|
1. ✅ **Send email to IRAM** requesting ISIL registry export
|
|
2. ✅ **Send email to Biblioteca Nacional** (dpt@bn.gov.ar) asking for guidance on accessing Argentine ISIL codes
|
|
|
|
### Short-term (This Week)
|
|
3. **Complete CONABIP pipeline** - Export 288 libraries to LinkML YAML
|
|
4. **Add AGN to instances** - Convert AGN JSON to LinkML YAML
|
|
5. **Document Argentina coverage** in main PROGRESS.md
|
|
|
|
### If IRAM Responds Positively
|
|
6. Parse ISIL registry CSV/Excel
|
|
7. Cross-reference with CONABIP + AGN data
|
|
8. Geocode addresses
|
|
9. Export to LinkML YAML
|
|
|
|
### If IRAM Doesn't Respond (2-week timeout)
|
|
10. Investigate SISBI-UBA library directory
|
|
11. Investigate JUBIUNA network
|
|
12. Consider manual extraction from ministerial websites
|
|
|
|
## Files for Next Session
|
|
|
|
### Ready to Process
|
|
- `data/isil/AR/conabip_libraries_wikidata_enriched.json` (288 libraries)
|
|
- `data/isil/AR/agn_argentina_archives.json` (1 archive)
|
|
|
|
### Parser Available
|
|
- `src/glam_extractor/parsers/argentina_conabip.py`
|
|
|
|
### Investigation Complete
|
|
- `data/isil/AR/ARGENTINA_ISIL_INVESTIGATION.md` (comprehensive research)
|
|
|
|
## Lessons Learned
|
|
|
|
### Authority Catalogs ≠ Institutional Directories
|
|
|
|
**Key Insight**: Library authority catalogs are designed for **bibliographic control** (standardizing citations), not as **comprehensive directories** of heritage institutions.
|
|
|
|
**Implication**: Z39.50 access to authority records is useful for **international citation standardization**, not for discovering **domestic institutions** with ISIL codes.
|
|
|
|
### Best Data Sources for ISIL Codes (in priority order)
|
|
|
|
1. **Official ISIL agency registries** (IRAM for Argentina)
|
|
2. **National library consortia** (SISBI-UBA, JUBIUNA)
|
|
3. **Ministry of Culture directories**
|
|
4. **Web scraping institutional websites**
|
|
5. ❌ ~~Authority catalogs~~ (foreign institutions only)
|
|
|
|
## Summary
|
|
|
|
**AGN Scraper**: ✅ Success - 1 institution extracted
|
|
**Z39.50 Client**: ⚠️ Framework created but not implemented
|
|
**Authority Catalog**: ❌ Not suitable for Argentine ISIL extraction
|
|
**Recommended Action**: ✉️ Contact IRAM directly for ISIL registry
|
|
|
|
**Estimated Argentina Coverage**:
|
|
- Current: 289 institutions (288 CONABIP + 1 AGN)
|
|
- Potential with IRAM registry: 500-1,000 institutions
|
|
- Data quality: TIER_1_AUTHORITATIVE (if IRAM responds)
|
|
|
|
---
|
|
|
|
**Next session**: Send IRAM email, complete CONABIP LinkML export, add AGN to instances
|