glam/SESSION_SUMMARY_ARGENTINA_Z3950_INVESTIGATION.md
2025-11-19 23:25:22 +01:00

203 lines
7.4 KiB
Markdown

# Session Summary: Argentina Z39.50 Investigation & Path Forward
**Date**: 2025-11-18
**Status**: Investigation Complete - Recommendation Ready
## What We Accomplished
### 1. ✅ AGN Scraper Executed Successfully
**File**: `scripts/scrapers/scrape_agn_argentina.py`
**Output**: `data/isil/AR/agn_argentina_archives.json`
**Results**:
- 1 institution: Archivo General de la Nación (National Archive)
- 2 collections: Main archive + document collections
- KOHA catalog URLs not accessible (expected)
**Data Extracted**:
```json
{
"name": "Archivo General de la Nación",
"name_en": "National Archive of Argentina",
"type": "ARCHIVE",
"country": "AR",
"city": "Buenos Aires",
"province": "Ciudad Autónoma de Buenos Aires",
"url": "https://argentina.gob.ar/interior/archivo-general-de-la-nacion"
}
```
### 2. ✅ Z39.50 Client Framework Created
**File**: `scripts/query_biblioteca_nacional_z3950.py`
**Confirmed**:
- ✅ Z39.50 server is accessible: `200.123.191.9:9991`
- ✅ Database: `BNA10` (Authority records)
- ✅ Authentication: Username `Z39.50` / Password `Z39.50`
- ✅ Format: MARC21, UTF-8 encoding
### 3. ✅ Biblioteca Nacional Web Catalog Investigated
**URL**: https://catalogo.bn.gov.ar/
**Findings**:
- Modern web interface with "Catálogo de Autoridades" (Authority Catalog)
- Contains MARC21 authority records accessible via web browser
- Z39.50 connection details prominently displayed
- **However**: Most authority records appear to be for **foreign institutions** (e.g., Spanish archives)
- Search for "archivo" returned mainly: Archivo General de Indias (Spain), Korean Film Archive, etc.
- **Few Argentine institutions** found in authority catalog browsing
## Critical Discovery: Authority Catalog Limitation
### Problem Identified
The Biblioteca Nacional's authority catalog (BNA10 database) appears to contain primarily:
- **Foreign institutions** referenced in Argentine bibliographic records
- International archives and libraries (Spanish, Korean, etc.)
- **Not a comprehensive registry of Argentine institutions**
### Why This Matters
The investigation document (`data/isil/AR/ARGENTINA_ISIL_INVESTIGATION.md`) suggested:
> "MARC field 024 (Standard Identifier) in authority records contains ISIL codes"
**However**, the authority catalog is designed for **bibliographic authority control** (standardizing how foreign institutions are cited), not as a **directory of Argentine heritage institutions**.
## Current Argentina Data Status
| Dataset | Count | Type | Coverage | Status |
|---------|-------|------|----------|--------|
| **CONABIP Libraries** | 288 | Popular libraries | Nationwide | ✅ Scraped + Wikidata enriched |
| **AGN** | 1 | National archive | Buenos Aires | ✅ Scraped (this session) |
| **BN Authority Catalog** | ~10-50? | Mixed (mostly foreign) | Limited Argentine | ⚠️ Not suitable for bulk extraction |
## Recommendation: Pivot Strategy
### ❌ Do NOT Invest in Z39.50 Implementation
**Reasons**:
1. Authority catalog contains mainly **foreign institutions**, not Argentine ones
2. Estimated yield of **Argentine ISIL codes: < 50 institutions** (not 200-500 as hoped)
3. High implementation effort (2-3 hours for PyZ3950 client) for minimal return
4. Web scraping the authority catalog would be tedious with old OPAC system
### ✅ INSTEAD: Contact IRAM Directly
**Best approach** identified in investigation document:
**Email**: iram-iso@iram.org.ar
**Subject**: Solicitud de acceso al registro ISIL de Argentina
**Rationale**:
- IRAM is the **official ISIL agency** for Argentina
- They maintain the **authoritative registry** of 500-1,000 institutions
- Direct data export is the most **efficient** path
- Precedent: Other countries (Netherlands, Belgium) provide ISIL registries
**Email Template** (from investigation document):
```
Estimados,
Soy investigador trabajando en un proyecto de patrimonio cultural global
y estoy recopilando datos sobre instituciones GLAM en Argentina.
¿Sería posible acceder al registro completo de códigos ISIL asignados
en Argentina? Cualquier formato (CSV, Excel, PDF) sería útil.
Muchas gracias,
[Your name]
```
### ✅ Alternative: University Library Networks
If IRAM doesn't respond, pursue:
1. **SISBI-UBA** (University of Buenos Aires library system)
- URL: http://www.sisbi.uba.ar/
- ~40 faculty libraries
- May have ISIL codes or institutional directory
2. **JUBIUNA** (National Universities Library Network)
- Consortium of Argentine university libraries
- Potential ISIL code aggregation
## Technical Artifacts Created
### Scripts Created This Session
1.`scripts/scrapers/scrape_agn_argentina.py` (264 lines)
2.`scripts/query_biblioteca_nacional_z3950.py` (285 lines - framework only)
### Data Files Created
1.`data/isil/AR/agn_argentina_archives.json` (1 institution + 2 collections)
### Z39.50 Client Status
- **Framework**: Complete (connection, data structures, MARC parsing)
- **Implementation**: Incomplete (requires PyZ3950 library)
- **Recommendation**: **Do not complete** - authority catalog doesn't contain target data
## Next Steps (Priority Order)
### Immediate (Today)
1.**Send email to IRAM** requesting ISIL registry export
2.**Send email to Biblioteca Nacional** (dpt@bn.gov.ar) asking for guidance on accessing Argentine ISIL codes
### Short-term (This Week)
3. **Complete CONABIP pipeline** - Export 288 libraries to LinkML YAML
4. **Add AGN to instances** - Convert AGN JSON to LinkML YAML
5. **Document Argentina coverage** in main PROGRESS.md
### If IRAM Responds Positively
6. Parse ISIL registry CSV/Excel
7. Cross-reference with CONABIP + AGN data
8. Geocode addresses
9. Export to LinkML YAML
### If IRAM Doesn't Respond (2-week timeout)
10. Investigate SISBI-UBA library directory
11. Investigate JUBIUNA network
12. Consider manual extraction from ministerial websites
## Files for Next Session
### Ready to Process
- `data/isil/AR/conabip_libraries_wikidata_enriched.json` (288 libraries)
- `data/isil/AR/agn_argentina_archives.json` (1 archive)
### Parser Available
- `src/glam_extractor/parsers/argentina_conabip.py`
### Investigation Complete
- `data/isil/AR/ARGENTINA_ISIL_INVESTIGATION.md` (comprehensive research)
## Lessons Learned
### Authority Catalogs ≠ Institutional Directories
**Key Insight**: Library authority catalogs are designed for **bibliographic control** (standardizing citations), not as **comprehensive directories** of heritage institutions.
**Implication**: Z39.50 access to authority records is useful for **international citation standardization**, not for discovering **domestic institutions** with ISIL codes.
### Best Data Sources for ISIL Codes (in priority order)
1. **Official ISIL agency registries** (IRAM for Argentina)
2. **National library consortia** (SISBI-UBA, JUBIUNA)
3. **Ministry of Culture directories**
4. **Web scraping institutional websites**
5.~~Authority catalogs~~ (foreign institutions only)
## Summary
**AGN Scraper**: ✅ Success - 1 institution extracted
**Z39.50 Client**: ⚠️ Framework created but not implemented
**Authority Catalog**: ❌ Not suitable for Argentine ISIL extraction
**Recommended Action**: ✉️ Contact IRAM directly for ISIL registry
**Estimated Argentina Coverage**:
- Current: 289 institutions (288 CONABIP + 1 AGN)
- Potential with IRAM registry: 500-1,000 institutions
- Data quality: TIER_1_AUTHORITATIVE (if IRAM responds)
---
**Next session**: Send IRAM email, complete CONABIP LinkML export, add AGN to instances