11 KiB
Czech Archives Database Investigation
Date: 2025-11-19
Status: Investigation in progress
Summary
The Czech Republic has TWO separate heritage databases:
- ✅ Libraries - Already harvested (8,145 institutions from ADR database)
- 🔍 Archives - Need to harvest (estimated ~560 institutions from ARON portal)
Czech Archive Infrastructure
Primary Source: Archiválie na dosah (Archives Within Reach)
Portal URL: https://portal.nacr.cz/cro/pro-badatele/
Institution Database: https://portal.nacr.cz/aron/institution
Manager: Národní archiv (National Archives) + Ministry of Interior
What We Discovered
- "Archiválie na dosah" replaced the old "Archivní fondy a sbírky v ČR" database
- Contains 200,000+ archival collections (fonds and collections)
- Managed by Národní archiv through ARON portal (ARchiv ONline)
- Updated in real-time by archivists across Czech Republic
- Includes state archives, municipal archives, university archives, specialized archives, and private archives
Archive Categories in ARON
According to the portal (https://portal.nacr.cz/cro/pro-badatele/):
State Archives (Státní archivy):
- Národní archiv (National Archives)
- Moravský zemský archiv v Brně
- 7 regional state archives (Státní oblastní archivy)
- Archiv bezpečnostních složek (Security Forces Archive)
Municipal Archives (Archivy územních samosprávných celků):
- Archiv hlavního města Prahy (Prague City Archive)
- Archiv města Brna
- Archiv města Plzně
- Archiv města Ústí nad Labem
- Archiv města Ostravy
- Many district and municipal archives
University Archives (Archivy vysokých škol):
- Univerzita Karlova
- Masarykova univerzita
- Univerzita Palackého
- Ostravská univerzita
- ČVUT Praha
- VUT Brno
- And more...
Specialized Archives (Specializované archivy):
- Archiv Kanceláře prezidenta republiky
- Archiv Poslanecké sněmovny (Parliament Archive)
- Archiv Ministerstva zahraničních věcí
- Vojenský historický archiv (Military Historical Archive)
- Archiv Českého rozhlasu (Czech Radio Archive)
- Archiv Národního muzea
- Literární archiv Památníku národního písemnictví
- Archiv Národní galerie
- Archiv Národní knihovny ČR
- And many more...
Private Archives (Soukromé archivy):
- Archiv Židovského muzea
- Corporate archives (Škoda, Vítkovice, Plzeňský Prazdroj, etc.)
- Church archives (Biskupství brněnské)
Current Status
ARON Portal Observations:
- Shows 56 pages of institutions (10 per page)
- Estimated total: ~560 Czech archive institutions
- Web interface at https://portal.nacr.cz/aron/institution
- Each institution has UUID (e.g.,
/aron/apu/000efd1e-099b-4e8c-ab8c-ec9e47662e7b) - Displays institution name, but details require clicking through
Data Access Challenges
No Obvious Bulk Download
❌ What we DON'T have yet:
- No public downloadable XML/CSV export like ADR library database
- No documented public API for institution list
- ARON API exists but appears to be for internal use (DA-COMM system)
- Czech open data portal (data.gov.cz) doesn't list archive institution registry
Possible Data Access Methods
Option 1: Contact Ministry of Interior - Archival Administration ⭐ RECOMMENDED
- Email: arch@mvcr.cz
- Request: Ask for downloadable export of Czech archive institution registry
- Precedent: National Library provided ADR database as open XML download
- License: Likely CC0 (public domain) like library database
- Format: Probably XML or CSV
Option 2: Web Scraping ARON Portal
- Portal: https://portal.nacr.cz/aron/institution
- 56 pages to scrape (paginated results)
- Extract: Institution name, UUID, link to detail page
- Then scrape each detail page for full metadata
- Cons: Time-consuming, fragile, may violate terms of service
Option 3: Check for API Documentation
- Found DA-COMM API docs at https://stands.nacr.cz/da-comm/viewapi/
- Appears to be for component viewing, not institution export
- May require authentication token
- Needs investigation: Does ARON have public API?
Option 4: Check Open Data Portal
- Search data.gov.cz for "archivy" or "archivní instituce"
- May be published as open data dataset
- Status: Preliminary search didn't find it, but worth deeper investigation
Technical Details
ARON System Architecture
ARON = ARchiv ONline (Archive Online)
- Modern web application (React-based frontend)
- RESTful API backend
- PostgreSQL database (inferred from similar Czech gov systems)
- Real-time updates by archivists
- Integrated with CAM (Centrální Archivní Modul - Central Archive Module)
Known ARON Endpoints
/aron/institution- List of archive institutions/aron/fund- Archival fonds/aron/finding-aid- Finding aids (archival inventories)/aron/originator- Originators (organizations that created records)/aron/apu/{uuid}- Individual access point (entity detail)
Data We Need from Each Institution
From LinkML schema perspective, extract:
- Name (official institution name)
- Institution Type (ARCHIVE, MUSEUM, LIBRARY, etc.)
- Location (city, address if available)
- Identifiers (UUID from ARON, potential ISIL code)
- Website URL (if available)
- Description (mission, holdings summary)
- Collections (archival fonds managed by institution)
Comparison: Libraries vs Archives
| Feature | Libraries (ADR) | Archives (ARON) |
|---|---|---|
| Total Institutions | 8,145 | ~560 (estimated) |
| Data Source | National Library | National Archives + Min. Interior |
| Download Available | ✅ Yes (adr.xml.gz) | ❌ Not yet found |
| Format | MARC21 XML | Unknown (web portal only) |
| Coverage | 100% Czech libraries | All Czech archives + cultural institutions |
| ISIL Codes | Yes (siglas) | Unknown if archives have ISIL |
| Update Frequency | Periodic (last: 2025-08-01) | Real-time (online updates) |
| License | CC0 (public domain) | Unknown (likely open) |
Next Steps
Immediate Actions (Priority 1)
-
Email Národní archiv and Ministry of Interior ⭐
To: arch@mvcr.cz Subject: Request for Czech Archive Institution Registry Export Dear Archival Administration, I am working on a global heritage institution database project (https://github.com/user/glam) and successfully integrated data from the National Library's ADR database (8,145 libraries). I would like to request a downloadable export of Czech archive institutions from the "Archiválie na dosah" (ARON) portal. Could you provide: 1. Complete list of Czech archive institutions in XML/CSV format 2. Metadata: institution name, type, location, identifiers, website 3. License information (hoping for CC0 like ADR database) The ADR database was available at: https://aleph.nkp.cz/data/adr.xml.gz Is there a similar download for archive institutions? Thank you! -
Check Czech Open Data Portal Thoroughly
- Search data.gov.cz for:
- "archivy"
- "archivní instituce"
- "ARON"
- "Národní archiv"
- "Archiválie na dosah"
- Filter by publisher: Národní archiv, Ministerstvo vnitra
- Search data.gov.cz for:
-
Investigate ARON API
- Check browser network tab when using portal
- Look for API endpoints (likely
/api/institutionor similar) - Check if API returns JSON data
- Test if API requires authentication
Secondary Actions (Priority 2)
-
Web Scraping as Last Resort
- Only if no official export is available
- Use
crawl4aiorplaywrightto scrape institution list - Respect rate limits (1 request per second)
- Parse HTML for institution names and UUIDs
- Follow links to detail pages for full metadata
-
Cross-Reference with ADR Database
- Check if 87 "archiv" mentions in library database are real archives
- Some may be archive libraries (KI-MU type institutions)
- Deduplicate if institutions appear in both databases
-
ISIL Code Investigation
- Check if Czech archives have ISIL codes
- Libraries use "siglas" (e.g., ABA000)
- Archives may use different identifier system
- Contact ISIL agency: https://isil.org
Data Integration (Priority 3)
-
Create Czech Archive Parser
- Once data is available, create
scripts/parsers/parse_czech_archives.py - Similar structure to
parse_czech_isil.py - Map archive types to GLAM taxonomy (ARCHIVE, MUSEUM, etc.)
- Generate GHCIDs with CZ-* prefix
- Once data is available, create
-
Merge Libraries + Archives
- Combine
czech_institutions.yaml(libraries) with new archive data - Deduplicate by name/location
- Create unified
czech_heritage_institutions.yaml - 8,145 libraries + ~560 archives = ~8,700 total Czech institutions
- Combine
-
Enrich with Wikidata
- Query Wikidata for Czech archives
- Fuzzy match by name and city
- Add Q-numbers to identifiers
- Update GHCIDs with Q-numbers if needed
Expected Outcomes
After Archive Data Harvest
Czech Heritage Institution Dataset:
- Total Institutions: ~8,700 (8,145 libraries + 560 archives)
- Data Quality: TIER_1_AUTHORITATIVE (from national registries)
- Geographic Coverage: All 14 Czech regions
- Type Distribution:
- Libraries: 94.6% (8,145)
- Archives: 5.4% (~560)
- Museums: ~50 (from specialized archives)
- Galleries: ~20 (from specialized archives)
- Universities: ~10 (university archives)
- Mixed: varies
Contribution to Global Dataset
Czech Republic would become one of the most complete country datasets:
- 🇳🇱 Netherlands: ~1,400 institutions (current best)
- 🇨🇿 Czech Republic: ~8,700 institutions (if archives added) 🏆
- 🇧🇷 Brazil: ~3,000+ institutions (in progress)
- 🇦🇷 Argentina: ~2,000 institutions
- 🇦🇹 Austria: ~1,800 institutions
- 🇨🇦 Canada: ~800 institutions
References
Czech National Archives:
- Website: https://www.nacr.cz
- ARON Portal: https://portal.nacr.cz/cro
- Institution List: https://portal.nacr.cz/aron/institution
- Email: posta@nacr.cz
Ministry of Interior - Archival Administration:
- Email: arch@mvcr.cz
- Website: https://www.mvcr.cz (archival section)
Czech Open Data Portal:
- Website: https://data.gov.cz
- Search: https://data.gov.cz/datasets
Related Documentation:
- CZECH_ISIL_COMPLETE_REPORT.md - Library harvest report
- CZECH_ISIL_NEXT_STEPS.md - Quick start guide
- SESSION_SUMMARY_20251119_CZECH_COMPLETE.md - Session summary
Status: Awaiting response from arch@mvcr.cz
Next Session: Check email for response, investigate open data portal, or begin API investigation