234 lines
8.2 KiB
Markdown
234 lines
8.2 KiB
Markdown
# Danish Heritage Institution Data (Libraries + Archives)
|
|
|
|
## Overview
|
|
This directory contains complete datasets for Danish heritage institutions:
|
|
- ✅ **Libraries**: 568 institutions with ISIL codes (DK-XXXXXX format)
|
|
- ✅ **Archives**: 594 institutions WITHOUT ISIL codes (municipal + special collections)
|
|
|
|
**CRITICAL FINDING**: Danish ISIL codes (DK-*) are **ONLY for libraries**, NOT archives.
|
|
Danish archives use a completely separate system without international ISIL codes.
|
|
|
|
---
|
|
|
|
## Libraries Data
|
|
|
|
### Source
|
|
Data downloaded from VIP-basen (Virtuel Informationsplatform for biblioteker)
|
|
- URL: https://vip.dbc.dk/librarylists
|
|
- Operator: DBC DIGITAL A/S
|
|
- Date downloaded: 2025-11-19
|
|
|
|
## Files
|
|
|
|
### Public Libraries (Folkebiblioteker)
|
|
- **publiclibraries.csv** - Main public libraries (væsener): 109 institutions
|
|
- **publicbranches.csv** - All public library branches: 639 locations
|
|
|
|
### Research Libraries (Forskningsbiblioteker)
|
|
- **ffulibraries.csv** - Main research libraries (væsener): 459 institutions
|
|
- **ffubranches.csv** - All research library branches: 687 locations
|
|
|
|
## Data Structure
|
|
|
|
### Main Libraries Files (publiclibraries.csv, ffulibraries.csv)
|
|
Columns:
|
|
- Biblioteksnummer (Library Number) - This is the Danish ISIL/library identifier
|
|
- Navn (Name)
|
|
- Adresse (Address)
|
|
- Postnummer (Postal Code)
|
|
- By (City)
|
|
- Telefon (Telephone)
|
|
- Email
|
|
- Leder (Director/Leader)
|
|
|
|
### Branch Files (publicbranches.csv, ffubranches.csv)
|
|
Columns:
|
|
- Biblioteksnummer (Library Number)
|
|
- Væsensnavn (Parent institution name)
|
|
- Navn (Branch name)
|
|
- Adresse (Address)
|
|
- Postnummer (Postal Code)
|
|
- By (City)
|
|
|
|
## Coverage
|
|
|
|
Total unique institutions: **568** (109 public + 459 research)
|
|
Total service locations: **1,326** (639 public + 687 research)
|
|
|
|
This dataset covers:
|
|
- ✅ Public libraries (folkebiblioteker)
|
|
- ✅ Research libraries (forskningsbiblioteker) including:
|
|
- National Library (Det Kgl. Bibliotek)
|
|
- University libraries
|
|
- Special libraries
|
|
- Professional college libraries
|
|
- Conservatory libraries
|
|
- Various institutional libraries
|
|
|
|
## Data Encoding
|
|
- Format: CSV with semicolon (;) delimiter
|
|
- Character encoding: UTF-8 with BOM
|
|
- Fields enclosed in double quotes
|
|
|
|
## ISIL Code Format
|
|
Danish library numbers in this dataset follow the format: 6-digit numbers (e.g., "710100")
|
|
|
|
These numbers are used in the Danish ISIL codes with prefix DK-X where X is this number.
|
|
|
|
## Next Steps for Data Processing
|
|
|
|
1. Parse CSV files with proper UTF-8-BOM encoding
|
|
2. Extract ISIL codes (convert library numbers to DK-XXXXXX format)
|
|
3. Geocode addresses using Nominatim
|
|
4. Map to LinkML HeritageCustodian schema:
|
|
- institution_type: LIBRARY
|
|
- data_source: CSV_REGISTRY
|
|
- data_tier: TIER_1_AUTHORITATIVE
|
|
5. Cross-reference with international ISIL registry if needed
|
|
6. Export to RDF/JSON-LD
|
|
|
|
## Notes on Library Data
|
|
- Some entries have empty address fields (branches without physical locations)
|
|
- Main library files include leadership information
|
|
- Branch files link to parent institutions via Væsensnavn
|
|
- Includes some German locations (Sydslesvig/South Schleswig Danish minority libraries)
|
|
|
|
---
|
|
|
|
## Archives Data
|
|
|
|
### Source
|
|
Data scraped from Arkiv.dk Municipal Archive Directory
|
|
- URL: https://arkiv.dk/arkiver
|
|
- Operator: Kulturministeriet (Danish Ministry of Culture)
|
|
- Date scraped: 2025-11-19
|
|
- Method: Browser automation (Playwright + JavaScript evaluation)
|
|
|
|
### Files
|
|
|
|
- **danish_archives_arkivdk.csv** - Complete archive list (594 institutions)
|
|
- **danish_archives_arkivdk.json** - Same data with extraction metadata
|
|
|
|
### Data Structure
|
|
|
|
Columns (CSV):
|
|
- municipality - Danish municipality name (e.g., "Albertslund Kommune") or "Specialsamlinger"
|
|
- archive_name - Archive institution name
|
|
- country - Always "DK"
|
|
- source - Always "Arkiv.dk"
|
|
- url - Always "https://arkiv.dk/arkiver"
|
|
|
|
### Coverage
|
|
|
|
Total archives: **594 institutions**
|
|
|
|
Breakdown:
|
|
- **Municipal archives**: 565 local archives (lokalarkiver)
|
|
- Organized by 98 Danish municipalities
|
|
- Most municipalities have multiple local archives (one per parish/community)
|
|
- ~11 municipalities do NOT contribute to Arkiv.dk (marked "Ingen")
|
|
|
|
- **Special collections**: 29 national/specialized archives
|
|
- Includes **Rigsarkivet** (National Archives of Denmark)
|
|
- Provincial archives
|
|
- Subject-specific archives (sports, deaf history, prison history, etc.)
|
|
- Museum archives
|
|
- Religious archives (Freemason lodge, Catholic Historical Archive)
|
|
|
|
### Notable Archives
|
|
|
|
**National Archives:**
|
|
- Rigsarkivet (National Archives)
|
|
|
|
**Major Municipal Archives:**
|
|
- Copenhagen (Københavns Kommune): No local archive listed (uses Rigsarkivet)
|
|
- Aarhus Kommune: 22 local archives
|
|
- Esbjerg Kommune: 23 local archives
|
|
- Ringkøbing-Skjern Kommune: 22 local archives
|
|
|
|
**Special Collections Examples:**
|
|
- Arkivet ved Dansk Centralbibliotek for Sydslesvig (German minority archive)
|
|
- Historisk Samling fra Besættelsestiden 1940-1945 (WWII Occupation Historical Collection)
|
|
- Niels Bohr Arkivet (Niels Bohr Archive)
|
|
- Katolsk Historisk Arkiv (Catholic Historical Archive)
|
|
- Friskolearkivet (Free School Archive)
|
|
- SIFA Idrætshistorisk Samling (Sports History Collection)
|
|
|
|
### IMPORTANT: No ISIL Codes for Danish Archives
|
|
|
|
**Danish archives do NOT have ISIL codes.**
|
|
|
|
Evidence:
|
|
1. Official Danish ISIL registry at slks.dk lists ONLY libraries
|
|
2. The ISIL page redirects to VIP-basen (library database)
|
|
3. Page title: "Biblioteker i Danmark med biblioteksnumre" (Libraries in Denmark with library numbers)
|
|
4. Arkiv.dk archive directory does NOT mention ISIL codes
|
|
5. International ISIL database has no Danish archive entries
|
|
|
|
This is different from many other countries where archives DO receive ISIL codes.
|
|
|
|
### Scraper Implementation
|
|
|
|
**Script**: `/scripts/scrapers/scrape_danish_archives_playwright.py`
|
|
**Version**: 2.0.0-playwright-js-eval
|
|
|
|
**Technical Details:**
|
|
- Uses Playwright headless browser automation
|
|
- JavaScript evaluation to extract data from collapsed panels (no clicking needed)
|
|
- Handles dynamic React/Vue.js content
|
|
- Processes municipalities with multiple local archives (newline-separated)
|
|
- Separates Specialsamlinger into individual archive records
|
|
- Runs in ~5 seconds (much faster than clicking approach)
|
|
|
|
**DOM Structure:**
|
|
- Municipality tabs: `<h4 class="panel-title"><a data-toggle="collapse" href="#collapseX">`
|
|
- Archive names: Inside `<div id="collapseX" role="tabpanel">`
|
|
- Special case: "Ingen" = Municipality does not contribute
|
|
|
|
### Data Quality
|
|
|
|
- **Completeness**: 100% coverage of Arkiv.dk directory (all 98 municipalities + special collections)
|
|
- **Accuracy**: Archive names extracted directly from official portal
|
|
- **Currency**: Data current as of 2025-11-19
|
|
- **Validation**: Verified against manual browser inspection
|
|
|
|
### Next Steps for Data Processing
|
|
|
|
1. Parse CSV/JSON files
|
|
2. Geocode municipality → city location (most archives don't have full addresses)
|
|
3. Map to LinkML HeritageCustodian schema:
|
|
- institution_type: ARCHIVE
|
|
- data_source: WEB_SCRAPING (Arkiv.dk)
|
|
- data_tier: TIER_2_VERIFIED (official portal)
|
|
- NO ISIL codes (field should be empty/null)
|
|
4. Generate GHCID identifiers (since no ISIL codes available)
|
|
5. Cross-reference with Rigsarkivet for national/provincial archive details
|
|
6. Export to RDF/JSON-LD
|
|
|
|
### Notes on Archive Data
|
|
|
|
- **Municipality names**: All end with "Kommune" (except Specialsamlinger)
|
|
- **Local archives**: Often named "Lokalhistorisk Arkiv" or "Lokalarkiv"
|
|
- **Multiple archives per municipality**: Common for larger municipalities (Aarhus, Esbjerg, etc.)
|
|
- **Copenhagen**: Does NOT have a local archive listed (uses national archives)
|
|
- **Specialsamlinger**: Includes Rigsarkivet and 28 other specialized archives
|
|
- **No contact information**: Archive names only, no addresses/emails/phones (would need to scrape individual archive pages)
|
|
|
|
---
|
|
|
|
## Summary: Complete Denmark GLAM Dataset
|
|
|
|
**Total institutions extracted: 1,162**
|
|
|
|
| Type | Count | ISIL Codes? | Source |
|
|
|------|-------|-------------|--------|
|
|
| Libraries | 568 | ✅ Yes (DK-XXXXXX) | VIP-basen |
|
|
| Archives | 594 | ❌ No | Arkiv.dk |
|
|
|
|
**Data Status**: ✅ **COMPLETE** for Priority 1 country (Denmark)
|
|
|
|
Denmark now has comprehensive coverage of both libraries and archives, ready for:
|
|
- LinkML schema mapping
|
|
- GHCID identifier generation
|
|
- RDF export
|
|
- Integration into global GLAM dataset
|