glam/data/isil/denmark/README.md
2025-11-19 23:25:22 +01:00

8.2 KiB

Danish Heritage Institution Data (Libraries + Archives)

Overview

This directory contains complete datasets for Danish heritage institutions:

  • Libraries: 568 institutions with ISIL codes (DK-XXXXXX format)
  • Archives: 594 institutions WITHOUT ISIL codes (municipal + special collections)

CRITICAL FINDING: Danish ISIL codes (DK-*) are ONLY for libraries, NOT archives.
Danish archives use a completely separate system without international ISIL codes.


Libraries Data

Source

Data downloaded from VIP-basen (Virtuel Informationsplatform for biblioteker)

Files

Public Libraries (Folkebiblioteker)

  • publiclibraries.csv - Main public libraries (væsener): 109 institutions
  • publicbranches.csv - All public library branches: 639 locations

Research Libraries (Forskningsbiblioteker)

  • ffulibraries.csv - Main research libraries (væsener): 459 institutions
  • ffubranches.csv - All research library branches: 687 locations

Data Structure

Main Libraries Files (publiclibraries.csv, ffulibraries.csv)

Columns:

  • Biblioteksnummer (Library Number) - This is the Danish ISIL/library identifier
  • Navn (Name)
  • Adresse (Address)
  • Postnummer (Postal Code)
  • By (City)
  • Telefon (Telephone)
  • Email
  • Leder (Director/Leader)

Branch Files (publicbranches.csv, ffubranches.csv)

Columns:

  • Biblioteksnummer (Library Number)
  • Væsensnavn (Parent institution name)
  • Navn (Branch name)
  • Adresse (Address)
  • Postnummer (Postal Code)
  • By (City)

Coverage

Total unique institutions: 568 (109 public + 459 research) Total service locations: 1,326 (639 public + 687 research)

This dataset covers:

  • Public libraries (folkebiblioteker)
  • Research libraries (forskningsbiblioteker) including:
    • National Library (Det Kgl. Bibliotek)
    • University libraries
    • Special libraries
    • Professional college libraries
    • Conservatory libraries
    • Various institutional libraries

Data Encoding

  • Format: CSV with semicolon (;) delimiter
  • Character encoding: UTF-8 with BOM
  • Fields enclosed in double quotes

ISIL Code Format

Danish library numbers in this dataset follow the format: 6-digit numbers (e.g., "710100")

These numbers are used in the Danish ISIL codes with prefix DK-X where X is this number.

Next Steps for Data Processing

  1. Parse CSV files with proper UTF-8-BOM encoding
  2. Extract ISIL codes (convert library numbers to DK-XXXXXX format)
  3. Geocode addresses using Nominatim
  4. Map to LinkML HeritageCustodian schema:
    • institution_type: LIBRARY
    • data_source: CSV_REGISTRY
    • data_tier: TIER_1_AUTHORITATIVE
  5. Cross-reference with international ISIL registry if needed
  6. Export to RDF/JSON-LD

Notes on Library Data

  • Some entries have empty address fields (branches without physical locations)
  • Main library files include leadership information
  • Branch files link to parent institutions via Væsensnavn
  • Includes some German locations (Sydslesvig/South Schleswig Danish minority libraries)

Archives Data

Source

Data scraped from Arkiv.dk Municipal Archive Directory

  • URL: https://arkiv.dk/arkiver
  • Operator: Kulturministeriet (Danish Ministry of Culture)
  • Date scraped: 2025-11-19
  • Method: Browser automation (Playwright + JavaScript evaluation)

Files

  • danish_archives_arkivdk.csv - Complete archive list (594 institutions)
  • danish_archives_arkivdk.json - Same data with extraction metadata

Data Structure

Columns (CSV):

  • municipality - Danish municipality name (e.g., "Albertslund Kommune") or "Specialsamlinger"
  • archive_name - Archive institution name
  • country - Always "DK"
  • source - Always "Arkiv.dk"
  • url - Always "https://arkiv.dk/arkiver"

Coverage

Total archives: 594 institutions

Breakdown:

  • Municipal archives: 565 local archives (lokalarkiver)

    • Organized by 98 Danish municipalities
    • Most municipalities have multiple local archives (one per parish/community)
    • ~11 municipalities do NOT contribute to Arkiv.dk (marked "Ingen")
  • Special collections: 29 national/specialized archives

    • Includes Rigsarkivet (National Archives of Denmark)
    • Provincial archives
    • Subject-specific archives (sports, deaf history, prison history, etc.)
    • Museum archives
    • Religious archives (Freemason lodge, Catholic Historical Archive)

Notable Archives

National Archives:

  • Rigsarkivet (National Archives)

Major Municipal Archives:

  • Copenhagen (Københavns Kommune): No local archive listed (uses Rigsarkivet)
  • Aarhus Kommune: 22 local archives
  • Esbjerg Kommune: 23 local archives
  • Ringkøbing-Skjern Kommune: 22 local archives

Special Collections Examples:

  • Arkivet ved Dansk Centralbibliotek for Sydslesvig (German minority archive)
  • Historisk Samling fra Besættelsestiden 1940-1945 (WWII Occupation Historical Collection)
  • Niels Bohr Arkivet (Niels Bohr Archive)
  • Katolsk Historisk Arkiv (Catholic Historical Archive)
  • Friskolearkivet (Free School Archive)
  • SIFA Idrætshistorisk Samling (Sports History Collection)

IMPORTANT: No ISIL Codes for Danish Archives

Danish archives do NOT have ISIL codes.

Evidence:

  1. Official Danish ISIL registry at slks.dk lists ONLY libraries
  2. The ISIL page redirects to VIP-basen (library database)
  3. Page title: "Biblioteker i Danmark med biblioteksnumre" (Libraries in Denmark with library numbers)
  4. Arkiv.dk archive directory does NOT mention ISIL codes
  5. International ISIL database has no Danish archive entries

This is different from many other countries where archives DO receive ISIL codes.

Scraper Implementation

Script: /scripts/scrapers/scrape_danish_archives_playwright.py
Version: 2.0.0-playwright-js-eval

Technical Details:

  • Uses Playwright headless browser automation
  • JavaScript evaluation to extract data from collapsed panels (no clicking needed)
  • Handles dynamic React/Vue.js content
  • Processes municipalities with multiple local archives (newline-separated)
  • Separates Specialsamlinger into individual archive records
  • Runs in ~5 seconds (much faster than clicking approach)

DOM Structure:

  • Municipality tabs: <h4 class="panel-title"><a data-toggle="collapse" href="#collapseX">
  • Archive names: Inside <div id="collapseX" role="tabpanel">
  • Special case: "Ingen" = Municipality does not contribute

Data Quality

  • Completeness: 100% coverage of Arkiv.dk directory (all 98 municipalities + special collections)
  • Accuracy: Archive names extracted directly from official portal
  • Currency: Data current as of 2025-11-19
  • Validation: Verified against manual browser inspection

Next Steps for Data Processing

  1. Parse CSV/JSON files
  2. Geocode municipality → city location (most archives don't have full addresses)
  3. Map to LinkML HeritageCustodian schema:
    • institution_type: ARCHIVE
    • data_source: WEB_SCRAPING (Arkiv.dk)
    • data_tier: TIER_2_VERIFIED (official portal)
    • NO ISIL codes (field should be empty/null)
  4. Generate GHCID identifiers (since no ISIL codes available)
  5. Cross-reference with Rigsarkivet for national/provincial archive details
  6. Export to RDF/JSON-LD

Notes on Archive Data

  • Municipality names: All end with "Kommune" (except Specialsamlinger)
  • Local archives: Often named "Lokalhistorisk Arkiv" or "Lokalarkiv"
  • Multiple archives per municipality: Common for larger municipalities (Aarhus, Esbjerg, etc.)
  • Copenhagen: Does NOT have a local archive listed (uses national archives)
  • Specialsamlinger: Includes Rigsarkivet and 28 other specialized archives
  • No contact information: Archive names only, no addresses/emails/phones (would need to scrape individual archive pages)

Summary: Complete Denmark GLAM Dataset

Total institutions extracted: 1,162

Type Count ISIL Codes? Source
Libraries 568 Yes (DK-XXXXXX) VIP-basen
Archives 594 No Arkiv.dk

Data Status: COMPLETE for Priority 1 country (Denmark)

Denmark now has comprehensive coverage of both libraries and archives, ready for:

  • LinkML schema mapping
  • GHCID identifier generation
  • RDF export
  • Integration into global GLAM dataset