glam/GERMAN_REGIONAL_ARCHIVE_PORTALS_DISCOVERY.md
2025-11-19 23:25:22 +01:00

19 KiB

German Regional Archive Portals - Discovery Report

Date: 2025-11-19
Method: Exa deep web search
Context: Discovered after finding archive.nrw.de (441 archives)


Executive Summary

Discovered 12+ regional archive portals across German federal states (Bundesländer), each with similar structure to archive.nrw.de. These portals provide searchable access to state, municipal, church, and specialized archives within each region.

Key Finding

Germany has a FEDERATED archive system - each state (Bundesland) operates its own archive portal, with Archivportal-D serving as the national aggregator. This structure means regional portals contain MORE detailed information than the national ISIL registry or DDB.


National Archive Portal

Archivportal-D (National Aggregator)

URL: https://www.archivportal-d.de/
Scope: All 16 German federal states
Language: German + English
Technology: Part of Deutsche Digitale Bibliothek (DDB)

Features:

  • Search across all German archives
  • Filter by federal state (Bundesland)
  • Filter by sector (state, municipal, church, business, etc.)
  • Finding aids and digital copies
  • Links to regional portals

Federal States Covered:

  • Baden-Württemberg
  • Bayern (Bavaria)
  • Berlin
  • Brandenburg
  • Bremen
  • Hamburg
  • Hessen
  • Mecklenburg-Vorpommern
  • Niedersachsen (Lower Saxony)
  • Nordrhein-Westfalen (NRW) Already harvested
  • Rheinland-Pfalz (Rhineland-Palatinate)
  • Saarland
  • Sachsen (Saxony)
  • Sachsen-Anhalt (Saxony-Anhalt)
  • Schleswig-Holstein
  • Thüringen (Thuringia)

Regional Archive Portals by State

1. Nordrhein-Westfalen (NRW) HARVESTED

Portal: https://www.archive.nrw.de/archivsuche
Status: 441 archives harvested (2025-11-19)
Technology: Drupal-based, JavaScript rendering
Archive Types: Municipal, district, state, university, church, corporate

Harvest Results:

  • 441 archives extracted
  • 356 cities covered
  • 85 new institutions added to German dataset

2. Niedersachsen & Bremen (Arcinsys)

Portal: https://arcinsys.niedersachsen.de/
Also: http://arcinsys.niedersachsen.de/ (HTTP redirects to HTTPS)
Language: German + English
Technology: Arcinsys (shared with Hessen, Schleswig-Holstein)

Features:

  • Joint portal for Niedersachsen AND Bremen
  • Niedersächsisches Landesarchiv (7 locations)
  • Municipal, church, and business archives
  • Online finding aids
  • Digital copies available
  • User registration for ordering archival items

Participating Archives:

  • State archives (Landesarchiv)
  • District archives (Kreisarchive)
  • Municipal archives (Stadtarchive)
  • Community archives (Gemeindearchive)
  • Church archives (Kirchenarchive)
  • University archives (Hochschularchive)
  • Business archives (Wirtschaftsarchive)
  • Media archives (Medienarchive)

Harvest Potential: HIGH (likely 300+ archives)


3. Schleswig-Holstein (Arcinsys)

Portal: https://arcinsys.schleswig-holstein.de/
Language: German + English
Technology: Arcinsys (shared system)

Features:

  • State archive in Schleswig (Prinzenpalais)
  • Municipal and church archives
  • Same Arcinsys interface as Niedersachsen
  • Searchable finding aids
  • Digital copies

Harvest Potential: MEDIUM (likely 150+ archives)


4. Hessen (Arcinsys)

Portal: https://arcinsys.hessen.de/
Language: German
Technology: Arcinsys (original developer)

Features:

  • Hessisches Landesarchiv
  • Municipal and specialized archives
  • Finding aids online
  • Part of 3-state Arcinsys consortium

Note: Hessen developed Arcinsys, later adopted by Niedersachsen and Schleswig-Holstein

Harvest Potential: MEDIUM-HIGH (likely 200+ archives)


5. Thüringen (Thuringia)

Portal: https://www.archive-in-thueringen.de/
Also: https://tharchivtest.thueringen.de/ (test environment)
Language: German + English
Technology: Custom archive portal

Statistics (from portal):

  • 149 archives
  • 14,793 inventories
  • 2,863 online finding aids

Archive Types:

  • State archives (5 locations: Altenburg, Gotha, Greiz, Meiningen, Rudolstadt, Weimar)
  • Main state archive (Weimar)
  • Municipal archives
  • Specialized archives

Features:

  • Cross-archive search
  • Online finding aids
  • Archive descriptions with historical context
  • Newspaper and periodical collections

Harvest Potential: HIGH - 149 archives confirmed


6. Brandenburg

Portal: https://blha.brandenburg.de/
Name: Brandenburgisches Landeshauptarchiv
Language: German + English + Polish
Location: Potsdam (Zum Windmühlenberg)

Features:

  • Main state archive for Brandenburg
  • Holdings from 10th century to present
  • Six main collections (Kurmark, Neumark, Niederlausitz, Prussian, GDR, modern Brandenburg)
  • Research services
  • Digital provenance research (NS-era financial records)

Structure: Centralized state archive (not a portal of multiple archives)

Harvest Potential: LOW (1 main institution, but check for branch archives)


7. Sachsen (Saxony)

Portal: https://www.staatsarchiv.sachsen.de/
Name: Sächsisches Staatsarchiv
Language: German

Features:

  • State archive system
  • Multiple locations (Dresden, Leipzig, Chemnitz, Freiberg, Bautzen)
  • Historical records from medieval period
  • Online research portal
  • Finding aids

Harvest Potential: MEDIUM (state archive with multiple locations + municipal archives)


8. Sachsen-Anhalt (Saxony-Anhalt)

Portal: https://landesarchiv.sachsen-anhalt.de/
Also: https://lha.sachsen-anhalt.de/
Name: Landesarchiv Sachsen-Anhalt (LASA)

Locations:

  • Abteilung Magdeburg
  • Abteilung Dessau
  • Abteilung Merseburg

Features:

  • Three department locations
  • Church book duplicates (Kirchenbuchduplikate)
  • Civil status registers (Zivilstandsregister)
  • Online research portal
  • Genealogical resources

Harvest Potential: MEDIUM (3 main locations + municipal archives)


9. Baden-Württemberg

Portal: https://www.landesarchiv-bw.de/
Name: Landesarchiv Baden-Württemberg
Language: German + English
Online System: https://www2.landesarchiv-bw.de/ofs21/

Features:

  • State archive system
  • Multiple historical territories (Baden, Württemberg, Hohenzollern)
  • Online finding aids (Findmittelsystem)
  • Research services
  • Medieval to modern holdings

Harvest Potential: HIGH (unified state archive + municipal networks)


10. Bayern (Bavaria)

Portal: https://www.gda.bayern.de/
Name: Generaldirektion der Staatlichen Archive Bayerns
Language: German (+ minimal English)

State Archives:

  1. Bayerisches Hauptstaatsarchiv (Munich) - central repository
  2. Staatsarchiv Amberg (Oberpfalz)
  3. Staatsarchiv Augsburg (Schwaben)
  4. Staatsarchiv Bamberg (Oberfranken)
  5. Staatsarchiv Coburg (Oberfranken)
  6. Staatsarchiv Landshut (Niederbayern)
  7. Staatsarchiv München (Oberbayern)
  8. Staatsarchiv Nürnberg (Mittelfranken)
  9. Staatsarchiv Würzburg (Unterfranken)

Features:

  • 9 state archives covering Bavaria's administrative regions
  • Holdings from 777 CE (oldest charter)
  • Genealogical research services
  • No unified search portal (each archive separate)

Harvest Potential: HIGH (9 state archives + extensive municipal network)


11. Rheinland-Pfalz (Rhineland-Palatinate)

Status: Mentioned in Archivportal-D but no dedicated regional portal found

Known Archives:

  • Landesarchiv Rheinland-Pfalz
  • Municipal archives (Stadtarchive)

Harvest Potential: MEDIUM (rely on Archivportal-D or ISIL registry)


12. Mecklenburg-Vorpommern

Portal: https://www.digitale-bibliothek-mv.de/viewer/cms/
Name: Landeshauptarchiv Schwerin
Part of: Digitale Bibliothek Mecklenburg-Vorpommern

Features:

  • Historical collections from Landeshauptarchiv Schwerin
  • 15th century origins (ducal archives)
  • Merged with Geheimes und Hauptarchiv (1779)
  • Digital collections online

Harvest Potential: MEDIUM (state archive + regional municipal archives)


13. Saarland

Status: Mentioned in Archivportal-D but no dedicated regional portal found

Harvest Potential: LOW-MEDIUM (small state, rely on Archivportal-D)


14. Hamburg

Status: City-state, archives part of Hamburg government

Harvest Potential: LOW (single city-state archive)


15. Berlin

Status: City-state, archives part of Berlin government

Harvest Potential: LOW (single city-state archive)


16. Bremen

Portal: Part of Arcinsys Niedersachsen und Bremen
URL: https://www.staatsarchiv.bremen.de/
Name: Staatsarchiv Bremen

Status: Integrated into Arcinsys Niedersachsen portal (see #2 above)

Harvest Potential: LOW (covered by Niedersachsen harvest)


Harvest Priority Ranking

Based on archive count, portal accessibility, and harvest feasibility:

Priority 1 - High Impact (300+ archives expected)

  1. Thüringen - 149 archives CONFIRMED
  2. Niedersachsen & Bremen (Arcinsys) - 300+ archives estimated
  3. Baden-Württemberg - 200+ archives estimated
  4. Bayern (Bavaria) - 9 state archives + municipal network

Priority 2 - Medium Impact (100-200 archives)

  1. Hessen (Arcinsys) - 200+ archives estimated
  2. Schleswig-Holstein (Arcinsys) - 150+ archives estimated
  3. Sachsen (Saxony) - State archive system + municipalities
  4. Sachsen-Anhalt - 3 departments + municipalities

Priority 3 - Lower Impact (<100 archives)

  1. Mecklenburg-Vorpommern - State archive + regional
  2. Brandenburg - Centralized system (1 main archive)
  3. Rheinland-Pfalz - No dedicated portal (use Archivportal-D)
  4. Saarland - Small state (use Archivportal-D)
  5. Hamburg - City-state (single archive)
  6. Berlin - City-state (single archive)

Technical Observations

Portal Technologies

  1. Arcinsys (Hessen, Niedersachsen, Bremen, Schleswig-Holstein)

    • Shared platform developed by Hessen
    • Consistent interface across 4 states
    • User registration system
    • Finding aids + digital copies
    • Web-based ordering system
  2. Custom Drupal (NRW)

    • JavaScript-rendered
    • Archive navigation by category
    • Button-based interface
  3. Custom Portals (Thüringen, Baden-Württemberg, Sachsen)

    • State-specific designs
    • Online finding aids
    • Search interfaces
  4. Institutional Websites (Bayern, Brandenburg)

    • Individual archive websites
    • No unified search portal

Common Features Across Portals

Archive Directory - List of participating archives
Finding Aids - Searchable inventories (Findmittel)
Digital Copies - Scanned archival materials
Archive Descriptions - Historical context, holdings info
Contact Information - Addresses, hours, services
User Accounts - Registration for ordering materials

Harvest Challenges

  1. Arcinsys Portals - May require clicking through archive listings
  2. JavaScript Rendering - Need Playwright/Selenium (like NRW)
  3. No Unified API - Each portal has custom structure
  4. German Language Only - Most portals German-only (except English summaries)
  5. Finding Aid vs Directory - Some portals focus on inventories, not archive lists

Harvest Strategy Recommendations

Approach 1: Arcinsys Consortium (3 states, ~650 archives)

Targets: Niedersachsen & Bremen, Schleswig-Holstein, Hessen
Technology: Shared Arcinsys platform
Advantage: Consistent structure, can reuse scraping logic

Steps:

  1. Analyze Arcinsys archive directory structure
  2. Build unified scraper for all 3 Arcinsys portals
  3. Extract archive names, cities, types, contact info
  4. Geocode and merge with German dataset

Expected Yield: 600+ archives


Approach 2: High-Impact Custom Portals (2 states, ~350 archives)

Targets: Thüringen (149 confirmed), Baden-Württemberg (200+ estimated)
Technology: Custom portals
Advantage: High archive counts, separate portal structures

Steps:

  1. Thüringen: Scrape https://www.archive-in-thueringen.de/ (149 archives listed)
  2. Baden-Württemberg: Scrape https://www.landesarchiv-bw.de/ directory
  3. Extract and merge

Expected Yield: 350+ archives


Approach 3: Bayern State Archives (9 archives + municipal)

Target: Bayern (Bavaria)
Technology: Individual archive websites
Challenge: No unified portal, must compile from GDA directory

Steps:

  1. Scrape archive list from https://www.gda.bayern.de/archive
  2. Extract 9 state archives (Hauptstaatsarchiv + 8 regional)
  3. Check for municipal archive lists on state archive websites

Expected Yield: 10-50 archives (state + major municipal)


Approach 4: National Aggregator (Archivportal-D)

Target: All remaining states (Rheinland-Pfalz, Saarland, etc.)
Portal: https://www.archivportal-d.de/
Advantage: Single portal for all states

Steps:

  1. Scrape Archivportal-D archive directory
  2. Filter by federal state
  3. Extract archive metadata (name, city, type, sector)
  4. Cross-reference with existing harvests (avoid duplicates)

Expected Yield: 1,000+ archives (all Germany, including duplicates from regional portals)


Expected German Dataset Growth

Current State (Post-NRW)

  • Total German Institutions: 20,846
  • Sources: ISIL + DDB + NRW
  • NRW Archives: 441

Projected Growth (Optimistic Scenario)

Portal/State Expected Archives Duplicates (Est.) Net New
Thüringen 149 30 (20%) 119
Niedersachsen & Bremen (Arcinsys) 350 70 (20%) 280
Schleswig-Holstein (Arcinsys) 150 30 (20%) 120
Hessen (Arcinsys) 200 40 (20%) 160
Baden-Württemberg 250 50 (20%) 200
Bayern 50 10 (20%) 40
Sachsen 150 30 (20%) 120
Sachsen-Anhalt 100 20 (20%) 80
Other states 200 40 (20%) 160
TOTAL 1,599 320 1,279

Projected German Dataset (After Regional Harvests)

  • Before Regional Harvests: 20,846 institutions
  • Expected New Additions: ~1,280 archives
  • Projected Total: ~22,100 German institutions

Phase 1 Impact

  • Current Phase 1: 38,479 / 97,000 (39.7%)
  • After German Regional Harvests: 39,800 / 97,000 (41.0%)
  • Gain: +1.3 percentage points

Immediate Actions

  1. Start with Thüringen (149 archives confirmed, easiest harvest)

  2. Harvest Arcinsys Consortium (600+ archives, unified platform)

    • Portals: Niedersachsen, Schleswig-Holstein, Hessen
    • Build shared Arcinsys scraper
    • Estimated time: 2-3 hours
  3. Harvest Baden-Württemberg (200+ archives)

Medium-Term Goals

  1. Harvest Bayern (9-50 archives)
  2. Harvest Sachsen (150+ archives)
  3. Harvest Sachsen-Anhalt (100+ archives)

Long-Term Strategy

  1. Use Archivportal-D as fallback for remaining states
  2. Cross-reference regional harvests with Archivportal-D to catch missing archives
  3. Validate against ISIL registry for quality control

Technical Requirements

Tools Needed

  • Playwright - JavaScript rendering (Arcinsys, Thüringen)
  • BeautifulSoup - HTML parsing
  • RapidFuzz - Deduplication (fuzzy matching)
  • Nominatim - Geocoding (rate-limited 1 req/sec)

Scraper Pattern (from NRW Success)

# 1. Use Playwright for JavaScript-rendered portals
async with async_playwright() as p:
    browser = await p.chromium.launch(headless=True)
    page = await browser.new_page()
    await page.goto(portal_url)
    await page.wait_for_load_state('networkidle')
    
    # 2. Extract archive buttons/links
    archives = await page.locator('.archive-button').all()
    
    # 3. Extract text without clicking (fast approach)
    for archive in archives:
        name = await archive.inner_text()
        # Parse city from name using regex
        
# 4. Geocode cities
# 5. Merge with existing dataset (fuzzy matching)
# 6. Export unified dataset

Key Insights

1. Federated Structure

Germany's archive system is highly federated - each state operates independently with its own portal/system. This means:

  • Regional portals have MORE detail than national ISIL registry
  • Must harvest state-by-state to get complete coverage
  • Archivportal-D aggregates but doesn't replace regional portals

2. Arcinsys Advantage

4 states share Arcinsys (Hessen, Niedersachsen, Bremen, Schleswig-Holstein):

  • Represents ~25% of German states
  • Expected ~600+ archives total
  • Single scraper can harvest all 4 portals
  • Consistent data structure = easier extraction

3. NRW Pattern Replicable

The NRW harvest pattern (fast text extraction without clicking) works well for:

  • Drupal-based portals
  • Button/link-based archive listings
  • JavaScript-rendered pages

Reuse this approach for Thüringen, Arcinsys portals, Baden-Württemberg

4. Duplicate Rate Validation

NRW showed 80.7% duplicate rate (356/441) with existing ISIL+DDB data:

  • Validates existing data sources are comprehensive
  • Expect similar rates for other states
  • ~20% new archives per state is realistic expectation

Comparison to NRW Harvest

Metric NRW Expected (All Regional Portals)
Archives Harvested 441 1,599
Duplicates (%) 80.7% ~80% (estimated)
Net New 85 ~1,280
Cities Covered 356 ~800
Geocoded (%) 83.7% ~85% (target)
Harvest Time 9.3 seconds ~5 hours (estimated)

Conclusion

Germany has a rich ecosystem of regional archive portals beyond archive.nrw.de. Harvesting these portals could add ~1,280 new institutions to the German dataset, bringing the total from 20,846 → ~22,100.

Priority targets:

  1. Thüringen (149 confirmed) - Quick win
  2. Arcinsys Consortium (600+ estimated) - High impact
  3. Baden-Württemberg (200+ estimated) - High impact

Impact: +1.3 percentage points toward Phase 1 goal (39.7% → 41.0%)


Next Recommended Action: Start with Thüringen harvest (149 archives, simple portal structure)


References

Portal URLs

Documentation

  • NRW Harvest: NRW_HARVEST_COMPLETE_20251119.md
  • NRW Merge: SESSION_SUMMARY_20251119_NRW_MERGE_COMPLETE.md
  • Quick Status: QUICK_STATUS_20251119_POST_NRW.md

Report Generated: 2025-11-19 22:30 UTC
Research Method: Exa deep web search (30 queries)
Status: Ready for harvest implementation