19 KiB
German Regional Archive Portals - Discovery Report
Date: 2025-11-19
Method: Exa deep web search
Context: Discovered after finding archive.nrw.de (441 archives)
Executive Summary
Discovered 12+ regional archive portals across German federal states (Bundesländer), each with similar structure to archive.nrw.de. These portals provide searchable access to state, municipal, church, and specialized archives within each region.
Key Finding
Germany has a FEDERATED archive system - each state (Bundesland) operates its own archive portal, with Archivportal-D serving as the national aggregator. This structure means regional portals contain MORE detailed information than the national ISIL registry or DDB.
National Archive Portal
Archivportal-D (National Aggregator)
URL: https://www.archivportal-d.de/
Scope: All 16 German federal states
Language: German + English
Technology: Part of Deutsche Digitale Bibliothek (DDB)
Features:
- Search across all German archives
- Filter by federal state (Bundesland)
- Filter by sector (state, municipal, church, business, etc.)
- Finding aids and digital copies
- Links to regional portals
Federal States Covered:
- Baden-Württemberg
- Bayern (Bavaria)
- Berlin
- Brandenburg
- Bremen
- Hamburg
- Hessen
- Mecklenburg-Vorpommern
- Niedersachsen (Lower Saxony)
- Nordrhein-Westfalen (NRW) ✅ Already harvested
- Rheinland-Pfalz (Rhineland-Palatinate)
- Saarland
- Sachsen (Saxony)
- Sachsen-Anhalt (Saxony-Anhalt)
- Schleswig-Holstein
- Thüringen (Thuringia)
Regional Archive Portals by State
1. Nordrhein-Westfalen (NRW) ✅ HARVESTED
Portal: https://www.archive.nrw.de/archivsuche
Status: ✅ 441 archives harvested (2025-11-19)
Technology: Drupal-based, JavaScript rendering
Archive Types: Municipal, district, state, university, church, corporate
Harvest Results:
- 441 archives extracted
- 356 cities covered
- 85 new institutions added to German dataset
2. Niedersachsen & Bremen (Arcinsys)
Portal: https://arcinsys.niedersachsen.de/
Also: http://arcinsys.niedersachsen.de/ (HTTP redirects to HTTPS)
Language: German + English
Technology: Arcinsys (shared with Hessen, Schleswig-Holstein)
Features:
- Joint portal for Niedersachsen AND Bremen
- Niedersächsisches Landesarchiv (7 locations)
- Municipal, church, and business archives
- Online finding aids
- Digital copies available
- User registration for ordering archival items
Participating Archives:
- State archives (Landesarchiv)
- District archives (Kreisarchive)
- Municipal archives (Stadtarchive)
- Community archives (Gemeindearchive)
- Church archives (Kirchenarchive)
- University archives (Hochschularchive)
- Business archives (Wirtschaftsarchive)
- Media archives (Medienarchive)
Harvest Potential: HIGH (likely 300+ archives)
3. Schleswig-Holstein (Arcinsys)
Portal: https://arcinsys.schleswig-holstein.de/
Language: German + English
Technology: Arcinsys (shared system)
Features:
- State archive in Schleswig (Prinzenpalais)
- Municipal and church archives
- Same Arcinsys interface as Niedersachsen
- Searchable finding aids
- Digital copies
Harvest Potential: MEDIUM (likely 150+ archives)
4. Hessen (Arcinsys)
Portal: https://arcinsys.hessen.de/
Language: German
Technology: Arcinsys (original developer)
Features:
- Hessisches Landesarchiv
- Municipal and specialized archives
- Finding aids online
- Part of 3-state Arcinsys consortium
Note: Hessen developed Arcinsys, later adopted by Niedersachsen and Schleswig-Holstein
Harvest Potential: MEDIUM-HIGH (likely 200+ archives)
5. Thüringen (Thuringia)
Portal: https://www.archive-in-thueringen.de/
Also: https://tharchivtest.thueringen.de/ (test environment)
Language: German + English
Technology: Custom archive portal
Statistics (from portal):
- 149 archives
- 14,793 inventories
- 2,863 online finding aids
Archive Types:
- State archives (5 locations: Altenburg, Gotha, Greiz, Meiningen, Rudolstadt, Weimar)
- Main state archive (Weimar)
- Municipal archives
- Specialized archives
Features:
- Cross-archive search
- Online finding aids
- Archive descriptions with historical context
- Newspaper and periodical collections
Harvest Potential: HIGH - 149 archives confirmed
6. Brandenburg
Portal: https://blha.brandenburg.de/
Name: Brandenburgisches Landeshauptarchiv
Language: German + English + Polish
Location: Potsdam (Zum Windmühlenberg)
Features:
- Main state archive for Brandenburg
- Holdings from 10th century to present
- Six main collections (Kurmark, Neumark, Niederlausitz, Prussian, GDR, modern Brandenburg)
- Research services
- Digital provenance research (NS-era financial records)
Structure: Centralized state archive (not a portal of multiple archives)
Harvest Potential: LOW (1 main institution, but check for branch archives)
7. Sachsen (Saxony)
Portal: https://www.staatsarchiv.sachsen.de/
Name: Sächsisches Staatsarchiv
Language: German
Features:
- State archive system
- Multiple locations (Dresden, Leipzig, Chemnitz, Freiberg, Bautzen)
- Historical records from medieval period
- Online research portal
- Finding aids
Harvest Potential: MEDIUM (state archive with multiple locations + municipal archives)
8. Sachsen-Anhalt (Saxony-Anhalt)
Portal: https://landesarchiv.sachsen-anhalt.de/
Also: https://lha.sachsen-anhalt.de/
Name: Landesarchiv Sachsen-Anhalt (LASA)
Locations:
- Abteilung Magdeburg
- Abteilung Dessau
- Abteilung Merseburg
Features:
- Three department locations
- Church book duplicates (Kirchenbuchduplikate)
- Civil status registers (Zivilstandsregister)
- Online research portal
- Genealogical resources
Harvest Potential: MEDIUM (3 main locations + municipal archives)
9. Baden-Württemberg
Portal: https://www.landesarchiv-bw.de/
Name: Landesarchiv Baden-Württemberg
Language: German + English
Online System: https://www2.landesarchiv-bw.de/ofs21/
Features:
- State archive system
- Multiple historical territories (Baden, Württemberg, Hohenzollern)
- Online finding aids (Findmittelsystem)
- Research services
- Medieval to modern holdings
Harvest Potential: HIGH (unified state archive + municipal networks)
10. Bayern (Bavaria)
Portal: https://www.gda.bayern.de/
Name: Generaldirektion der Staatlichen Archive Bayerns
Language: German (+ minimal English)
State Archives:
- Bayerisches Hauptstaatsarchiv (Munich) - central repository
- Staatsarchiv Amberg (Oberpfalz)
- Staatsarchiv Augsburg (Schwaben)
- Staatsarchiv Bamberg (Oberfranken)
- Staatsarchiv Coburg (Oberfranken)
- Staatsarchiv Landshut (Niederbayern)
- Staatsarchiv München (Oberbayern)
- Staatsarchiv Nürnberg (Mittelfranken)
- Staatsarchiv Würzburg (Unterfranken)
Features:
- 9 state archives covering Bavaria's administrative regions
- Holdings from 777 CE (oldest charter)
- Genealogical research services
- No unified search portal (each archive separate)
Harvest Potential: HIGH (9 state archives + extensive municipal network)
11. Rheinland-Pfalz (Rhineland-Palatinate)
Status: Mentioned in Archivportal-D but no dedicated regional portal found
Known Archives:
- Landesarchiv Rheinland-Pfalz
- Municipal archives (Stadtarchive)
Harvest Potential: MEDIUM (rely on Archivportal-D or ISIL registry)
12. Mecklenburg-Vorpommern
Portal: https://www.digitale-bibliothek-mv.de/viewer/cms/
Name: Landeshauptarchiv Schwerin
Part of: Digitale Bibliothek Mecklenburg-Vorpommern
Features:
- Historical collections from Landeshauptarchiv Schwerin
- 15th century origins (ducal archives)
- Merged with Geheimes und Hauptarchiv (1779)
- Digital collections online
Harvest Potential: MEDIUM (state archive + regional municipal archives)
13. Saarland
Status: Mentioned in Archivportal-D but no dedicated regional portal found
Harvest Potential: LOW-MEDIUM (small state, rely on Archivportal-D)
14. Hamburg
Status: City-state, archives part of Hamburg government
Harvest Potential: LOW (single city-state archive)
15. Berlin
Status: City-state, archives part of Berlin government
Harvest Potential: LOW (single city-state archive)
16. Bremen
Portal: Part of Arcinsys Niedersachsen und Bremen
URL: https://www.staatsarchiv.bremen.de/
Name: Staatsarchiv Bremen
Status: Integrated into Arcinsys Niedersachsen portal (see #2 above)
Harvest Potential: LOW (covered by Niedersachsen harvest)
Harvest Priority Ranking
Based on archive count, portal accessibility, and harvest feasibility:
Priority 1 - High Impact (300+ archives expected)
- Thüringen ⭐ - 149 archives CONFIRMED
- Niedersachsen & Bremen (Arcinsys) ⭐ - 300+ archives estimated
- Baden-Württemberg ⭐ - 200+ archives estimated
- Bayern (Bavaria) - 9 state archives + municipal network
Priority 2 - Medium Impact (100-200 archives)
- Hessen (Arcinsys) - 200+ archives estimated
- Schleswig-Holstein (Arcinsys) - 150+ archives estimated
- Sachsen (Saxony) - State archive system + municipalities
- Sachsen-Anhalt - 3 departments + municipalities
Priority 3 - Lower Impact (<100 archives)
- Mecklenburg-Vorpommern - State archive + regional
- Brandenburg - Centralized system (1 main archive)
- Rheinland-Pfalz - No dedicated portal (use Archivportal-D)
- Saarland - Small state (use Archivportal-D)
- Hamburg - City-state (single archive)
- Berlin - City-state (single archive)
Technical Observations
Portal Technologies
-
Arcinsys (Hessen, Niedersachsen, Bremen, Schleswig-Holstein)
- Shared platform developed by Hessen
- Consistent interface across 4 states
- User registration system
- Finding aids + digital copies
- Web-based ordering system
-
Custom Drupal (NRW)
- JavaScript-rendered
- Archive navigation by category
- Button-based interface
-
Custom Portals (Thüringen, Baden-Württemberg, Sachsen)
- State-specific designs
- Online finding aids
- Search interfaces
-
Institutional Websites (Bayern, Brandenburg)
- Individual archive websites
- No unified search portal
Common Features Across Portals
✅ Archive Directory - List of participating archives
✅ Finding Aids - Searchable inventories (Findmittel)
✅ Digital Copies - Scanned archival materials
✅ Archive Descriptions - Historical context, holdings info
✅ Contact Information - Addresses, hours, services
✅ User Accounts - Registration for ordering materials
Harvest Challenges
- Arcinsys Portals - May require clicking through archive listings
- JavaScript Rendering - Need Playwright/Selenium (like NRW)
- No Unified API - Each portal has custom structure
- German Language Only - Most portals German-only (except English summaries)
- Finding Aid vs Directory - Some portals focus on inventories, not archive lists
Harvest Strategy Recommendations
Approach 1: Arcinsys Consortium (3 states, ~650 archives)
Targets: Niedersachsen & Bremen, Schleswig-Holstein, Hessen
Technology: Shared Arcinsys platform
Advantage: Consistent structure, can reuse scraping logic
Steps:
- Analyze Arcinsys archive directory structure
- Build unified scraper for all 3 Arcinsys portals
- Extract archive names, cities, types, contact info
- Geocode and merge with German dataset
Expected Yield: 600+ archives
Approach 2: High-Impact Custom Portals (2 states, ~350 archives)
Targets: Thüringen (149 confirmed), Baden-Württemberg (200+ estimated)
Technology: Custom portals
Advantage: High archive counts, separate portal structures
Steps:
- Thüringen: Scrape https://www.archive-in-thueringen.de/ (149 archives listed)
- Baden-Württemberg: Scrape https://www.landesarchiv-bw.de/ directory
- Extract and merge
Expected Yield: 350+ archives
Approach 3: Bayern State Archives (9 archives + municipal)
Target: Bayern (Bavaria)
Technology: Individual archive websites
Challenge: No unified portal, must compile from GDA directory
Steps:
- Scrape archive list from https://www.gda.bayern.de/archive
- Extract 9 state archives (Hauptstaatsarchiv + 8 regional)
- Check for municipal archive lists on state archive websites
Expected Yield: 10-50 archives (state + major municipal)
Approach 4: National Aggregator (Archivportal-D)
Target: All remaining states (Rheinland-Pfalz, Saarland, etc.)
Portal: https://www.archivportal-d.de/
Advantage: Single portal for all states
Steps:
- Scrape Archivportal-D archive directory
- Filter by federal state
- Extract archive metadata (name, city, type, sector)
- Cross-reference with existing harvests (avoid duplicates)
Expected Yield: 1,000+ archives (all Germany, including duplicates from regional portals)
Expected German Dataset Growth
Current State (Post-NRW)
- Total German Institutions: 20,846
- Sources: ISIL + DDB + NRW
- NRW Archives: 441
Projected Growth (Optimistic Scenario)
| Portal/State | Expected Archives | Duplicates (Est.) | Net New |
|---|---|---|---|
| Thüringen | 149 | 30 (20%) | 119 |
| Niedersachsen & Bremen (Arcinsys) | 350 | 70 (20%) | 280 |
| Schleswig-Holstein (Arcinsys) | 150 | 30 (20%) | 120 |
| Hessen (Arcinsys) | 200 | 40 (20%) | 160 |
| Baden-Württemberg | 250 | 50 (20%) | 200 |
| Bayern | 50 | 10 (20%) | 40 |
| Sachsen | 150 | 30 (20%) | 120 |
| Sachsen-Anhalt | 100 | 20 (20%) | 80 |
| Other states | 200 | 40 (20%) | 160 |
| TOTAL | 1,599 | 320 | 1,279 |
Projected German Dataset (After Regional Harvests)
- Before Regional Harvests: 20,846 institutions
- Expected New Additions: ~1,280 archives
- Projected Total: ~22,100 German institutions
Phase 1 Impact
- Current Phase 1: 38,479 / 97,000 (39.7%)
- After German Regional Harvests: 39,800 / 97,000 (41.0%)
- Gain: +1.3 percentage points
Recommended Next Steps
Immediate Actions
-
Start with Thüringen ⭐ (149 archives confirmed, easiest harvest)
- Portal: https://www.archive-in-thueringen.de/
- Build scraper for archive directory
- Estimated time: 30 minutes
-
Harvest Arcinsys Consortium ⭐ (600+ archives, unified platform)
- Portals: Niedersachsen, Schleswig-Holstein, Hessen
- Build shared Arcinsys scraper
- Estimated time: 2-3 hours
-
Harvest Baden-Württemberg (200+ archives)
- Portal: https://www.landesarchiv-bw.de/
- Custom scraper for archive directory
- Estimated time: 1 hour
Medium-Term Goals
- Harvest Bayern (9-50 archives)
- Harvest Sachsen (150+ archives)
- Harvest Sachsen-Anhalt (100+ archives)
Long-Term Strategy
- Use Archivportal-D as fallback for remaining states
- Cross-reference regional harvests with Archivportal-D to catch missing archives
- Validate against ISIL registry for quality control
Technical Requirements
Tools Needed
- Playwright - JavaScript rendering (Arcinsys, Thüringen)
- BeautifulSoup - HTML parsing
- RapidFuzz - Deduplication (fuzzy matching)
- Nominatim - Geocoding (rate-limited 1 req/sec)
Scraper Pattern (from NRW Success)
# 1. Use Playwright for JavaScript-rendered portals
async with async_playwright() as p:
browser = await p.chromium.launch(headless=True)
page = await browser.new_page()
await page.goto(portal_url)
await page.wait_for_load_state('networkidle')
# 2. Extract archive buttons/links
archives = await page.locator('.archive-button').all()
# 3. Extract text without clicking (fast approach)
for archive in archives:
name = await archive.inner_text()
# Parse city from name using regex
# 4. Geocode cities
# 5. Merge with existing dataset (fuzzy matching)
# 6. Export unified dataset
Key Insights
1. Federated Structure
Germany's archive system is highly federated - each state operates independently with its own portal/system. This means:
- Regional portals have MORE detail than national ISIL registry
- Must harvest state-by-state to get complete coverage
- Archivportal-D aggregates but doesn't replace regional portals
2. Arcinsys Advantage
4 states share Arcinsys (Hessen, Niedersachsen, Bremen, Schleswig-Holstein):
- Represents ~25% of German states
- Expected ~600+ archives total
- Single scraper can harvest all 4 portals
- Consistent data structure = easier extraction
3. NRW Pattern Replicable
The NRW harvest pattern (fast text extraction without clicking) works well for:
- Drupal-based portals
- Button/link-based archive listings
- JavaScript-rendered pages
Reuse this approach for Thüringen, Arcinsys portals, Baden-Württemberg
4. Duplicate Rate Validation
NRW showed 80.7% duplicate rate (356/441) with existing ISIL+DDB data:
- Validates existing data sources are comprehensive
- Expect similar rates for other states
- ~20% new archives per state is realistic expectation
Comparison to NRW Harvest
| Metric | NRW | Expected (All Regional Portals) |
|---|---|---|
| Archives Harvested | 441 | 1,599 |
| Duplicates (%) | 80.7% | ~80% (estimated) |
| Net New | 85 | ~1,280 |
| Cities Covered | 356 | ~800 |
| Geocoded (%) | 83.7% | ~85% (target) |
| Harvest Time | 9.3 seconds | ~5 hours (estimated) |
Conclusion
Germany has a rich ecosystem of regional archive portals beyond archive.nrw.de. Harvesting these portals could add ~1,280 new institutions to the German dataset, bringing the total from 20,846 → ~22,100.
Priority targets:
- Thüringen (149 confirmed) - Quick win ⭐
- Arcinsys Consortium (600+ estimated) - High impact ⭐
- Baden-Württemberg (200+ estimated) - High impact ⭐
Impact: +1.3 percentage points toward Phase 1 goal (39.7% → 41.0%)
Next Recommended Action: Start with Thüringen harvest (149 archives, simple portal structure)
References
Portal URLs
- Archivportal-D: https://www.archivportal-d.de/
- NRW: https://www.archive.nrw.de/archivsuche ✅ Harvested
- Thüringen: https://www.archive-in-thueringen.de/
- Niedersachsen & Bremen: https://arcinsys.niedersachsen.de/
- Schleswig-Holstein: https://arcinsys.schleswig-holstein.de/
- Hessen: https://arcinsys.hessen.de/
- Baden-Württemberg: https://www.landesarchiv-bw.de/
- Bayern: https://www.gda.bayern.de/
- Brandenburg: https://blha.brandenburg.de/
- Sachsen: https://www.staatsarchiv.sachsen.de/
- Sachsen-Anhalt: https://landesarchiv.sachsen-anhalt.de/
Documentation
- NRW Harvest:
NRW_HARVEST_COMPLETE_20251119.md - NRW Merge:
SESSION_SUMMARY_20251119_NRW_MERGE_COMPLETE.md - Quick Status:
QUICK_STATUS_20251119_POST_NRW.md
Report Generated: 2025-11-19 22:30 UTC
Research Method: Exa deep web search (30 queries)
Status: Ready for harvest implementation