# Unified GLAM Database - Quick Start **Last Updated**: 2025-11-20 **Database Version**: 1.0.0 (Phase 1) **Total Institutions**: 1,678 across 8 countries --- ## Quick Access ### Database Files ```bash # JSON format (2.5 MB, complete) /Users/kempersc/apps/glam/data/unified/glam_unified_database.json # SQLite format (20 KB, partial due to overflow issue) /Users/kempersc/apps/glam/data/unified/glam_unified_database.db ``` ### Query Examples #### Python (JSON) ```python import json # Load database with open('data/unified/glam_unified_database.json', 'r') as f: db = json.load(f) # Get metadata print(f"Total institutions: {db['metadata']['total_institutions']}") print(f"Countries: {', '.join(db['metadata']['countries'])}") # Find Finnish museums finnish_museums = [ inst for inst in db['institutions'] if inst['source_country'] == 'finland' and inst['institution_type'] == 'MUSEUM' ] print(f"Finnish museums: {len(finnish_museums)}") # Get country statistics for country, stats in db['country_stats'].items(): print(f"{country}: {stats['total']} institutions ({stats['with_wikidata']} with Wikidata)") ``` #### SQLite (after fixing overflow) ```bash # Count by country sqlite3 data/unified/glam_unified_database.db \ "SELECT country, COUNT(*) FROM institutions GROUP BY country ORDER BY COUNT(*) DESC;" # Find institutions with Wikidata sqlite3 data/unified/glam_unified_database.db \ "SELECT name, country FROM institutions WHERE has_wikidata=1 LIMIT 10;" # Search by institution type sqlite3 data/unified/glam_unified_database.db \ "SELECT name, city FROM institutions WHERE institution_type='MUSEUM';" ``` --- ## Database Schema ### JSON Structure ```json { "metadata": { "export_date": "2025-11-20T15:17:03+00:00", "total_institutions": 1678, "unique_ghcids": 565, "duplicates": 269, "countries": ["finland", "denmark", ...] }, "country_stats": { "finland": { "total": 817, "with_ghcid": 817, "with_wikidata": 63, "with_website": 58, "by_type": {"LIBRARY": 789, "MUSEUM": 15, ...} } }, "institutions": [ { "id": "https://w3id.org/heritage/custodian/fi/...", "ghcid": "FI-A-A-L-ALKU-Q39176216", "ghcid_uuid": "550e8400-e29b-41d4-a716-446655440000", "name": "Alakylรคn kirjasto", "institution_type": "LIBRARY", "country": "FI", "city": "Alavi", "has_wikidata": true, "has_website": false, "raw_record": "{...full LinkML record...}" } ] } ``` ### SQLite Schema ```sql CREATE TABLE institutions ( id TEXT PRIMARY KEY, ghcid TEXT, ghcid_uuid TEXT, ghcid_numeric INTEGER, -- โš ๏ธ Overflow issue name TEXT NOT NULL, institution_type TEXT, country TEXT, city TEXT, source_country TEXT, data_source TEXT, data_tier TEXT, extraction_date TEXT, has_wikidata BOOLEAN, has_website BOOLEAN, raw_record TEXT -- Full JSON record ); CREATE TABLE metadata ( key TEXT PRIMARY KEY, value TEXT ); ``` --- ## Statistics at a Glance ### Overall - **Total Institutions**: 1,678 - **Unique GHCIDs**: 565 (33.7%) - **Wikidata Coverage**: 258 (15.4%) - **Website Coverage**: 198 (11.8%) ### By Country | Country | Count | GHCID | Wikidata | Tier | |---------|-------|-------|----------|------| | ๐Ÿ‡ซ๐Ÿ‡ฎ Finland | 817 | 100% | 7.7% | TIER_1 | | ๐Ÿ‡ง๐Ÿ‡ช Belgium | 421 | 0% | 0% | TIER_1 | | ๐Ÿ‡ง๐Ÿ‡พ Belarus | 167 | 0% | 3.0% | TIER_1 | | ๐Ÿ‡ณ๐Ÿ‡ฑ Netherlands | 153 | 0% | 73.2% | TIER_1 | | ๐Ÿ‡จ๐Ÿ‡ฑ Chile | 90 | 0% | 78.9% | TIER_4 | | ๐Ÿ‡ช๐Ÿ‡ฌ Egypt | 29 | 58.6% | 24.1% | TIER_4 | ### By Institution Type - Libraries: 1,478 (88.1%) - Museums: 80 (4.8%) - Archives: 73 (4.4%) - Education Providers: 12 (0.7%) - Official Institutions: 12 (0.7%) --- ## Known Limitations (Phase 1) 1. โš ๏ธ **Denmark excluded** (2,348 institutions) - parser error 2. โš ๏ธ **Canada excluded** (9,565 institutions) - nested dict error 3. โš ๏ธ **SQLite incomplete** - INTEGER overflow on ghcid_numeric 4. ๐Ÿ” **269 GHCID duplicates** - need collision resolution 5. ๐Ÿ“ **Missing GHCIDs** - Belgium, Netherlands, Belarus, Chile **Phase 2 will fix these issues and bring total to 13,591 institutions.** --- ## Rebuilding the Database To rebuild with updated country datasets: ```bash # Run the unification script python3 scripts/build_unified_database.py # Output will be in: # - data/unified/glam_unified_database.json # - data/unified/glam_unified_database.db ``` To add a new country dataset: 1. Edit `scripts/build_unified_database.py` 2. Add country to `COUNTRY_DATASETS` dict with path 3. Run script 4. Check `UNIFIED_DATABASE_REPORT.md` for results --- ## Documentation - **Full Report**: `UNIFIED_DATABASE_REPORT.md` - Detailed statistics and analysis - **Session Summary**: `SESSION_SUMMARY_20251120_FINLAND_UNIFIED.md` - What we did today - **Finland Report**: `data/finland_isil/FINLAND_ISIL_HARVEST_REPORT.md` - Finnish dataset details - **Main Progress**: `PROGRESS.md` - Overall project status --- ## Support For questions or issues: - Check `UNIFIED_DATABASE_REPORT.md` for detailed documentation - Review `AGENTS.md` for extraction guidelines - See `PROGRESS.md` for project history **Version**: 1.0.0 (Phase 1) **Next Update**: Phase 2 (Denmark + Canada integration)