5.2 KiB
5.2 KiB
Unified GLAM Database - Quick Start
Last Updated: 2025-11-20
Database Version: 1.0.0 (Phase 1)
Total Institutions: 1,678 across 8 countries
Quick Access
Database Files
# JSON format (2.5 MB, complete)
/Users/kempersc/apps/glam/data/unified/glam_unified_database.json
# SQLite format (20 KB, partial due to overflow issue)
/Users/kempersc/apps/glam/data/unified/glam_unified_database.db
Query Examples
Python (JSON)
import json
# Load database
with open('data/unified/glam_unified_database.json', 'r') as f:
db = json.load(f)
# Get metadata
print(f"Total institutions: {db['metadata']['total_institutions']}")
print(f"Countries: {', '.join(db['metadata']['countries'])}")
# Find Finnish museums
finnish_museums = [
inst for inst in db['institutions']
if inst['source_country'] == 'finland'
and inst['institution_type'] == 'MUSEUM'
]
print(f"Finnish museums: {len(finnish_museums)}")
# Get country statistics
for country, stats in db['country_stats'].items():
print(f"{country}: {stats['total']} institutions ({stats['with_wikidata']} with Wikidata)")
SQLite (after fixing overflow)
# Count by country
sqlite3 data/unified/glam_unified_database.db \
"SELECT country, COUNT(*) FROM institutions GROUP BY country ORDER BY COUNT(*) DESC;"
# Find institutions with Wikidata
sqlite3 data/unified/glam_unified_database.db \
"SELECT name, country FROM institutions WHERE has_wikidata=1 LIMIT 10;"
# Search by institution type
sqlite3 data/unified/glam_unified_database.db \
"SELECT name, city FROM institutions WHERE institution_type='MUSEUM';"
Database Schema
JSON Structure
{
"metadata": {
"export_date": "2025-11-20T15:17:03+00:00",
"total_institutions": 1678,
"unique_ghcids": 565,
"duplicates": 269,
"countries": ["finland", "denmark", ...]
},
"country_stats": {
"finland": {
"total": 817,
"with_ghcid": 817,
"with_wikidata": 63,
"with_website": 58,
"by_type": {"LIBRARY": 789, "MUSEUM": 15, ...}
}
},
"institutions": [
{
"id": "https://w3id.org/heritage/custodian/fi/...",
"ghcid": "FI-A-A-L-ALKU-Q39176216",
"ghcid_uuid": "550e8400-e29b-41d4-a716-446655440000",
"name": "Alakylän kirjasto",
"institution_type": "LIBRARY",
"country": "FI",
"city": "Alavi",
"has_wikidata": true,
"has_website": false,
"raw_record": "{...full LinkML record...}"
}
]
}
SQLite Schema
CREATE TABLE institutions (
id TEXT PRIMARY KEY,
ghcid TEXT,
ghcid_uuid TEXT,
ghcid_numeric INTEGER, -- ⚠️ Overflow issue
name TEXT NOT NULL,
institution_type TEXT,
country TEXT,
city TEXT,
source_country TEXT,
data_source TEXT,
data_tier TEXT,
extraction_date TEXT,
has_wikidata BOOLEAN,
has_website BOOLEAN,
raw_record TEXT -- Full JSON record
);
CREATE TABLE metadata (
key TEXT PRIMARY KEY,
value TEXT
);
Statistics at a Glance
Overall
- Total Institutions: 1,678
- Unique GHCIDs: 565 (33.7%)
- Wikidata Coverage: 258 (15.4%)
- Website Coverage: 198 (11.8%)
By Country
| Country | Count | GHCID | Wikidata | Tier |
|---|---|---|---|---|
| 🇫🇮 Finland | 817 | 100% | 7.7% | TIER_1 |
| 🇧🇪 Belgium | 421 | 0% | 0% | TIER_1 |
| 🇧🇾 Belarus | 167 | 0% | 3.0% | TIER_1 |
| 🇳🇱 Netherlands | 153 | 0% | 73.2% | TIER_1 |
| 🇨🇱 Chile | 90 | 0% | 78.9% | TIER_4 |
| 🇪🇬 Egypt | 29 | 58.6% | 24.1% | TIER_4 |
By Institution Type
- Libraries: 1,478 (88.1%)
- Museums: 80 (4.8%)
- Archives: 73 (4.4%)
- Education Providers: 12 (0.7%)
- Official Institutions: 12 (0.7%)
Known Limitations (Phase 1)
- ⚠️ Denmark excluded (2,348 institutions) - parser error
- ⚠️ Canada excluded (9,565 institutions) - nested dict error
- ⚠️ SQLite incomplete - INTEGER overflow on ghcid_numeric
- 🔍 269 GHCID duplicates - need collision resolution
- 📝 Missing GHCIDs - Belgium, Netherlands, Belarus, Chile
Phase 2 will fix these issues and bring total to 13,591 institutions.
Rebuilding the Database
To rebuild with updated country datasets:
# Run the unification script
python3 scripts/build_unified_database.py
# Output will be in:
# - data/unified/glam_unified_database.json
# - data/unified/glam_unified_database.db
To add a new country dataset:
- Edit
scripts/build_unified_database.py - Add country to
COUNTRY_DATASETSdict with path - Run script
- Check
UNIFIED_DATABASE_REPORT.mdfor results
Documentation
- Full Report:
UNIFIED_DATABASE_REPORT.md- Detailed statistics and analysis - Session Summary:
SESSION_SUMMARY_20251120_FINLAND_UNIFIED.md- What we did today - Finland Report:
data/finland_isil/FINLAND_ISIL_HARVEST_REPORT.md- Finnish dataset details - Main Progress:
PROGRESS.md- Overall project status
Support
For questions or issues:
- Check
UNIFIED_DATABASE_REPORT.mdfor detailed documentation - Review
AGENTS.mdfor extraction guidelines - See
PROGRESS.mdfor project history
Version: 1.0.0 (Phase 1)
Next Update: Phase 2 (Denmark + Canada integration)