glam/tests/fixtures/expected_extraction.json
kempersc e5a532a8bc Add comprehensive tests for NLP institution extraction and RDF partnership integration
- Introduced `test_nlp_extractor.py` with unit tests for the InstitutionExtractor, covering various extraction patterns (ISIL, Wikidata, VIAF, city names) and ensuring proper classification of institutions (museum, library, archive).
- Added tests for extracted entities and result handling to validate the extraction process.
- Created `test_partnership_rdf_integration.py` to validate the end-to-end process of extracting partnerships from a conversation and exporting them to RDF format.
- Implemented tests for temporal properties in partnerships and ensured compliance with W3C Organization Ontology patterns.
- Verified that extracted partnerships are correctly linked with PROV-O provenance metadata.
2025-11-19 23:20:47 +01:00

86 lines
2.4 KiB
JSON

[
{
"id": "rijksmuseum-amsterdam",
"name": "Rijksmuseum",
"alternative_names": [],
"institution_type": "MUSEUM",
"organization_status": "ACTIVE",
"description": "Dutch national museum in Amsterdam with over 1 million objects",
"identifiers": [
{
"identifier_scheme": "ISIL",
"identifier_value": "NL-AsdRM"
}
],
"locations": [
{
"street_address": "Museumstraat 1",
"city": "Amsterdam",
"postal_code": "1071 XX",
"country": "NL",
"is_primary": true
}
],
"homepage": "https://www.rijksmuseum.nl/en/rijksstudio",
"metadata_standards": ["SPECTRUM", "LIDO"],
"collections": [
{
"collection_description": "Over 1 million objects including masterpieces by Rembrandt and Vermeer",
"item_count": 1000000
}
],
"partnerships": [
{
"partner_name": "Museumvereniging",
"partnership_type": "membership"
}
],
"provenance": {
"data_source": "CONVERSATION_NLP",
"data_tier": "TIER_4_INFERRED",
"extraction_method": "NER + pattern matching",
"confidence_score": 0.95,
"conversation_id": "test-uuid-001"
}
},
{
"id": "nationaal-archief-den-haag",
"name": "Nationaal Archief",
"alternative_names": ["National Archive"],
"institution_type": "ARCHIVE",
"organization_status": "ACTIVE",
"description": "National Archive of the Netherlands with documents dating back to the 9th century",
"identifiers": [
{
"identifier_scheme": "ISIL",
"identifier_value": "NL-HaNA"
}
],
"locations": [
{
"street_address": "Prins Willem-Alexanderhof 20",
"city": "Den Haag",
"postal_code": "2595 BE",
"country": "NL",
"is_primary": true
}
],
"homepage": "https://www.nationaalarchief.nl",
"metadata_standards": ["EAD", "RIC_O"],
"parent_organization": "Ministry of Education, Culture and Science",
"partnerships": [
{
"partner_name": "Network of Regional Archives",
"partner_id": "Netwerk Archieven",
"partnership_type": "network_membership"
}
],
"provenance": {
"data_source": "CONVERSATION_NLP",
"data_tier": "TIER_4_INFERRED",
"extraction_method": "NER + pattern matching",
"confidence_score": 0.93,
"conversation_id": "test-uuid-001"
}
}
]