glam/data/wikidata/GLAMORCUBEPSXHFN/query_generation_log.md
2025-11-19 23:25:22 +01:00

5.2 KiB

2025-11-13T16:11:49 - Botanical (B) Class Query Update

Action: Updated Botanical Garden & Zoo (B) class SPARQL query with latest curated vocabulary exclusions

Changes:

  • Updated exclusion list from 567 to 573 Q-numbers
  • New timestamp: 2025-11-13T16:11:49+00:00
  • All 17 base classes included (botanical garden, zoo, aquarium, herbarium, natural history museum, wildlife reserve, nature reserve, national park, protected area, biosphere reserve, safari park, wildlife sanctuary, marine park, conservation area, seed bank, living collection, natural monument)

Files Created:

  • B/queries/botanical_query_updated_20251113T161149.sparql (9.5K)
  • B/queries/botanical_query_updated_20251113T161149.yaml (1.7K)

Query Characteristics:

  • Base classes: 17
  • Exclusion filters: 12 chunks (50 Q-numbers each)
  • Total Q-numbers excluded: 573
  • Languages: 42 (multilingual label service)
  • Status: Not executed (stored for future use)

Previous Version: botanical_query_updated_20251113T160823.sparql (567 exclusions)

Notes: Query ready for execution on Wikidata Query Service when needed. Includes comprehensive nature preserve and wildlife conservation institution types.

2025-11-13T16:52:19 - Botanical (B) Class Query Logic Clarification

Action: Updated B-class query with clarified comments explaining traversal vs. result exclusion logic

Query Logic Clarification:

  • TRAVERSAL: Query INCLUDES curated Q-numbers when following subclass paths (wdt:P279+)
  • RESULTS: Query EXCLUDES curated Q-numbers from final result set (FILTER NOT IN)
  • PURPOSE: Discover NEW hyponyms that are children/descendants of curated items

Why This Matters: The FILTER statements at the end of the query do NOT prevent traversal through curated Q-numbers. They only exclude those Q-numbers from appearing in results. This is the CORRECT behavior for discovering new vocabulary items that are subclasses of already-known items.

Example:

  • Curated: Q473972 (protected area)
  • Query traverses THROUGH Q473972 to find its subclasses
  • New hyponym found: Q123456 (hypothetical "marine protected area")
  • Result: Q123456 appears in results, Q473972 does NOT

Files Updated:

  • B/queries/botanical_query_updated_20251113T165219.sparql (9.6K)
  • B/queries/botanical_query_updated_20251113T165219.yaml (2.1K)

Previous Concern Addressed: User correctly noted that results should exclude curated Q-numbers while still finding their children. Query already implements this correctly; added comments to clarify.

2025-11-13T16:59:27 - Botanical (B) Class Query - Corrected Scope

Action: Final correction - query searches ONLY the 17 base classes, not all curated Q-numbers

User Clarification: The query should find hyponyms of ONLY these 17 base classes:

  • Q167346 (botanical garden), Q43501 (zoo), Q27686 (aquarium), etc.

It should NOT find hyponyms of other curated items like:

  • Q2322153 (safari park - also in base list but example of scope)
  • Q158675 (biosphere reserve - also in base list)
  • Other curated Q-numbers that are NOT in the 17 base classes

Query Behavior:

  • Searches 17 specific base classes for subclass relationships
  • Excludes ALL 573 curated Q-numbers from results
  • Returns ONLY new hyponyms not in curated vocabulary
  • Does NOT search through non-base-class curated items

Files Created:

  • B/queries/botanical_query_updated_20251113T165927.sparql (9.8K)
  • B/queries/botanical_query_updated_20251113T165927.yaml (2.4K)

Previous Misunderstanding: Earlier documentation incorrectly suggested the query would find hyponyms of ALL curated Q-numbers. The query actually only searches the 17 base classes specified in the UNION clauses.

2025-11-13T17:14:50 - Botanical (B) Class Query - Fixed Missing LIMIT

Issue: New query returned 22,337 results (vs. 5,000 expected) because it lacked the LIMIT clause

Root Cause:

  • Original query: botanical_zoo_query_complete_20251113T130659.sparql had LIMIT 5000
  • Updated queries: Missing LIMIT clause, returned all results

Solution:

  • Added LIMIT 10000 to cap results
  • Used original query's 27 base classes (not the 17 from earlier attempt)
  • Maintained 573 Q-number exclusions (up from 390 in original)

Base Classes (27 total from original):

  • Q167346 (botanical garden), Q43501 (zoo), Q2281788 (aquarium), Q7712619 (arboretum)
  • Q181916 (herbarium), Q1970365 (natural history museum), Q20268591 (wildlife reserve)
  • Q179049 (nature reserve), Q46169 (national park), Q473972 (protected area)
  • Q158454 (biosphere reserve), Q21164403/Q9480202 (safari parks)
  • Q8085554 (wildlife sanctuary), Q2616170 (marine reserve), Q936257 (conservation area)
  • Q1426613 (seed bank), Q4915239 (biorepository), Q2982911 (natural history collection)
  • Q1905347 (gene bank), Q864217 (biobank), Q2189151 (soilbank)
  • Q8508664 (herbaria), Q11489453 (culture collection), Q23790 (natural monument)
  • Q386426/Q526826 (natural heritage)

Expected Behavior:

  • With 573 exclusions (vs. 390 original): Should return FEWER results
  • LIMIT 10000: Caps maximum results to prevent unbounded queries

Files Created:

  • B/queries/botanical_query_updated_20251113T171450.sparql (10.1K)
  • B/queries/botanical_query_updated_20251113T171450.yaml (2.9K)