Glossary & Semantic Workbench for Data Citizens

Key Takeaways

Establishing Boundaries and Seeding: Define clear domain boundaries first, then use AI to generate core terms from authoritative texts.
The Power of the SPO Structure: Break complex definitions into atomic Subject-Predicate-Object structures for clear, machine-auditable semantic models.
Streamlined SME Approval: The granular SPO structure streamlines SME sign-off, enabling detailed review of definitions one sentence at a time.
Operational Integration: A glossary workbench integrates approved terms into workflows, providing instant access to definitions for employees.
Controlling AI Hallucinations: To ensure consistency, organisations must enforce a specific semantic model for AI’s decision-making.
Resolving Terminology Conflicts: Polysemy in data management complicates clarity; centralised glossaries and semantic lineage aid resolution and multilingual support.
Automated Knowledge Pipelines: Power users can build advanced pipelines between Claude and NotebookLM for efficient content creation.
Ethical Data Practices: Implementing ethical frameworks like FAIR and CARE promotes transparency and community involvement in data collection.

Webinar Details

Title: Glossary & Semantic Workbench for Data Citizens
Date: 2026-06-11
Presenter: Howard Diesel
Meetup Group: African Data Management Community
Write-up Author: Howard Diesel

What is the Importance of Defining Semantic Boundaries?

Developing a robust semantic model or business glossary necessitates strictly defined boundaries to prevent structural degradation. For instance, when establishing terminology for the Data Management Body of Knowledge (DMBOK), delineating the precise scope was critical. Following the establishment of these parameters, the “seeding” phase commences.

Organisations can utilise artificial intelligence platforms, such as Claude, to scan authoritative texts and generate a foundational list of terminology. This initial extraction might yield approximately 60 to 75 core terms, alongside their corresponding knowledge areas and source origins. This structured preliminary research provides a definitive baseline for cultivating an enterprise-grade semantic glossary.

Figure 1 The Architecture of Meaning

Figure 2 The Enterprise Data Glossary Blueprint from Fragmented Terms to Institutional Truth

Figure 3 Undefined Terms

How is the SPO Architecture used in Definitions?

Following terminology extraction, definitions must be constructed using a rigorous Subject-Predicate-Object (SPO) architecture. This methodology deconstructs complex conceptual definitions into atomic, interrelated statements. For example, the definition of “data” is articulated through individual SPO assertions: data is a representation, it represents facts, and it supports decision-making.

Assigning verifiable sources, such as specific DMBOK chapters or ISO 11179 standards, to each SPO statement guarantees comprehensive traceability. Furthermore, this granular framework enables hierarchical taxonomy mapping, permitting the classification of an automated agent as an “actor”. Ultimately, these systematic definitions can be exported as turtle files (.ttl) to construct an advanced semantic model or knowledge graph.

Figure 4 Data Management Glossary: Terms

Figure 5 Write back to Excel

Figure 6 Foundational Concepts – Data Structure Concepts

Figure 7 “Decision Making” Knowledge Graph

How does the SPO Structure Facilitate Glossary Reviews?

A semantic glossary demands rigorous alignment with business operations, necessitating formal approval from Subject Matter Experts (SMEs). The SPO structure facilitates an efficient review process by allowing SMEs to evaluate definitions iteratively, one atomic sentence at a time. Utilising a specialised glossary workbench, reviewers can independently accept, reject, or amend specific statements, such as “a data asset produces value”.

This granular approach systematically resolves interpretative disputes. Once the constituent SPO statements receive authorisation, they are synthesised into a cohesive “narrative” definition. This generates a naturally readable statement that remains strictly governed and entirely machine auditable.

How does the Glossary Enhance Operational Connectivity?

An effective enterprise glossary must integrate directly with functional business architecture. The glossary workbench facilitates the complete terminology lifecycle, encompassing drafting, SME review, formal approval, and eventual deprecation. Beyond governance, the primary value of this system resides in its operational connectivity. Authorised definitions can be exported and embedded directly into enterprise applications.

For example, within a trade finance application architecture, complex terminology such as “AI model card” or “intelligent document processing” can be incorporated as hyperlinks. Consequently, business personnel encountering unfamiliar terminology can seamlessly access the formally governed definitions, thereby bridging the divide between data management theory and practical application.

Figure 8 “Formalised Form” Knowledge Graph

Figure 9 Decision Notes

Figure 10 Business Architecture: “Why build it, and what it must do”

Figure 11 Download a CSV if Every Term and its Deep Links

Figure 12 Lexicon – SPO Workbench

Figure 13 ISO/IEC 11179 – Data Element

How does AI Impact Semantic Models in Enterprises?

The proliferation of Artificial Intelligence has elevated semantic models to a critical enterprise requirement. Unlike traditional deterministic software systems, AI operates non-deterministically, possessing the capacity to generate variable outputs across iterations. Large Language Models possess their own “latent ontology,” which represents a generalized understanding of terminology derived from global training datasets.

Without a formalised semantic framework, AI models risk substituting enterprise business rules with their own assumptions. Therefore, it is imperative for organisations to overlay an aligned, structural ontology onto these systems. This ensures that the AI’s operations are strictly confined within the organisation’s deterministic boundaries and approved terminology.

Figure 14 The Mirroring Effect

Figure 15 The Minimal Ontology Principle

How can we Resolve Conflicting Terminology Definitions?

Enterprise glossary development requires the meticulous reconciliation of terminology across disparate departmental knowledge areas. A pervasive challenge is polysemy, where identical terms possess conflicting definitions. For example, the term “entity” is frequently misapplied; it can designate an instance in set theory, a table in database modelling, or a legal identifier within a party model. These contradictory interpretations precipitate substantial operational confusion.

To mitigate this, a centralised workbench enables the elevation of universally applicable terms to an upper ontology, thereby restricting sub-glossaries from generating redundant classifications. Tracking the semantic lineage of terminology is also vital for resolving cross-cultural and multilingual linguistic discrepancies.

Figure 16 What the Workbench is for

Figure 17 The Three Semantic Lineages of the Data “Entity”

Figure 18 Six Core Concepts

What is the Iterative Process of Semantic Modelling?

Developing a semantic model is an iterative, research-intensive procedure. Users can deploy artificial intelligence, such as Claude, to dissect comprehensive frameworks like the DMBOK. However, because AI adheres to intellectual property constraints, it generates faithful paraphrasing rather than verbatim reproductions. Consequently, glossary developers must rigorously validate these AI-generated definitions against the original source texts.

This extraction process necessitates a sophisticated interrogation of linguistic structures, requiring complex noun phrases to be isolated to maintain the strict SPO methodology. Through iterative taxonomy structuring—such as categorising text, sound, and images beneath a “formalised form” parent classification—developers achieve an enhanced understanding of the atomic relationships uniting business concepts.

Figure 19 Defining a Term with SPO

Figure 20 From Terms to Decision Notes

Figure 21 Data

Figure 22 Verifying the Output of NotebookLM

How can NotebookLM Enhance Semantic Knowledge Distribution?

Following the formal documentation of semantic definitions, organisations must effectively distribute this knowledge. NotebookLM can synthesise complex glossary structures into accessible formats, including explainer videos, audio overviews, and Standard Operating Procedures. By providing NotebookLM with “woven prose”—definitions engineered from governed nouns and differentiated reuse—the resulting outputs achieve exceptional clarity and machine auditability.

Furthermore, developers can engineer advanced integration pipelines between different AI models.
By configuring custom “skills,” Claude can be programmed to simulate browser interactions, enabling it to autonomously coordinate with NotebookLM. This allows Claude to automatically dispatch refined semantic data into NotebookLM to generate derivative educational assets.

Figure 23 Using NotebookLM to generate Short Videos

Figure 24 Business Glossary Standard Operating Procedure

Figure 25 Boundary Topology Separates Enterprise-wide Truth from Domain-specific Vocabulary

Figure 26 “NotebookLM” Skill in Claude

Figure 27 Using the NotebookLM Skill in Claude

How can AI Improve Ethical Data Management Practices?

The final phase of semantic data management involves leveraging specialised AI models and adhering to robust ethical frameworks. Advanced AI variants, such as Claude’s “Fable” model, demonstrate exceptional utility for laborious analytical tasks and critical “red teaming” evaluations. Concurrently, strict ethical governance is paramount when executing foundational data collection or community surveys.

Data initiatives must comply with FAIR principles, ensuring data remains auditable, accessible, and reusable. Additionally, CARE principles mandate that researchers consult community leadership prior to data extraction to prevent adverse social consequences. Integrating these ethical standards with AI-assisted research tools ensures enterprise data management practices remain structurally robust and socially responsible.