Key Takeaways
- Shifting from narrative wikis to structured data dictionaries validates terminology in extensive glossaries like DAMA DMBOK.
- A six-level quality framework systematically validates business glossaries from atomic terms to data evaluation.
- SPO triples provide clarity in definitions, reducing misinterpretation and eliminating complex jargon in text.
- Using “Genus” and “Differentia” ensures clarity by distinguishing a term’s category and unique characteristics.
- Zero-ambiguity prose clarifies rigid SPO models into understandable definitions, removing vagueness and passive voice.
- Incorporating real-world examples enhances clarity, as seen in FCOIM methodologies for data modelling.
- SHACL and SPARQL enable direct validation of data terminology against operational data, enforcing constraints effectively.
- Dynamic knowledge graphs identify “terminology drift” by comparing baseline models with evolving real-world data.
- Separating lexical and ontic models clarifies overlapping domains, enhancing real-world ontology management and classification.
- Glossary maturation requires careful management, guiding users from vague definitions to precise data standards.
Webinar Details
Title: Glossary & Semantic Workbench for Data Professionals
Date: 2026-06-18
Presenter: Howard Diesel
Meetup Group: African Data Management Community
Write-up Author: Howard Diesel
How can we Validate Glossary Definitions Effectively?
Validating glossary definitions is critical when upgrading data dictionaries, such as the DAMA DMBOK version 3 release.
The DAMA dictionary upgrade involves over 2,400 key terms and dictionary elements. However, many raw definitions suffer from ambiguous language, such as using terms like “non-information”. To solve this, data professionals must implement validation frameworks to assess and improve the quality of their business glossaries.
Key Takeaways
- Assess terminology libraries for vague language before integration.
- Implement structured validation to improve dictionary elements.
FAQ
- Why do business glossaries need validation? Glossaries require validation to remove ambiguous descriptions and ensure organisational alignment on data terminology.
Figure 1 The Architecture of Meaning
Figure 2 5 Levels of Business Glossary Quality
What is a Robust Business Glossary Framework?
A robust business glossary requires a six-level quality framework ranging from atomic term structures to operational data alignment.
Validating a glossary involves breaking terms into triples consisting of a Subject, Predicate, and Object. This ensures correct genus and differentiating characteristics, preventing duplicate concepts. The six levels of quality checks span from the structural syntax of individual terms to intra-glossary completeness, semantic model consistency, and finally, operational data validation. This strict framework prevents AI hallucinations and ensures business data stewards remain aligned.
Key Takeaways
- Break terms into Subject-Predicate-Object triples.
- Execute progressive quality checks from term prose to data evaluation.
FAQ
- What are the six levels of glossary quality? The six levels progress from atomic Subject-Predicate-Object statements to prose definition, intra-glossary completeness, multi-glossary semantic models, and operational data validation.
Figure 3 SPO Statements
Figure 4 SME Review: Definition
Figure 5 Quality Review: L1 – Statements
What is Level 1 Validation in Narratives?
Level 1 validation eliminates ambiguous narrative wikis by strictly enforcing Subject-Predicate-Object (SPO) frameworks and verifying lexical integrity.
Unstructured definitions create ambiguity through compound nouns and subjective pronouns. Lexical integrity is maintained using tools like WordNet to quarantine unverified nouns or fabricated jargon. Furthermore, treating every predicate phrase as a defined relationship ensures strict data quality conformity and highlights undefined entity objects.
Key Takeaways
- Replace narrative definitions with strict SPO triples.
- Quarantine fabricated verbs using external dictionaries like WordNet.
- Define predicates explicitly to establish exact relationship rules.
FAQ
- How does Level 1 glossary validation improve clarity? It removes ambiguous compound nouns and pronouns, replacing them with distinct Subject-Predicate-Object components.
Figure 6 Business Glossary & Semantic Model Quality Framework Mindmap
Figure 7 Business Glossary & Semantic Model Quality Management Slide Deck
Figure 8 Architecting Clarity
Figure 9 Achieving the Structural Blueprint
Figure 10 The 6-level Semantic Quality Architecture
Figure 11 Level 1 Enforces Atomic Subject-predicate-object Triples
What Defines Genus and Differentia in Validation?
Level 2 validation secures structural rigour by defining a term’s genus (parent) and differentia (purpose) to eliminate overlapping definitions.
If a data steward cannot differentiate between sibling terms, an overlap exists. Level 2 quality checks demand precise definitions for term life cycle states and sub-states. For attributes, adherence to ISO 11179 standards ensures clear data types, codes, and minimum or maximum constraints. For relationship terms, this phase validates the subject domain range, object size, and specific cardinality boundaries.
Key Takeaways
- Use Genus Differentia to distinguish sibling terms and stop duplication.
- Apply ISO 11179 rules for consistent attribute data typing.
- Establish strict domain ranges and cardinality for relationships.
FAQ
- What is Genus-Differentia in data terminology? It is a structural definition method where “Genus” identifies the parent term, and “Differentia” defines the specific purpose that separates it from sibling definitions.
Figure 12 Validation Cockpit
Figure 13 Level 2 Structures Terms using Genus-Differentia
How does Level 3 Ensure Zero-ambiguity Prose?
Level 3 transforms rigid Subject-Predicate-Object structures back into zero-ambiguity prose that business users can easily consume.
Strict semantic structures must eventually become readable text that aligns perfectly with the underlying SPO statements. Utilising the Fully Communication Information Modelling (FCOIM) methodology, professionals must attach concrete, real-world examples to every defined model. This assisted fidelity prevents misinterpretations, ensuring that when business stakeholders read the model, the examples perfectly clarify the intended meaning.
Key Takeaways
- Draft definitions directly from verified SPO statements.
- Implement FCOIM methodology to attach real-world examples to semantic models.
- Eliminate pronouns and enforce an active voice format.
FAQ
- What is FCOIM modelling? Fully Communication Information Modelling (FCOIM) is a fact-based data modelling method that reverse-engineers concepts directly from clear terminology and concrete examples.
Figure 14 Structural Validation Adapts Strictly to the Term Type
Figure 15 Level 3 Translates Structural Rigour into Zero-ambiguity Prose
What are Levels 4 and 5 in SHACL?
Levels 4 and 5 focus on localised domain governance and cross-glossary relationships using Shapes Constraint Language (SHACL).
Level 4 investigates localised domains for circular references, disjoint parents, and missing data stewardship. Moving to Level 5 tackles sideways dependencies, which is the wicked problem of balancing overlapping domains. This is governed via SHACL, which functions similarly to an XML Schema Definition (XSD) to determine namespace rules. SHACL utilises node shapes to dynamically suggest new terminologies from existing graphs and strictly control how data is constructed.
Key Takeaways
- Audit domains for circular references and unassigned stewardship.
- Utilise SHACL to define and enforce node shapes within graphs.
- Control cross-domain overlap to minimise duplicated definitions.
FAQ
- What is SHACL used for in data glossaries? SHACL (Shapes Constraint Language) establishes rules and node shapes to control namespace structures and validate knowledge graphs.
Figure 16 Validation Cockpit: Steward Worklist
Figure 17 Level 4 Validates the Localised Domain for Logical Consistency
Figure 18 Level 5 Engineers an Interoperable Enterprise Semantic Model
How does Level 6 Detect Terminology Drift?
Level 6 moves glossary governance directly into operational data records to detect terminology drift using dynamic knowledge graphs.
Semantic models are rigorously tested against actual structured and unstructured data using SHACL for validation and SPARQL for query extraction. To identify terminology drift, organisations deploy a baseline rule-based knowledge graph alongside a dynamic graph built via LLM ingestion. By comparing these graphs continuously using non-deterministic, dynamic thresholds, teams receive early warnings when definitions fall out of sync with actual operational data.
Key Takeaways
- Validate semantic models against live operational data using SPARQL queries.
- Compare baseline knowledge graphs against LLM-driven dynamic graphs.
- Establish dynamic thresholds to detect and isolate terminology drift.
FAQ
- How do knowledge graphs detect terminology drift? By comparing a static, rule-based baseline graph against a dynamic graph populated by live data, discrepancies in terminology usage are automatically flagged.
Figure 19 Resolving Semantic Overload via Local NodeShape Architecture
Figure 20 Level 6 Compiles the Blueprint into Run-time SHACL Firewall
Figure 21 The Master Orchestrator Executes a Fail-Fast Semantic Pipeline
Figure 22 Stopping AI Drift: Dynamic Hallucination Detection
Figure 23 Engineered Quality Guarantees the Reliability of Enterprise AI
Figure 24 Shifting Left: Blocking Errors before Term Assembly
What is a Hybrid Model for Ontology Management?
Modern ontology management requires a hybrid model that separately governs lexical semantics and ontic assertions.
While earlier initiatives focused strictly on physical things rather than strings of text, advanced semantic modelling requires managing both elements. By shifting lexical language elements to a dedicated layer, the core ontology can cleanly govern ontic declarations, like entity resolution. This hybrid T-box structure enables shared domains to reuse terms consistently without creating tangled, circular dependencies.
Key Takeaways
- Split semantic models into distinct lexical (language) and ontic (real-world) layers.
- Map core facets dynamically to prevent unchecked domain overlap.
- Ensure the meaning of a defined word remains universally consistent downstream.
FAQ
- What is the difference between lexical and ontic models? The lexical model manages language, linguistics, and translations, while the ontic model manages declarations about real-world physical entities.
Figure 25 Semantics. Structure. Significance.
Figure 26 Semantium Protocols Mindmap
Figure 27 The Initial Promise of Knowledge Graphs Collided with the Messy Reality of Human Language
Figure 28 The True Bottleneck to Enterprise Adoption is Lateral Extension
Figure 29 Introducing a Separation of Concerns for Modern Semantic Architecture
Figure 30 Defining the Atomic Engine of Communication
How can we Improve our Data Dictionary Effectively?
A mature semantic dictionary bridges the gap between raw business definitions and strict data modelling while actively managing human behavioural changes.
Transforming a basic glossary into a Level 5 data dictionary introduces robust rigour that outpaces traditional modelling tools. However, upgrading definitions requires empathetic change management, as users transitioning from legacy terminologies may resist strict semantic frameworks. By recording informal business language and leveraging LLMs to map loose phrasing to standardised predicates, data architects can iteratively mature their environments.
Key Takeaways
- Acknowledge that dictionary maturity is a gradual, multi-year progression.
- Use AI to translate informal business phrasing into formalised predicates.
- Track term origins to facilitate organisational change management.
FAQ
- Can you generate a data model directly from a business glossary? Yes, when a glossary achieves Level 5 structural maturity using strict triples and predicate rules, it possesses the rigour needed to inform and validate data models.
Figure 31 Hierarchy
Figure 32 Validation Cockpit: Rewrites
Figure 33 Hierarchy: Focused
Figure 34 Local Graph: Data
- Key Takeaways
- How can we Validate Glossary Definitions Effectively?
- What is a Robust Business Glossary Framework?
- What is Level 1 Validation in Narratives?
- What Defines Genus and Differentia in Validation?
- How does Level 3 Ensure Zero-ambiguity Prose?
- What are Levels 4 and 5 in SHACL?
- How does Level 6 Detect Terminology Drift?
- What is a Hybrid Model for Ontology Management?
- How can we Improve our Data Dictionary Effectively?
If you would like to join the discussion, please visit our community platform, the Data Professional Expedition.
Additionally, if you would like to watch the edited video on our YouTube please click here.
If you would like to be a guest speaker on a future webinar, kindly contact Debbie (social@modelwaresystems.com)
Don’t forget to join our exciting LinkedIn and Meetup data communities not to miss out!