Is Your Data GenAI Ready for Data Professionals with Paul Bolton

Executive Summary

This webinar addresses key challenges in preparing data for AI, focusing on semantic ambiguity and the risks of vague definitions. Howard Diesel introduces the Structured Definition Method, which clarifies concepts using subjects, predicates, and objects and aligns with ISO 11179 standards for multiple classifications. The pathway from definitions to models through iterative refinement with ArchiMate is explored, alongside the significance of industry ontologies and knowledge graphs. The webinar concludes by highlighting the implications of AI and LLM hallucinations on future semantic control.

Webinar Details

Title: Is Your Data GenAI Ready for Data Professionals with Paul Bolton
Date: 2026-02-19
Presenter: Howard Diesel & Paul Bolton
Meetup Group: African Data Management Community
Write-up Author: Howard Diesel

The Foundation—AI-Ready Data and Three Critical Barriers

The webinar begins, and Howard Diesel sets the context for the session within the larger AI-ready enterprise architecture course. AI-ready data requires four critical elements: proper business alignment, robust architecture, effective governance, and accurate corporate memory. The course flows from business architecture through information architecture to data architecture, technology implementation, and governance—all designed to work as an integrated system rather than isolated silos.

Three fundamental barriers prevent data from activating in AI systems. First, discovery challenges mean teams cannot find the data they need, even though it exists somewhere in the organisation. Second, context gaps occur when meaning doesn’t travel with data—raw values become meaningless without understanding what they represent, how they were derived, and when they remain valid. Third, trust issues stem from quality concerns and unclear data provenance.

These barriers aren’t technical problems—they’re semantic ones. Howard emphasises that when organisations rush to implement AI without addressing how business meaning is captured and shared, they create friction that undermines even sophisticated technical implementations. The information architecture module focuses specifically on bridging business strategy to execution, ensuring seamless flow from strategic intent through business concepts to technical data structures. This foundational understanding sets the stage for tackling the semantic friction trap that plagues most organisations.

Figure 1 Information Architecture

Figure 2 ‘Is Your Data Gen-AI Ready?’

Figure 3 From Strategy to Execution: AI-Ready Enterprise Architecture and Governance

Figure 4 Training Outcomes

Figure 5 Course Agenda (4+1 Days)

Figure 6 Business Architecture and Strategy Alignment

Figure 7 Module Objectives

Understanding Semantic Ambiguity—The Language Problem

Human language contains inherent ambiguities that plague both interpersonal communication and AI systems. Howard identifies four critical types of ambiguity. Acronym ambiguity affects nearly everyone—”XL” could mean extra-large, forty in Roman numerals, or Latin consonants. Syntactic ambiguity arises from sentence structure: “The chicken is ready to eat” could mean the chicken is prepared for consumption or that the chicken is hungry.

Lexical ambiguity stems from multiple word meanings—over eighty percent of common English words have multiple definitions. Terms like “record” could mean a document, a vinyl disc, or a database entry.

Semantic ambiguity occurs when stakeholders use identical terms but hold fundamentally different mental models. What sales calls a “customer” might be an “account holder” to finance or a “user” to IT.

Organisations operate through functional divisions, each developing its own terminology and data practices. Hans Hultgren identified this integration challenge: it’s not just technical but semantic. When different functions use different terms for the same concept or the same term for different concepts, data integration becomes enormously complex.

These ambiguities don’t just confuse humans—they cause AI hallucinations, incorrect analyses, and flawed recommendations. Howard stresses that addressing semantic ambiguity requires disciplined approaches to definition writing, moving beyond vague “hand-wavy” definitions that gesture toward meaning without providing precision.

Figure 8 ‘The Semantic Friction Trap’

Figure 9 Types of Ambiguity

Figure 10 Ambiguous Terminology leads to a lack of Consensus

The Dangers of Hand-Wavy Definitions and Premature Modelling

A critical problem in data architecture is “hand-wavy definitions”—vague descriptive paragraphs that bundle multiple concepts, omit critical details, and use ambiguous language, inviting multiple interpretations. Writing definitions as descriptive paragraphs rather than functional logic creates ambiguity, bundling, and gaps. Stakeholders nod in agreement while harbouring fundamentally different understandings.

Paul Bolton shares experiences re-engineering SAP databases where multiple fields labelled simply “item” required extensive investigation—purchase order line items versus sales order line items versus other types. Similarly, “movement type” codes lacked categorisation to indicate whether the stock left a pharmacy for patient care or for an inter-ward transfer. These challenges stem from jumping to conceptual modelling before understanding content and context.

Howard identifies a persistent pattern: data modellers read requirements, extract nouns and verbs, and immediately create entity-relationship diagrams. This premature abstraction embeds semantic ambiguity into data structures. When modellers create containers (tables, entities) before comprehending what those containers should hold, the resulting models reflect the modeller’s assumptions rather than business reality.

This rush-to-model approach causes “ROI leakage”—organisations invest in data strategies and products expecting returns, but value disappears when poor semantic foundations undermine initiatives. Analytics platforms deliver insights from inconsistent definitions, and compliance dashboards report violations under ambiguous policies. The solution: establish precise, agreed-upon definitions first, then let data structures flow naturally from those definitions.

Figure 11 The Cost of “Hand-Wavy” Definitions

Figure 12 ‘Why Conceptual Modelling Fails Early’

Figure 13 The Trap of Premature Abstraction

Figure 14 The “6 Friends” and SPO Framework

Dissecting a Failed Definition—The “It” and “And” Traps

Howard dissected a typical business definition riddled with problems: “A data management policy is a set of principles and guidelines used to ensure that our data is handled correctly across the university. It includes rules for security and quality, and it should be followed by everyone dealing with data to meet our regulatory needs.”

This definition contains six critical flaws. Pronoun poisoning: It leaves ambiguous whether ‘it’ refers to the policy, data, or university. Machines cannot reliably parse pronouns, and even humans struggle with them. Bundling: ‘principles and guidelines’ are lumped together without distinguishing their different characteristics—are they governed identically or differently?

Vague terms: “handled correctly” provides no actionable guidance. Another bundling: “security and quality” conflates mandatory rules with advisory guidelines. Missing actors: “everyone” is too broad for accountability—who specifically must comply? Passive logic: “should be followed” lacks enforcement clarity and doesn’t specify consequences.

Tools like Grammarly flag these issues—passive voice, ambiguous pronouns, vague language. Yet business professionals continue to write this way, viewing the passive voice as more formal or professional. This convention directly conflicts with the semantic precision needed for both human understanding and AI interpretation.

When definitions contain such flaws, AI systems cannot reliably parse them, and humans interpret them differently based on roles and contexts. The exercise demonstrates how common business writing practices systematically undermine both human understanding and AI effectiveness, necessitating a fundamentally different approach.

Figure 15 ‘The Narrative Trap – Why is it Bad?’

Figure 16 Anatomy of a Weak Definition

Figure 17 The “Before”: The Narrative Trap

The Structured Definition Method—Subject, Predicate, Object

The solution lies in the subject-predicate-object (SPO) framework combined with the “six friends” interrogation method. Every definitional statement should clearly identify what is being described (subject), what characteristic or relationship is being asserted (predicate), and what the subject relates to (object). This structure mirrors how knowledge graphs represent information.

The six friends framework interrogates every term through seven dimensions: classification (what is it?), hierarchy (what’s its parent?), why (purpose), who (actors), what (actions), when (timing), where (location), and how (method). Each question generates atomic statements following the SPO structure.

Howard demonstrates rewriting the flawed policy definition: “Data management policy is a governance document” (classification). “The data management policy enables data trust” (why). “The Data Governance Council authorises the data management policy” (who). “The data management policy defines data security rules” (what). “The data management policy provides data quality guidelines” (what—note the distinction between “defines rules” and “provides guidelines”). “The policy is reviewed annually” (when). “Internal audit verifies policy compliance” (how).

Each statement is independently verifiable. Stakeholders agree or disagree with individual assertions without rejecting the entire definition. Active voice is essential—it clearly identifies actors and actions, unlike passive voice, which obscures them. Active voice definitions translate directly into conceptual models and knowledge graphs. The SPO structure eliminates the “and” trap by requiring separate statements for distinct concepts, preventing bundling that can create ambiguity.

Figure 18 Active Voice: Driving Accountability

Figure 19 Passive Voice: The Root of Ambiguity

Figure 20 The Atomic Definition (Data Management Policy)

Figure 21 Find your 6 Friends

Figure 22 Find your 6 Friends pt.2

Figure 23 ‘The 6 Friends and SPO Framework’

Handling Multiple Classifications and ISO 11179 Standards

Real-world complexity emerges when stakeholders propose multiple classifications. William raises a manufacturing example: “front door” and “back door” serve different security and operational purposes. Should they have separate definitions or be treated as variations of one concept?

The structured method handles this through hierarchical classification with qualifiers. “Door” receives its own definition, establishing common characteristics. “Front door is a door” creates parent-child inheritance, adding front-door-specific attributes. “Back door is a door” does the same. The danger lies in bundling multiple purposes into a parent definition that doesn’t apply to all children. When a child concept doesn’t inherit all parent characteristics, the taxonomy is broken.

ISO 11179 provides standard structures for naming and qualifying terms: class (general category), qualifier (specific type), core term (entity name), modifier (additional distinction), and type (format). For example: “Annual Financial Report Document” breaks down as class (Document), qualifier (Financial), core term (Report), modifier (Annual).

This structure resolves SAP’s “item” problem—instead of the generic “item,” proper naming yields “Purchase Order Line Item” and “Sales Order Line Item,” immediately clarifying their meanings. Howard demonstrates analysing different stakeholder definitions of insurance “premium”—some view it as an amount, others as payment, others as a charge.

These semantic differences must be surfaced and resolved. The structured method helps by presenting each perspective as separate statements that can be compared and harmonised, and by establishing shared definitions that accommodate legitimate perspectives while eliminating ambiguity.

Figure 24 Corrected Definition: Policy

Figure 25 Module 2: Information Architecture

Figure 26 Module Objectives

From Definitions to Models—Iterative Refinement with ArchiMate

The previous week, Paul Grobler (‘Is Your Data GenAI Ready for Data Citizens with Paul Grobler’) demonstrated how to use Archi, an open-source enterprise architecture modelling tool, to develop conceptual models iteratively. He shows three versions of a product and customer model: each refined through questioning based on the six friends framework.

Version one showed a simple hierarchy: Product → Product Category → Customer → Market Segment. Version two added relationships and attributes after systematic interrogation of each term. Version three, refined through discussion with Howard, revealed that Product Category relates directly to Market Segment (products are offered to market segments), not just through customers. This critical insight emerged from systematically asking the six friends questions.

The iterative process reveals the framework’s power. By asking “what, why, who, when, where, how” about each concept, gaps and ambiguities surface. The model evolves from initial assumptions to an accurate representation of business reality. Paul notes that even the verb “associates” proved too weak—”offers” more precisely describes how product categories relate to market segments.

An attendee shared an observation that Archi enables object reuse: once a term is defined with its relationships, anyone pulling it into a model automatically includes those relationships, preventing inconsistency and “inventing stuff.” The information architecture framework establishes business glossaries, semantic layers, conceptual models, and structured knowledge through taxonomies and ontologies. Clear ownership and stewardship are essential—definitions require accountable authors, reviewers, and approvers following governed processes, not endless consensus-seeking.

Industry Ontologies, Knowledge Graphs, and Temporal Complexity

Organisations must align with industry-standard ontologies like FIBO (Financial Industry Business Ontology) and ACORD (insurance). FIBO defines “policy” as a contract of insurance, distinguishing it from a quote and an application, and includes concepts such as insured party, insurer (carrier), policy coverage, and claim adjuster. Howard shows mapping exercises between internal terms and standard ontologies, enabling industry benchmarking, regulatory reporting, and partner integration.

An attendee raises questions about knowledge engineering. Howard confirms SPO statements are the triples that populate knowledge graphs—each subject-predicate-object statement becomes a node-edge-node relationship.

A Taxonomy provides hierarchical classification (a car is a vehicle, a vehicle is an asset), while an ontology adds cross-hierarchy relationships (a car is manufactured by a company, purchased by a customer). Knowledge graphs implement ontologies computationally, enabling AI to reason about concepts.

Another attendee raises a critical point about temporal complexity: definitions change over time. A “high-risk claim” exceeding $100,000 may later be redefined as exceeding $250,000. Historical claims analysed under old definitions yield different results than under new definitions, affecting trend analysis and compliance reporting.

The solution to this issue requires version management, capturing not just current definitions but historical versions with effective dates. Knowledge graphs must support temporal queries: “What did this term mean on this date?”

This complexity reinforces the importance of a structured definition. When definitions are vague paragraphs, detecting changes is impossible. When definitions are atomic SPO statements, version control becomes tractable—specific statements are added, modified, or deprecated with clear effective dates.

Figure 27 Taxonomies

Figure 28 Structured Knowledge: Use Cases and Benefits of Implementing Taxonomies and Ontologies

Figure 29 ACORD Reference Architecture

Figure 30 Mapping to ACORD Model

Figure 31 Industry Comparisons

AI, LLM Hallucinations, and the Future of Semantic Control

An attendee poses a critical question: Should we ask AI for definitions of terms it uses, not just answers? Howard identifies three causes of LLM hallucination. First, user prompts use ambiguous terminology—the “it” and “and” traps. LLMs try to bridge gaps, but sometimes they infer incorrectly.

Second, LLMs determine new facts not in their training data or ontologies. Without semantic controls, hallucinated facts are presented with the same confidence as verified information. Third, LLMs access poorly structured data where the semantic meaning is unclear.

Three solution approaches are emerging: Prompt preprocessing challenges users to clarify ambiguous prompts before sending to LLMs. Ontology checking validates new LLM-generated facts against established knowledge graphs—if a relationship doesn’t exist in the ontology, it’s flagged for human review. Knowledge graph routing directs LLMs to query knowledge graphs rather than SQL databases, leveraging explicit semantic relationships.

The “LLM as judge” pattern implements validation: one LLM generates content, another evaluates it for truthfulness or policy compliance before presenting to users. These approaches only work when underlying ontologies are built on precise, structured definitions.

The fundamental debate: Can AI learn business semantics from unstructured content, or will humans always need to provide explicit definitions? An attendee argues that humans must maintain semantic control because they have worldviews and contexts that AI can only glimpse. If humans don’t provide context, AI will construct its own understanding and eventually dictate how the world should be understood—a concerning flip. The battle line isn’t human versus AI but between disciplined semantic management and chaos.

Figure 32 Information Architecture versus Data Architecture

Table Of Contents

Executive Summary
- Webinar Details
The Foundation—AI-Ready Data and Three Critical Barriers
Understanding Semantic Ambiguity—The Language Problem
The Dangers of Hand-Wavy Definitions and Premature Modelling
Dissecting a Failed Definition—The "It" and "And" Traps
The Structured Definition Method—Subject, Predicate, Object
Handling Multiple Classifications and ISO 11179 Standards
From Definitions to Models—Iterative Refinement with ArchiMate
Industry Ontologies, Knowledge Graphs, and Temporal Complexity
AI, LLM Hallucinations, and the Future of Semantic Control

If you would like to join the discussion, please visit our community platform, the Data Professional Expedition.

Additionally, if you would like to watch the edited video on our YouTube please click here.

If you would like to be a guest speaker on a future webinar, kindly contact Debbie (social@modelwaresystems.com)

Don’t forget to join our exciting LinkedIn and Meetup data communities not to miss out!