Key Takeaways
- The MIDCOT Framework: A comprehensive approach to AI governance emphasising upstream training data alongside established industry standards.
- AI Model Apophagy: AI models can degrade over time by retraining on their own biased and misleading outputs.
- Three Layers of Bias Propagation: Errors permeate models on micro, meso, and macro levels, affecting prompting, architecture, and dissemination.
- Continuous Distributional Accuracy: The ALOGF framework and MIDCOT monitor ongoing data drift, assessing bias with the Bias Amplification Rate.
- Statistical Process Control (SPC): Organisations can set control limits tailored to their industry, alerting them to potential data failures.
- The AI Agent Paradox and Cost-Benefit: MIDCOT calculates costs for data quality improvements, enabling optimised budget planning and economic equilibrium.
- Corporate AI Deployment Failures: Corporate models often fail due to gaps between experts and engineers and inadequate data quality assessment.
- Model Size and Hybrid Architecture: A hybrid model combining large linguistic capabilities and small contextual accuracy offers optimal performance.
- Retrieval-Augmented Generation (RAG): Implementing RAG clearly defines context windows, minimising unverified assumptions and reducing model hallucinations effectively.
- The Imperative of “Human-in-the-Loop”: Organisations should prioritise “AI augmentation” with human oversight for complex decisions, not full automation.
Webinar Details
Title: Closing the AI Governance Gap: Monitoring Data Quality Drift with Dale Rutherford
Date: 2026-03-16
Presenter: Dale Rutherford
Meetup Group: DAMA SA Big Data
Write-up Author: Howard Diesel
What is the “MIDCOT” Framework in the Context of AI?
The webinar commenced with an introductory dialogue regarding the complexities of data management and AI governance. Howard Diesel recounted a previous interaction in which Dale Rutherford expressed reluctance to share specific insights for fear of contradicting him, which ultimately prompted a productive exchange of research. Consequently, Dale presented his “MIDCOT” framework, which offers a holistic, umbrella approach to the governance of Large Language Models (LLMs) and artificial intelligence.
Dale also delineated differing analytical perspectives, contrasting the view of an LLM as a judge against Dale’s proposition of AI functioning as an auditor. Furthermore, he highlighted that his comprehensive governance framework intersects with other established industry methodologies, notably the “conformed dimensions” developed by participant Dan Myers. This foundational exchange established a collaborative and objective environment for examining the future of AI reliability and structural data frameworks.
Figure 1 Governing Data Quality in AI Corpora
What is Apophagy in the Data Lifecycle?
A primary focus of the session was the introduction of the AI model “apophagy,” a condition in which an AI model progressively consumes its own output, leading to irreversible decay. Once this cyclical degradation commences, mitigation strategies may cause the decay to plateau, but they cannot restore the model to its original state of integrity. To address this systemic vulnerability, Dale mapped the AI lifecycle against the traditional data lifecycle, identifying precise touchpoints where bias, misinformation, and errors propagate. He categorised these touchpoints into three structural layers.
The micro layer involves the human interface, where cognitive biases influence machine prompting. The meso layer encompasses the model’s architectural mechanics, including Retrieval-Augmented Generation (RAG) processes. Finally, the macro layer pertains to the dissemination of model outputs into public domains, such as research papers and social media, which are subsequently re-ingested as new training data. This recursive process amplifies bias and reinforces misinformation, necessitating upstream governance interventions to mitigate errors before they manifest in final outputs.
Figure 2 The Data Quality Gap in AI Systems
Figure 3 Conceptualising BME Propagation and Amplifications in LLMs
Figure 4 BME Framework
What is Human Bias in the ALOGF Framework?
The discussion advanced to draw parallels between artificial intelligence dynamics and human cognitive biases, specifically comparing AI reinforcement behaviours to sociological “groupthink,” wherein contradictory perspectives are suppressed to maintain consensus. To address these data quality deficiencies in current AI governance, Dale presented the ALOGF framework, which incorporates the MIDCOT component. This framework utilises the Bias Amplification Rate (BAR) to continuously measure whether systemic bias is increasing or decreasing during model cycling, in contrast to traditional methodologies that evaluate only static snapshots.
Furthermore, MIDCOT explicitly targets the foundational stage of the AI lifecycle: the training corpora. By assessing whether new training iterations improve or degrade the model, MIDCOT shifts the evaluative focus from traditional record-level correctness to dynamic distributional accuracy over time. For these novel metrics to provide legitimate governance value, participants emphasised that they must strictly align with recognised industry standards and regulatory guidelines.
Figure 5 The Data Quality Gap in AI Systems
Figure 6 Governance Gap
Figure 7 ALAGF: Adaptive Lifecycle Agentic Governance Framework
Figure 8 Bridging Data Quality Frameworks and AI Governance
Figure 9 Alignment with Existing AI Governance Standards
What are the benefits of AI in SPC?
To proactively monitor data quality and pre-empt systemic failures, the framework advocates applying Statistical Process Control (SPC), a methodology derived from Six Sigma. By establishing statistical baselines, organisations can define explicit upper and lower control limits; if data quality metrics drift toward these parameters, the system generates proactive alerts before a critical breach occurs. Importantly, these control limits are scalable and must be calibrated to an organisation’s specific risk tolerance, which varies significantly between industries, such as toy manufacturing versus pharmaceuticals.
Dale subsequently addressed the “AI Agent Paradox”. While deploying an AI model with 70% accuracy is relatively expedited, striving for absolute precision requires an exponential investment of time, capital, and resources. Recognising that flawlessly unbiased data is statistically unattainable, organisations must identify an economic equilibrium. The MIDCOT framework facilitates this cost-benefit analysis, enabling stakeholders to calculate the precise financial investment required to achieve the next incremental unit of data integrity.
Figure 10 Modern Data Readiness Funnel for AI and Machine Learning
Figure 11 MIDCOT Monitoring Workflow
Figure 12 MIDCOT Framework Process Flow
Figure 13 Scenario: Detecting Corpus Contamination
Figure 14 The AI Agent Paradox
Figure 15 MIDCOT – Cost-Optimisation Framework
Figure 16 Governance Implications for Data Professionals
What are corporate AI failures and challenges?
Dale examined systemic issues in corporate data management. It was noted that financial institutions often utilise third-party data quality services but intentionally withhold corrective feedback to maintain their competitive edge. Concurrently, large corporations restrict external models from training on proprietary data due to security concerns; however, this lack of fresh training variation causes the isolated models to deteriorate over time. Dale expanded upon the primary catalysts for corporate AI deployment failures.
A significant operational gap exists between subject matter experts, such as finance professionals, and AI engineers. Business leaders frequently operate under the erroneous assumption that AI models are inherently accurate, failing to recognise that these systems are entirely dependent on the quality of their training corpora. Because organisations typically lack rigorous frameworks to measure baseline quality or monitor longitudinal decline, their AI deployments lack strategic direction and operate without an understanding of their origin, trajectory, or ultimate destination.
Figure 17 MIDCOT: DQ/DI Features, Benefits, and Cost-Effciency
Figure 18 Anti-Autophagy Monitor
What is the Autophagy Model Simulation?
To quantitatively illustrate model decay, Dale presented an “apophagy” simulation. The analytical results demonstrated that if an AI model undergoes quarterly retraining cycles without substantive quality intervention, its functional integrity decays to zero within approximately four and a half years. Increasing the frequency to monthly retraining cycles accelerates this collapse, reducing the model’s viability to merely a year and a half. Furthermore, the simulation indicated that, in the absence of proper governance, algorithmic bias rapidly approaches 100%.
While structured interventions can decelerate this decay and partially mitigate bias, the damage inflicted by contaminated data pools cannot be fully reversed. Consequently, the imperative is to implement upstream governance at the level of training corpora. To facilitate this, Dale demonstrated the MIDCOT tool, which utilises twelve distinct diagnostic probes to establish baseline metrics. These probes function diagnostically to stress-test the model, identifying specific vulnerabilities and instances of drift within the training corpus, thereby enabling early remediation.
Figure 19 Anti-Autophagy Monitor – Restraining Frequency – Annual
Figure 20 Anti-Autophagy Monitor – Restraining Frequency – Monthly
Figure 21 Anti-Autophagy Monitor – Restraining Frequency – Quarterly
Figure 22 MIDCOT Pipeline Overview
Figure 23 Probe Scorecard
Figure 24 Probe Interpretation
Figure 25 Intervention Cost Curves
What are the MIDCOT Demo Dimensions?
The MIDCOT framework extends beyond basic error detection by providing actionable financial metrics, allowing organisations to assign precise cost values—such as $120 per unit—to data quality improvements. This functionality optimises budgetary planning for mitigating data drift. Additionally, the system incorporates a Governance Ledger that comprehensively records all alerts and remediation activities, ensuring auditable oversight. The discourse then integrated Dan Myers’ concept of “conformed dimensions,” a structural framework designed to unify and standardise data quality terminology.
While conformed dimensions were originally conceptualised for static, structured environments, the panel concluded that the two methodologies are highly complementary. Conformed dimensions can establish foundational quality definitions, whereas MIDCOT provides the necessary temporal monitoring for dynamic AI systems. Dale also emphasised an emergent industry requisite: expanding these quality governance frameworks beyond structured databases to address the complex variables inherent in unstructured generative outputs, such as audio and video.
Figure 26 Governance Tier Distribution
What are Data Measurement Targets in Modelling?
A critical technical clarification was established regarding MIDCOT’s operational parameters: the tool specifically analyses the incoming training data rather than the LLM’s final generated output. This distinction is critical because continuous corruption of training corpora in LLMs leads to catastrophic structural collapse, unlike traditional databases, which simply experience gradual degradation. This functional reality prompted an analytical debate regarding model scale. Smaller, localised models are generally easier to govern and maintain; however, their restricted training corpora can induce overly deterministic behaviour, repeatedly defaulting to narrow datasets rather than generating probabilistic insights.
Conversely, large external models benefit from massive data variation, which theoretically neutralises bias. Nevertheless, these expansive models are constrained by “context windows”—mechanisms that retrieve only fragmented segments of information in response to prompts, potentially omitting necessary context. Ultimately, a hybrid architecture was proposed, leveraging the advanced linguistic capabilities of massive external LLMs while utilising highly governed, localised corpora to ensure precise, domain-specific contextual accuracy.
What are Context Windows and Human-in-the-Loop?
In the concluding segment, the panel evaluated the strategic application of Retrieval-Augmented Generation (RAG). By implementing RAG, organisations can strictly define context windows, preventing the LLM from relying on unverified assumptions and significantly reducing the probability of model hallucination. Dale reiterated a fundamental constraint of artificial intelligence: these systems are purely algorithmic entities incapable of autonomous reasoning or making authentic value judgments. Consequently, he strongly advocated for “AI augmentation” rather than complete autonomous automation.
To ensure operational reliability, AI deployment architectures must incorporate a “human-in-the-loop” framework. The AI should process data within strictly defined parameters, but any anomalies or threshold breaches must be escalated to human operators for final evaluative judgment. The webinar concluded with participants expressing mutual appreciation for the rigorous exchange of conflicting viewpoints, underscoring the consensus that robust AI governance is a continuous, evolving discipline.
- Key Takeaways
- What is the "MIDCOT" Framework in the Context of AI?
- What is Apophagy in the Data Lifecycle?
- What is Human Bias in the ALOGF Framework?
- What are the benefits of AI in SPC?
- What are corporate AI failures and challenges?
- What is the Autophagy Model Simulation?
- What are the MIDCOT Demo Dimensions?
- What are Data Measurement Targets in Modelling?
- What are Context Windows and Human-in-the-Loop?
If you would like to join the discussion, please visit our community platform, the Data Professional Expedition.
Additionally, if you would like to watch the edited video on our YouTube please click here.
If you would like to be a guest speaker on a future webinar, kindly contact Debbie (social@modelwaresystems.com)
Don’t forget to join our exciting LinkedIn and Meetup data communities not to miss out!