Executive Summary
This webinar outlines key topics in quality management and machine learning observability. Howard Diesel begins with an introduction and recap of the previous week’s discussions, followed by an exploration of the TQM framework and the importance of policy retention. The transition from TQM dimensions to practical implementation is examined, alongside the need for effective machine learning observability. The framework aims to bridge the gap between business language and technical metrics, addressing quality gates and production challenges. It highlights open telemetry as a means to standardise AI observability and provides implementation steps with automated responses. The webinar also includes insights from the DataIQ Report on industry realities, emphasising accountability, risk, and governance, culminating in a call to action.
Webinar Details
Title: Business Metrics & TQM for Data Citizens
Date: 2025-11-20
Presenter: Howard Diesel
Meetup Group: African Data Management Community
Write-up Author: Howard Diesel
Introduction & Last Week’s Recap
Howard opens the webinar with an in-depth recap of the previous session’s discussion on business metrics, setting the stage for a deeper understanding of performance measurement. He clearly outlines the relationship between the North Star metric and the business metric driver tree, illustrating how they lead to lagging indicators that reflect past performance. Furthermore, these lagging indicators are intricately linked to leading indicators, which specify the essential activities teams need to undertake to improve their results. By establishing these connections, Howard emphasises the importance of proactive engagement in driving business success.
Integrating Total Quality Management (TQM) with business metrics is essential for organisations to achieve a comprehensive quality approach. The emphasis on “total quality” rather than just “data quality” underscores the need for collaboration across disciplines, including data governance, records management, AI governance, cybersecurity, and corporate governance. This holistic strategy is critical, as AI teams often encounter significant challenges when functioning in isolation, from ensuring accurate records and maintaining high-quality data to addressing ethical considerations and human-in-the-loop requirements.
Ultimately, a unified approach to quality management can enhance organisational effectiveness and drive better outcomes across all facets of the business. The increasing complexity of modern AI implementations necessitates a more holistic approach to quality management. Often, AI teams prioritise model development and production deployment, which can lead to significant challenges if not managed properly. By integrating all quality disciplines under Total Quality Management (TQM), organisations can streamline their processes and enhance the overall effectiveness of their AI initiatives. This comprehensive approach enables better management of the complexities of deploying AI solutions, ultimately leading to more successful outcomes.
Figure 1 Implementing AI Quality
Figure 2 FNOL TQM Expectations
The TQM Framework & Policy Retention
Implementing Total Quality Management (TQM) in the insurance sector is crucial to improving policy retention rates. In the context of life insurance, companies prioritise policy retention as their primary metric, often referred to as their North Star, because the end of a policy—when it is cashed out—usually signifies the conclusion of the customer relationship. By focusing on maintaining these policies, insurers can foster long-term customer engagement and satisfaction, ultimately driving their overall business success. Therefore, understanding and improving policy retention through TQM strategies is essential for insurance companies seeking to cultivate lasting client relationships.
Several crucial factors, including operational efficiency and customer experience, influence the policy retention rate. To achieve optimal retention, these elements must be measured and optimised in tandem. The framework for evaluating these factors resembles a balanced scorecard hierarchy but is intentionally simplified for easier maintenance than traditional balanced scorecards. By integrating these approaches, organisations can effectively enhance their policy retention strategies. Straight-through processing is identified as a crucial lagging indicator for enhancing claimant satisfaction. Howard highlights that achieving higher customer satisfaction requires a multifaceted approach, with several improvements implemented simultaneously.
This process is not instantaneous; for instance, it may take up to six months to see the effects of initiatives such as automating the First Notice of Loss (FNOL). Leading indicators, including the completeness of claims submissions and the usability of chatbots, play an essential role in influencing these lagging metrics, demonstrating that sustained effort in these areas will ultimately yield positive outcomes. In conclusion, a comprehensive strategy focused on both leading and lagging indicators is vital for long-term improvements in customer satisfaction in the claims process.
Figure 3 Wang-Strong Framework
Figure 4 UC. FNOL: Data, Records, Development and Logs
Figure 5 Total Quality Management Diagram
From TQM Dimensions to Implementation
The Wang-Strong framework for implementing Total Quality Management (TQM) provides a comprehensive approach to understanding and improving quality by categorising it into various dimensions. For instance, when examining accuracy, the framework shows that evaluating quality at the dimension level alone is insufficient; one must delve deeper into specific concepts to gain a full understanding. This detailed analysis reveals complexities that extend beyond conventional data quality assessments, underscoring the importance of thoroughness in TQM practices.
Ultimately, the Wang-Strong framework underscores that achieving high-quality standards requires a multifaceted exploration of each element. When assessing the quality of AI systems, it is essential to examine three key elements: the quality of input data, the quality of output decisions, and the quality of the AI technique itself. For example, when generative AI produces new content, it is crucial to evaluate aspects such as toxicity and hallucinations in the generated material. By considering these factors, we can establish a more nuanced and comprehensive framework for quality assessment in AI applications.
The conversation Howard had with Dan Mayers from the DMBOK editorial board highlights the complex challenge of quantifying dimensions such as accuracy and completeness in data management. By breaking down these dimensions into specific components such as row and column populations, tables, and schemas, they have developed a Total Quality Management (TQM) framework. This framework distinguishes between the quality of input data, the quality of decision outcomes, and the effectiveness of AI techniques. Additionally, it proposes a classification of AI techniques, each with distinct quality rankings, thereby creating a robust quality management strategy. This approach effectively bridges the gap between business metrics and technical execution, enabling organisations to better align their data practices with their strategic goals.
Figure 6 TQM in AI
Figure 7 TQM in AI pt.2
Understanding Machine Learning Observability
The successful integration of total quality metrics into machine learning operations is crucial for enhancing overall performance. This chapter highlights the challenges of bridging the gap between establishing effective Total Quality Management (TQM) metrics and their practical application in ML operations.
The central question posed is, “I’ve defined my TQM, but how do I get it into MLOps?” Addressing this inquiry is essential for organisations striving to improve their machine learning processes by ensuring that quality considerations are not merely theoretical but actively implemented.
Observability is crucial for effective monitoring of models in production environments, particularly when they are no longer under direct tester supervision. This practice involves continuously observing models in action to identify and address potential issues. Howard highlights the significance of OTEL (Open Telemetry), which serves as the standard for enabling models to submit traces, metrics, and logs to an observability platform that can take corrective action as necessary. Ultimately, this proactive approach ensures the reliability and performance of deployed models. The importance of continuous monitoring in chatbot management is underscored by real-world examples of chatbots corrupted and disseminating harmful content due to a lack of oversight.
Implementing Total Quality Management (TQM) in production environments facilitates this monitoring, ensuring that chatbots function effectively and safely. For instance, if a chatbot experiences response latency, it can lead to user frustration and interaction abandonment, highlighting the critical need for timely responses. Some chatbots proactively mitigate this issue by using messages such as “let me think about it” and employing animated dots to manage user expectations during processing delays. Thus, an observability platform must be in place to maintain response times within the acceptable thresholds established by the TQM framework, ultimately enhancing user experience and trust.
Figure 8 TQM in AI pt.3
Figure 9 Codifying Quality for Training
Bridging the Gap—From Business Language to Technical Metrics
Effectively bridging the gap between business expectations and technical machine learning operations (MLOps) metrics poses a significant challenge. Business stakeholders often express their quality requirements in terms that are easy to understand, such as “99.9% uptime” or “no factual errors.” However, these qualitative demands must be translated into precise technical metrics, such as F1 scores, precision, recall, and confusion matrices, to ensure the objectives are measurable and achievable. Addressing this translation is crucial for aligning technical performance with business goals, ultimately leading to more successful project outcomes.
The translation of business requirements into technical specifications is essential for ensuring clarity and effectiveness in model development. For instance, when a business requirement emphasises that “the model must not exhibit bias against any demographic,” this is technically interpreted to mean that “statistical parity difference must be less than 0.05 on the validation set.” Similarly, a requirement for “customer-facing models to maintain high accuracy” translates into a need for the model’s accuracy to exceed 90% and for the F1 score to be greater than 0.88. By clearly defining these technical benchmarks, teams can better align their development efforts with organisational goals, ensuring that models are both fair and effective.
MLFlow is an innovative Python library and framework that automates model evaluation. It provides a range of scoring options, such as correctness, accuracy, and grounded truth, which enable automatic testing of models against predefined expectations. The framework emphasises the practice of “eval-driven development,” encouraging developers to establish success criteria before starting their work and to continuously measure performance throughout the model’s lifecycle. This proactive approach ensures that quality standards are upheld from the initial stages of development through to production deployment. Ultimately, MLFlow enhances the reliability and integrity of machine learning processes by integrating evaluation into development as a core component.
Figure 10 Beyond Training
Figure 11 Beyond Training pt.2
Figure 12 OpenTelemetry
Quality Gates and Production Challenges
Quality gates play a crucial role in assessing when a model is prepared for deployment. They serve as essential checkpoints that go beyond merely attaining a satisfactory F1 score; comprehensive test cases and an ongoing evaluation-driven development process are necessary to ensure the model meets established thresholds. Ultimately, implementing rigorous quality gates ensures that models are not only effective but also reliable and robust before being deployed.
Transitioning from a training environment to production poses distinct challenges due to the unpredictability of real-world interactions. In a controlled training setting, all variables remain consistent, resulting in predictable outcomes; however, once a system, such as a chatbot, is deployed, its responses can vary widely based on spontaneous input. This variability makes it impossible to anticipate every potential interaction, thereby requiring a significant evolution in monitoring strategies to effectively address the dynamic nature of production environments. Understanding these differences is essential for successfully managing and optimising the deployment of automated systems in real-world applications.
Production-specific challenges can significantly affect the performance of deployed models, particularly due to data and concept drift, as well as operational failures. Data drift occurs when the distribution of incoming data differs from that in the training data, while concept drift refers to changes in the relationships between inputs and outputs over time. Additionally, operational failures can disrupt model functionality. These challenges are not identifiable in controlled training environments; instead, they arise in real-world applications, underscoring the need for continuous real-time monitoring of both deployed models and the data they process. By implementing robust monitoring strategies, organisations can proactively address these issues and maintain the effectiveness of their models over time.
The interactions among various AI components, such as chatbots and anomaly detectors, highlight the complexity of modern business processes. In this system, a chatbot first collects data, which is then assessed for fraud risk by an anomaly detector. This process may involve additional models, resembling agentic AI, which adds layers of sophistication. Each transition point in this system is critical, requiring diligent monitoring to identify when production behaviours deviate from established expectations. Timely detection can trigger automated responses that prevent negative customer impacts, such as the inadvertent sending of toxic messages. Thus, effective oversight of these handoffs is essential to maintain customer trust and safeguard against potential risks.
Figure 13 OpenTelemetry pt.2
Figure 14 Operationalising TQM
Figure 15 Automated Responses
Figure 16 Summary: the TQM-MLOps-OTEL Quality Framework
Figure 17 The AI Governance Paradox
Figure 18 The Global AI Leadership Confession
Figure 19 The Hidden Cost of Uncontrolled AI
Open Telemetry—Standardising AI Observability
OpenTelemetry (OTEL) establishes a crucial standard for achieving consistent AI observability across diverse platforms, including Amazon, Azure, Google, and various model providers. By adhering to OTEL standards, most major AI platform providers enhance interoperability and transparency, making it easier for organisations to monitor and manage their AI systems effectively. This alignment not only streamlines observability processes but also empowers organisations to leverage their AI capabilities with greater confidence and efficiency. OTEL emphasises the importance of three core areas: metrics, traces, and logs, to ensure accountability and transparency in decision-making processes. Metrics include key performance indicators such as latency, accuracy, bias indicators, and quality to assess the system’s overall effectiveness.
Traces provide insight into execution paths, revealing how decisions are made and outlining the references to relevant policy documents or websites that inform responses. Logs document any missing information or errors encountered during operation. Together, these elements enable quality validation of the information sources and their presentation, ultimately enhancing the reliability of the AI’s outputs. The integration of OpenTelemetry (OTEL) is crucial for connecting diverse systems, such as cloud AI platforms, databases, and master data resolution services, into a cohesive observability framework. This unified platform enables users to gain valuable insights; for example, when an entity resolution request is made to a master data service, it provides telemetry data that reveals resolution accuracy.
Tools like Great Expectations, an open-source data quality platform, are essential for conducting real-time scoring on specific quality dimensions, ensuring that key metrics—including response time, latency, fairness across demographics, output quality (such as politeness and toxicity), and violation rates—are consistently monitored. Through an evaluation-driven approach, the observability platform not only defines expectations but also continuously assesses performance against these benchmarks, fostering improved system reliability and transparency.
Figure 20 The Hidden Cost of Uncontrolled AI pt.2
Figure 21 AI Quality Integration
Figure 22 “Act Now!”
Implementation Steps & Automated Responses
Effective implementation guidance begins with clearly defining expectations, which can significantly enhance operational efficiency. For instance, setting a response-time expectation of “under 500 milliseconds” can be translated into an OTEL metric gauge named “model_latency_ms.” This setup not only supports performance tracking but also enables automated alerts: if latency exceeds 500 milliseconds for more than 5 minutes, an alert can be raised to prompt MLOps to initiate corrective action. By establishing such detailed metrics and alert systems, organisations can proactively manage their response times and ensure smoother operation.
Maintaining fairness and output quality in machine learning operations is crucial for effective model performance. When bias violations are detected for more than five minutes, the observability platform activates and initiates necessary MLOps actions to address the issue. This includes automated monitoring and alerting systems that track model performance. When predefined thresholds are breached, the system can execute model rollbacks, trigger retraining pipelines, or escalate the situation for human review. Ultimately, these processes ensure that models uphold fairness and quality standards, reinforcing trust in automated decision-making systems.
Effective implementation of models in a business context requires a structured approach that prioritises their impact on business objectives. This involves several key steps: first, prioritising models based on their potential business impact; second, defining metrics that bridge the gap between business expectations and technical performance; and third, integrating OpenTelemetry (OTEL) instrumentation at the model serving layer to ensure comprehensive monitoring. Additionally, it is crucial to establish automated response mechanisms such as rollback, retrain, and escalation protocols. Each interaction is meticulously logged by the observability platform, enabling human reviewers to validate automated quality assessments. By following these steps, organisations can enhance their model management processes and ensure alignment with business goals.
Observability platforms play a crucial role in ensuring the quality of interactions by validating both input and output. Howard highlights the importance of assessing whether a user is experiencing frustration or exhibiting toxic behaviour, and whether the system’s responses are appropriate. Access to the complete interaction history allows for human intervention in complex situations that exceed the model’s capabilities. Consequently, these platforms enhance the overall effectiveness and safety of automated systems, ensuring a more positive user experience.
Industry Reality Check—The DataIQ Report
The alarming statistics from a DataIQ global confessions report underscore significant shortcomings in AI governance among executives. A staggering 80% of leaders recognise the dangers associated with AI and acknowledge the potential for failures; however, 72% still approve AI-generated decisions without sufficient explanations. This lack of accountability is further illustrated by the fact that only 5% of companies can guarantee 100% traceability in their AI outputs, while merely 19% require explainability prior to production approval. Additionally, only 34% of executives express confidence that their AI could successfully pass a decision audit. These figures highlight an urgent need for improved oversight and transparency in AI technologies to mitigate risks.
Organisations are facing a critical challenge in deploying AI systems that lack transparency and accountability. This issue arises not from a deficiency in technical expertise—many organisations have successfully established effective observability and governance measures. Instead, the root problem lies in a fundamental misunderstanding of AI systems and the absence of robust governance frameworks. To address this challenge, it is essential for organisations to invest in a better understanding and the development of comprehensive governance strategies for their AI deployments.
The debate surrounding the deployment of AI technology highlights a troubling trend: the prioritisation of speed over safety. Many companies rush to implement AI models driven by performance metrics like F1 scores, often neglecting essential checks and balances. A significant portion of industry leaders, 56%, believe that accountability for AI failures should rest with the CIO and CDO, even though these roles do not usually reap the primary benefits of AI initiatives. It is argued that the risk should be transferred to the business functions that stand to gain the most from AI innovations. Furthermore, an alarming 80% of leaders acknowledge a lack of adequate governance structures to manage potential AI failures. Ultimately, addressing these issues is crucial to creating a safer, more responsible landscape for AI deployment.
Accountability, Risk, and Governance
Howard highlights the importance of accountability structures within organisations, particularly regarding the governance of emerging technologies. Audience contributions reveal that the Data Management Body of Knowledge (DMBOK) emphasises the need for the governing body to direct and oversee the ethical use of these technologies and to integrate risk management into enterprise management practices. This underscores the notion that accountability should not be the sole responsibility of Chief Information Officers (CIOs) or Chief Data Officers (CDOs), but rather should be embraced at the enterprise level. Ultimately, a collective approach to accountability enables organisations to navigate the complexities of technological advancements while upholding their ethical responsibilities.
Effective AI governance relies on properly integrating business expectations into machine learning operations and evaluation metrics. When AI governance, data governance, and MLOps teams align on their understanding of business goals, model deployment occurs only after thorough evaluation through established quality gates. This control framework not only enhances the reliability of AI applications but also builds organisational confidence in their AI initiatives. By prioritising these processes, organisations can achieve sustainable and high-quality AI deployments that meet their strategic objectives.
In machine learning, ensuring that models meet set expectations before advancing to the model registry or production is crucial for maintaining quality and trust. MLFlow plays a significant role in this process by identifying when models have satisfied all necessary criteria, while quality gates effectively manage deployment authorisation. This proactive validation approach not only reduces the likelihood of issues arising post-deployment but also fosters greater trust among executives, moving away from a reactive problem-solving mindset. By prioritising thorough evaluation, organisations can achieve more reliable and effective machine learning outcomes.
Increasing the percentage of companies with traceable AI outputs is essential to enhancing accountability and transparency in decision-making within organisations. Currently, only 5% of companies can effectively demonstrate traceable AI outputs, limiting executives’ ability to provide the necessary explanations for their claims. By implementing proper guidance and robust governance structures, organisations can empower business stewards to examine decision-making processes and validate the AI traces. This proactive approach will help firms not only to ensure that ethical standards are met but also to enhance the effectiveness of their machine learning operations and data science practices. Ultimately, moving toward greater traceability in AI requires the right governance frameworks and ensures proper operational execution.
Call to Action
To ensure the successful implementation of total quality in organisations, a comprehensive four-point action plan is essential. First, organisations must cultivate a deep understanding of business expectations regarding total quality, as this forms the foundation for defining appropriate metrics. Once defined, these metrics should be integrated into telemetry systems for all models, including those involving agents that interact with databases or master data systems. Additionally, it is crucial for organisations to establish automated response capabilities to promptly address model drift or other issues.
Finally, teams need robust control mechanisms that allow for instantaneous rollback or model switching in production environments to maintain operational integrity. By following this structured approach, organisations can enhance their quality management processes and improve overall performance.
Lacking essential operational capabilities can lead to significant consequences for organisations, including system outages and an inability to meet uptime commitments such as 99.9% availability.
These failures can result in ongoing operational issues that hinder overall productivity and customer satisfaction. Howard then encourages attendees to understand the connection between business requirements and technical implementations, such as telemetry and MLOps, fostering a clearer cause-and-effect relationship that is vital for effective management. By recognising this relationship, organisations can work to mitigate risks and improve operational performance.
- Executive Summary
- Introduction & Last Week's Recap
- The TQM Framework & Policy Retention
- From TQM Dimensions to Implementation
- Understanding Machine Learning Observability
- Bridging the Gap—From Business Language to Technical Metrics
- Quality Gates and Production Challenges
- Open Telemetry—Standardising AI Observability
- Implementation Steps & Automated Responses
- Industry Reality Check—The DataIQ Report
- Accountability, Risk, and Governance
- Call to Action
If you would like to join the discussion, please visit our community platform, the Data Professional Expedition.
Additionally, if you would like to watch the edited video on our YouTube please click here.
If you would like to be a guest speaker on a future webinar, kindly contact Debbie (social@modelwaresystems.com)
Don’t forget to join our exciting LinkedIn and Meetup data communities not to miss out!