Webinar Details
Title: How to Prepare for the CDMP® Data Quality Specialist Exam
Date: 23 August 2023
Presenter: Howard Diesel
Meetup Group:
Write-up Author: Howard Diesel
Data Quality and Business Strategies
During the presentation, Howard emphasised the importance of accurately mapping dimensions and understanding different perspectives. He also highlights the need to understand management operations and statistical approaches thoroughly. Excel was recommended for exam calculations. Data quality activities were discussed, starting with defining high-quality data and developing a business case or strategy to identify business needs, requirements, and data quality expectations. No further details were provided on tools and templates.
Figure 1 CDMP DQ (Data Quality) Exam Structure and Module breakdown
Figure 2 Data Quality Activities
Importance of Business Case and Data Quality Program in Project Management
Make a business case that outlines the benefits and value to justify pursuing projects. To address data quality problems, first understand the cost. The CIO’s views are crucial. Financial metrics like cash flow and return on investment are important. A business case must address critical questions about the data quality problem, the organisation’s strengths and weaknesses, and the resources and training needed.
Figure 3 DQ Business Case: Strategy
Figure 4 Business Case Definition
Figure 5 Business Case Methodology
Figure 6 Business Case Overview
Figure 7 Business Case Cost/Benefit
The Role of Data Stewards in Defining Data Quality Expectations and Connecting Data Lineage to Business Processes
As a data steward, you are responsible for establishing data quality expectations and identifying issues that may arise in business operations. It is important to connect various elements, such as BI reports, application processes, and data lineage, to ensure accurate and reliable reporting. Tracing data lineage back to its source is crucial to this role.
To support your work, it is essential to clearly understand the data lineage from the report back to the dataset and table. The success of a data quality program can be measured by its return on investment, which should exceed the cost of quality improvement. If you’re interested in exploring data quality in greater depth, we recommend checking out Tom Redman’s book “People and Data,” which delves into the concept of “Friday afternoon measurement.”
Figure 8 Critical Business Case Questions
The Importance of Data Quality for Cost Impact on Business Reports
As part of his job, Howard works with various business departments to analyse and measure important data elements from top records. These attributes, such as customer name, size, colour, and transaction amount, are displayed on the screen and are related to retail transactions. To determine the quality of each record, the writer evaluates it and assigns a score out of 100. Business personnel also evaluate these records and assign a score of 1 for perfect and 0 for imperfect.
To ensure data accuracy, the writer applies the “rule of 10,” under which defective data can cost up to 10 times as much as perfect data. The writer calculates the cost impact of mistakes on reports, which can be significant if there are many imperfect records. Based on the findings, you may extrapolate the data and provide an estimated cost impact for the entire dataset.
Using Force Field Analysis and SWOT Analysis for Business Impact
During the presentation, Howard recommended using the “rule of 10” to estimate the impact of various factors on the business. He supported this approach by referring to credible papers found online. Howard also introduced force field analysis as a tool to understand the organisation’s position, similar to a SWOT analysis but with a numeric allocation to quantify forces.
The driving and restraining forces were discussed, including strengths, opportunities, external threats, weaknesses, and capabilities. Examples of technology failures and their impact on the business were provided. Howard explained how restraining forces push the problem down while driving forces alleviate it. He also referenced the Ishikawa diagram as a model for ordering and positioning challenges, highlighting external issues, internal threats and opportunities, and the role of data and technology. The potential benefits of using AI for quality enhancement were mentioned, and the concept of calculating benefits based on uplift from a baseline was explained.
Figure 9 DQ Initial Assessment with Cost Impact
Figure 10 Change Assessment Force-Field Analysis
High-quality data, improvement cycles, business impact, SWOT analysis, and tiles analysis
Our calculations show that a significant amount of money has been saved, which is a great benefit. The Juran Trilogy emphasises the importance of using high-quality data and continuously improving processes to achieve desired outcomes. To address our chronic waste of 33, we plan to improve by using the burrito principle and select the next target. In the business case, we prioritise data quality over data science.
The fishbone diagram shows how incorrect technology can negatively affect agility. Through tile analysis, we can identify how to leverage our strengths to capitalise on opportunities, such as expanding internationally or launching new products. Our strengths, brand, and reputation can also help us address threats, such as negative online reviews and declining revenue. To determine potential risks, we evaluate our weaknesses and threats.
Figure 11 Example: Business Impact of Data Science
Figure 12 Issues Caused by Lack of Leadership
Threats and Opportunities arising from DQ Challenges
To mitigate weaknesses and drive business growth, data architects should act as agents of transformation and utilise their strengths to seize opportunities and address potential dangers posed by emerging technologies and data quality issues.
One approach is to use Excel spreadsheets to quantify the opportunities and threats the business faces, applying a plus-one multiplier to opportunities, resulting in a positive value. Similarly, the threats and weaknesses can be quantified using appropriate multipliers. Overall, the force field analysis indicates a favourable market position, with a score of seven.
Figure 13 SWOT Analysis and TOWS Analysis
Figure 14 Change Assessment Force-Field Analysis (FFA) for ChatGPT
Figure 15 Scoring the FFA (Template)
Establishing a Common Language and Building a Business Case
Establishing common terms and definitions is important to avoid quality issues before selecting a Common Data Environment (CDE). The six friends classification (why, who, what, when, where and how) can be used to define a business definition. For instance, a learner is a person or group who seeks to enhance their professional capabilities by using learning resources. Defining what a CDE means requires time and agreement from multiple stakeholders.
By analysing the people in an organisation, the transition from regular employees to data managers and, eventually, data provocateurs or champions can be determined. The growth path for data management entails moving from regular employees to data managers to data champions. The DQ superhero team includes a data entrepreneur, a data steward, and other team members. When making decisions in the business case, it is important to consider and work through various factors.
Figure 16 FFA: ChatGPT Modelware Systems
Figure 17 Establish a Common Language Approach
Figure 18 The Work Team
Understanding Business Case and Data Quality Dimensions
The topic of the business case is intricate and not specific to any data quality area. When it comes to Master Level questions, this is an area that is often emphasised. During a discussion, one participant inquired if anyone had created a business case for data quality. Henry clarifies that they utilised the 1-10 theory to represent the business case visually and focused on analysing the benefits and costs.
The 1-10 theory helps incorporate data, and tools like Information Steward can display values in the dataset. This section primarily highlights the justification for the improvement, the cost analysis, and the benefits. The DMBOK recognises that there is no single definition of dimensions and that thought leaders have diverse perspectives. This statement alludes to Dan Myers’ idea of conformed dimensions.
Figure 19 DQ and Data Team “Superhero’s”
Different Perspectives on Data Quality Dimensions
When evaluating data quality, it is important to consider completeness on both a column level and across interdependent fields in a table. The process of populating tables and schemas is crucial to achieving data completeness. Frameworks for assessing data quality, such as Strong Wang’s, categorise dimensions into intrinsic factors like data accuracy, objectivity, believability, and reputation, as well as textual, symbolic, and accessibility dimensions.
Howard notes that Tom Redmond takes a different approach, categorising dimensions into data model, data value, and data representation. Meanwhile, Larry English discusses both inherent and pragmatic dimensions of data quality. Dami UK provides definitions for these dimensions, but it is important to note that there are multiple dimensions to consider. When evaluating data quality, several key factors include having enough data, ensuring accuracy, consistency, and integrity, and ensuring timeliness.
Figure 20 Data Quality Dimensions: 11%
Figure 21 Strong-Wang Framework
Figure 22 Ted Redman: Data Model, Data Values and Representation
Figure 23 Larry English, Inherent and Pragmatic
Figure 24 Data Quality Dimensions
Importance of Dimensions in Data Quality
Businesses can select the key elements that meet their current needs using a set of Dimensions. According to Dan Myers, data accuracy is about agreeing with the real world and the precision of data values. It is important to consider the interconnectivity among different dimensions of data quality. ISO 25 000 is a standard for products, and it includes various levels of data product quality. Ensuring data quality is fit for purpose is crucial and involves considering different perspectives and passing all necessary checks for the chosen dimensions.
Figure 25 Data Quality Dimension Focus
Figure 26 Data Quality Dimension Relationships
Figure 27 ISO 25000 Data Product DQ Dimension, Quality of Data Product
Figure 28 DQ Fit for Purpose: 12%
Understanding the Importance of Data Quality for Specific Purposes
When conducting a study, it’s important to determine the specific features and population to be measured, as in the case of measuring diabetes. This is where a cohort comes in handy, as it helps identify the specific patient population to include in the study. By defining the purpose, the data collection requirements and measurement value set can be determined, which ensures that data quality expectations are met. The cohort serves as a system of records, similar to a master data situation, providing necessary information.
It’s crucial to avoid overtraining or overengineering data in machine learning, as this can lead to biased results and poor data quality. Instead, data that’s fit for purpose should follow the Goldilocks principle – neither too much nor too little. Poor data quality can have negative impacts, including incorrect decision-making, regulatory submission issues, revenue loss, increased costs, and reputational damage. Data quality management is critical in addressing these issues and ensuring data reliability. Working with data requires recognising its value and trustworthiness.
Figure 29 Fit for Purposes in Populations Analytics
Figure 30 Issues caused by fixing issues.
Figure 31 How Poor Data Quality Impacts Organisations & Individuals
Understanding and Managing Data Quality
It is a common misconception that data is always accurate, which can make it difficult to maintain data quality. Since no organisational business processes are perfect, data quality must be continuously aligned with these processes. It is crucial to convince the CIO that data quality is an ongoing program, not a one-time project. To effectively manage data quality, updating the culture and adopting a quality mindset is essential.
A comprehensive methodology, such as ISO’s data quality framework, can be chosen to ensure comprehensive data quality management. The impact of data architecture, integration, operations, security, and resources on data quality should be understood. 11 common causes of data quality problems include leadership gaps and system design issues. Prevention, correction, automation, and statistical approaches can improve data quality. To understand and analyse data quality, one must familiarise oneself with data profiling tools and their purpose.
Figure 32 DQ Introduction: 10%
Figure 33 DQ Management: 11%
Figure 34 DQ Management Detailed Structure
Figure 35 ISO 8000 Parts
Figure 36 DQ Operations: 11%, Common Causes of DQ Issues
Figure 37 DQ Techniques
Figure 38 DQ Statistical Approaches: 11%
Figure 39 DQ Tools: 11%
Understanding the Importance of Data Profiling and Data Quality Operations
Data quality operations begin with profiling, which involves statistically assessing existing data structures to create effective data quality rules and identify problem areas. Data querying tools can be used to identify issues such as null counts and their locations in the database. Readiness is crucial for successful data quality initiatives, which involve convincing people to shift from an application-centric to a data-centric mindset and assessing the actual state of data.
Both culture and technical readiness play important roles in implementation. Examples of potential exam questions may include analysing the validity of sand composition, understanding census data, assessing data age and currency, addressing field overloading issues, and measuring stakeholder accountability. Continuous improvement and goal setting are essential for data quality operations to ensure the benefits outweigh the costs.
Figure 40 Readiness Assessment
Figure 41 DQ Questions: Application
Figure 42 DQ Specialist Exam
Summary
- The CDMP exam emphasises the importance of data quality in business strategies.
- Data stewards ensure quality and connect data lineage to processes.
- Common language, data profiling, quality operations, accuracy, and CDMP Exam prep.
Contents
- Data Quality and Business Strategies
- Importance of Business Case and Data Quality Program in Project Management
- The Role of Data Stewards in Defining Data Quality Expectations and Connecting Data Lineage to Business Processes
- The Importance of Data Quality for Cost Impact on Business Reports
- Using Force Field Analysis and SWOT Analysis for Business Impact
- High-quality data, improvement cycles, business impact, SWOT analysis, and tiles analysis
- Threats and Opportunities arising from DQ Challenges
- Establishing a Common Language and Building a Business Case
- Understanding Business Case and Data Quality Dimensions
- Different Perspectives on Data Quality Dimensions
- Importance of Dimensions in Data Quality
- Understanding the Importance of Data Quality for Specific Purposes
- Understanding and Managing Data Quality
- Understanding the Importance of Data Profiling and Data Quality Operations
If you would like to join the discussion, please visit our community platform, the Data Professional Expedition.
Additionally, if you would like to watch the edited video on our YouTube please click here.
If you would like to be a guest speaker on a future webinar, kindly contact Debbie (social@modelwaresystems.com)
Don’t forget to join our exciting LinkedIn and Meetup data communities not to miss out!