Data Warehousing, BI, Big Data & Data Science for Data-Driven Executives

Executive Summary

This webinar explores today’s competitive landscape and how effective Data Management is crucial for business success. This is particularly evident in the context of artificial intelligence (AI) and data science. Howard Diesel explores the transition and the critical role that robust data flow, quality, and governance play in AI model evaluation and deployment.

As organisations face challenges related to Data Quality and digitalisation, the integration of business knowledge with data insights emerges as a vital necessity. Trustworthy data underpins AI adoption, emphasising the need for a strategic approach that redefines Data Management as a core business discipline. Ultimately, the webinar stresses the collaboration between technology and personnel as essential for fostering an environment where Data Management contributes to sustainable organisational success.

Webinar Details

Title: Data Warehousing, BI, Big Data & Data Science for Data Driven Executives

Date: 06 February 2025

Presenter: Howard Diesel

Meetup Group: African Data Management Community

Write-up Author: Howard Diesel

Contents

Data Management in Business and the Role of AI

The Shift from AI to Data Science

Data Management and Data Flow in AI

The Challenges of Data Quality in AI Model Evaluation and Deployment

The Challenges and Insights in Data Management and Digitalization

The Intersection of Data and Business Knowledge

The Importance of Data Management in AI Development

Data Management in AI

The Role of Technology and People in Data Management

The Evolution of Data Management in Business

The Need for Trustworthy Data in AI Adoption and Management

Redefining Data Management as a Business Discipline

The Importance of Data Management and Business Knowingness in Organizational Success

Data Management in Business and the Role of AI

Howard Diesel opens the webinar and shares his strong belief in the potential of AI, emphasising its usefulness in daily tasks and acknowledging the value of various large language models. However, he highlights a significant challenge in convincing businesses of the critical importance of Data Management as a discipline.

Data Warehousing, BI, Big Data & Data Science

Figure 1 Data Warehousing, BI, Big Data & Data Science

Data Management = Most Important Business Discipline

Figure 2 Data Management = Most Important Business Discipline

The Shift from AI to Data Science

Francesco Puppini presented the previous webinar Howard mentioned earlier. Francesco talked on the critical need for effective Data Management practices within businesses, particularly as they adopt AI and data science technologies. A key point discussed was the "garbage in, garbage out" principle, which highlights the consequences of poor Data Quality on AI outcomes.

Many organisations invest heavily in AI without establishing proper data frameworks, leading to inconsistencies and trust issues with results. The concept of "shifting left" was introduced, advocating for early investment in data infrastructure and understanding of data collection processes to minimise costs and enhance efficiency. Successful case studies can bolster the argument for proactive Data Management, illustrating its foundational role in driving organisational success and ultimately saving money in the long run.

Data Management and Data Flow in AI

Howard moves on to the importance of leveraging thought leaders in the AI field, specifically highlighting Andrew Ng's campaign for data-centric AI. Andrew Ng emphasises that many practitioners prioritise model development over Data Management, which can lead to significant issues during model evaluation and deployment. Additionally, Howard shares a reference to a Google-developed concept called "data cascades." This illustrates how problems in data collection, labelling, analysis, and cleansing can cascade into later stages of AI development. Furthermore, the necessity of addressing data-related issues early in the process will prevent challenges during model work, ultimately advocating for a shift towards prioritising Data Management in AI projects.

Many challenges, especially in descriptive analytics, arise from insufficiently defined issues, such as declining sales, revenue drops, and decreased customer engagement. Thus, the importance of a clear problem statement in data analytics. A clear problem statement points out that effective Data Management extends beyond IT to involve business-driven practices, particularly in data stewardship and governance. Additionally, Proper management of reference and Master Data is crucial and should be led by the business rather than IT.

Data Cascades in High-Stakes AI

Figure 3 Data Cascades in High-Stakes AI

Forbes Article on Andrew Ng

Figure 4 Forbes Article on Andrew Ng

Challenges with AI & Data Cascades

Figure 5 Challenges with AI & Data Cascades

Data Work and Model work present in the Data Cascade Diagram

Figure 6 Data Work and Model work present in the Data Cascade Diagram

The Challenges of Data Quality in AI Model Evaluation and Deployment

Google's work highlights significant challenges in AI model evaluation and deployment, emphasising the risks of model abandonment post-deployment. Distinctions between Business Intelligence (BI) and machine learning (ML) can prolong the deployment process due to the necessity of ensuring low error rates and correct decision-making, contrasting with the more straightforward approach of BI, which often relies on user acceptance testing of data products.

The Data Cascade extended timeline can escalate costs and efforts, particularly when models must be revisited or discarded. Howard notes that Andrew Ng emphasises that the effectiveness and responsibility of AI decisions heavily depend on high-quality, accurate, and complete data sets. Additionally, Howard reiterates how poor Data Quality can lead to disastrous outcomes. Thus, data is described as the essential "food" for AI, as it fuels training and decision-making capabilities, with both accuracy and completeness being vital dimensions of Data Quality.

Howard moves on to talk about the critical importance of Data Quality and governance in accurately reflecting the real world. An example of this is concerns about the reliability of data, especially when it is acquired from external sources or not regularly updated. This can lead to biases—such as focusing predominantly on high net worth individuals and neglecting other ethnic groups.

Lack of completeness creates significant challenges and affects the utility of artificial intelligence, which relies heavily on the quality of its underlying data. The key takeaway is that unless data accuracy and comprehensiveness are prioritised, efforts may be futile and result in wasted time. Stakeholders are urged to recognise the necessity of addressing these data issues to enhance decision-making.

Conclusion

Figure 7 "Conclusion"

The Challenges and Insights in Data Management and Digitalization

A common issue in Data Management projects is the lack of understanding between data engineers and the domain experts behind the data sources. Howard recounts a situation where a €5 million digitisation project utilised 19 disparate data sources, which posed challenges for effective Data Management. Despite suggestions to consolidate the data for better efficiency, resistance stemmed from the differing perspectives of those creating data systems and those managing data infrastructure.

The importance of having roles like data architects to create a cohesive enterprise Data Model is emphasised. Additionally, there is a need for a broader understanding of data complexities among data scientists. Data Scientists often focus on quickly loading data into models without recognising underlying integration challenges.

The Intersection of Data and Business Knowledge

The critical intersection between data knowledge and business understanding occurs in engineering teams that feel overwhelmed due to a lack of access to business insights. Despite their technical skills, these teams struggle to connect data initiatives with business objectives, often overlooking the importance of conceptual and logical modelling in understanding and communicating business needs.

Howard emphasises that many professionals only recognise the initial stages of the data life cycle, neglecting the planning and design phases essential for effective Data Management. This gap points to the necessity of modelling to gain deep insights into one's business, suggesting that if a model doesn't make sense in plain language, it likely won't make sense in data terms.

The Importance of Data Management in AI Development

Recent observations from Google Engineering highlight the critical yet often undervalued role of data in AI development. Howard notes that the observations speak of 92% of machine learning models encountering data issues, many of which are avoidable with effective Data Management. This reflects a broader trend in the conference landscape, where presentations lacking "AI" in their title struggle for acceptance.

Howard shares that AI developers spend 80% of their time on data preparation, a task ideally suited for data engineers. Questions arise about whether data engineers are being excluded or if they are providing data that isn't optimally structured for AI applications. Additionally, Howard mentions the webinar by Francesco Puppini again, which emphasises the importance of focusing on the "last mile" of business intelligence (BI). Francesco felt that the last mile was where the transition from the data warehouse to the final BI model occurred. He suggested that this phase is crucial for delivering value from analytics.

Observations from Google

Figure 8 "Observations from Google"

Extract from Forbes Article

Figure 9 Extract from Forbes Article

Data Management in AI

The critical role of Data Management for AI can be found in focusing on four key areas: Data Quality, Metadata Management, Master Data, and Data Governance. Data Quality involves identifying and resolving issues through automated rules and cleaning processes, while Metadata Management addresses the labelling, tagging, classification, and lineage of data. In terms of Master Data, the identification and automation of reference data sources are essential.

Howard adds that the use of ontologies as a method to standardise data across various sources, particularly in clinical trials, has facilitated reliable data integration. Additionally, Data Governance is crucial for managing risks, biases, and ensuring responsible handling of data, ultimately synthesising these elements into a streamlined approach to Data Management that promotes automation and efficiency.

Intelligent Data Management

Figure 10 Intelligent Data Management

The Role of Technology and People in Data Management

Intelligent Data Management signifies a shift away from traditional, manual human resources for Data Governance, largely due to the overwhelming volume of data that cannot be efficiently managed without the support of suitable technologies. While advancements in AI and automation enhance productivity and data classification, it is essential to recognise the significant role of people and policies in this process. Technology alone cannot suffice; effective Data Management must be supported by human involvement and established processes to ensure accuracy and compliance. Thus, a collaborative approach that integrates technology with human expertise is crucial for successful Data Governance.

The Evolution of Data Management in Business

Howard reflects on past challenges in establishing effective Data Management governance, noting difficulties in gaining business ownership and commitment. While some divisions have maintained good programs, there is a growing need for data literacy training and skills development in tools like Power BI and Tableau, as data has become essential for business success.

The importance of integrating Data Management principles into everyday tasks without overwhelming employees is emphasised. The parallels are drawn with regards to the adoption of technology like GIS, which users often employ unconsciously. Howard further discusses the DCAM framework, distinguishing between Data Governance and oversight, cautioning against the perception that Data Governance is overly burdensome, and highlighting the need to clarify its benefits to encourage adoption.

The Need for Trustworthy Data in AI Adoption and Management

AI is recognised as a powerful business discipline that is reshaping operational value through continuous learning and innovation. However, to maximise its effectiveness, responsible decision-making relies on high-quality, unbiased data, often referred to as trustworthy data. This requires a shift in Data Management practices to focus on delivering reliable data products tailored to business needs.

As organisations increasingly integrate large language models (LLMs) into their operations, they must ensure these models are populated with their specific data and insights. Consequently, effective Data Management becomes critical, as simply utilising the latest data is insufficient; a structured approach is essential to manufacture trustworthy data. Ultimately, Data Management transcends mere technology capability, encompassing planning, design, and consolidation efforts to ensure Data Quality for successful AI model implementation.

An attendee emphasises the importance of Data Quality in business, using the analogy of retailers refusing to put subpar products like meat or vegetables on shelves to illustrate the disconnect with accepting flawed fraud data. They then express concern over the history of failed AI and machine learning initiatives, clarifying that the issue lies not in the models themselves but in the quality of data being fed into these systems. The goal is to identify and address the root causes of AI failures to enhance performance rather than dismiss AI altogether.

Data Management = Most Important Business Discipline

Figure 11 Data Management = Most Important Business Discipline

Redefining Data Management as a Business Discipline

To elevate Data Management as a recognised business discipline, it is crucial to demonstrate its sustainable business value through quantifiable metrics within Data Management programs. The challenge lies in effectively engaging stakeholders, particularly those in AI and business intelligence, to recognise and appreciate the role of data in driving economic value and innovation.

By starting with the right data, organisations can leverage data science to identify opportunities for business innovation, facilitating a continuous cycle of value creation. It is essential for data executives to reframe the conversation around Data Management, emphasising its critical contribution to informed decision-making and overall business success.

6+1 Attributes of Data Management Discipline

Figure 12 6+1 Attributes of Data Management Discipline

Value Creation "Flywheel"

Figure 13 Value Creation "Flywheel"

Data Executive: Call to Action

Figure 14 Data Executive: Call to Action

The Importance of Data Management and Business Knowingness in Organizational Success

Howard emphasises the importance of a clear Data Management framework that distinguishes data responsibilities from IT, enabling business stakeholders to engage effectively with Data Management. He highlights the necessity for business users to take ownership of Data Quality and relevance, ensuring alignment with business needs.

An example Howard shares from past experience is in sell-side research, which illustrates how business-driven decisions on data sources and transformations led to better outcomes. It is imperative to have IT aligning to support these initiatives. Lastly, Howard notes that a challenge from a business executive underscores the need for data professionals to enhance their business and financial acumen to foster effective communication and collaboration.

If you would like to join the discussion, please visit our community platform, the Data Professional Expedition.

Additionally, if you would like to be a guest speaker on a future webinar, kindly contact Debbie (social@modelwaresystems.com)

Don’t forget to join our exciting LinkedIn and Meetup data communities not to miss out!

Previous
Previous

Designing Data Products to Support Customer Value Propositions for Data Citizens

Next
Next

Data Warehousing, BI, Big Data & Data Science for Data Management Professionals