Metadata Management for Data Executives

Executive Summary

This webinar highlights key concepts related to Metadata and Data Management, emphasising the distinctions between Business Glossaries, Data Catalogues, and data dictionaries. It explores the role of Metadata in facilitating effective Data Management and outlines the importance of implementing a Unified Metamodel.

Drew Kennedy addresses the significance of Metadata Governance, its life cycle, and the business case for improving Data Quality and compliance in analytics. The challenges organisations face in managing data, the necessity for Metadata programs, and the impact of strategic business changes on Data Management practices are also examined.

Webinar Details

Title: Metadata Management for Data Executives

Date: 31 October 2024

Presenter: Drew Kennedy

Meetup Group: African Data Management Community

Write-up Author: Howard Diesel

Contents

Understanding the Differences Between Business Glossary, Data Catalogue and Data Dictionary

Business Glossary and Data Management

Implementation and Functionality of a Unified Meta Model

The Role of Metadata in Data Management

Managing Data: Metadata and Data Lineage

Business Case for Metadata in Data Management

Data Compliance and Challenges in Business Analytics

Metadata Programs in Organisations

Business Case for Data Management and Quality Improvement

Business Strategies and Organisational Change

The Importance of Metadata Governance

Understanding the Metadata Life Cycle

Building a Business Case for Metadata

Data Management and Metadata in Business

Understanding the Differences Between Business Glossary, Data Catalogue and Data Dictionary

Drew Kennedy opens the webinar with a focus on defining the distinctions between a data dictionary, a Data Catalogue, and a Business Glossary. The data dictionary contains the physical details of databases, including table names, attributes, field lengths, and rules, derived from the logical model, which requires working backwards from the database schema.

In contrast, the Data Catalogue provides a comprehensive overview of data assets within an environment, encompassing both structured and unstructured data. It includes details such as the data's origin, freshness, quality, and ownership, resembling a library card system that helps users identify and utilise data effectively for analysis and data products.

What's the difference between: A Business Glossary, A Data Catalogue, and A Data Dictionary

Figure 1 "What's the difference between: A Business Glossary, A Data Catalogue, and A Data Dictionary"

Business Glossary and Data Management

A Business Glossary is a crucial resource for understanding industry-specific terminology and acronyms, which often have varying meanings within the same sector. It provides clear definitions of terms, such as "propensity," and their application across different areas, like short-term and life insurance.

The glossary details essential metrics—like customer lifetime value—and outlines business rules, such as the requirement for a customer to make their first payment. Additionally, it connects business terminology to a Data Catalogue, which identifies the physical databases housing customer data and describes the relationships between various data sources, ensuring clarity and consistency in the management of customer records across large organisations.

Implementation and Functionality of a Unified Meta Model

Howard outlines the development of a unified Metadata model through a data dictionary and a Data Catalogue integrated with a Business Glossary. The data dictionary includes key elements such as catalogue ID, server, database schema, table, column names, and data types, linking to a Business Glossary that defines terms, abbreviations, classifications, ownership, and statuses.

The proposed model aims to streamline documentation processes for onboarding and training by centralising vital business entity information, thereby reducing redundancy and improving accessibility. The initiative is supported by positive feedback on its potential to eliminate unnecessary complexity and enhance Data Management efficiency.

Unified Metamodel

Figure 2 Unified Metamodel

Data Catalogue

Figure 3 Data Catalogue

Business Glossary

Figure 4 Business Glossary

Business Glossary Continued

Figure 5 Business Glossary Continued

The Role of Metadata in Data Management

Over the past four weeks, we have discussed various roles in Data Management, including the data manager, Metadata Management team, Data Stewards, and data executives. Drew notes having drawn parallels between Data Management and historical practices, notably referencing Hercules, now referred to as Atlas, to illustrate the complexity of juggling multiple data responsibilities. Last week, He likens Metadata to a Rubik's Cube, emphasising its intricate yet harmonious nature.

Drew highlights his previous exploration of the origins of Metadata, tracing it back to the 3rd century BC with Callimachus and the creation of the Pinakes at the Library of Alexandria, which classified scrolls by subject and author, showcasing that Metadata is not a novel concept but one that has evolved over centuries. This historical context, he notes, helps us navigate modern data challenges by recalling how libraries facilitated information retrieval.

Metadata Management Webinar Series

Figure 6 Metadata Management Webinar Series

Previously used Visuals to explore Metadata Management

Figure 7 Previously used Visuals to explore Metadata Management

Origins of Metadata

Figure 8 "Origins of Metadata"

Managing Data: Metadata and Data Lineage

The discussion centred around the concepts of Metadata, data lineage, and data provenance in the context of Data Management. Metadata is essential for finding, understanding, and managing data, while data lineage refers to the tracking of a data item's journey from its source to its destination, including any transformations it undergoes.

In contrast, data provenance, derived from the French term "provenir” meaning “to come from”, details the origin and historical context of data, including its ownership and changes over time. In summary, lineage explains how data arrived at its current state, whereas provenance reveals where it originally came from, offering clarity on both the data's journey and its roots.

A discussion then starts on the importance of Metadata within the context of Data Management and Governance initiatives, emphasising that it is rarely justified on its own. Instead, Metadata often supports larger projects such as Data Modelling tools, Data Quality initiatives, or governance frameworks that require data ownership and stewardship.

The growing significance of Data Catalogues and glossaries is highlighted as essential for understanding data assets, particularly within government settings in Saudi Arabia, where the need for comprehensive Metadata drives development efforts. By simplifying the conversation around Metadata to these terms, organisations can facilitate better comprehension and engagement without delving into complex details.

Definition of Metadata

Figure 9 Definition of Metadata

Definition of Data Lineage and Data Provenance

Figure 10 Definition of Data Lineage and Data Provenance

Show me the money

Figure 11 "Show me the money"

Metadata and the Data Executive

Figure 12 Metadata and the Data Executive

Business Case for Metadata in Data Management

Effective Metadata Management plays a crucial role in enhancing Data Quality and consistency, which is essential for data integration and accuracy in an era of overwhelming data volume. A robust Metadata Catalogue simplifies data discovery, allowing users to independently locate and utilise data while reducing search time.

It’s important for organisations, such as banks, to establish a solid primary domain, like customer information, and gradually expand their glossary and catalogue to include subdomains rather than attempting a comprehensive overhaul. This approach not only enhances trust in data assets by linking them to quality scores but also helps address challenges related to the various ways customers are identified across different sectors. Moreover, maintaining accurate and organised Metadata can lead to cost reductions by minimising duplication, mitigating data breach risks, and supporting advanced analytics for better decision-making and forecasting.

Generic Business Cases for Metadata

Figure 13 Generic Business Cases for Metadata

Generic Business Cases for Metadata Continued

Figure 14 Generic Business Cases for Metadata Continued

Data Compliance and Challenges in Business Analytics

Regulatory compliance requirements, such as BCBS 239 and PPR 13, emphasise the need for effective Data Management, including Data Lineage and Governance, to enhance decision-making and ensure compliance while reducing costs. The head of Data Analytics, or individuals in similar roles, often face challenges related to high data costs, accessibility issues, data sensitivity, and quality concerns. They are frequently tasked with answering critical questions about data availability, origins, accuracy, and compliance, highlighting the necessity for a robust Metadata strategy. Addressing these real-world inquiries is essential for organisations to manage their data effectively and minimise expenses.

Generic Business Cases for Metadata Continued

Figure 15 Generic Business Cases for Metadata Continued

The Challenges faced by the Head of Data Analytics

Figure 16 The Challenges faced by the Head of Data Analytics

Metadata Programs in Organizations

A robust Metadata program is essential for managing data effectively, as it helps to identify and reduce redundancy and obsolescence in large data sets. Many organisations accumulate multiple copies of data due to unsatisfactory initial loads, leading to increased storage costs that exceed expectations, challenging the notion that "storage is cheap."

Implementing an organised Metadata Catalogue can enhance data discoverability, providing users with clear insights into the data's origins, quality, and usage permissions. This prevents Data Stewards from becoming overwhelmed with requests and helps them manage Data Quality issues more efficiently. However, securing funding and support from primary stakeholders is crucial for the success of such Metadata initiatives.

To effectively address the significant monthly costs associated with cloud storage and backup, it is crucial for the Chief Financial Officer and Chief Information Officer to engage in discussions about implementing a Metadata solution. The head of data and analytics must secure their buy-in, as these senior executives are directly impacted by profit and loss considerations. Additionally, a diverse range of stakeholders, including actuaries, data analysts, Data Scientists, and product teams, all require access to data for various purposes, from modelling to customer promotions. While quantifying opportunity costs can be challenging, even minor improvements in sales growth or profitability—such as an increase from 2% to 2.1%—can translate into significant financial benefits for large organisations.

"The Solution" to Challenges faced by the Head of Data Analytics

Figure 17 "The Solution" to Challenges faced by the Head of Data Analytics

Support & Sponsorship from Primary Stakeholders for Metadata

Figure 18 Support & Sponsorship from Primary Stakeholders for Metadata

Support & Sponsorship from Primary Stakeholders for Metadata Continued

Figure 19 Support & Sponsorship from Primary Stakeholders for Metadata Continued

Business Case for Data Management and Quality Improvement

The business case for investing in a Data Catalogue and Metadata tool is strengthened by the high costs associated with Data Engineers and scientists, who often spend up to 80% of their time searching for and manipulating data. By applying Activity-Based Costing to model the total cost of these roles—including salaries, overheads, and the time lost in data retrieval—organisations can quantify opportunity costs for data analysts and actuaries. Such an analysis reveals the significant value of streamlining data access, which not only enhances efficiency but also minimises risks related to incorrect reporting and regulatory compliance. Establishing clear data usage guidelines and ensuring Data Quality fosters trust among stakeholders, ultimately leading to stronger support for funding these initiatives and integrating them into the budget.

Business Strategies and Organisational Change

To effectively reduce costs in data and analytics, the head of this department should build a solid business case illustrating potential savings from eliminating data duplication and obsolescence. For example, reducing obsolete data by 30% could cut client storage costs from 100 million Rand to 70 million Rand, offering significant cash savings that would cover the expenses of a Metadata program. Addressing the productivity losses across the organisation and securing sponsorship from management is crucial.

A comprehensive business strategy should include assessing organisational readiness for change, emphasising collaboration between business and IT teams, and establishing a joint effort in managing Metadata. By aligning projects with business needs and exhibiting an enterprise governance strategy, the return on investment can be effectively measured. Ultimately, this requires commitment from senior management and consistent engagement from all stakeholders involved in the Data Management initiative.

Support & Sponsorship from Primary Stakeholders for Metadata Continued

Figure 20 Support & Sponsorship from Primary Stakeholders for Metadata Continued

Organisation and Culture Change

Figure 21 Organisation and Culture Change

The Importance of Metadata Governance

Effective governance of Metadata is essential, requiring a clear understanding of the Metadata cycle and the establishment of governance processes tailored to specific requirements. Many organisations already have Data Quality initiatives linked to Data Governance, alongside systems managing master and reference data, as well as security architecture.

It is crucial to integrate governance for Metadata, focusing on process controls, standards, guidelines, and metrics. Key steps include defining standards, managing changes to Metadata, and fostering collaboration between Metadata teams and business stakeholders. Additionally, promotional activities and training development are necessary to ensure understanding and effective management of business terms and definitions as they evolve through their status changes.

Governance

Figure 22 Governance

Process Controls

Figure 23 Process Controls

Understanding the Metadata Life Cycle

The Metadata life cycle begins with the proposal of a term, which undergoes evaluation and testing before receiving approval from the Data Steward or business owner to go live. Any subsequent changes must follow a formal change control process. It is crucial to associate terms with related concepts, ensuring proper governance in areas such as categorisation, data retention, and privacy requirements.

Establishing standards, including sector-specific ones like ISO, is essential for data sharing through APIs and other interfaces. Additionally, measuring the impact of Metadata involves assessing how much time users spend searching for data, the completeness of the information provided, and understanding user engagement to enhance usage through training and promotion. Ultimately, a robust Metadata program should be built on a well-researched business case and aligned with broader business and data strategies.

Building a Business Case for Metadata

Drew focuses on the key aspects of building a robust Data Governance framework, including standards, metrics, controls, and organisational change, with a particular emphasis on developing a business case for Metadata. A notable example highlighted was a project for a regulatory body where different departments were unaware of the data being collected from reporting entities, leading to complaints about excessive regulatory burdens.

Through the creation of a manual Data Catalogue, the project revealed significant insights, showcasing the vast amounts of data available and enhancing awareness among stakeholders. This has resulted in a powerful visualisation of data across the organisation, demonstrating how previously overlooked data assets can create new opportunities for various business applications, ultimately illustrating that data is a valuable, reusable resource that increases in worth as it is utilised.

A discussion then highlights the frustrations surrounding data access and ownership, emphasising that for too long, individuals have faced bureaucratic hurdles to retrieve their own data, often requiring permission from unidentified data owners. This political complexity often hinders important initiatives like Data Quality profiling.

The importance of Metadata is underscored, as it is essential for efficient Data Management and collaboration. Drew encourages attendees to foster communication and marketing efforts to gain buy-in for data tools that can ultimately benefit users.

Metadata Standards and Metrics

Figure 24 Metadata Standards and Metrics

The Business and Data Strategy

Figure 25 The Business and Data Strategy

Data Catalogue 2.0

Figure 26 Data Catalogue 2.0

Closing Slide

Figure 27 Closing Slide

Data Management and Metadata in Business

A discussion starts on the challenges in understanding Metadata and data lineage tools within Data Management fundamentals. Drew emphasises the importance of tracking the flow of data, including its origins and transformations, while noting that some software tools provide not just the path but also the actual values of attributes. This capability allows Data Cataloguers to engage with business stakeholders for clarity on data meaning, effectively closing the information loop.

Additionally, Drew underscores the critical need for compliance, especially in contexts like BCBS, where the CFO must confirm no illegal transformations have occurred during data submission to central banks. Concerns were also raised about access issues when handling sensitive information, such as HR and salary data, underscoring the necessity for meticulous detail in Data Management practices.

The discussion highlights the implementation of BCBS 239 regulations, which arose from concerns that financial institutions were manipulating data to enhance capital adequacy during the original Basel Accord. The regulations mandated a shift from superficial, presentation-based Metadata to comprehensive documentation of data lineage, emphasising the actual flow and transformation of data. A key aspect was that the CFO needed to attest to the integrity of the data movement.

The webinar then closes with an attendee sharing on their early exposure to Metadata tools, specifically a demonstration by British Telecom that showcased the ability to track data values and changes throughout the process. This experience underscored the importance of complete traceability of data, a core objective of BCBS 239.

If you would like to join the discussion, please visit our community platform, the Data Professional Expedition.

Additionally, if you would like to be a guest speaker on a future webinar, kindly contact Debbie (social@modelwaresystems.com)

Don’t forget to join our exciting LinkedIn and Meetup data communities not to miss out!

Previous
Previous

Data Engine Thinking with Dirk Lerner

Next
Next

Metadata Management for Data Citizens