Metadata Management for Data Professionals

Executive Summary

This webinar outlines the critical role of Metadata Management in enhancing data professionalism and organisational efficiency. Drew Kennedy emphasises the importance of Metadata in Data Management and knowledge retention while addressing ethical considerations surrounding data use. Key topics of the webinar include the differences between data verifiability and lineage, the significance of Data Quality, and the various types of Metadata and architectures—centralised, distributed, and hybrid. Additionally, Drew highlights the importance of adhering to ISO guidelines, implementing robust Metadata strategies, and building comprehensive Metadata repositories. The webinar underscores the necessity of fostering data literacy and developing structured business definitions to optimise Metadata's impact across organisations.

Webinar Details

Title: Metadata Management for Data Professionals

Date: 24 October 2024

Presenter: Drew Kennedy

Meetup Group: African Data Management Community

Write-up Author: Howard Diesel

Contents

Metadata Management and the Data Professional

Data Management and Metadata: A Comprehensive Guide

Metadata and its Importance in Data Management

Data Management, Knowledge Retention, and Intellectual Property in Organisations

Data Ethics and Intellectual Property Law

Understanding the Basics and Implications of Data Metadata

Data Preservation and Legal Hold on Data

Understanding the Difference between Data Provenance and Data Lineage

Data Quality and Verifiability in the Digital Age

Metadata in Business

Understanding the Guidelines of ISO 11179 in Data Management

Registry Standards and Data Modelling

Types of Content Metadata

Various Locations of Metadata in Different Systems

Importance of Metadata Architecture

Different Types of Metadata Architectures

Centralised, Distributed, and Hybrid Models

Metadata Strategy and Requirements

Requirements for Building a Metadata Repository

Implementation of Metamodels in Systems

Strategy for Metadata Management in Organisations

Data Literacy in Organisations

The Process of Business Definition Development

Metadata Management and the Data Professional

Drew Kennedy opens the webinar and receives a couple of questions from Howard. The focus of the questions was on the technical and professional aspects of Metadata, with anticipation of a comprehensive presentation. Additionally, Howard asks Drew about intellectual property and its implications for Metadata controls. Drew is then encouraged to address these inquiries either throughout the session or at the end. The presentation aims to cover extensive material within an hour, although it was acknowledged that some topics might extend into a follow-up discussion the following week in ‘Metadata Management for Data Executives’.

Metadata Management Webinar Series Table of Content

Figure 1 Metadata Management Webinar Series Table of Content

Data Management and Metadata: A Comprehensive Guide

Drew shares that the webinar will explore essential concepts of Metadata, including types of Metadata sources, architecture, strategy, requirements, and Metamodels, building on last week's discussions, ‘Metadata Management for Data Citizens,’ about the roles of data managers, data citizens, and data stewards. We will reflect on the historical origins of Metadata, tracing back to the Library of Alexandria and Telemarcus's cataloguing efforts in the 3rd century BC, which laid foundational principles still relevant today. This session will highlight how effective Metadata provides context and structure, enabling users to find, understand, and utilise data efficiently. Drew notes that aligning strategy and requirements can enhance a Metadata program, likened to a complex Rubik's Cube where all components must come together to achieve clarity and functionality.

Hercules - the Data Manager for Metadata

Figure 2 "Hercules - the Data Manager for Metadata"

Origins of Metadata

Figure 3 Origins of Metadata

It's easy to be a Data Steward

Figure 4 "It's easy to be a Data Steward"

Metadata Definition

Figure 5 Metadata Definition

Metadata is like a Rubick's Cube

Figure 6 Metadata is like a Rubick's Cube

Metadata and its Importance in Data Management

To ensure high-quality Metadata, it is essential to design a robust Metadata program that clearly defines the specific types of Metadata to be captured, along with standards for completeness, accuracy, and correctness. Ongoing Metadata Governance is crucial for monitoring quality and addressing any shortcuts taken in data capture. Using standard Data Quality dimensions and profiling accessible Metadata repositories can help identify issues. Properly managed Metadata is vital, as it is a central resource for understanding data context and origins. Neglecting Metadata Quality can lead to serious repercussions in various industries, just as retailers avoid selling expired products. A well-maintained Metadata portal is akin to an accurate catalogue for online shopping—essential for functionality and user trust.

Questions from Sessions One and Two

Figure 7 Questions from Sessions One and Two

Data Management, Knowledge Retention, and Intellectual Property in Organisations

Drew is questioned on the importance of a well-maintained Glossary and Data Catalogue in enhancing organisational knowledge and Data Management maturity. He shares that organisations can reduce misinterpretations and ensure consistency by accurately capturing business terminology, calculations, and Data Management policies, especially amid high staff turnover. This approach preserves critical knowledge and minimises errors in data usage and understanding. The conversation further touches on the complexities of claiming intellectual property concerning Metadata when similar data products are developed by different entities using shared frameworks and layouts.

The discussion focuses on intellectual property (IP), specifically concerning data models like how architects hold rights to their designs. Drew questions whether creators automatically own their data products or need to pursue patents, which can be time-consuming and often ineffective. He emphasises that capturing Metadata about data products' construction is crucial for protection. However, the consensus is that due to the collaborative nature of the industry and knowledge transfer among professionals, enforcing IP claims can be challenging and often not worth the effort, as innovation typically builds on existing ideas. Keeping sensitive data secure is advisable, but asserting ownership over general models may be impossible.

Data Ethics and Intellectual Property Law

Drew highlights the intersection of data ethics, copyright, and intellectual property law, particularly concerning the protection of Metadata, such as data schemas and location information. He then emphasises the complexity of legally safeguarding this type of information, especially with the expansion of big data usage and reuse. As the debate around balancing privacy protection and intellectual property rights intensifies, technological advancements may pave the way for resolving contractual challenges and establishing new dispute resolution mechanisms.

Understanding the Basics and Implications of Data Metadata

We differentiate between data and Metadata as data represents raw information collected and stored, while Metadata is the descriptive information about that data. Drew offers the example of a customer record. "Kennedy" would be the data under the field "Name," which is itself a part of the Metadata. Various types of Metadata include business, operational, technical, social, and provenance Metadata, with a growing focus on data lineage and regulatory compliance. Drew then highlights that while we've historically emphasised technical aspects, the importance of provenance and other Metadata types, especially in relation to legal requirements, is increasingly significant in fields like information management and content regulation.

Essential Concepts

Figure 8 Essential Concepts

Data vs. Metadata

Figure 9 Data vs. Metadata

Four Key Differences Between Data and Metadata

Figure 10 "Four Key Differences Between Data and Metadata"

Types and Categories of Metadata

Figure 11 Types and Categories of Metadata

Data Preservation and Legal Hold on Data

A discussion is then held on the concepts of provenance and preservation Metadata in relation to Data Management and legal holds. Provenance is compared to data lineage, emphasising the origins and history of data, much like tracing the source of grapes in winemaking. When a legal hold is placed on data due to legal queries, related documents cannot be destroyed or archived, overriding standard retention schedules. It's crucial to tag or classify this data in the Metadata repository to prevent tampering and ensure compliance, allowing legal teams to access necessary information during e-discovery. Establishing a flag for preservation Metadata within the Data Catalogue is highlighted as a vital practice for effective Data Governance.

Understanding the Difference between Data Provenance and Data Lineage

Drew shares on the debate regarding the distinction between data provenance and data lineage, particularly in the context of utilising AI and synthetic data. Data provenance refers to tracking the original source of data, including its historical context and authenticity, while data lineage focuses on the movement and transformation of data across various systems. In essence, provenance helps us understand where data originates and confirms its authenticity, whereas lineage examines how data changes and flows through processes. This differentiation allows for improved management of data by clarifying the perspectives of provenance and lineage, with the former often associated with business context and the latter linked to operational or technical Metadata. Notably, the term "provenance" originates from the art world, where it was used to verify the authenticity of paintings without physical examination, thus preserving their quality.

Data Quality and Verifiability in the Digital Age

The reliability of data is increasingly tied to the credibility of the organisations providing it, leading to the introduction of quality metrics that evaluate the trustworthiness of these sources. This is particularly relevant in various environments, including governmental, where rankings of data producers—such as individuals or ministries—impact the overall quality assessment of data sets. Provenance plays a crucial role in establishing this trust, as seen in frameworks like BUSI, which addresses the management of Master Data provenance. Furthermore, innovative technologies like blockchain are being utilised to track the history and integrity of assets, providing a clear pathway of ownership and data handling.

Metadata in Business

Drew then explores the importance of various types of Metadata in managing data effectively. Business Metadata serves as a comprehensive resource for understanding data concepts, calculations, and business rules, while technical Metadata provides detailed insights into data structures and storage. Operational Metadata tracks data movements, quality scores, and error logs. Additionally, social collaborative Metadata encompasses user-generated content like comments and tags, which is becoming increasingly significant in data democratisation. The discussion highlights the necessity of accurate Metadata, particularly in areas such as product and financial master data, to ensure Data Quality and consistency across organisations, with examples like reference data illustrating its practical application.

Types of Metadata

Figure 12 Types of Metadata

Examples of Metadata

Figure 13 Examples of Metadata

Other Types of Metadata

Figure 14 Other Types of Metadata

Understanding the Guidelines of ISO 11179 in Data Management

The ISO standard 11179 provides a structured framework for defining, describing, and managing Metadata, ensuring consistency, accuracy, and interoperability of data across systems. It emphasises the importance of clear definitions for data elements, which consist of various attributes, relationships, and usage contexts. Each data definition must be unique, distinguishable, singular, and concise, using commonly understood abbreviations. This often leads to extensive collaboration among data stewards and owners to establish precise definitions, as illustrated by the complexities of terms like "sale," where different variations (e.g., gross sales, net sales) require detailed clarification. These standards also encompass the lifecycle of data elements, from creation to retirement, addressing essential quality requirements.

ISO 11179 - Metadata Registry Standards

Figure 15 ISO 11179 - Metadata Registry Standards

Metadata Registry Naming Guidelines

Figure 16 Metadata Registry Naming Guidelines

Registry Standards and Data Modelling

Key registry standards for Metadata include the Dublin Core, applicable to diverse resources such as text, images, audio, and video; various guidelines like ISO/IEC 19770 for record management; and standards like ISO/IEC 27001 for information security. Drew also mentions standards like XML for data interchange, RDF for data modelling, and further standards, including XMP and MOD. Additionally, the FAIR principles (Findable, Accessible, Interoperable, and Reusable) emphasise the importance of unique IDs for Metadata elements and their alignment with data standards. When implementing these standards, it's crucial to adhere to agreed-upon internal and external standards, especially concerning legal compliance and data exchange formats.

Other Registry Standards

Figure 17 Other Registry Standards

More Registry Standards

Figure 18 More Registry Standards

Applying Metadata Standards

Figure 19 Applying Metadata Standards

Types of Content Metadata

The accessibility of data relies on consistent formatting and effective use of Metadata, which can be categorised into several types. Descriptive Metadata helps identify resources for easy retrieval—covering aspects like type, author, and subject—while structural Metadata outlines the relationships within a resource, such as the number of pages and chapters. Administrative Metadata manages the lifecycle of resources, detailing updates, versioning, and archival conditions. Additionally, bibliographic Metadata assists library cataloguing, and preservation Metadata outlines conservation rules based on storage conditions. Effective content Metadata must ensure searchability and accessibility, often drawing from existing patterns to address user queries comprehensively, highlighting the need for thorough documentation and attention to detail in Metadata Management.

Metadata for Unstructured Data

Figure 20 Metadata for Unstructured Data

Metadata for Unstructured Data Two

Figure 21 Metadata for Unstructured Data Two

Various Locations of Metadata in Different Systems

Metadata can be found in various locations within an organisation, including the application Metadata repository associated with your Enterprise Learning Platform (ELP) system, glossaries, business requirement (BR) tooling, and ETL tooling. Configuration Management Databases (CMDBs) hold significant Metadata about physical assets and technical operations, while Data Dictionaries provide technical Metadata. Integration tools, APIs, database management systems, data modelling and mapping tools, event messaging tools, reference data repositories, service registries, cloud platforms, spatial data, digital graphic data, project management (PR) tools, and business rules are also valuable sources of Metadata. Additionally, data lakes and other areas might contain further Metadata, underscoring its diverse availability across systems.

Sources of Metadata

Figure 22 Sources of Metadata

Importance of Metadata Architecture

Effective Metadata architecture is crucial for successful implementation, as a lack of proper design can lead to significant issues. Key components to consider include the Metadata life cycle—drafting, loading, validating, accessing, and retiring. Essential activities involve creating, sourcing (understanding origins and authorship), storage, integration, delivery, and usage management. Additionally, it's important to focus on data flow control, ensuring trustworthiness, availability, security, and overall management throughout the process.

Metadata Architecture Layers

Figure 23 Metadata Architecture Layers

Different Types of Metadata Architectures

There are three main types of Metadata architecture: centralised, distributed, and hybrid. In a centralised architecture, a Metadata portal collects Metadata from various tools (like BR tools, modelling tools, and ETL) into a single repository, allowing for easy access and potential updates back to the original tools. The distributed architecture offers a simpler model with a portal that directly interrogates data sources, ensuring access to the most current data but lacking a central repository. The hybrid model combines both approaches with a portal and a repository but functions as a one-way system, where the repository is updated from the source but not vice versa. Understanding these architectures is essential for effective Metadata Management, especially when dealing with rapidly changing data environments like data lakes.

Types of Metadata Architecture

Figure 24 Types of Metadata Architecture

Types of Metadata Architecture Two

Figure 25 Types of Metadata Architecture Two

Centralised, Distributed, and Hybrid Models

Centralised data systems offer high availability and speed, unaffected by third-party systems; however, they require complex processes to replicate changes from the source data, and delays in regular harvesting can lead to outdated information and increased costs. In contrast, distributed systems provide current and valid data directly from the source, resulting in simpler maintenance and minimal development intervention. However, they lack user-defined Metadata, and query quality depends on the source system's availability. Hybrid systems combine the advantages of both, ensuring up-to-date Metadata and good query performance, yet face challenges such as the unavailability of source terms and limitations in user-defined Metadata, which could significantly add value.

A discussion then revolves around clarifying the distinctions between centralised, hybrid, and bidirectional models in a presentation. An error in the slide displayed the hybrid model as centralised, while the correct arrows for the bidirectional model were not properly implemented. The need for a unified metamodel linking data from ETL processes to BI tools and applications was emphasised as part of the overall context diagram deliverables. Changes to the slides will be made to ensure clarity and accuracy.

Types of Metadata Architecture Pros and Cons

Figure 26 Types of Metadata Architecture Pros and Cons

Metadata Strategy and Requirements

An effective Metadata strategy must align closely with Metadata requirements, as developing a strategy without understanding these requirements can lead to significant challenges. It is essential to have a supporting strategy for Data Governance, Data Management, and overall data strategy to successfully manage organisational data and transition to a desired future state. Conducting a Metadata maturity assessment can provide a framework for enhancing Metadata Management, identifying key drivers, and recognising potential obstacles while defining the future architecture of enterprise Metadata. Emphasising the importance of this integration can lead to more effective implementation phases. Further discussion on this topic can be explored in the upcoming week.

Metadata Strategy

Figure 27 Metadata Strategy

Requirements for Building a Metadata Repository

To effectively develop a Metadata repository, several key requirements must be considered. These include identifying data creators, stewards, and users and establishing functional requirements for update frequency, version history, and access rights. Proper integration of Metadata from various sources is essential, ensuring unique naming conventions and clear rules for updates. Collaboration is critical to prevent overwriting changes and ensure data integrity. Maintaining high-quality Metadata is also paramount, as working with inaccurate data is detrimental. Security measures should limit the exposure of sensitive Metadata, protecting confidential information from unauthorised access.

Metadata Requirements

Figure 28 Metadata Requirements

Understand Metadata Functional Requirements

Figure 29 Understand Metadata Functional Requirements

Implementation of Metamodels in Systems

In developing a metamodel, it's essential to establish both a high-level conceptual model that outlines the relationships among systems and a more detailed lower level metamodel focused on attributes, elements, and processes. This model should encompass both logical and physical architecture, including a business and technical glossary. The logical data model features entities and attributes, while the physical data model includes data stores, file tables, and field columns. Additionally, a business cluster aligned with technical Metadata highlights business value, code sets, and the codified domain, complemented by a clear representation of the system and application architecture.

Create the Metamodel

Figure 30 Create the Metamodel

Strategy for Metadata Management in Organisations

To effectively implement Metadata Management in an organisation, teams should start by articulating the value of Metadata through cost savings and efficiency gains, particularly in environments utilising data lakes where locating and verifying data can be resource-intensive. Quick wins may include identifying and addressing flaws in data products, reducing duplication, and ensuring quality compliance through established Metadata repositories. Aligning Metadata initiatives with business objectives, such as precision retail or customer centricity, can foster engagement from business stakeholders and lead to actionable Data Governance. Rather than adopting a broad, indiscriminate approach, focusing on specific data domains can help highlight inefficiencies, such as unnecessary data redundancy, ultimately justifying the investment in a Metadata program.

Example Metadata Repository Metamodel

Figure 31 Example Metadata Repository Metamodel

Other Activities and Outputs

Figure 32 "Other Activities and Outputs"

Data Literacy in Organisations

A discussion starts on the critical importance of understanding Metadata and its role in achieving a common understanding of data among stakeholders, particularly around key performance indicators (KPIs). The challenge lies in differing interpretations, which can be illustrated through a lack of clarity in metrics. Maintaining a structured Data Catalogue is essential, as it serves as a centralised repository for Metadata, yet it requires careful management to avoid chaos. Highlighting the value of Metadata, especially in managing critical data elements within specific domains, can help make a compelling case to executives about the need for improved Metadata Management, ultimately leading to a clear roadmap and potential ROI.

The discussion then goes on to emphasise the crucial role of data literacy within organisations, highlighting the importance of a common understanding of data and effective communication among team members. Drew proposed using a business glossary and Data Catalogue as tools for induction and training, suggesting that these could replace outdated terminologies and specification documents. He referenced Steve Hoberman's approach of creating subject areas and conceptual models to facilitate onboarding by clearly defining core business concepts. This method has proven effective in introducing new employees to organisational data.

The Process of Business Definition Development

To establish a Metadata practice from scratch, it's essential for data stewards to capture business terms, as they possess the necessary understanding of these terms' meanings and usage nuances. While the Data Governance team can provide templates and assistance, they typically lack the in-depth knowledge of specific business concepts like "ranging" or "assortment planning." A successful approach includes building a community of subject matter experts (SMEs) who can collaboratively define terms, leveraging dictionary definitions and classification elements. Effective change management strategies are crucial for raising awareness about the business glossary and integrating it into organisational processes.

Find your Six Friends

Figure 33 "Find your Six Friends"

Word Relations

Figure 34 Word Relations

Definitions - Find your Six Friends

Figure 35 Definitions - Find your Six Friends

Highlight the Six Friends

Figure 36 Highlight the Six Friends

If you would like to join the discussion, please visit our community platform, the Data Professional Expedition.

Additionally, if you would like to be a guest speaker on a future webinar, kindly contact Debbie (social@modelwaresystems.com)

Don’t forget to join our exciting LinkedIn and Meetup data communities not to miss out!

Next
Next

Metadata Management for Data Managers