Data Works: When Precision Matters with Marco Wobben

Executive Summary

Communication has evolved from verbal to visual to electronic methods. Technology has had a significant impact on society. Effect-oriented modelling assesses the impact, while factory-oriented modelling is an approach to creating models. It is important to use proper grammar to express facts and visual aids to enhance information. In business, shared language and conceptual modelling are essential. CaseTalk is a valuable tool for creating and implementing conceptual models. AI text generation can be impressive but often inaccurate. It is important to train AI models using information models and to ask the right questions to understand IT products. Communication and understanding assumptions are crucial in expertise. Engaging with data users and using their language is essential. ChatGPT has many uses in the workplace, but challenges and misconceptions exist about using AI in writing. AI has limitations in addressing data quality and legal analysis. Connecting law and system design can be challenging. Data Vault Builder and natural language models can aid in data analysis. Context is critical in data analysis. Understanding risks and levels of accuracy is vital in data science and AI. Reasonableness is essential in assessing balance sheet statements. Responsible AI and data management are essential. Quality data products are critical for decision-making. Data modelling in AI can be challenging, but automation and standards can help with man-machine interaction. Verification is crucial when automating AI.

Webinar Details

Title: Data Works When Precision Matters with Marco Wobben

Date: 31 May 2023

Presenter: Marco Wobben

Meetup Group: DAMA SA User Group

Write-up Author: Howard Diesel

Contents

Evolution of Communication: From Verbal to Visual to Electronic

The Impact of Technology on Society

Effect-oriented modelling and Factory-oriented modelling

The Importance of Grammar in Expressing Facts

The Power of Visualising Information Grammar

Importance of Shared Language and Conceptual Modelling in Business

Building a Conceptual Model and Implementing Data Vault

Data Modelling and Automated Information Mapping

Impressive but Inaccurate Examples of AI Text Generation

The Use of AI in Writing

Training AI Models Using Information Models

The Importance of Asking the Right Questions in Understanding IT Products

Importance of Asking the Right Questions

The Importance of Understanding Assumptions and Communication in Expertise

Importance of Engaging with Data Users and Communicating in their Language

Use of ChatGPT in the Workplace

Challenges and Misconceptions of Using AI in Writing

The Limitations and Potential of AI in Addressing Data Quality and Legal Analysis

The Challenges of Connecting Law and System Design

Implementing DataVault Builder and exploring natural language models for Data Analysis

The Importance of Context in Data Analysis

The Challenges of AI in the Business World

Importance of Understanding Risks and Levels of Accuracy in Data Science and AI

The importance of reasonableness in assessing balance sheet statements and driving interaction

The Importance of Responsible AI and Data Management

Importance of Quality Data Products in Decision Making

Challenges of Data Modelling in the AI Era

Importance of Automation and Standards in Man-Machine Interaction

Automating AI and the Importance of Verification

Evolution of Communication: From Verbal to Visual to Electronic

Marco Wobben started discussing the topic of communication between business and IT, emphasising how challenging it is to understand each other. He showed a picture of his youngest child, with blue ink smeared on his hands. The child attempted to express his thoughts, but he couldn't express his thoughts clearly yet. This illustrates how more than visual representation may be required for effective communication.

Then, Marco presented an ancient picture that used written language, items, and pictures to communicate facts. This emphasises the importance of communicating effectively. He also talked about the evolution of communication methods from verbal to manual to mechanical methods, citing examples such as vacuum tubing and mailrooms.

Marco referred to the current era of electronic information spread, highlighting its fast-paced nature. He shared the historical context of a machine developed during World War II to calculate ammunition trajectory, showcasing technological advances. It is fascinating to see the historical development of technology.

Evolution of Communication

Figure 1 Evolution of Communication

The Impact of Technology on Society

In the early days, the cost of hardware used for calculations was steep, starting at half a million dollars. However, with technological advancements, hardware has become smaller and more powerful in processing capabilities. One prime example is the iPhone, which is significantly smaller than the first hardware but has more switches, allowing for faster calculations.

Technology offers numerous advantages, including eliminating repetitive tasks, promoting equity in society, and making knowledge more accessible. However, it also has its downsides, such as causing people to become lazy due to automation and leading to social disconnection.

The decreasing hardware cost and faster spread of information have positive and negative effects. If technology is not implemented correctly, there can be concerns regarding data security and quality and negative impacts on human life.

Balancing speed and potential risks is essential, as faster is only sometimes better. Planning and knowing where one wants to go is critical to achieving success.

Effect-oriented modelling and Factory-oriented modelling

The focus of Effect-Orient Modelling is to ensure that the effects are sustained throughout the implementation process. This method addresses the challenge of aligning the system implementation with the actual needs of business users, highlighting the importance of using the language of the business to capture communication.

Ineffective communication

Figure 3 Ineffective communication

The objective of Factory-Oriented Modelling as an approach is to capture and produce the necessary artefacts for production based on the language and communication of business users. The emphasis is not solely on technology or specific frameworks but on formalising and expressing human communication in a standardised way. This approach allows for stating facts, modelling them, and generating artefacts for system design and implementation.

Effect-oriented modelling and Factory-oriented modelling

Figure 4 Effect-oriented modelling and Factory-oriented modelling

The Importance of Grammar in Expressing Facts

Marco breaks down information and how it is communicated. He starts by noting that “Marco” and “Howard” are both “speakers” at a Meetup. And then, to specify that “Marco” presented at the Dharma South Africa Meetup on April 24th, 2023. These details enrich and clarify the facts he presents. Both “Marco” and “Howard” are categorised as Meetup speakers. Marco emphasises the importance of clear communication through the use of rich information grammar. This includes identifying the speaker's name as “Marco,” the Meetup's location as “Dharma South Africa,” and the date as “April 24th, 2023.” It is worth noting that different languages may express the same fact, but grammatical structure plays a crucial role in ensuring clear communication.

The Importance of Grammar in Expressing Facts

Figure 5 The Importance of Grammar in Expressing Facts

The Power of Visualizing Information Grammar

The utilisation of visual aids can improve comprehension of information grammar. Visuals offer a clear and concise representation of the information model. The structure and animation must correspond with the regulations and limitations of the information grammar. Determining limitations necessitates interviewing the business audience. Employing algorithms and generators, such as a tailored example, “Batman,” can assist in generating artefacts based on factual models. Logical diagrams, database scripts, and JSON schema files are effortlessly produced through this approach. This technique allows for precise and detailed data models in intricate business fields. It guarantees high-quality data and streamlines governance by tracing the origin of the information.

The Power of Visualizing Information Grammar

Figure 6 The Power of Visualizing Information Grammar

Importance of Shared Language and Conceptual Modelling in Business

When everyone in an organisation speaks the same language, they can work together more effectively. This is especially important when creating IT systems, where collaboration between business experts and IT professionals is crucial. Establishing a common language across the company helps ensure consistency and alignment.

Creating conceptual models is important when starting new projects or revamping existing systems. Remco's presentations explain how using assembler logical modelling and the Core Business Concept form can be beneficial. The Core Business Concept form categorises important concepts like events, people, and places, giving teams a starting point for discussion. It also includes related concepts, making it easier for everyone to understand the company's terminology.

Importance of Shared Language and Conceptual Modelling in Business

Figure 7 Importance of Shared Language and Conceptual Modelling in Business

Building a Conceptual Model and Implementing Data Vault

Specific examples and precise language are necessary to create a comprehensive information model. This involves transforming abstract concepts into a more accurate and practical structure.

Turning this model into an artefact requires following established algorithms that have been tried and tested over time. Data Vault is a mechanical approach to storing data in one central location, allowing for historical data representation from multiple sources and resilience to change.

Implementing Data Vault requires precise definitions and enrichment of the initial raw data vault to create a business vault that aligns with the Core Business Concepts. Beginning with an information model can aid in generating Data Vault and other related artefacts.

Building a Conceptual Model and Implementing Data Vault

Figure 8 Building a Conceptual Model and Implementing Data Vault

Data Modelling and Automated Information Mapping

During the data modelling process, hubs (coloured blue) are identified as common concepts, such as customers, branch locations, product order lines, and orders. These hubs have business keys to identify stored data and are interconnected through links (coloured green) to establish relationships, like customer-to-branch location and branch location-to-order. The attributes of these entities are grouped as satellites (coloured yellow). Some data can be redundant and stored as reference data. The information model can be colour-coded to represent the different components visually: blue for hubs, green for links, and yellow for satellites. AI is being discussed as a potential tool for automating data mapping and modelling processes.

Impressive but Inaccurate Examples of AI Text Generation

Although AI text generation can be impressive, it can also be highly inaccurate. To trust AI-generated text, Marco makes the suggestion that one should have experience with AI before developing a toolchain to work with it. For instance, the speaker requested a business scenario containing specific terms, but the generated text had grammatical errors and made-up information.

Despite the inaccuracies, the generated text can still be a starting point for discussions with domain experts. Furthermore, AI text generation can also write code, but accuracy is crucial, particularly in complex projects with multiple systems and tables.

Impressive but Inaccurate Examples of AI Text Generation

Figure 9 Impressive but Inaccurate Examples of AI Text Generation

The Use of AI in Writing

During a discussion about AI in writing, he points out the need for a more comprehensive class diagram and criticises the absence of a preferred attribute. The conversation shifts to another perspective on AI's role in writing, with a small model being discussed. To gather comments, definitions, and sample data to test assumptions, ChatGPT is consulted. Marco emphasises AI's ability to generate readable text from “hallucinations” and the importance of incorporating errors. When teaching AI, especially ChatGPT, requires providing data, structure, and grammar. Concerns are raised about training the AI on proprietary company jargon that may not be accessible.

Notes on the Use of AI in Writing

Figure 10 Notes on the Use of AI in Writing

Training AI Models Using Information Models

Starting with the effect-oriented model is a great way to train AI models effectively. The “Mary Blight” example, residing in Amsterdam, showcases how grammar is utilised in the model. To generate training data for the AI model, you can map overlapping features like names and cities—the discussion shifts from factoring as a modelling approach to artefacts and then to data-fold. In the information model, facts are crucial to tie everything together and authenticate the AI-generated output's accuracy. Verifying the end-product and comprehending the business's significance is vital. The presenter illustrates 27 dice forming a supposed perfect cube, emphasising the need for interpretation and precision in data analysis. Consolidating data in a warehouse is beneficial, but extracting precise meaning is still necessary. Using algorithms and dashboards can result in inaccuracies if the data isn't interpreted correctly.

Training AI Models Using Information Models

Figure 11 Training AI Models Using Information Models

The Importance of Asking the Right Questions in Understanding IT Products

In ‘The Hitchhiker's Guide to the Galaxy,’ the metaphor of not comprehending the meaning of "42" reflects the confusion surrounding IT products. It's important to question whether we asked the right questions, even if we find answers. Rather than providing answers, it's recommended that we flip our perspective and ask the questions from the answers. Remco Broekmans comments that using “stupid drawings” helps businesses better understand complex concepts. Similarly, an inaccurate AI-generated sentence can help stimulate conversation and break the tension around a possible misunderstanding. Communication and discussion are crucial when it comes to comprehending IT products. Overwhelming people with information can impede instant feedback and alignment. It's necessary to test assumptions to ensure accuracy, as people tend to seek confirmation rather than challenge their own thought processes.

The Importance of Asking the Right Questions

Figure 12 The Importance of Asking the Right Questions

Importance of Asking the Right Questions

Marco's approach of tailoring questions based on the answers given by business people highlights the importance of formulating the right questions. The community is encouraged to ask relevant questions about Marco's discussion. Marco stresses the significance of asking the right questions repeatedly. When conversing with experts from various fields, it can be difficult to ask precise questions. Language barriers can further exacerbate this challenge. In the Dutch language, there are intricate grammar rules that are challenging to explain, analogous to the difficulty of asking the right questions in a different field. Asking "silly" questions and presenting incorrect examples can aid in conveying complicated knowledge.

The Importance of Understanding Assumptions and Communication in Expertise

When one becomes an expert in a field too quickly, one may not fully realise the extent of their knowledge during an interview. It's important to recognise that there can be a considerable gap between what is heard and what is understood in any conversation. Understanding what is being said is based on assumptions and communication. Making mistakes and learning from incorrect examples is necessary for gaining true expertise. It's crucial to reverse-engineer assumptions and structures when dealing with complex systems such as data warehouses or data lakes. While having faith in AI technology is important, it's also essential to question assumptions to avoid false beliefs. When AI fails, it can provide valuable insights and help us better understand our assumptions.

Importance of Engaging with Data Users and Communicating in their Language

During a discussion, Marco uses an example of using data from 2017 but notes an error with the data as it is off by one month. Engaging with business and data users is important to uncover errors like this, as they are the ones using the data. Simply transferring data from one location to another may not reveal such issues. While data is an important element, it is only part of the story. Effective collaboration requires communication in the language of data users. Learning how to communicate in its language is crucial to work with AI. When choosing AI tools, it is important to understand the core problem or goal.

Use of ChatGPT in the Workplace

Before asking questions in a company setting, a member of the webinar believes it is crucial to conduct research. They have developed a product called Case Up, which enables users to outsource basic interview questions to an AI called ChatGPT. The user sees ChatGPT as a "good cop bad cop" situation, with human analysts and mothers playing the role of the good cop and ChatGPT acting as the bad cop. Using AI like ChatGPT in the workplace can enhance critical thinking and engagement. However, there is a potential for individuals to misuse ChatGPT to appear knowledgeable in areas where they lack expertise. Ensuring that sound assumptions are used to construct arguments or models is essential.

Challenges and Misconceptions of Using AI in Writing

People are heavily relying on ChatGPT without considering its limitations and assuming that it never makes mistakes due to their belief in its vast knowledge. A member of the Webinar expresses his frustration with the use of ChatGPT resulting in errors piling up because of the above assumption. He suggests an example of the diversity in the Norwegian language. Usage presents a challenge for data modelling as there is no standard version, requiring an understanding of the region and individual writing styles. AI cannot comprehend the person behind the text, relying solely on the provided data. Marco agrees that even without AI, issues exist with experts and data quality within organisations.

The Limitations and Potential of AI in Addressing Data Quality and Legal Analysis

It may not be wise to rely solely on AI to address problems with data quality, engineering, and communication problems. These issues often result from human error or disorganisation. While AI can improve human capabilities, it cannot replace human involvement entirely.

When it comes to the legal field, there is often ambiguity and inconsistency in legal articles, which are created to accommodate different stakeholders. In the Netherlands, efforts are underway to connect legal pieces to government data. However, this process of readjustment is expected to take several years.

The Challenges of Connecting Law and System Design

When laws change, it can be confusing and uncertain where to start adjusting multiple systems. The complex process of linking data to laws and regulations is often unclear. While ChatGPT AI cannot truly understand the processed text and engage in thoughtful analysis or self-correction, the development of AI systems that combine logical reasoning with textual processing is underway. However, current AI technology is limited to probabilistic predictions. Even professional writers can make mistakes in AI-generated text due to language barriers. Collaboration between writers and automation tools like CaseTalk software can simplify the process of linking language to system design and automate the creation of templates.

Implementing DataVault Builder and exploring natural language models for Data Analysis

During a discussion, Marco explained how easy it is to use DataVault Builder as a source system for input and output. This tool includes transformation and generator features that allow creating of a database similar to a data field model. However, Remco's template lacks fact statements that represent facts, despite providing Core Business Concepts.

Recently, Azure has introduced the ability to apply natural language models to structured and unstructured data sets. This application can assist in identifying statements that represent facts within organisational documents. Implementing a constraint is necessary to ensure the model is robust and capable of handling various conditions.

While data science and AI systems are helpful for data modelling, Marco is cautious about relying solely on them.

The Importance of Context in Data Analysis

To ensure accurate analysis, it's essential to understand the context of data. Technical data alone isn't enough; knowing the business context is crucial. Non-structured data also requires context to make sense of it. Failure to consider context can lead to errors and delays. Experts working with data provide the necessary context for proper analysis. It's important to consider consistency, quality, and accuracy when moving from exploration to production, especially when processing data without considering context can result in incorrect conclusions.

The Challenges of AI in the Business World

Marco feels that the potential benefits of AI are often exaggerated, while the amount of effort required needs to be considered. There is a significant amount of data debt that has accumulated over time, with origins and meanings that remain unknown. Unfortunately, AI tends to compound this problem rather than resolve it. Over the last two decades, there has been a shift in roles, with data modelling and data warehousing responsibilities increasingly falling on software engineers. The challenge is to adopt AI responsibly. The creators of AI are unlikely to take accountability, often blaming faulty assumptions instead.

Importance of Understanding Risks and Levels of Accuracy in Data Science and AI

Ronald Domhoff, the Chief Data Officer at the Ministry of Safety and Justice in the Netherlands, developed the four quadrants approach to data science, recognising the importance of having less rigid and accurate data.

In fraud detection scenarios, data quality can vary, where precise data from banks may be merged with unclear data from other sources.

Data products are rated based on accuracy, with levels ranging from F for uncertain data to A for completely accurate data needed for reporting to the European Union. Marco offers this as an example.

However, Marco suggests that data science systems deployed should meet a minimum accuracy level of "C" to guarantee that all the necessary components are in place.

The inclusion of AI in data science demands understanding what is being produced and why, as well as accountability and the transfer of products.

The importance of reasonableness in assessing balance sheet statements and driving interaction

Howard has proposed a quality dimension called Reasonableness, which can be used to evaluate the accuracy of balance sheet statements. While the value of a specific line item on a balance sheet cannot determine its accuracy, AI can be employed to assess its reasonableness. This technique serves as a warning for potential issues, although further analysis by a human is necessary. In the event of any uncertainties, analysts at the Central Bank would reach out to the reporting entity for clarification. The Reasonableness technique has proved helpful in driving necessary interaction and providing valuable insights. Additionally, we discussed the business case for Marco and the process for obtaining the required budget and sponsorship. It's worth noting that awareness around data is changing, and there's been a shift from software engineering to data warehousing.

The Importance of Responsible AI and Data Management

Marco notes that it is now widely recognised that data can't be treated as just information - it needs to be accurate and handled responsibly. Unfortunately, the Netherlands has experienced major data mishandling scandals in recent years. The Dutch National Bank has also faced similar issues, resulting in millions of Euros in fines due to data discrepancies. To address these issues, multiple data modellers have been employed.

Government bodies are leading in addressing data debt to comply with the law and protect their careers. However, a communication gap between domain experts in business and technology needs to be bridged with improved communication.

Municipalities in the Netherlands are seeking to redo projects to ensure accuracy in their systems, particularly those providing advice to citizens. Meanwhile, commercial organisations require motivation from their board of directors to prioritise data understanding and management. Although several commercial attempts have been made to showcase prototypes and augmented reality dashboards, the associated costs often pose challenges.

Importance of Quality Data Products in Decision Making

When striving to achieve a high-quality data product, it is essential to be thorough and present a compelling argument to decision-makers. Unfortunately, despite understanding its significance, decision-makers often prioritise speed and efficiency over quality.

To address this issue, it is crucial to have open discussions about the required level of quality and potential consequences of subpar data products. The reputation and credibility of corporations and governments are on the line, making it necessary to convey the financial benefits of investing in quality data products.

However, resistance to data governance can arise, leading executives to participate actively and listen to examples of negative outcomes. Using data models and well-defined concepts can help minimise problems and differences in interpretations.

All stakeholders, including executives, IT professionals, and modellers, must prioritise the quality of data products. Drawing examples from the finance sector can effectively illustrate the importance of quality in decision-making.

Challenges of Data Modelling in the AI Era

The speaker emphasises the importance of providing upfront information to avoid misunderstandings. Data is omnipresent in business and has become a significant part of IT. However, there is a lack of proper communication and an organization-wide approach towards data. The speaker points out the need for a solid program to deal with data.

Although large language models can generate data models, the accuracy of these models is uncertain. Marco highlights the issue of verification and the lack of clarity on who can verify the accuracy of these models. The success of AI algorithms depends on human-machine interaction and verification. The discussion stresses the critical role of verification in AI.

Importance of Automation and Standards in Man-Machine Interaction

In the production and deployment of code, it's crucial to have effective interaction between humans and machines. One way to achieve this is by defining and enforcing standards, such as data governance, to guide human developers. To ensure the quality of the code, verification and validation processes like peer reviews and multiple review cycles are necessary. While automation can help enforce standards and controls, complete automation cannot replace the need for human involvement. Factory-oriented modelling can demonstrate the significance of automation and standards. However, challenges arise in verifying and enforcing even the most straightforward rules using AI.

Automating AI and the Importance of Verification

Using a strict algorithm rather than a statistical one is important to automate AI effectively. The algorithm should be able to compare new data with previously collected data, which raises concerns about the accuracy and potential biases in the training data. In the upcoming years, universities will prioritise establishing ethical and responsible standards for AI development. The topic of what data they are feeding into AI is intriguing and not frequently discussed. Verification is essential in guaranteeing the accuracy of AI. Although AI can assist in resolving technical problems, it does not provide direct answers. Therefore, clear expectations are vital when utilising AI.

If you would like to join the discussion, please visit our community platform, the Data Professional Expedition.

Additionally, if you would like to be a guest speaker on a future webinar, kindly contact Debbie (social@modelwaresystems.com)

Don’t forget to join our exciting LinkedIn and Meetup data communities not to miss out!

Previous
Previous

Datalytics with Debbie Botha

Next
Next

Data warehousing from Conceptual to Physical - Corné Potgieter