How to Evaluate a Data Vault Warehouse Automation Tool & Why Willibald Will Help

Executive Summary

Discover some fascinating topics that may interest you!

These include analysing data tools in an online store, the importance of delivery dates and partnership associations in e-commerce, exploring the relationship between gardening associations and point-of-sale data models, key factors to consider when testing data warehouse automation tools, challenges in managing relationships in data vault, perspectives on data vault modelling and automation, discussing automation tools and data integration, an overview of common data management challenges and their solutions, the significance of data warehousing and related topics, feedback on Insert Statement and Presentation, tips for implementing data modelling and technical patterns, integrating database automation tools with Azure, implementing automation tools and data integration in software development, the benefits of data modelling, and selecting the right tools for creating a data warehouse.

Please don’t forget to join the Meetup group to avoid missing webinars.

 

Webinar Details

Title: How to Evaluate a Data Vault Warehouse Automation Tool & Why Willibald Will Help

Date: 07-July-23

Presenter: Michael Müller

Meetup Group: INs & OUTs of Data Modelling

Write-up Author: Howard Diesel

Contents

Executive Summary

Webinar Details

Real-life examples and evaluation of data tools in a web shop setting

Importance of Delivery Date and Association Partners in Online Sales - Notes

Association of Gardening Association and POS Data Model

Critical Considerations for Data Warehouse Automation Tool Testing

Challenges in Modelling and Handling Relationships in Data Vault

Perspectives on Data Vault Modelling and Automation

Discussion on Automation Tools and Data Integration

Overview of Data Management Challenges and Solutions

Importance of a Data Warehouse and Related Topics

“Insert Statement” and Presentation Feedback

Insights on implementing data modelling and technical patterns

Implementation of automation tool and data integration in software development

Benefits to Data Modelling and Selection of Tools for Creating a Data Warehouse

Real-life examples and evaluation of data tools in a web shop setting

In this session, Michael Muller shares his expertise in evaluating data warehouse automation tools and his involvement with the German Data Vault group. He conducted a test based on data modelling to simplify the selection process.

Michael emphasises the importance of drawing individual conclusions when evaluating data tools, as each company has unique needs and situations. Different expectations from automation tools make it necessary to evaluate them based on real-life examples. Regardless of the techniques used, the primary goal is consistency and capability.

To help users assess whether the tools can handle their specific situations, an invented web shop called “Willibald” was used as an example, incorporating different data challenges. The evaluation involved 70 participants and four other tools (DataVaultBuilder, Vaultspeed, DBT, and DBT with Data Vault), and the results and descriptions can be downloaded from the web portal.

The data of the example web shop is accessible on a GitHub repository under a Creative Commons license, allowing users to utilise and modify it with proper credit. The web shop example features a traditional family-run company, with a senior and junior chef contributing to the data requirements.

Figure 1 Real-life examples and evaluation of data tools in a web shop setting

Importance of Delivery Date and Association Partners in Online Sales - Notes

The head chef at the shop tends to voice peculiar opinions and make unexpected moves. The company's primary sales channel is its online shop. Timely delivery is crucial for customers who require plants immediately for planting. If the plants arrive too early, customers have to care for them until planting time. Conversely, customers will have gaps in their gardens if they arrive too late. Shipping partners are crucial, particularly for special transport items.

The data model follows a standardised sales model, with the customer, product category, and order position as critical components. However, an error in the model involves associating the customer and finance partner with an association partner, which refers to NGOs where gardening enthusiasts congregate to receive discounts.

The association head is saved as a contact person in the data model. Still, the mistake lies in marking the contact person as a separate entity without considering other customers in the association. Association partners are significant in marketing, and the company intends to visit them to increase sales by hosting a small party while giving 2% of the day's overall revenue.

Output data

Figure 2 Output data

Association of Gardening Association and POS Data Model

The partnership between the Gardening Association and Willibald is mutually beneficial, allowing Willibald to avoid wasting seasonal products.

The POS data offers a comprehensive data model that combines order and position details in a single file. Orders can be linked to an association that differs from those recorded in the customer's data. Each position in the data includes the order ID, customer ID, and association partner ID.

Testing the data model focuses on how effectively it solves problems rather than comparing patterns. The objective is to ensure that the Data Mart is correctly integrated with the data for the automation tool to be effective.

Output Data 2

Figure 3 Output Data 2

Critical Considerations for Data Warehouse Automation Tool Testing

Our data warehouse automation tool now incorporates the Data Mart, which poses a challenge due to the large volume of products and revenue involved. To address this challenge, the tool is designed to provide information on the number of open orders, completed orders, and any deviations between requested and actual delivery dates.

Partnerships can be established in two ways - directly through the order or through the customer, adding a role-playing dimension to the tool.

To ensure the accuracy of the data, testing criteria include verifying the raw data from the source table and establishing hierarchical links.

As we approach three consecutive deliveries on three Fridays on March 21, the categories of data will also change, requiring meticulous testing.

In scenarios where weak keys represent interruptions in living time and have no business associations, a test case for a multi-active satellite is particularly important.

Figure 4 Critical Considerations for Data Warehouse Automation Tool Testing

Challenges in Modelling and Handling Relationships in Data Vault

It is crucial to store business objects, particularly when facing technicalities that need to be covered by the business key or have multiple representations over time or in different languages.

Evaluators must observe how relationship handling is conducted. This includes testing cases for relationships that identify relationships between orders and positions.

It is important to handle scenarios of changing driving keys, especially when customers change over time. Correctly managing relationship changes is essential to prevent dimension association partner issues and avoid creating double keys.

Various methods of modelling and managing many-to-many relationships without their own keys should be explored. Future needs for the key in handling deliveries should be taken into consideration.

It is worth considering if different modelling approaches are recommended for different frameworks or if the focus is solely on testing automation.

Figure 5 Test cases in the data model, ID relationship and driving keys

Perspectives on Data Vault Modelling and Automation

Michael is concerned about whether the automation can meet their requirements and accurately manage data. Assessing one's situation and selecting the appropriate tool to support their needs effectively is recommended.

Howard is curious whether various data vault modelling methods offer distinct ways to address challenges. Remco agrees with Michael and clarifies that there are diverse viewpoints on data vault modelling. It's essential for tools to provide solutions that cater to each user's specific requirements.

Remco has created a Ensemble Logical Modelling (ELM) template to handle data modelling from their perspective. Howard asks whether different approaches propose different models to solve issues.

Michael responds that while most people have their preferred approach, they may suggest alternative models if necessary.

Test cases in the data model, m:n table without own key

Figure 6 Test cases in the data model, m:n table without own key

Discussion on Automation Tools and Data Integration

During a recent conversation with a tool vendor, Michael explored a different approach to automation. Although the vendor's preferred method differed from Michael's suggestion, they confirmed they could handle it.

Michael raised concerns about whether a later process could link to the approach they discussed, stating that having a hub is crucial for successful integration. The conversation focused on automation tools and their solutions rather than the correct approach to data tables. Michael recommended using ELM templates to understand the context of integration better. He also emphasised the importance of early integration, highlighting the need to separate tables containing multiple business objects. For example, he suggested splitting the order table into two separate tables in the stage for better integration.

He stressed the significance of integrating data based on business keys in Data Vault. To test early integration, he highlighted the ability to split a file into different tables without altering the data.

Figure 7 Test cases in the data model, Integration of the order

Overview of Data Management Challenges and Solutions

When it comes to Willibald, it's crucial to have a reference table and a historical reference table that can be loaded separately for convenience. Duplicate data in test cases must be appropriately handled without causing issues during loading. To prevent interruptions during the loading process in the middle of the night, error messages should be implemented.

Accuracy tests should be conducted on rules that don't involve changes to the customer's business key data. Customer data deletion must be handled carefully, mainly if it affects KPI calculations. Deletion of orders and delivery addresses without customer data must be addressed.

Identifying and resolving both intentional and unintentional data quality issues is crucial. All dimensions should be consolidated, and results presented in a table format.

Business rules should be standardised, including association partner linkage and customer acquisition through credit card numbers.

Incorporating lineage is necessary to keep track of data flow from stage to Data Mart.

Overarching functions should be implemented, such as data lineage orchestration and process step scheduling.

Error handling is crucial throughout the entire data management process.

Figure 8 Test cases in the data

Importance of a Data Warehouse and Related Topics

Using a data warehouse enables efficient error handling and message sending for resolving issues. Without error handling, the athletic department would have to oversee the process manually and make phone calls in case of errors. The Data Mart feature of the warehouse can readily load and display errors, promoting transparency and aiding error correction. The combination of error handling and the data warehouse creates a smooth, seamless process.

The deployment and integration of the warehouse are of particular interest and concern, which entails selecting databases, deployment options (on-premise or as a software solution/service), and entity translation. The conversation on the topic includes a repository link that provides comprehensive details in both German and English. Test cases, web shop inserts, and C-Suite files are accessible for database loading, along with supplementary descriptions.

The web shop and roadshow cases involve addressing issues such as duplicate and missing business keys and potential problems with SQL queries.

Figure 9 Overarching Functions

“Insert Statement” and Presentation Feedback

The contestants needed to demonstrate the skill to integrate supplementary information regarding inserts. Various tools were used during the meeting for problem-solving and examining many-to-many tables. Concise PDFs were provided for additional clarification on how "inserts" were resolved. Within the next fortnight, the presentations will be supplemented with videos. The vendors' lighting products are exhibited on a different website. The Data Vault User group website offers English explanations on tax-related matters. The website includes a comprehensive overview of the taxes and their respective details.

Figure 10 Translation of Entities

Insights on implementing data modelling and technical patterns

During a conversation, Corné shares his interest in the outcomes of a project and highlights the necessity of focusing more on implementing data modelling in the industry.

Howard asks if Corné's experience implementing these concepts in practical scenarios. Corné acknowledges that there are disputes and discussions surrounding certain aspects of data modelling, particularly in the modelling efforts of structuring a data vault. In response, Michael stresses the significance of comprehending technical patterns and their application in resolving modelling issues. He also advocates for a diverse approach that allows flexibility and adjustment if necessary.

Insights on implementing data modelling and technical patterns

Figure 11 Insights on implementing data modelling and technical patterns

Implementation of automation tool and data integration in software development

During a heated discussion, Gauchet and Michael debated the implementation of a specific tool into Sass. Fortunately, the automation tool was a lifesaver for all parties involved. Gauchet stressed the importance of integration for larger systems, such as an ERP or Azure warehouse system. Meanwhile, Michael highlighted the importance of data modelling and linking source data with business objects. The automation tool conveniently generates the STL and is compatible with various systems, though system support may vary. Remco chimed in, noting that most tools now utilise information from the business glossary and model in the automation process. Previously, tools required manual data vault creation, but now, integration is a must-have. This is good news for individuals like Michael.

Benefits to Data Modelling and Selection of Tools for Creating a Data Warehouse

According to Michael Müller, it is essential to assess the needs and capabilities of each department before choosing tools for the data warehouse.

Conducting pre-evaluations of tools can significantly reduce the time taken and simplify the selection process.

With efficient decision-making, the tool selection process can be shortened from six months to 2-3 months.

If you want to receive the recording, kindly contact Debbie (social@modelwaresystems.com)

Don’t forget to join our exciting LinkedIn and Meetup data communities not to miss out!

Previous
Previous

Transitioning from Business Intelligence to Decision Intelligence with Erwin Bisschops

Next
Next

The Benefits of Data Centralisation and Automated Reporting