Data Engine Thinking with Dirk Lerner
Executive Summary
This webinar encapsulates a comprehensive exploration of modern data management through various critical themes. Dirk Lerner highlights the innovative approaches in his book, ‘Data Engine Thinking,’ which emphasises the necessity for flexible and automated data solutions in rapidly changing business environments. The webinar covers strategic visions for data management, aligning data landscape and governance with business operations, and the importance of data immutability and logistical processes.
The webinar shares key insights into data integration, data modelling, and data governance in shaping effective data architectures and warehousing solutions. Additionally, Drik addresses the impartiality of technology in data solutions and considers the impact of artificial intelligence on the future of data warehousing, ultimately offering a roadmap for businesses to navigate the complexities of data management in the digital age.
Webinar Details
Title: Data Engine Thinking with Dirk Lerner
URL: https://youtu.be/SicQ4BqmAyM
Date: 02/07/2025
Presenter: Dirk Lerner
Meetup Group: DAMA SA User Group Meeting
Write-up Author: Howard Diesel
Contents
Data Engine Thinking with Dirk Lerner
The Journey to a Flexible and Automated Data Solution: The Narrative of Fast Change Co
Rethinking Data Management: A Conversation on the Challenges and Solutions
Having a Strategic Vision for Data Management
Data Management and Strategy Alignment in Business
Data Landscape and Strategy in Business Technology
Data Immutability and Data Logistical Processes in Data Solutions
Data Integration and Governance in Business Operations
Data Modelling and Data Governance in Data Quadrants
Managing and Interpreting Data in Timelines
Data Solution Architectures and the Role of Information Models
Data Warehousing and ‘Data Engine Thinking’
Data Solutions and Technology Impartiality
The Role and Future of AI in Data Warehousing
Data Engine Thinking with Dirk Lerner
Howard Diesel opens the webinar and introduces Dirk Lerner. Dirk expressed gratitude to the attendees and shared the challenges of authoring. This included potential oversights in their work right before publication. Additionally, Dirk shared insights about his book, which features nearly 700 pictures and 350 graphics, indicating its comprehensive nature.
Figure 1 DAMA South Africa Webinar: 'Data Engine Thinking'
The Journey to a Flexible and Automated Data Solution: The Narrative of Fast Change Co.
In his recently published book, ‘Data Engine Thinking,’ Dirk discusses the fictitious company Fast Change Co., founded in 1888. The company serves as a case study in his presentations and training sessions.
“Fast Change Co,” a global enterprise with numerous departments and thousands of employees, was an early adopter of data warehousing, launching its first in the 1990s. However, their third data warehouse is now nearing the end of its lifecycle, presenting challenges in terms of extensibility and adaptability.
The company faces the critical task of developing a flexible, automated data solution to address current data management issues. Dirk welcomes the audience to join him on this journey toward enhancing data solutions and invites them to discuss his background further at the end of the session.
Figure 2 "FastChangeCo"
Rethinking Data Management: A Conversation on the Challenges and Solutions
Michael Müller joins Fast Change Co and identifies significant issues arising from the emergence of silos within the organization, primarily due to high expectations from the business for rapid data results and the limitations of the existing data warehouse technology. He realises that the data is underutilised and lacks meaningful insights, leading to inconsistencies in reports and data interpretations.
To address these challenges and maintain competitiveness, Michael proposes a comprehensive data initiative aimed at fundamentally rethinking the company’s approach to data warehousing. He assembles a team to develop a final, effective data solution, prioritising vision and strategy over mere technology improvement.
Figure 3 Michael Müller
Figure 4 Michael and the Team
Having a Strategic Vision for Data Management
The team recognised the importance of operating as a centralised data management centre to effectively bundle, coordinate, and implement use cases emerging within the fast-changing organisation. Their primary objective is to identify and leverage synergies while breaking down existing silos that arose from individual teams working in isolation due to a lack of speed and coordination.
They emphasised the need to design more business-driven processes, focusing on capturing business requirements rather than prioritising technology or source systems at the outset. This approach aims to ensure that solutions address actual business needs, allowing the fast-changing team to resolve issues as they arise and actively involve business stakeholders from the beginning—an improvement from previous practices.
The project vision emphasises the importance of developing a robust strategy centred around leveraging data models derived from business models, rather than starting with physical data models from source systems. Additionally, this approach aims to enhance knowledge transfer, understanding of the information landscape, and ultimately improve data quality within the organisation.
The strategy advocates for embracing agile methodologies, focusing on adaptability to change and navigating uncertainties in the data world, rather than strictly adhering to traditional agile practices. Furthermore, Dirk shares that a key lesson learned from previous data warehouse implementations is the need for inherently scalable solutions that can accommodate varying sizes and complexities, enabling reuse across different applications. This is referred to as the "Resurrection of Data with Meaning," highlighting its goal of revitalising data's role in driving business value.
Data Management and Strategy Alignment in Business
Dirk highlights the challenge of departmental silos in strategy development, particularly in data management, where individual departments may create divergent strategies, such as cybersecurity or AI, that do not align with a unified business goal. He emphasises the importance of aligning the Data Management Centre of Excellence's strategy with the overall company strategy, ensuring that all solutions developed are scalable and consistent with the overarching vision. Additionally, the necessity of revisiting this alignment during the development process is underscored, alongside a remark on the role of a well-structured information model and knowledge graph in testing this alignment effectively.
Data Landscape and Strategy in Business Technology
After establishing a vision and strategy, the team at Fast Change Co. recognised the need to learn from past mistakes, particularly their struggles with three unscalable data warehouses. They realised that previous projects had overly emphasised technology at the expense of business needs. Additionally, to align better with their business churn strategy, they focused on designing a new data landscape, represented through colour-coded domains that illustrate their high-level goals rather than a final solution.
Throughout this process, they consistently checked their design decisions against their vision and strategy to ensure they remained on track and avoided reverting to old habits. As the data was initially unsorted, they planned to categorise it into different domains, adding any missing elements before ultimately delivering it to various presentation areas.
Dirk then shares the necessity for innovation and research within existing data frameworks, particularly those housed in data warehouses. There is a recognised need to create areas for research and to integrate new data, enabling the development of prototypes and innovative solutions. The focus is on establishing a solution architecture that remains technology-independent while leveraging insights from data quadrants. Furthermore, this involves designing different architectural solutions tailored to various needs, such as real-time data integration, culminating in a reference architecture that aligns with overarching strategies and visions, ultimately allowing for adaptable information models across diverse technologies.
Michael emphasised the need for a versatile data solution that can adapt to future technologies, similar to how cloud systems emerged a decade ago. Additionally, Dirk shares that Roelant Vos, a contributor to the book, is working to unify the concepts of the data landscape, data quadrant model, and solution architecture.
Continuing, the team debated whether to maintain existing terminology, such as "haps," or to adopt more conventional terms, like "layers" and "areas." They decided to retain multiple perspectives within their architecture, recognising that different terminology helps convey ideas to various audiences, such as management and other departments. Dirk notes that the visual representation aims to clarify these concepts collectively, encapsulating the key themes discussed in the book.
Figure 5 The Data Landscape
Figure 6 Creating Solution Architecture
Data Immutability and Data Logistical Processes in Data Solutions
In the data solution architecture, it is crucial to maintain data immutability, starting from the staging area known as the data log hub, where data is stored in an unordered format similar to a database log. This approach ensures that all types of data are welcomed without bias, promoting error-free design. Additionally, as data moves to the integration layer, key principles are established for data logistics processes, referred to as modules, which must be atomic, target-centric, and independent.
This independence is crucial in preventing previous issues where repeating load processes led to inconsistent results. These processes must be auditable, scalable, failure-tolerant, and inherently bi-temporal, maintaining both business and technical timelines for every data load. Overall, this framework emphasises the importance of handling complex data solutions effectively.
Data Integration and Governance in Business Operations
Dirk discusses the importance of managing data complexity through a structured approach involving technical and standardised business timelines, which facilitates automation in data solutions. He emphasises the significance of a well-defined presentation layer for delivering integrated and derived data to users while allowing for additional data in specific cases.
‘Data Engine Thinking,’ according to Dirk, aims to address these complexities early in the process, supporting their methodology with resources from a dedicated Git repository that provides templates and SQL examples. Dirk then touches upon the concept of sense-making in achieving common goals, highlighting the foundational role of a data quadrant model in organising incoming raw data under strong governance.
The data management approach can be divided into four quadrants, each with varying levels of governance and development styles. In Data Quadrant 1, strict processes govern data integration and modelling, resulting in well-defined business rules and multiple versions of truth that may be department-dependent. Data Quadrant 2 features a more relaxed governance framework while maintaining a strong development style, allowing for flexibility in handling data.
In Quadrant 4, referred to as the research area or sandboxes, development becomes loose, with users able to utilise Python and access data from Quadrant 1 or their own data without predefined models, fostering innovation without governance constraints. Additionally, the staging area permits the incorporation of unsorted data, such as Excel files, with no development requirements, highlighting the technology-independent and informal nature of this phase.
Data Modelling and Data Governance in Data Quadrants
The data quadrant model facilitates communication about governance levels and the rationale behind varying degrees of data modelling across different quadrants. For instance, strong data modelling is applied in Quadrant 1, while Quadrant 4 sees minimal modelling.
This framework helps explain the necessity for distinct technologies and skill sets in each quadrant, addressing the mismatch between roles, such as placing a creative researcher in a governance-heavy environment or a data engineer in an innovative setting without guidance. Additionally, the concept of "sense making" is crucial, ensuring that team members grasp the importance of data modelling and apply consistent business terminology as outlined in the information model, thereby reinforcing a structured approach to data management.
Dirk emphasises the importance of precise information modelling in data management, particularly through fact-oriented modelling. While striving for clarity, the introduction of business rules can create inconsistencies in definitions across departments. For example, terms like "customer" may have varied meanings internally versus externally, leading to different interpretations.
Data Governance within a Data Management Centre of Excellence is crucial for maintaining consistent terminology, but flexibility may be necessary for specific use cases. Additionally, it's acknowledged that multiple timelines, both technical and business-oriented, are essential for comprehensive data representation, including contract start dates and changes. This complexity indicates a need for further discussion on establishing and managing these timelines effectively.
Managing and Interpreting Data in Timelines
The approach to managing two timelines focuses on distinguishing between event dates and true time stamps. Event dates are fixed moments, akin to receiving a time-stamped document, whereas a business timeline reflects changes in reality, such as fluctuating prices for products like blue jeans. When a price change occurs, it alters the business timeline, necessitating updates based on immutable data.
To maintain clarity and efficiency in data presentation, a state timeline is incorporated alongside standard columns, allowing for seamless automation and historical accuracy. Adjustments to these timelines can be made easily by altering the mappings between the immutable data set and the data presentation layer, ensuring that different use cases are catered to without compromising the integrity of the technical history.
Data Solution Architectures and the Role of Information Models
The team has completed various data solution architectures and selected a reference architecture for their Minimum Viable Product (MVP) to validate their concepts and assumptions. They emphasised the importance of starting with an information model, specifically an abstracted FCOM data model, rather than jumping straight to technology implementation. Additionally, the team has chosen the Fast Change Core as their state-of-the-art information model. With advancements in automation, they are now deriving metadata from this information model to streamline the creation of data work models, loading patterns, and modules. Their current task focuses on developing the information model.
In the development of their Minimum Viable Product (MVP), the team, led by business expert Philomena, mapped source systems to their information model, recognising that they could extract essential metadata and create necessary components using simple scripts. They acknowledged the value of design metadata derived from data models, which enables fast adaptation to changing technologies without losing intellectual property.
Through testing and reloading data, they discovered that their MVP laid the foundation for a future-proof data solution, which they believe will effectively automate and manage data challenges. Ultimately, they see this MVP as the culmination of their vision, positioning the "resurrection" as the last data solution they will ever need to build.
Figure 7 Return to the Information Model
Figure 8 "Resurrection is Now the Foundation"
Data Warehousing and ‘Data Engine Thinking’
Dirk shares that Roland and he have combined their 50 years of experience in the data business to create a comprehensive book that explores effective data solutions and engineering practices. Drawing on our extensive backgrounds—starting with my beginnings at Susa Linux in 2000 and our in-depth exploration of DB2 and data warehousing—we share insights on what works and what doesn’t in building flexible and scalable data solutions. Additionally, ‘Data Engine Thinking’ spans 700 pages, making it a substantial resource for those in the field.
The ebook and softcover versions of their publication are currently available exclusively on Amazon. Additionally, customers can reserve a limited-edition hardcover, with only 404 copies produced, by visiting our website at dataenginethink.ing.com under the "Orders" section. This special edition includes two notebooks, a storage bag, and other items.
A question arose about the relationship between the data landscape and traditional data models, such as subject area and enterprise data models. Dirk clarified that the data landscape is not intended to be a data model but rather serves to illustrate the process of how data is integrated, managed, and utilised within a new data solution, as this understanding has historically been unclear in the context of data warehousing.
The discussion emphasises the importance of effectively conveying the technology-independent nature of the data landscape to stakeholders, including sponsors and controlling departments, to enhance understanding of its necessity. Dirk and the Attendees explored the concept of integrating innovation and research components, often overlooked in the past, by allowing for a sandbox-like environment.
The conversation then shifted to the relevance of the buzzword "data lake house," highlighting that technology should be considered later in the process. Dirk shared his belief that it is crucial to avoid biases tied to specific technologies and instead adopt a flexible approach that incorporates various databases—such as relational, key-value, document, and temporal databases—based on the specific needs of data integration and governance.
Figure 9 'Data Engine Thinking'
Figure 10 Getting in Contact
Data Solutions and Technology Impartiality
Dirk emphasises the importance of a business-driven approach to implementing data solutions, encouraging organisations to define their goals and vision before selecting appropriate technologies. He highlights that rather than relying on a single database, businesses should utilise a mix of technologies tailored to their specific needs, including options like Microsoft, Databricks, and Snowflake, depending on their requirements. Lastly, the core philosophy remains consistent across different platforms, with variations in loading patterns, data definition language (DDL), and file structures, depending on the chosen technology. The conversation also suggests revisiting these concepts from different perspectives in future discussions.
The Role and Future of AI in Data Warehousing
In discussions with customers about the necessity of a data warehouse in the age of AI, Dirk emphasises the importance of clarifying the distinct roles each plays in data management and decision-making. ‘Data Engine Thinking’ analyses various factors, such as memory availability, technology used, and the frequency of data requests, to determine the optimal physical implementation, including the use of third normal form versus flat tables.
Environmental metadata supports this analysis, and AI significantly contributes to enhancing these decisions. When customers question the need for a data warehouse, it's essential to shift the focus to their business requirements and objectives: for instance, if they only need to generate a PDF report, a direct query from the operational system may suffice, making a data warehouse unnecessary.
The necessity of a data warehouse remains crucial for companies striving to thrive in competitive environments, despite some use cases where it might seem dispensable. While AI can process large data sets, it often falls short in areas requiring comprehensive financial analysis, such as calculating profitability after accounting for all associated costs.
Historical trends indicate that technology integrations, like Hadoop within data warehouses, suggest that AI will likely serve as an integrated interface within existing data solutions rather than a replacement. As organisations continue to develop data warehouses, the expectation is that AI will enhance user interaction with data systems, ultimately reinforcing the demand for robust data solutions to address complex business challenges.
Dirk then highlights the integral role of data modelling in the realm of artificial intelligence, emphasising that while large language models have the potential to enhance decision-making processes, they still rely heavily on well-structured data models. Additionally, he noted that AI cannot fully replace the need for human involvement in data modelling, as having a "human in the loop" ensures accurate interpretation and direction in the modelling process.
References to the European AI Act and similar regulations in the US further underscore the significance of human input. Ultimately, effective data modelling is crucial for addressing business requirements and generating valuable insights, reinforcing the notion that, despite advancements in technology, the collaboration between humans and AI remains essential for solving complex challenges.
If you would like to join the discussion, please visit our community platform, the Data Professional Expedition.
Additionally, if you would like to be a guest speaker on a future webinar, kindly contact Debbie (social@modelwaresystems.com)
Don’t forget to join our exciting LinkedIn and Meetup data communities not to miss out!