Modernising Data and Forecast Management with SDMX and Semantic Standards with Dr. Daan Steenkamp & Aidan Horn

Executive Summary

This webinar highlights the significance of the Statistical Data and Metadata eXchange (SDMX) framework in enhancing Data Management and economic analysis. The application of SDMX facilitates effective data tracking, understanding of economic concepts, and establishment of data relationships, thereby improving reporting strategies.

Webinar Details

Title: Modernising Data and Forecast Management with SDMX and Semantic Standards with Dr. Daan Steenkamp & Aidan Horn
Date: 23/06/2025
Presenter: Dr. Daan Steenkamp & Aidan Horn
Meetup Group: DAMA SA User Group
Write-up Author: Howard Diesel

Understanding the Application of SDMX in Data Management and Economic Analysis

Howard Diesel opens the webinar and introduces Dr. Daan Steenkamp and Aidan Horn to the audience. Daan Shares that their presentation will focus on the SDMX (Statistical Data and Metadata Exchange) standard and its significance in Data Management and reporting, especially within the context of economic and financial data. Furthermore, Daan highlights the introduction of semantic standards, the creation of a data glossary using the SDMX standard, and the automation of Data Management workflows.

The presentation, Daan shares, will provide a summary of SDMX, its importance, and an exploration of what a Data Catalogue entails, aiming to share valuable insights applicable to the broader Data Management community. The presenters, including Aidan for technical queries, emphasise that the team’s background in economics and data science underscores the relevance of their examples to a wider audience.

SDMX is an open-source standard promoted by various multilateral institutions, including national statistical agencies such as Eurostat, as well as organisations like the ECB, IMF, and World Bank. It facilitates modern Data Management in compliance with international best practices, enabling a range of use cases from basic to complex data projects.

Emphasising semantic data modelling, SDMX ensures that the meaning and context of data are preserved, which is crucial for effective data sharing between teams and institutions. This standard supports the creation of data dictionaries and taxonomies, enhancing data governance by providing clarity on data sensitivity, origin, and ownership, while facilitating auditing, automating processes, and documenting data.

Figure 1 Modernising Data and Forecast Management with SDMX and Semantic Standards

Figure 2 Presentation Outline

Figure 3 “Why SDMX?”

Figure 4 Why SDMX Pt.2

Application of SDMX in Data Management

SDMX is applicable for various data sets, and its strength is in handling time series data with lengthy historical records. While cross-sectional data can also be accommodated, most current use cases focus on time series formats. Additionally, SDMX excels in contexts where data is regularly produced and tracking lineages and transformations is crucial, making it particularly useful within banking, finance, and statistical agencies. Furthermore, there have been small-scale projects involving mixed data types, combining numeric time series with associated text.

Data Tracking and Understanding Definitions in Economic Concepts

In the field of economics, precise definitions of concepts like Gross Domestic Product (GDP) are crucial, as these definitions can vary by source and may change over time. For instance, GDP can be presented in either current or constant prices, and may utilise different base years, leading to variations in reported figures from institutions such as the Reserve Bank or national statistical agencies.

Understanding the relationships between various economic indicators and the methodologies used for data transformations is essential for accurate interpretation. Moreover, the quality and provenance of data, including proprietary or sensitive internal series, must be carefully tracked. With a focus on the potential for data revisions, especially in official statistics, it is crucial to document changes in data structure and methodology. Tools like SDMX provide the necessary flexibility to manage this information effectively.

Figure 5 Definitional Consistency of Economic Concepts

Data Relationships and Reporting Strategies

The SDMX information model provides a Metadata-rich framework for managing data, specifically designed for statistical agencies and institutions requiring data sharing across diverse entities. It offers flexibility to accommodate various data types while incorporating features for reporting data structures, allowing for easy construction of data glossaries and auditing of data creation processes.

Users can define Metadata structures that describe datasets, attach important information such as data providers and provision agreements, and create categorisations to facilitate comparisons across different data sources. Overall, SDMX promotes standardisation within teams or institutions while incorporating governance and process considerations essential to effective Data Management.

In a discussion regarding data structure flexibility, Aidan highlighted the importance of how datasets can be structured differently based on their intended use, particularly through the lens of Metadata. He pointed out that in SDMX, Metadata is primarily defined by dimensions and attributes, forming a foundational Metadata table comprised of various concepts.

This structure can vary across different agencies, as exemplified by Eurostat’s use of the D-Cat API, which enables the exchange of Metadata and time series information with other countries, such as Greece. The aim is for agencies to adopt a common set of Metadata schemes in the future to facilitate easier data transfer. Additionally, Don’s inquiry about the relationship between domestic activity and international trade was noted, emphasising the role of category links in these connections.

Daan then discusses the connection between time series data related to domestic activity and international trade or asset prices. He highlights the importance of categorisation. Additionally, Daan suggests that while a category schema could serve as a method of linking these series, more complex relationships might emerge through the use of knowledge maps or graphs, allowing for diverse views of economic concepts from various sources.

This flexibility in data representation facilitates tailored reporting at different organisational levels, such as institution-wide glossaries versus team-specific needs. The conversation aims to clarify these connections further, with input from Aidan, a subject area specialist.

Figure 6 “SDMX Information Model”

The Applications of SDMX Data Management

The SDMX framework enhances the interpretability and exchangeability of data by allowing the attachment of various Metadata types, such as data structure details and provenance information. This approach enables the creation of reporting taxonomies and data glossaries while facilitating both bottom-up and top-down processes.

For institutions, particularly large ones, it simplifies the creation of a cohesive Data Management system by ensuring that all datasets are structured for modern management practices. Additionally, SDMX enables efficient data dissemination in multiple formats, as it facilitates machine readability, supporting the automation of quality assurance and sharing processes. This flexibility is beneficial for large organisations with rigorous governance requirements and for startups that need a balance of control and adaptability.

Figure 7 SDMX Formalises Processes for Validating

Figure 8 SDMX Enables Automated Reporting and Dissemination

The Challenges of Data Registries in Institutions

The benefits of SDMX include rapid modernisation of Data Management through efficient data ingestion, where Metadata is attached to the data as it is processed. This enables automated data ingestion and discoveries, allowing users with appropriate clearance to access available datasets.

SDMX facilitates the automation of downstream processes, such as forecasting and reporting, and supports the creation of APIs for querying both data and Metadata. Its interoperability is particularly valuable, as it allows users to leverage various analytical tools—such as Excel, R, or Python—to analyse data, future-proofing the Data Management approach against evolving software landscapes.

Aidan highlights the utility of LLMs in processing data that is formatted for machine readability, particularly within SDMX datasets. Additionally, he shares his experience of using an LLM, Gemini, to summarise 5,000 economic concepts into the 300 most significant indicators for a project. While LLMs are primarily adept at handling textual data rather than databases, the conversation underscores their potential for enhancing data ingestion and structuring processes in economic analysis.

A modern Data Management framework enables various use cases by incorporating structured data, allowing for the integration of LLMs and tools like RAG to enhance understanding of data visualisations. Establishing a Data Catalogue is crucial, especially for large institutions, as it provides an inventory of data assets amid the challenges posed by diverse Data Management systems.

A well-structured Data Catalogue helps institutions comprehend their data, ensures proper governance, and enhances the discoverability and understanding of data assets; yet, many institutions often create these catalogues late in the process. The SDMX approach offers a significant advantage as it allows for the seamless creation of a Data Catalogue directly from managed data.

Figure 9 SDMX Enables Automation and Interoperability

Figure 10 Why a Data Catalogue?

Figure 11 “What is a Data Catalogue?

The Importance of Data Capital and Management in AI Initiatives

A Data Catalogue is essential for making data discoverable across different teams within an organisation. It allows users to quickly locate data, understand its origins, and engage in follow-up discussions regarding any uncertainties. Additionally, key aspects include identifying data ownership, tracking changes over time, and ensuring data is managed effectively.

This maturity in Data Management is crucial for implementing AI initiatives, as it enables access to modern tools and facilitates data sharing between teams and institutions, thereby breaking down silos. Furthermore, a centralised approach can lead to cost savings by reducing redundancy in data storage and fostering a common vocabulary for data interpretation, which is vital for accurate data usage. Ultimately, effective Data Management lays the groundwork for monetising data and creating valuable data products.

Figure 12 “Why a Data Catalogue?”

The Implementation of Data Registries in Institutional Settings

A modern Data Management framework is essential for enabling diverse use cases by incorporating structured data, particularly with the integration of Large Language Models (LLMs) and tools like retrieval-augmented generation (RAG) to improve data visualisation comprehension. Establishing a Data Catalogue is crucial for organisations, particularly large institutions, as it fosters transparency, accountability, and trust in data usage while allowing for effective governance and discoverability of data assets.

Daan shares that a Data Catalogue enhances collaboration across teams by making data easily accessible, tracking ownership and changes, and reducing redundancies, which ultimately supports AI initiatives and data monetisation. Furthermore, adopting a standardized approach, such as the SDMX framework, can alleviate silo issues by promoting a common vocabulary and understanding of data definitions across departments, thereby facilitating smoother transitions in terminology and processes as institutions advance their Data Management capabilities.

Figure 13 Examples of Glossary Output One

Figure 14 Examples of Glossary Output Two

Definition Management in Organisations

Daan highlights the complexity of managing data definitions across different departments, particularly when multiple versions of a concept are involved, such as in the case of a churn metric requested by the CEO. Marketing and sales often have conflicting definitions due to their distinct processes, which can lead to prolonged disputes over a single data point.

This discrepancy emphasises the need for separate fields to capture varying definitions accurately and reflects the importance of training across the organisation to promote a common understanding of Data Management. Ultimately, the conversation underscores the necessity of recognising and addressing these differences to facilitate better data integration and usage.

Daan then emphasises the importance of clear definitions and adaptable Data Management in organisational contexts, particularly when presenting metrics to executives, such as churn rates across various departments. By implementing methods like SDMX, organisations can maintain historical data versions, allowing for effective comparisons even when regulatory definitions change.

This flexibility enables regulated entities to navigate shifts in reporting requirements while fostering innovation in regulatory practices. Additionally, annual financial reports illustrate how mergers and acquisitions can alter definitions, underscoring the need for adaptability in data interpretation and presentation.

Implementation and Benefits of SDMX in Small Teams and Institutions

The implementation of SDMX (Statistical Data and Metadata eXchange) by a small core team of three to four members demonstrates that significant benefits can be achieved in automating workflows and utilising public domain data, even within a limited budget. This approach enhances data discoverability and accessibility, enabling programmatic access to data that can be read by machines. Users can filter and visualise data, apply seasonal adjustments, and explore Metadata, fostering collaboration and enabling staff to leverage both internal and external data sources through a centralised platform. Ultimately, this facilitates the creation of glossaries and supports more effective data sharing across teams, making the process even more efficient for smaller institutions.

SDMX offers a Microsoft-approved add-in and an API with R and Python packages for programmatic data access, enhanced by informative video links in the accompanying slide pack. Aiden has been instrumental in providing valuable Metadata linked to public domain data, which is often inadequately described by providers, with detailed structures available on the website. For those without an economics background, visual representations, such as the National Treasury’s debt-to-GDP projections, illustrate the importance of tracking data revisions year-on-year, which is crucial for quality assurance. The system allows integration of both third-party and proprietary data in a model-ready format, facilitating automated reporting and analytics, including forecasts and summaries, while supporting diverse tools such as Excel and programming languages for efficient Data Management and downstream processes.

Figure 15 EconData for Data and Forecast Automation

Figure 16 Mature your Data Management Today

Figure 17 Econmetric Data Services for Automating Workflows

Figure 18 Self Service Data Discovery and Quality Assurance

Figure 19 Excel Add-in, API and R & Python Packages

Figure 20 Public Domain Metadata

Figure 21 Assess Historical Data Vintages

Figure 22 Assess Histoical Data Revisions

Figure 23 Harness you Institution’s Data

Better Understanding Data Management with EconData

The implementation of economic data for clients can involve a web platform that enables data stewards to load and validate forecasts. The process can range from manual data loading to comprehensive back-end validation, depending on the desired level of automation and the maturity of the Data Management system. Additionally, an integral part is the use of a knowledge graph, which visually depicts the relationships between various datasets and their corresponding Metadata, facilitating data discovery and documentation.

By linking data through specific identifiers or relationships with data providers, organisations can create a robust data glossary that clarifies data origins and attributes, applicable across different contexts, such as financial and economic data. This flexible, open-source approach enables the customisation of data structures and workflows, thereby enhancing overall Data Management capabilities.

Figure 24 EconData Implementation and Manual Validation

Figure 25 EconData Registry and Knowledge Graphs

Figure 26 EconData Registry Data Structures

Figure 27 EconData Registry Schema

Versatility and Functionality of SDMX in Data Management

Daan outlines the capabilities of the SDMX framework in managing and navigating economic data, particularly through a public example available on the organisation’s website. Additionally, he highlights how users can explore data by various groupings, such as cross-domain concepts or specific providers like the Reserve Bank’s quarterly bulletin.

The framework enables granular definitions and relationships within the data, accommodating diverse data types while standardising information across institutions. Key features include examining data flow frequency, attributes, units of measure, and detailed series information, all of which contribute to enhanced interpretability and flexibility for data managers in creating and utilising schemas.

Figure 28 Example of Glossary Output One

Figure 29 Examples of Glossary Output Two

Figure 30 Examples of Glossary Output Three

Figure 31 Examples of Glossary Output Four

The Importance of Data Management and Automation in Financial Analysis

Daan emphasises the importance of data accessibility and usability from an analyst’s perspective. Analysts prioritise having comprehensive data in a manageable and understandable format, which facilitates their work, whether in Excel or through coding. He highlights the need for tools that enable efficiency and sophistication in analysing data, allowing small teams to automate processes and leverage both open-source and proprietary solutions. EconData offers free access to basic economic data, which is used to manage foreign exchange data and automate forecasting processes utilising various models.

To effectively manage and utilise economic data from various sources for modelling, it is essential to have automated processes in place for forecasting and reporting. Sophisticated models enable the characterisation of forecast uncertainty and the tracking of forecast errors, allowing analysts to understand the origins of errors and learn from them.

By comparing forecasts against authoritative sources, such as the National Treasury, the IMF, and the World Bank’s data, analysts can assess performance and explore critical questions, including the reasons behind fiscal slippage and the accuracy of public forecasts over time. Additionally, building dashboards with open-source tools facilitates the summarisation of forecast errors, allowing for effective data analysis and decision-making, especially in uncertain contexts like public finance in South Africa.

Figure 32 Examples of Glossary Output Five

Figure 33 Become a Super-Analyst

Figure 34 EconData for Workflow Automation

Figure 35 “Scheduled FX Forecasts”

Figure 36 Report Automation

Figure 37 Enabling Detailed Driver Analysis

Figure 38 Enabling Comparisons to Official Forecasts

Figure 39 Automated Forecast Error Tracking of Your Forecasts

Figure 40 Automated Forecast Error Tracking of Official Forecasts

Figure 41 Tracking Judgement/ Model Performance

Application of AI and Data Analysis in Decision Readiness and Data Migration

The importance of enabling decision-ready analytics in the context of AI is emphasised through practical examples. Daan shares that interactive dashboards can provide valuable insights when applied correctly, allowing users to track market share and changes over time using public domain data.

Tools like Excel can enhance forecasting capabilities without requiring users to alter their workflows; for instance, users can select a model from a drop-down menu and input values to analyse potential outcomes. Additionally, these approaches empower analysts to perform more sophisticated tasks seamlessly, fostering a deeper understanding of the data and its implications.

An attendee suggests that while SDMX is often considered suitable only for small datasets due to its use of JSON as an intermediary, it could be leveraged as a more agnostic framework for organising larger datasets, like those in SQL databases. Another attendee shared their feelings on the potential value of the tool for data migrations, especially in the context of mergers and acquisitions. They pointed out that many decisions in such scenarios are made without a clear understanding of data costs, and using this tool could streamline due diligence processes, making them faster and more effective.

Figure 42 Decision-Ready Analytics

Figure 43 Enabling Automated Dashboards for Benchmarking

Figure 44 Interactive Models for Strategic Decisions

Data Management and Collaboration in South Africa

Attendees were encouraged to reach out for further guidance on accessing economic data via CODERA products, particularly through Excel add-ins and APIs. The platform offers a free trial, allowing users to explore public domain data, making it especially suitable for expert users. Additionally, there are opportunities for collaboration, and CODERA is willing to provide demonstrations for institutions seeking to modernise their Data Management practices.

Data Management and Pipeline Implementation in South Africa and the UK

Daan discussed the management of South African data, emphasising the importance of distinguishing between data handling and the overall approach. The team works with both public domain and private client data, implementing data pipelines to enhance compliance with regulatory standards, particularly for forecasting and reporting purposes.

There is potential for setting up operations in the UK, contingent upon specific details and current client needs. Additionally, Daan shares that while they do have international clients and manage various datasets, the focus remains on enhancing Data Management capabilities, especially in a proprietary context.

If you would like to join the discussion, please visit our community platform, the Data Professional Expedition.

Additionally, if you would like to watch the edited video on our YouTube please click here.

If you would like to be a guest speaker on a future webinar, kindly contact Debbie (social@modelwaresystems.com)

Don’t forget to join our exciting LinkedIn and Meetup data communities not to miss out!

Scroll to Top