Data Product Design – the new frontier designing Data Products at ANZ
Louis Jasek talks about how we’re starting to treat data as a product and the lessons we've learnt designing data products.
Our journey to designing Data Products
For over a decade I’ve been working in different roles on data projects across multiple industries. I started my career working on a large data warehouse implementation at an Australian telco, and have since worked on data warehouses, big data platforms and more recently cloud data platforms.
Currently, at ANZ bank, I’m part of the leadership driving the change towards treating data as a product. Over the last 12 months, we’ve been helping teams across the bank design and build data products. Through this work we’ve learnt about effective data product design. We’ve recognised how the skills and experiences we’ve learnt over the last decade still apply, as well as new concepts we are now beginning to uncover.
What is a Data Product at ANZ?
At ANZ, a data product is data that has been described using our data product specification document, undergone verification, and ultimately published to our enterprise catalogue. It includes the data and all the information someone needs to use it.
The teams who manage and understand the data are responsible for sharing it. To maximise re-use, we encourage them to share it across our hybrid multi-cloud environment rather than tying it to a single platform or technology.
We categorise our data products into two broad categories – source-aligned and consumer-aligned.
A source-aligned data product is built by a team on behalf of the data owner and is tied to the original source application or business process. This data tends to look close to what it does in the sources system with some light curation applied.
Consumer-aligned data products are designed to meet more specific needs of users and are built by combining other data products. This data tends to be modelled to align more to common business terms and concepts, and relatively agnostic of source system
Figure 1: Shows a consumer-aligned data product created from source-aligned data products.
The similarities between data Product Design and Data Modelling
When I first started my career, my mentors taught me the principles of good data modelling – the process of creating a representation or specification of a system’s data and its relationships. I devoured textbooks written by gurus like Ralph Kimball, Bill Inmon and Graeme Simsion, and applied the concepts I learnt at work. I designed some poor data models, but improved over time, learning that data modelling is an art that takes practice.
Now designing data products, we are heavily drawing on this data modelling experience. Most of the fundamental data modelling concepts we learnt back then still apply. We’re answering the same questions (just replace ‘data product’ with ‘table or view’): How many data products should I have? How should they relate to each other? What attributes should my data product include? Should my data products follow a business domain model or resemble the source system?
The answer remains the same – ‘it depends.’ You create a data model by weighing up different factors like who is using the data, how they are using it, the business process in which the data was created or captured, the standards to follow, and the technology available to store it in, etc. Designing your data products involves the same considerations, and there is no one-size-fits-all answer.
A new consideration – designing for responsibility
We are however seeing one big change to our design approach. When a team publishes their data product to our enterprise catalogue, they need to list the people responsible for it – the data owners, data product steward, key contacts etc. This means as we design data products, we now also ask the question: ‘who should be responsible for the data during its lifecycle?’ For example, we consider who is responsible for a particular data attribute and ensure the source of truth resides authoritatively in their data product not ours. If a user wants something in one of our data products, we first think about whether we should be responsible for it or someone else. Counter-intuitively this sometimes means doing less for users, not more.
I was recently part of a talented ANZ team that designed some customer complaints data products. Better understanding our customers’ complaints is important to improve customer experiences. The cross-functional team consisted of people across our Salesforce Centre of Excellence (COE), and our data and analytics teams in Australia Retail and Group Risk. Together we designed a series of data products, where each team was responsible for different parts of the data through its lifecycle. Designing this way, we made sure the cross functional team validated each team’s contribution to the whole:
Salesforce COE: Responsible for the source-aligned data products from Salesforce to make the complaints data available. The data products are well described and in a format that is easy to use.
Retail Data and Analytics Team: Responsible for adding more business context, business rules and commonly used complaints metrics, then repackaging as new consumer-aligned data products.
Group Risk Data and Analytics Team: Responsible for using the data products to calculate their own risk metrics to use in their risk models. They will also create their own data products consisting of these metrics.
Figure 2: A simplified depiction of the responsibilities of each team and how they build upon each-others work
During this design work, we realised that we needed to be focused on clarifying who is responsible for what as a key consideration. In my experience, this is something that hasn’t normally been as big of a concern during data design.
The benefits of clear responsibilities in Data Product Design
Reflecting on my experience building data platforms, I’ve seen projects fail when the people building and running the data aren’t communicating with those using and creating it.
When the teams building data platforms ignore user needs, they build complex, all-encompassing data models. Users then struggle to use the data as intended and revert to their silos and old methods. Similarly, when these teams are disconnected from the business processes and source systems that create the data, they fail to improve data quality. Issues with the data get fixed in the data platform rather than at source, which is unsustainable. Throughout my career I have seen increased duplication, costs and lower quality data all because of this type of behaviour.
Designing with clear responsibilities in mind, as we did for the customer complaints data products, addresses these problems. We ensure the right people maintain the data they are responsible for, fostering communication and collaboration between all teams involved. Users know who to contact for changes, and data quality issues are addressed by the appropriate people. This approach creates a positive feedback loop, with everyone working together to improve the data. It may be harder to setup in the short term, but we believe it will make a significant, positive difference to our data environment in the long term.
Final thoughts
Our work at ANZ has shown us that as we continue to navigate the ever-changing landscape of data, one thing remans clear: the principles of data modelling are timeless and as important as ever.
However, it’s crucial to be open to change and embrace new techniques and approaches, such as taking into consideration who is responsible for the data during the design process. I believe this is more than a trend, it is a natural evolution for the data practitioner’s toolkit and essential for unlocking your organisation’s full data potential.
If you’re a data practitioner wanting to learn more about designing data products, we’d recommend brushing up on your data modelling fundamentals. Books like The Definitive Guide to Dimensional Modelling by Ralph Kimball or Data Modelling Essentials by Graeme Simsion have been a great help to me over my career.
If you’re leading a data team, we encourage you to invest in your team’s data modelling skills. I also encourage you to try some newer concepts, like those we outline in this article, to see how they work in your organisation.
Louis Jasek is the Data Solutions and Engineering Lead at ANZ. He is an AWS Certified Cloud Practitioner and a Google Cloud Certified Cloud Digital Leader. He’s also an accomplished fly fisherman.
This article contains general information only – it does not take into account your personal needs, financial circumstances and objectives, it does not constitute any offer or inducement to acquire products and services or is not an endorsement of any products and services. Any opinions or views expressed in the article may not necessarily be the opinions or views of the ANZ Group, and to the maximum extent permitted by law, the ANZ Group makes no representation and gives no warranty as to the accuracy, currency or completeness of any information contained.