Effective Metadata Management in Five Steps

Effective Metadata Management in Five Steps

by Grant
October 27, 2020
How To Tree Schema 101

Metadata Value is Driven by the Growth of Data

The past decade was the beginning of explosive growth in data creation and data capture. This was the era of data and it was ushered in by the proliferation of cloud processing which led to extremely cheap storage, digitally connected IOT sensors that generate rich event streams and advancements in open source products that made distributed processing both ubiquitous and easy to use. If the previous ten years changed the way the world creates data then the next ten years will change the way in which we use data and nothing will be as instrumental in allowing companies to effectively use their data as metadata management.

Metadata is Now Imperative

There are several key factors in the broader industry landscape that continue to push metadata management as an imperative in the data strategy for many organizations. Regulation is slowly starting to catch up with both the value and risk that comes with using data. We’ve already seen the GDPR and CCPA place strict requirements on how sensitive data is saved and who it can be shared with; proper data governance is no longer something only large enterprises need to have in order to meet external compliance protocols. In addition, simply having a lot of data is not a strategic advantage by itself. Companies are spending more time to curate and publish data sets internally (and externally) to enable more reuse of data products. Data excellence is now measured by how reliable your data is, how quickly your users can understand and leverage data assets, and how easy it is for others to replicate and verify results.

As it pertains to metadata management, all of these elements can be boiled down to a single question: how well do you really know your data? This is, at its core, this is what metadata management is all about - understanding your data. Billions of dollars will be spent each year to help companies manage their data better. That’s billions of dollars not spent on a company’s core focus but on an ancillary product, an enabler, on metadata.

What needs to be done to manage metadata well?

There are five things every organization needs to do in order to excel at metadata management:

1. Enable self-service knowledge

This is all about allowing your teammates to be able to understand the data quickly and efficiently without having to utilize time from other resources. There are numerous studies showing the impact interruptions have on productivity and simply eliminating 50-80% of those interruptions will provide significant value to your organization. The real value to self-service knowledge is that it knowledge transferred is repeatable. Human to human interactions are rarely repeatable and therefore it is easy for a user to forget to neglect to mention the importance of a specific field or to forget why exactly there are six date fields in a given table. Self-service metadata access is not about replacing all interactions but rather those that are frequent and repeatable.

To understand the importance of self-service for metadata management you need to put yourself into the shoes of a data user.

Let’s consider a scenario where you are new to a team and you are tasked with developing a model or creating a dashboard. The very first set of questions you will ask is: what data is available? How do I access it? What do these values mean? How often is this data null? Where does it come from? The list goes on and on. The overwhelming majority of these questions are repetitive - the same question is asked by many users for many different types of data assets - and the questions are not strategic to your company’s goals. By allowing your team to capture this information one time and allow all subsequent users to reuse that documented knowledge, tremendous amounts of time will be saved for your data experts who would otherwise need to spend hours and hours getting their teammates up to speed.

2. Surface unknown insights

One of the biggest challenges when interacting with data is uncovering the unknown. You cannot understand your data if you do not know that it exists.

There is serious risk to a company when analytical users do not know some data set exists and they set out to recreate it or when data engineers are unaware that the pipeline they are working on replicates another ELT process. This is the risk of duplication and it is one of the greatest risks because the incremental value generated for an organization is close to zero, the time spent by the analyst or engineer is wasted and the output can cause swirl for how users interpret the results if the results do not perfectly align with the existing data asset.

A good metadata management tool will provide asset recommendations to users with a healthy dosage of serendipity. Many organizations have hundreds, if not thousands or tens of thousands of tables, schemas, and files that are used in daily processing. When considering recommendations to users the metadata management tool should take into account past user interactions with data, proximity of other users, and progression in data usage patterns. Surfacing this level of information not only helps to prevent the risk of duplication but also provides a structured way for new users to identify what they should learn next.

3. Integrate into your ecosystem

In order to have self-service metadata access there first must be some metadata for your team to access. Given that manually capturing metadata will be tedious, error prone and cause your team to burn out quickly on metadata management, the only effective way to capture your metadata is with a tool that automates a large portion of the metadata capture. There are generally two approaches to this: A) the tool that is able to read your existing metadata where it currently resides, in your databases, or B) the tool provides APIs that enable systems to send it their metadata. Some tools, such as Tree Schema are able to do both.

When looking for the tool that works for your team, keep in mind what integration method will work best for you. Will the tool support all of the different types of data that you have (e.g. SQL vs. noSQL)? Will the tool be able to evolve and grow with your data ecosystem? The tool that you choose will not be able to make your data culture thrive by itself but the wrong tool can certainly become a barrier that prevents your data culture from flourishing.

4. Ensure quality with governance

Metadata is only valuable if it is high quality and if it is fresh. In regards to metadata management, data governance is all about ensuring that your metadata management tool is kept up to date and that there is a sufficient level of detail provided in the documentation to truly enable self-service access.

Governance is most powerful when coupled with data stewards - individuals that are directly responsible for the quality, documentation and overall data ownership. There will always be times that your data users need to talk to a person to ask a unique question or to gain additional clarity about the data; this is where data stewards come in. Data stewards are the front-line support group that enables your data users to gain that elevated level of knowledge that does not yet exist in your metadata management system.

One of the ways that data governance manifests itself to create value for an organization is how it enables compliance with data protection and data privacy laws. A common example here is that data stewards are often tasked to ensure that data assets are properly tagged as personally identifiable information (PII) or non-public information (NPI). When an external organization or governing body requests a data audit, an organization is able to systematically generate the required report within minutes, saving countless hours and days of work to manually capture the required metadata at the last moment.

5. Remember that metadata management is not a core function of your business

Since metadata management does not drive your business - it only helps the users in your data ecosystem to use data more effectively - you will want to find the balance between cost, time and capabilities that works best for you. Finding the right tradeoff can be difficult but there are ways to try and measure the value that a metadata management tool brings to your organization. Perhaps the most naive, yet effective, approach would be to consider the approximate amount of time that each of your data users spends researching data or being interrupted by others and to consider how much value that wasted time has.

In reality, the value that a metadata management tool brings to an organization is often much higher than this basic approach. Long-term value from metadata generally comes in the form of how it helps to curate your data culture. Your data culture defines how your teams handle data, document it, share knowledge about the data, and even how efficiently they are able to build high quality data products. When your metadata is readily available and you have thorough documentation in place it helps to reinforce your data culture which in turn helps your company sustain a long-term data advantage.

Closing Thoughts

Your team’s culture and architecture will have a tremendous impact on the way in which you interpret these five principles. Achieving excellence with metadata management takes a combination of people, processes and technology all working cohesively and the output from this combination is always for each company. Whether you are a startup of one or a Fortune 500 company, you can unlock more value from your data by instituting these five tenants.

Share this article:

Like this article? Get great articles direct to your inbox