The difference between a Data Catalogue and a Data Glossary
Data Governance is full of lots of jargon and terminology which can mean different things to different people. It’s all very subjective and this is usually because of the culture within a particular organisation.
The way the various terms are applied within organisations can vary their meaning. And that’s ok – but you should also be wary of it.
This is something I recently discussed in my ‘What is Data Custodianship?’ blog and even more recently than that I’ve noticed a lot of confusion around what a Data Catalogue is and how this differs from a Data/Business Glossary.
It’s important to make sure you fully understand the meaning of the terminologies within the context of the organisation that you are working with so that there are no crossed wires. Don’t make assumptions about the meanings of particular terms – and if you are ever in doubt, then ask.
However, there are some distinct differences between these two things, and I am going to do my best to clear them up for you.
What is a Data Catalogue?
A Data Catalogue is considered a core component of modern data management.
Very simply, a data catalogue uses metadata (data that describes or summarises data) to create a searchable inventory of all that organisation’s data assets. So, a Data Catalogue is a detailed inventory of all the data assets in an organisation, which is designed to help the data professionals within that organisation quickly find the data they need for whatever purpose they may need it for. It’s basically a tool to help you find that needle in your data haystack.
Data Catalogues can evolve with an organisation and over time, the metadata within a Data Catalogue can be enriched and updated to support better data discovery and governance within an organisation.
A data catalogue provides context to enable data analysts, data scientists, data stewards, and other business data consumers to find and understand a relevant dataset for the purpose of extracting business value. Data Catalogues can also support such individuals in acting upon it to realise the true value of the available data.
Functions of a Data Catalogue:
‘Dataset Searching’ – supporting searches for keywords, can also allow a user to check how frequently search results are used
‘Dataset Evaluation’ – allows you to preview datasets to ensure you’re getting the right data you need to analyse (for instance, by previewing the data in question, checking data quality and user ratings, etc) – saves you potentially downloading the wrong data
‘Data Access’ – Data catalogue can aid the process of search to access
What is a Data or Business Glossary?
A Data Glossary is an exhaustive list of all terms used across the company with definitions. It comes back to what I said at the start… organisations use lots of jargon and terminology which can mean different things to different people.
A Data Glossary defines the terminology which organisations use when discussing their processes and their data. It is purpose is to define the business/data terms within the organisation.
The Data Glossary is designed to keep everyone on the same page and using a common vernacular to ensure clarity and consistency between departments. Think of it as a dictionary for a particular organisation (but not a data dictionary – that is another thing entirely – you can read more about what that is and how it differs from a data glossary here.
A Data Glossary does not define the data like a Data Catalogue does. The data glossary defines the terms we use when discussing the data and who owns that data. A Data Catalogue contain more technical metadata to help you find and locate your data.
A Data Glossary and a Data Catalogue are two different things (which can be linked to provide extra value), although both have their place and can be very useful to organisations when implementing data governance.
Don’t forget if you have any questions you’d like covered in future videos or blogs please leave a comment below.