Continuing the Hybrid Platforms Trends Series on Cyber Resilience, we are going to look at what Data Observability is and why it’s so important as we move into 2024 and beyond.Â
Data growth is becoming a massive challenge. The Rubrik Zero labs report suggests that, "a typical organisation’s data volume will triple in the next five years and require 545 BETB (back-end terabyte) to secure if growth rates hold steady." This not only brings cost and capacity issues, but also poses the question of how and what should we keep, secure and protect. Â
A typical organisation likely contains enough sensitive data to max out any/all financial penalties. As we see attackers increasingly rely on double-extortion or triple-extortion (without encryption) to simplify ransomware attacks, it is going to be more important than ever to understand what data we have, to inform decisions and investments. Â
What Is Data Observability?
There are two fields of data observability, those that focus on the management and health of your production data and those that focus on leveraging the insights that can be uncovered in backup data. For the first type, IBM defines Data Observability as, "the practice of monitoring, managing and maintaining data in a way that ensures its quality, availability and reliability across various processes, systems and pipelines within an organisation". Data protection vendors would likely refer to this as data discovery or sensitive data monitoring; looking at your backup data to find exposure risk and provide visibility into vulnerable data. If your interest is just to help with understanding exposure, leveraging tooling building into data protection suites is a good route to take. If you need a wider data strategy to underpin wider initiatives, then managing the actual data is a better approach. Â
For this article, we are going to focus on the general concept of having a deep understanding of your data, as data visibility creates decision-making capabilities that can start to mitigate challenges around capacity, relevance, and costs. I will do a future session on data observability as a function for data teams, where it is used for business intelligence, developing machine learning models or driving innovation.Â
Let’s have a look at some of the reasons why data observability is so important in the context of your cyber resilience strategy.  Â
Sensitive Data Classification
Estimates suggest that 1 of every 38 files contains sensitive data, so how do you make decisions on which data to protect, keep, wrap that extra security around, keep, delete or anonymise? Most organisations likely have a good handle on the core business data that is known to contain sensitive records, HR, CRM systems, and core data repositories, but what about the rest? Data is copied, manipulated, moved, and transformed into many different formats as people attempt to extract the value from it needed to complete a specific function. In this process, it can be very hard to keep track of that sensitive information and be assured it always remains behind the defined levels of protection. We only need to look at incidents like the freedom of information request to the Northern Irish police to see how easy it can be for data to escape. Â
A strong data observability practice can first identify all data and then ensure it is classified, allowing for the orchestration of the future state. With this level of visibility, we can start to make informed decisions on the future of all data, what needs to be kept and folded into data lakehouse for business intelligence, what needs to be archived for compliance reasons, and most importantly, what can be deleted so it’s no longer a risk.Â
Understanding Exposure – the Blast Radius
Triple-extortion ransomware attacks, where associates (such as partners or customers) are threatened with data leakage are becoming very common. Even simpler double-extortion tactics (steal and threaten to release) rely on one factor: knowing the value of the data that was stolen. If the attacker knows what they have stolen and you don’t, how do you make an informed decision on how to interact with the bad guys? If you have a strong data observability practice you can get a clear picture of which files and applications were affected. Couple this with forensic analysis and you will know which system had data exfiltrated and if the data on those systems was of concern or not. If it turns out to be sensitive PI data or core IP, then your dealing with the ransomware demands will be very different. Â
Additionally, with strong data observability you can remediate faster. With critical data access intelligence you will know what and when; allowing for data-driven decisions on what to recover and in what order. Reducing the recovery time will also reduce the need to possibly pay whilst also decreasing reputational impact. Â
User Access
Another problem with this explosion of data and proliferation of use cases is managing access. With credential-stealing attacks so prevalent, it becomes critical to ensure a least-privilege approach Is adopted; something that is increasingly difficult as unstructured data explodes. A strong data observability practice will reduce exposure risk by ensuring only qualified users have access to sensitive data, whilst alerting compliance teams to data that appears in repositories that it should not be in. These areas will need to also include people and processes to be effective; technology can assist but will not be the final solution.Summary
Join me in part three as we explore how cyber recovery is not the same as disaster recovery.