Do you want a successful recovery test, or a successful recovery?
In our fourth of five articles providing insights into the importance of building and maintaining digital systems that are resilient to adverse events, we focus on testing and recovery from cyber events. Most companies spend a lot of effort, money, and time developing a resilient operational environment. And for good reason. IT professionals know all too well that unplanned downtime can damage reputations and cost money – lots of it!
But no matter how robust the architecture, how diligent the management processes are, or how skilled and dedicated the organisation, sooner or later there will be a disruption – or the serious threat of one. How we prepare to react and recover when such a disruption or threat occurs can make the difference between thriving and surviving as a company.
Scheduled tests are typically not representative
“No problem,” you might say. “We test our recovery plan annually.” That’s very good, but how and what do you test? Many companies spend weeks preparing for an annual test to ensure everything is technically in order. Support staff, vendors and others are all on board and ready to help. The big day comes, management declares a “disaster” (usually on a weekend during daytime hours), and the test begins.
Results are usually good. Technical staff are happy, management is relieved, and the board and the regulators are updated with the results of another successful test. But scheduled tests are not reasonably representative of what actually happens in an adverse cyber-incident.
How to prepare for real-world cyber-events
Cyber recovery and cyber recovery testing are quite different from other types of disaster recovery and operational recovery testing. Preparation is arguably more important than execution – without proper preparation, successful execution is nearly impossible, unless you get lucky. If not, damages can be significant.
What are best practices for preparing for real world cyber events? One answer is to test for much more than technical recovery and restoration processes. For example, how many of these questions can you answer without time-consuming research and investigation:
- Where and how do we find the most recent clean copy of our critical data?
- How do we activate our cyber-recovery-specific environment (if we have one)?
- How do we ensure our restoration environment is clean?
- How do we make sure our recovery environment does not get reinfected?
- How do we test our management and executive processes – for example, who is authorised for the “make or break” decisions during the recovery?
- How do we operate our business after recovery of critical operations?
- How do we communicate with our employees, customers, competitors, regulators, the media?
That is not an exhaustive list of questions. But – and perhaps most importantly – let’s also ask: How do we sustain the business with only our most critical services and systems operable? Do we even KNOW what our most critical services are and the IT resources are required to support them?
Regardless of the type of incident, the keys to successful recovery from a real event are preparation and practice of a specialised nature.
Preparation means that not only do we have a written and validated plan for recovery – it also means we keep our intended recovery environment and support personnel in a perpetual state of readiness. Changes to the production environment are reflected in the recovery environment; changes to the support team roles or personnel are documented, and new people are educated on their roles.
Spur of the moment practice
Practice means we rehearse and perform recovery actions periodically, but not just under carefully contrived, artificial conditions. Practice your recovery plan without involvement of some of your key people, and without allowing updates to the recovery environment ahead of time. Practice on the spur of the moment – without management declaring a “disaster” – and without prior scheduling. Maybe you want to wait to do this until you’re confident of your recovery readiness, but certainly you can run a detailed paper exercise without forewarning. In all cases, try to answer the hardest questions.
Having a heritage in backup and recovery for over 50 years, Kyndryl has helped hundreds of clients with the planning and execution of recovery tests, as well as the actual recovery from actual disruptions. We offer security and resiliency services to work with you to prepare for, practice, and even participate in actual recoveries.