How to Build Resilient Cloud Infrastructure for Modern Businesses

Yossi Levi
6 min readJun 29, 2022

In this age of cloud, where data is accessible from anywhere, the need to secure and build resilient cloud infrastructure is greater than ever before.

Many organizations lack a clear understanding of what cloud resiliency is, how to apply it in their business, and even its importance. But cloud resiliency is critical in this digital age to secure data.

Cloud resiliency essentially refers to the process where businesses get insight into foreseeing possible future disruptions to technology service at a business. It involves planning for business continuity, given its critical role in keeping business data and functions secure online.

Cloud resiliency is an important part of technology infrastructure planning. Depending on the type of business, different solutions are necessary to ensure cloud data remains secure and accessible.

So, how do you build cloud resiliency in your business’ IT infrastructure?

The Right Approach to Building Cloud Resiliency

When it comes to building cloud resilience for your business, you need to ensure that you are employing the correct cloud resiliency strategy right from the beginning.

One of the best approach for building cloud resiliency entails:

  • Understanding the IT resiliency requirements, cloud applications, legacy applications, data dependencies, and target cloud workloads.
  • Identifying the risks and processes for mitigating the risks.
  • Identifying the linkages, interdependencies and synchronization points between legacy and cloud environments.
  • Architecting, implementing, executing, and sustaining the technology required for resiliency and building procedures to build that technology into your infrastructure.
  • Creating validation and testing plans and developing a roadmap for future transitions.

Steps to Build Cloud Resiliency in Your Business’ Infrastructure

Even though building cloud resiliency might seem like a daunting task, planning and implementing a resilient cloud infrastructure is one of the most critical tasks to do for businesses today.

Building cloud resiliency ensures that you are not only planning for business continuity in the event of technology failure, but also how the technology systems will recover with speed and without data loss.

Here is a step-by-step framework that organizations can implement in their business continuity plan to architect and incorporate a highly resilient cloud infrastructure:

Step 1: Assessment and Evaluation

It is difficult for most organizations to keep their entire cloud environments always available. A more realistic approach would be to link cloud resiliency to the business value. This helps create a structured approach that enables the organizations to identify the key business requirements in the purview of resiliency and create realistic resiliency objectives and metrics.

For example, identifying and documenting the correct tiers and ascertaining what will be the recovery point objective (RPO) and recovery time objective (RTO) for each tier, or what business functions should be mapped to a particular tier. This approach will not only act as a pre-assessment for your business’s cloud resiliency needs, but will also provide you with the groundwork before evaluating different cloud features and the associated risks.

Once you understand that, you can then develop a resilient cloud strategy that meets your business’s resiliency requirements. This stage may include the following steps:

  • Gathering the resiliency requirements for cloud workloads
  • Identifying interdependencies and linkages between legacy systems and cloud environments.

Step 2: Planning and Designing

Cloud infrastructure design should be fully integrated, including the server dependencies and abstraction levels. High levels of abstraction layers in the cloud can create challenges in identifying dependencies. But this can be rectified using management tools that handle cloud provisioning, automation, backup and replication, and monitoring and reporting.

These tools will provide an understanding of application-level interdependencies and workload distribution between both cloud and legacy systems.

Only after understanding the functional and non-functional requirements of cloud resiliency can organizations move into the planning and designing phase, which includes:

  • Identifying the resiliency requirements for different cloud workload groups.
  • Understanding the recovery times and recovery point objectives.
  • Applying the best practices for cloud resiliency when deciding on cloud infrastructures and delivery models.

Step 3: Implementation and Testing

Cloud resiliency testing isn’t simply a matter of provisioning and performing the tests by activating a cloud infrastructure and dumping data to the cloud. Organizations should perform IP address management, jurisdictional boundaries over data, and a trail of evidence that shows the fact that tests were performed and that the tests actually produced the expected results.

For implementation and testing, end goals should always be kept in mind for realistic tests and results that assure that your organization can run the business in an alternate environment in case of a major service disruption.

With that information, you can create coherent cloud resiliency plans and procedures, which may include:

  • Provisioning the target environment and ingraining resilience by automating virtual or physical servers.
  • Building fault tolerance and high availability into the cloud environment and configuring and testing it for resilience.
  • Retaining the evidence of the tests and auditing the outcomes for reporting purposes.
  • Creating validation and testing planning that may include planned failures to test the components in real-time and ascertaining resiliency of services.

Step 4: Manage and Sustain

Cloud is agile because of its speed and flexibility. However, this comes with a high degree of volatility. Control processes, robust management, and governance are needed to keep the resilient functions synchronized with production.

Another important thing to mention here is reporting and monitoring, which is an ongoing and operational process that executives need to develop gradually and continuously.

Monitoring and reporting may include:

  • Designing and updating the framework, monitoring and risk reporting.
  • Developing a resilient transitional roadmap for sustaining cloud resiliency.
  • Creating and maintaining the appropriate check and balances for:

Ø Cloud Vendor Stability and Reputation

Ø Application and process readiness

Ø Checking the ease and mobility of data migration

Ø Governance, risk and compliance

Tips and Measures to Strengthen Cloud Resiliency

Some tips to keep in mind and options you can use to strengthen your business’ cloud resiliency include:

1. Using Immutable Backups (WORM)

WORM stands for “write once read many.” Thus Immutable or WORM volumes are repositories that cannot be altered once they are written. This means that even if an attacker was able to compromise your cloud infrastructure and gain access to your data, they would still not be able to alter it.

Immutable backups are target backup volumes that prevent modifications like editing, overwriting, and deletion for a time defined by the user. Such immutable backup repositories are used to archive sensitive data, prevent malicious encryption, facilitate compliance and security requirements, ransomware protection and comply with cyber insurance.

Deploying WORM volumes do not require major infrastructural changes and has a low operational cost.

2. Air gapped Backups

Air gapped backup to the cloud is the next frontier in enterprise data protection. It is often a standard compliance requirement for financial services companies that need to meet regulatory requirements around audit trails and legal discovery.

An air-gapped backup is a backup that is not connected to any device that connects to the internet. An air gap backup is achieved by isolating the backup systems from any network connections and often involves placing them in a distant location from the primary systems.

Cloud air gapped backups aims to create a gap between your cloud assets and anything that connects them to the internet. This makes it harder for attackers to gain access to your data because they would have to physically break into your facility to get access.

Air gapped backups also provide businesses with extra protection against ransomware attacks. If you’re infected with ransomware and locked out of your computer, having a secondary offline backup means that you can still recover your files without paying any ransom money!

3. AES 256-bit Encryption and SSL Tunneling

Modern cloud vendors provide various encryption levels for data at rest and in transit. AES 256-bit encryption is the highest standard of data encryption at rest, while SSL tunnelling is used for encrypting data in transit.

With this level of encryption, it is almost impossible for hackers to steal your data or hack your key even with brute force attacks since the 256-bit length key takes an impossible amount of time to get decrypted.

Conclusion

Cloud resiliency begins with strategically aligning your business resiliency objectives with planning and execution that strengthens your disaster recovery (DR) program.

Organizations must plan to handle known vulnerabilities and unknown threats. Complimentary resilience measures like disaster recovery as a service (DRaaS) woven in your DR plan is also a plus.

Importantly, remember that cloud resiliency is an exercise that must be regularly hardened to create any lasting fruitful results, and it demands iterative treatment.

--

--

Yossi Levi
0 Followers

Yossi Levi is a senior Content Writer at Stonefly for 10+ years with an expertise in blogging & writing creative copies around Cyber Security, Cloud Storage etc