Disaster Recovery with IBM Spectrum Scale & Active File Management (AFM-DR) in AWS

by | Nov 27, 2018

Spectrum Scale is a IBM’s clustered filesystem which provides extreme scalability, high performance and reliability for HPC workloads. Its widely used by major financial, weather forecasting and genomic research organizations across the world. IBM Spectrum Scale Active File Management (AFM) offers filesystem caching and synchronization across multiple Spectrum Scale clusters located at different geographic locations. […]

Spectrum Scale is a IBM’s clustered filesystem which provides extreme scalability, high performance and reliability for HPC workloads. Its widely used by major financial, weather forecasting and genomic research organizations across the world.
IBM Spectrum Scale Active File Management (AFM) offers filesystem caching and synchronization across multiple Spectrum Scale clusters located at different geographic locations. It also facilitates data migration from non-Spectrum Scale filesystems to Spectrum Scale filesystem. It masks WAN latency by doing asynchronous data transfers across the network. AFM can use either NFS or Spectrum Scale native NSD protocols for the data transfers.
As an IBM Business Partner, The ATS Group provides an end-to-end IBM Spectrum Scale cluster solutions architecture, implementation and managed support services to our Financial and Genomic Research customers.
One of our customers uses medium scale IBM Spectrum Scale – SAS Grid cluster to store data for their economic, financial and strategic consulting services. They are currently investigating available options of replicating their production data stored in Spectrum Scale filesystem to a different geographic location as part of their Disaster Recovery (DR) strategy. They want to use their DR site primarily as a data repository without any compute or application servers. They  also want to leverage a cloud based solution to minimize their capital and operating expenses.
We at The ATS Group, work with our customers to clearly understand their exact business needs and always provide innovative and cost effective solutions. So naturally, we were asked by our customer to do a Proof of Concept to implement an IBM Spectrum Scale AFM-DR solution between an on-prim IBM Spectrum Scale cluster and off-prem Amazon AWS cloud based cluster.  We are aware that IBM does not yet support  AFM-DR as cloud based DR solution, but is expected to be supported by IBM very soon.
This document briefly describes the results the AFM-DR Proof of Concept completed by The ATS Group.

Key Features of IBM Spectrum Scale AFM-DR

AFM-DR is a AFM based fileset level disaster recovery solution between two IBM Spectrum Scale clusters. Keep in mind that AFM-DR does not support filesystem replication as a whole and replication has to be setup for each independent fileset within the filesystem. AFM-DR also does not support replication between dependent filesets. The independent filesets in the two IBM Spectrum Scale clusters will have a one-to-one, active-passive relationship with fileset in one cluster being primary (production) and the fileset in the other cluster being secondary (DR). The primary and secondary clusters can have different types of servers and storage independent of each other as AFM-DR is server and storage agnostic.
AFM-DR replicates file data, metadata with ACLs all extended attributes between primary and secondary clusters. However it does not replicate User, Group, Fileset Quotas and number of data and metadata copies configured for filesystem.
You can optionally establish a Recovery Point Objective (RPO) based snapshots for data synchronization for AFM-DR. While applications can read and write to the primary fileset, the data in the secondary cluster is read-only until you do manual a fail-over in case of DR situation. You can fail-back to primary after the production environment has been restored.
A detailed discussion on AFM-DR features and implementation is out of scope of this document and the reader is advised to refer to IBM Knowledge Center.

AFM and AFM-DR Use Cases

IBM Spectrum Scale offers a synchronous replication to replicate data bet een two locations as part of Disaster Recovery solution, but it can’t mask or tolerate WAN latency and has distance limitations between sites.  Spectrum Scale AFM overcomes this limitation by asynchronously transferring data across Spectrum Scale clusters separated by large distances. It also can serve as a data caching mechanism between two clusters in addition to full data replication. AFM has multiple options for data caching for different business use cases such as:

  • partially caching enterprise data between spectrum scale clusters located at head office and branch offices.
  • data migration from a non-Spectrum Scale filesystem to IBM Spectrum Scale filesystem.
  • consolidating data from different sources into a central IBM Spectrum Scale filesystem.
  • AFM-DR provides specific features to implement AFM as a Disaster Recovery solution with RPO based snapshots, fail-over and fail-back capabilities.

AFM-DR Requirements and Limitations

Please note that all customers are advised to contact IBM and get an approval before implementing a AFM-DR as disaster recovery solution.

  • AFM-DR can be configured only for independent filesets.
  • Does not replicate User, Group and Fileset quotas.
  • Can only be configured between two IBM Spectrum Scale Clusters.
  • Needs one or more IBM Spectrum Scale nodes defined as AFM gateways. IBM recommends not you use Protocol Nodes or NSD Servers as AFM Gateway nodes for better performance.
  • Does not support FPO enabled clusters.

Proof of Concept Environment

On-Prim ATS Spectrum Scale Cluster

  • 2 x IBM Power-8 S822 servers
  • 1 Gbps network
  • IBM V7000 Storage

AWS – Spectrum Scale Cluster

Proof of Concept Objectives

  • Setup an on-prim ATS Spectrum Scale cluster on IBM Power-8 LPARs located in the ATS Innovation Center.
  • Setup an off-prem IBM Spectrum Scale cluster on AWS EC-2 instances.
  • Create Test IBM Spectrum Scale filesystem with independent filesets on both filesystems.
  • Configure AFM-DR with on-prim cluster as Primary and AWS IBM Spectrum Scale cluster as Secondary.
  • Test Data replication between the clusters.
  • Test a AFM-DR fail-over and fail-back process.

AFM-DR Configuration

This document will describe the process followed specifically to configure AFM-DR between the two clusters.
Both the ATS and AWS Spectrum Scale clusters are configured using standard IBM Spectrum Scale configuration documented in IBM knowledge center. This document does not intend to describe the basic Spectrum Scale cluster configuration, NSD and filesystem creation process as it is out of scope of this document.
The ATS and AWS cluster details are shown in the screenshots given below:
ATS GPFS Cluster (On-Prim):

AWS GPFS Cluster (Cloud):

ATS GPFS Version:

AWS GPFS Version:

ATS GPFS Cluster NSDs:

It was required to create a custom /var/mmfs/etc/nsddevices for IBM Spectrum Scale to properly detect AWS – EBS volumes as NSDs:

Creating Custom NSD Discovery File:


ATS GPFS Cluster Filesystem:

AWS Filesystem:

Creating New Independent Filesets for AFM DR

Keep in mind that AFM-DR is not supported on dependent filesets. So, you need create independent filesets in order to configure AFM-DR. Below screenshots show  the commands used to create independent filesets in ATS and AWS Spectrum Scale cluster filesystems
Creating a Primary fileset in ATS Cluster:

Linking Primary Fileset:

Check the AFM Primary Fileset Information:

Creating Secondary filesets in AWS Cluster:

Linking Secondary Fileset:

Enabling NFS on AWS Secondary Cluster:

Starting NFS:

Now let’s create some test data in the primary ATS cluster:

Check AFM-DR status on Primary:

Make sure AFM-DR mounted the Secondary fileset using NFS:

Check the AFM Status again on Primary after few min and make sure its active:

Make sure the data is getting copied from Primary to Secondary.
ATS – Primary cluster:

AWS – Secondary Cluster:

Verify that the data in Secondary – AWS cluster is read-only until a manual failover is initiated in case of DR situation:

Now do manual Failover from Primary to secondary in case of DR situation:

Make sure DR filesystem is writable for applications:

Summary and Conclusion

AFM-DR presents a unique solution for customers who want to replicate their business critical data to an off-site location as part of their DR strategy. Since AFM-DR is a server and storage agnostic solution, customers have many options to choose from for their DR environment based on their IT budget and business requirements. As many of the enterprises are leveraging cloud based solutions to minimize capital and operating expenses, implementing a cloud based data replication solutions are picking up high demand. Though IBM Spectrum Scale AFM-DR solution is not formally supported by IBM at present, IBM Spectrum Scale development teams are working to support this solution very soon. This Proof of Concept completed by The ATS Group proves that AFM-DR solution can be implemented  successfully between on-prem and AWS Spectrum Scale clusters as a DR solution for customers to leverage cloud based services. Keep in mind that though the basic data replication process using AFM-DR works, there are many limitations that need to be addressed before implementing this solution in a production environment.

About ATS

As new tech emerges offering business advantages, enterprises need support and expertise that will enable them to reap the benefits. Based near Philadelphia, the ATS Group offers agile services aligned with modern IT innovations, providing a critical competitive edge. For almost 20 years, our consultants have worked together to provide independent and objective technical advice, creative infrastructure consulting and managed support services for organizations of all sizes. Our specialist help clients store, protect and manage their data, while optimizing performance and efficiency. The ATS Group specializing in server and storage system integration, containerized workloads, high performance computing (HPC), software defined infrastructure, devops, data protection and storage management, cloud consulting, infrastructure performance management and real-time monitoring for cloud, on-premises and hybrid solutions. The ATS Group supports solutions from today’s top IT vendors including IBM, VMware, Oracle, AWS, Microsoft, Cisco, Lenovo, Pure Storage and Red Hat. www.theatsgroup.com

Related Articles