Day: August 2, 2018

Point of View: IBM Spectrum Scale 5.0 and Above – New Features and Functionality

Point of View: IBM Spectrum Scale 5.0 and Above - New Features and Functionality

By Prasad Surampudi | Sr. Systems Engineer (The ATS Group)

Up until a few years ago, only performance, reliability and ease of administration were considered major factors when selecting a clustered file system for enterprise data storage. But as cloud-based systems and services are gaining more attention, companies are looking for a complete data management solution that can leverage cloud services to cost effectively scale and support explosive growth of data.

Today’s clustered file systems must be highly scalable across different server architectures, whether they reside on-premises and/or off-premises. They also need to be highly available with minimal down time during maintenance and should be able to scale across multiple tiers of storage. The file system also should be able archive legacy and less frequently accessed data to cost effective storage. The file system should protect, secure, and encrypt the data to prevent unauthorized access and should be able to provide the highest granular level of access in terms of ACLs.

Apart from the above, the file system also should provide consistent performance in terms of throughput and IOPs to a wide variety of data ranging from small files to very large data sets used by various big data and analytical applications. Also, it should support various protocols to access data like POSIX, NFS, CIFS and Object.

IBM Spectrum Scale is a clustered file system that meets all the above requirements. Spectrum Scale is a highly scalable, secure and high-performance file system for large scale enterprise data storage. It is widely used in several industries including financial analytics, healthcare, weather forecasting, genomics and many other industries across the world.

Spectrum Scale has a long history of more than twenty years of development since 1998. Earlier versions of Spectrum Scale were known as General Parallel File System (GPFS). IBM rebranded its General Parallel File System as Spectrum Scale starting with version 4.1.1.

IBM added many new features and functions with the goal of delivering a complete software-defined storage solution rather than being just a clustered file system that is shared across several nodes. In 2017, IBM released Spectrum Scale 5.0 after a significant development effort to achieve performance and reliability requirements set forth from the US Department of Energy’s CORAL supercomputing project.

The purpose of this document is to briefly discuss some of the new exciting features and functions of Spectrum Scale 5.0 and understand how they can be leveraged to meet today’s demanding business requirements.

New Features and Functionality

Let’s look at some of the new features of Spectrum Scale 5.0. They have been categorized based on the Spectrum Scale function group.

Core GPFS functionality Changes

Variable Sub-Block Size
Earlier versions of Spectrum Scale (GPFS) have a fixed 32 sub-blocks per single file system block size. With Spectrum Scale 5.0 and above the number of sub-blocks depends on the file system block size chosen.

File System Block Size Number of Sub-blocks Sub-block size
64KiB/128KiB/256KiB 32 2KiB/4KiB/8KiB
512KiB/1MiB/2MB/4MiB 64/128/256/512 8KiB
8MiB/16MiB/32MiB 512/1024/2048 16KiB

Starting from Version 5.0, if not specified, Spectrum Scale file systems are created with the default file system block size 4MiB with a sub-block size of 8 KiB.

NSD Server Priority
The preferred or primary NSD server of NSD can be changed dynamically without unmounting the file system.

File System Rebalancing:
With version 5.0, Spectrum Scale uses a lenient round-robin algorithm which makes rebalancing much faster vs the strict round-robin method used in earlier versions.

File System Integrity Check
While doing a file system integrity check, if the mmfsck command is running for a long period of time, another instance of mmfsck can be launched with the –stats-report option to display current status from all the nodes that are running the mmfsck command.

Cluster Health
Spectrum Scale cluster health check commands have been enhanced with options to verify file system, SMB and NFS nodes.

IBM Support
The mmcallhome command has a new option ‘–pmr’ which can be used to specify an existing PMR number for data upload.

Installation Toolkit

Spectrum Scale installation toolkit was introduced with version 4.1 and many enhancements are made in Version 5.0. The installation kit now supports deploying protocol nodes in a cluster that uses Spectrum Scale Elastic Storage Server (ESS). The installation toolkit also supports configuring Call Home and File Audit Logging. Deployment of Ubuntu 16.04 LTS nodes as part of the cluster are also supported by the installation toolkit.

Encryption and Compression

The file compression feature was added in Spectrum Scale 4.2 and has been enhanced in Spectrum Scale 5.0 to optimize read performance.  Local Read-only cache (LROC) can be used for storing compressed files. Spectrum Scale 5.0 also simplifies IBM SKLM configuration for file encryption.

Protocol Support

Starting with Spectrum Scale 4.1.1, data in IBM Spectrum Scale can be accessed using a variety of protocols like NFS, CIFS and Object.  The packaged Samba version has been upgraded to 4.0. Spectrum Scale 5.0 also supports the option to use a Unix primary group in AD. You can also modify NFS exports dynamically without impacting connected clients.  CES Protocol node functionality is now supported on Ubuntu 16.04 and above.

File Audit Logging

Spectrum Scale File Audit Logging logs all file operations like create, delete modify etc. in a central place. These logs can be used to track user access to the file system.

AFM

Files can be compressed in AFM and AFM-DR filesets. Spectrum Scale 5.0 also made improvements for load balancing across AFM gateways. Information Life-cycle Management for snapshots is now supported for AFM and AFM-DR filesets. AFM and AFM-DR filesets can be managed using IBM Spectrum Scale GUI.

Transparent Cloud Tiering (TCT)

TCT now supports remote mounted file systems. Clients can access tiered files on a remotely mounted file system.

Big-Data Analytics

Spectrum Scale Big-Data Analytics is now certified with Hortonworks Data Platform 2.6 on both Power 8 and x86 platforms and also certified with Ambari 2.5 for rapid deployment.

Spectrum Scale GUI Changes

The Spectrum Scale GUI was introduced in version 4.1. IBM made significant upgrades to the GUI in IBM Spectrum Scale 5.0.  Call Home and monitoring of remote clusters, file system creation and integration of Transparent Cloud Tiering are some of significant features that were added in version 5.0.

Use Cases

With the enhancements included in version 5.0, Spectrum Scale has truly become an enterprise class file system for the modern cloud era. Let’s see how we can leverage some of the new features and functions.

File System Block Size

File System block size is a critical parameter that needs to be considered for optimal performance before a file system is created and used for large amounts of data. The wide range of file system block sizes and sub-block sizes offered by Spectrum Scale makes it possible to store different sizes of files in a single file system and still get better throughput and IOPs performance for various sequential and random workloads.

While a larger block size helps improve throughput performance, having a variable sub-block size and number of sub-blocks enables you to minimize file system fragmentation and use the storage effectively.

But keep in mind that only new file systems created with Spectrum Scale 5.0 and above can take advantage of the variable sub-block size enhancement.

NSD Server Priority Change

Today’s businesses expect their servers, storage, file systems and applications to run with minimal downtime. Dynamically changing NSD server priority for each NSD without unmounting the file system on all NSD servers helps minimize downtime in several scenarios including NSD server retirement, for example.

File System Rebalancing

Network Storage Devices (NSDs) must be added or removed to expand or shrink a Spectrum Scale file system. The data in the file system need to be rebalanced across all NSDs for optimal performance. after the NSD addition or removal. Most of the time, System Administrators let Spectrum Scale do the rebalance in the background without actually forcing it at the time of removal or addition of NSD. The file system performance is not optimal until the NSDs are balanced. With Spectrum Scale 5.0 the rebalancing occurs at a faster speed using lenient round robin instead of strict round robin.

Protocol Support

Within IT organizations, its fairly standard for applications to run on a variety of platforms such as AIX, Linux and/or Windows. Spectrum Scale Protocol Support of industry standard protocols like CIFS, NFS and Object will allow users to access the data stored using these protocols in the most efficient way.

Protocol Support enables businesses to consolidate all of their enterprise data into a global name space with unified file system and object access avoiding multiple copies of data. With Spectrum Scale 5.0, customers can now configure servers running Ubuntu 16.04 as protocol nodes in addition to Redhat Enterprise Linux.

Call Home

Configuring the call-home feature in Spectrum Scale 5.0 enables IBM to detect cluster/file system issues proactively and enables automatic sending of logs and other required data for a timely resolution. This helps customers minimize down time and improves reliability of the cluster.

Installation Toolkit

With the installation toolkit, clusters can be configured and deployed seamlessly by defining cluster topology in a more intuitive way. The installation toolkit performs all necessary pre checks to make sure all the required package dependencies are met, automatically installs Spectrum Scale RPMS, configures the cluster and configures Protocols, Call Home, File Audit Logging etc. It simplifies the installation process and eliminates many manual configuration tasks.

Complex tasks like balancing the NSD servers, installing and configuring Kafka message brokers for file system Dudit Logging, and Spectrum Scale AD/LDAP authentication, for example, are much easier and simplified with installation took kit.

Compression

With explosive rates of data growth, organizations are always looking for ways to reduce storage costs.

Spectrum Scale File compression introduced in Version 4.2 addresses this need to minimize storage costs effectively by compressing legacy and less frequently used data. File compression is driven by Spectrum Scale ILM policies and typically provides a compression efficiency of 2:1 and 5:1 in some cases. Compression not only reduces the amount of storage required, but also improves I/O bandwidth across the network and reduces cache (pagepool) consumption.

With Spectrum Scale 5.0, file compression supports zlib and lz4 libraries. Zlib is primarily intended for cold data where as lz4 is intended to compress active data. Compression using lz4 favors read-access speed than space saving.

Though, Regular File compression and Object compression use the same technology, keep in mind that Object compression is available in CES environment and whereas File compression is only available in non-CES environments.

Encryption

File Encryption was introduced by IBM in Spectrum Scale version 4.1 and is available in Spectrum Scale Advanced and Data-Management editions only.

The data is encrypted at rest. Only data is encrypted, not metadata. Keep in mind Spectrum Scale encryption protects data storage device misuse and attacks by unprivileged users but not against deliberate malicious acts by cluster administrators.

Spectrum Scale 5.0 enables encryption of files stored in local disk (LROC) and simplifies the SKLM configuration.

Spectrum Scale encryption can be leveraged where organizations need to store PII data, business critical and any other confidential data. It can be also used by organizations that are required to meet federal and other security compliance standards like GDPR.

Spectrum Scale encryption is also certified as Federal Information Processing Standard (FIPS) compliant.

File Audit Logging

File Audit Logging was introduced in Spectrum Scale 5.0. File Audit Logging addresses the need to track the access of files for auditing purposes. It’s not an easy task to track individual file access in large scale clustered file systems with petabytes of data and billions of files that are accessed by hundreds of applications and thousands of users. Spectrum Scale File Audit Logging is designed to be highly scalable as the file system grows.

File Audit Logging is not required to be installed and configured on each and every  node in the cluster as required by some of the operating systems audit logging processes. It just needs to be configured on minimum three quorum nodes in the Spectrum Scale cluster and can be scaled to other nodes as required.

File Audit Logging also supports tracking of file access using NFS, CIFS and Object Protocols.

Addressing GDPR requirements using Spectrum Scale

IBM Spectrum Scale allows organizations to avoid multiple data islands as it provides a single name space for both structured and unstructured data. This helps to have a single point of control when protecting and managing all data that is subject to GDPR compliance.

Spectrum Scale Encryption helps to secure personal data while at rest to meet GDPR security requirements.

IBM Spectrum Scale supports industry standard Microsoft AD and LDAP Directory Sever authentication and a rich set of ACL support to comply with GDPR Right of Access policies.

Active File Monitoring (AFM)

Active File Monitoring can be used to transfer and cache data over a WAN between two Spectrum Scale clusters. One cluster being the home cluster that stores all data and the other cluster being a cache cluster, which can cache all data in the home cluster or only limited amount of data. AFM can also be implemented as a Disaster Recovery solution with AFM-DR.

With Spectrum Scale 5.0, storing compressed data with Spectrum Scale file compression is supported. Load balancing improvements and ILM support for AFM and AFM-DR fileset snapshots has also been added.

Transparent Cloud Tiering (TCT)

As the name implies, Transparent Cloud Tiering is another way to reduce high performance storage costs by transparently migrating aged data into less expensive cloud storage. This makes more room to ingest new data into the high-performance storage tier. Spectrum Scale ILM policies can be used to scan the file system metadata and identify the files that are not accessed for months or years and tier them to cloud storage. Since only file data gets migrated and not metadata, the migration process is transparent to user and applications. The data gets pulled from cloud storage to the local file system storage when users or applications try to access the data.

Spectrum Scale Transparent Cloud Tearing was introduced starting from Version 4.2 and is available with data-management edition only.

With Spectrum Scale 5.0, TCT supports file systems mounted from remote clusters. TCT enabled file sets can now use different containers. Multiple cloud accounts and containers are also supported.

Big-Data Analytics

As companies started leveraging social media feeds and other unstructured data for business analytics, today’s file systems have a need to support both structured and unstructured data under a single name space. Spectrum Scale introduced File Placement Organizer (FPO) architecture and Hadoop plug-in starting with Version 4.1 to support Big-Data applications and frame-work. Later versions of Spectrum Sale enhanced Hadoop Distributed File System (HDFS) transparency. Spectrum Scale is certified on Hortonworks Data Platform (HDP) 2.6.5 provided by Hortonworks, a large Hartonworks which is major big-data frame-work distributor.

Spectrum Scale also supports Ambary for easy and quick deployment in large scale Hadoop clusters.

Spectrum Scale GUI Changes

Ease of deployment and administration is one of the major requirement of large clustered file systems that are deployed across hundreds of servers. The Spectrum Scale GUI introduced in version 4.1 and above simplifies the installation, configuration and administration of large scale clusters. IBM made many significant enhancements to the Spectrum Scale GUI to make it more intuitive for routine system administration and monitoring tasks.

The Spectrum Scale GUI can now monitor cluster performance, cluster health, individual node health, SMB, NFS and Object protocol health and several other enhancements. In certain cases, routine actions can be applied to fix errors using a simple click from the Spectrum Scale GUI.

The Spectrum Scale 5.0 GUI now supports monitoring of remote clusters, Transparent Cloud Tiering and also provides IBM call home support for cluster, node, file system issues.

Summary

IBM Spectrum Scale continues to add and align features into a complete data management solution. This solution meets market demands of a highly scalable solution across different server architectures on-premises or in the cloud. It support new-era big data and artificial intelligence workloads along with traditional applications while ensuring security, reliability and high performance. The IBM Spectrum Scale solution also exhibits the stability, maturity and trust with a long history of more than twenty years of development.

The Power of a Storage Platform: IBM FlashSystem 9100

The Power of a Storage Platform: IBM FlashSystem 9100

This post originally appeared on IBM developerWorks and was written by Eric Stouffer.


Published on July 10, 2018 | Today, IBM Storage has announced the newest member of the IBM FlashSystem family – IBM FlashSystem 9100. Even at first glance, it has already impressed the experts:

“Today, enterprises are motivated and – if they want to optimize their results – required to be data driven within a multi-cloud deployment model,” said Mark Peters, Principal Analyst & Practice Director at ESG, “With its FlashSystem 9100, IBM is making it easier to optimize the value of data in terms of speed, availability, reliability, and resilience. At the same time, IBM has loaded its FlashSystem 9100 with sophisticated software for modern data protection, enhancing users’ ability to reuse secondary data, and supporting Docker and Kubernetes container environments.”

And when you take a closer look, FlashSystem 9100 continues to impress.

For decades, IBM has offered a range of high performance, ultra-low latency storage solutions.1 Now, FlashSystem 9100 combines the performance of flash and the Non-Volatile Memory Express (NVMe) protocol with the reliability and innovation of IBM FlashCore technology and the rich feature set of IBM Spectrum Virtualize in one powerful new storage platform for your data driven multi-cloud enterprise.

The NVMe-optimized all-flash arrays are offered in two basic models – FlashSystem 9110 and FlashSystem 9150. The compact, 2U enclosures feature dual array controllers, dual power supplies, redundant cooling, and full hot swap capabilities. Both models have two Intel Skylake CPUs per array controller with the FlashSystem 9110 offering eight cores per CPU, while the 9150 model comes with 14 cores/CPU for higher throughput and performance. Essentially, two rack units of space can provide the performance and efficiency of over a terabyte of memory and up to 2 petabytes of effective storage – all moving at NVMe speeds to tackle even the most demanding real-time analytics or artificial intelligence (AI) applications and workloads.

A key innovation involves the transformation of IBM FlashCore technology into the industry standard 2.5” small form factor (SFF) FlashCore Modules (FCM). Twenty-four FCMs with NVMe interfaces can form the basis of the storage array. Remember that FlashCore technology refers to the IBM innovations that enable FlashSystem solutions to deliver consistent microsecond latency, extreme reliability, and a wide range of operational and cost efficiencies. FlashCore engineering includes the high-performance architecture and advanced flash management features such as Variable Stripe RAID technology, IBM-engineered error correction codes, and proprietary garbage collection algorithms.

Flexibility is built into the FlashSystem 9100 architecture. You can choose IBM FCMs offering 4.8 TB, 9.6 TB, and 19.2 TB 3D TLC capacity points, or you can opt for industry standard NVMe SFF drives. This means that effective capacities can grow into the 2-petabyte range in a single 2RU enclosure, depending on the data set characteristics. In an industry standard 42 RU rack the FlashSystem 9100 delivers the ability to cluster, scale out, or scale up capacity and performance up to 32 petabytes and up to 10 million IOPS. Plus, IBM FlashSystem 9100 arrays come ready to support NVMe over Fabrics, once you are ready to take this step, so that the solutions can extend their extreme low latency across entire storage area networks.

“The value and insights we gain from our data provide important advantages to our business. This means that high performance storage has always been a crucial part of our IT infrastructure. We have intended to move toward NVMe-based storage, but needed to keep deployment risks low,” said Tim Conley, from The ATS Group and Galileo Performance Explorer, “The new IBM FlashSystem 9100 enables us to take advantage of NVMe technology and performance with minimal risk, thanks to the maturity and stability of the IBM FlashSystem platform. This new system is a big step forward for us.”

FlashCore technology is certainly impressive, but FlashSystem 9100 also provides the software-defined, modern data protection, and multi-cloud capabilities of several members of the IBM Spectrum Storage family. Chief among these is IBM Spectrum Virtualize, the system foundation that provides a broad set of enterprise-class data services – such as dynamic tiering, replication, FlashCopy management, data mobility, transparent cloud tiering, and high performance data-at-rest encryption, among many others. The arrays also leverage innovative new Data Reduction Pools (DRP) that incorporate deduplication and hardware-accelerated compression technology, plus SCSI UNMAP support and all the thin provisioning and data efficiency features you’d expect from IBM Spectrum Virtualize-based storage to potentially reduce your CAPEX and OPEX. Additionally, all these benefits extend to over 440 heterogeneous storage arrays from multiple vendors.

data efficiency from IBM Spectrum Virtualize-based storage

IBM FlashSystem 9100 DRP technology can lead to substantial capital and operational cost savings. Many workloads should be able to realize 5:1 data reduction, or higher, which over a three-year period can potentially save your IT department nearly a million dollars in CAPEX and an astonishing five and half million dollars in OPEX.2

Altogether, members of the IBM Spectrum Storage family, including IBM Spectrum Protect Plus Multi-Cloud starter for FlashSystem 9100, IBM Spectrum Copy Data Management Multi-Cloud starter for FlashSystem 9100, IBM Spectrum Virtualize for Public Cloud Multi-Cloud starter for FlashSystem 9100, and IBM Spectrum Connect; as well as IBM Storage Insights, come as standard components with every flexible, modern, agile, and blazingly fast IBM FlashSystem 9100 solution, while additional IBM software such as IBM Cloud Private or IBM Spectrum Access can easily be added either at the time of purchase or as future upgrades.

The combination of FlashCore technology and IBM Spectrum Storage functionality has already drawn the attention and enthusiasm of IBM Business Partners.

“Our customers want the speed of NVMe storage, but that’s just the beginning. Our solutions must also provide the full range of cloud, container, and data protection capabilities,” Lief Morin, CEO of IBM business partner Key Information Systems, A Converge Company. “That’s why this latest announcement from IBM is so important. It changes the landscape for enterprise data storage. Now, in one platform we can offer our customers everything they need to modernize and transform their storage infrastructure, their cloud and their data driven business.”

AI-enhanced, cloud-based system insights platform

And don’t think that a new storage system from IBM left the building without a full dose of Artificial Intelligence (AI). FlashSystem 9100 solutions come with Storage Insights, IBM’s enterprise grade, AI-enhanced, cloud-based system insights platform designed to help our customers better understand trends in storage capacity and performance and expedite resolution when support is required. The Storage Insights platform provides proactive best practices and uses AI-based analytics to help identify potential issues before they become problems. And if support is ever needed, Storage Insights helps speed resolution by simplifying opening tickets; automating log uploads to IBM; and providing configuration, capacity, and performance information to IBM technicians.

From the technology, engineering, and innovation perspectives, the new FlashSystem 9100 arrays have plenty to offer. But of course, it’s the view from the board room that matters most. For the data-driven, multi-cloud enterprise searching for information technology to simply accelerate their agility and competitive advantages, rather than constrain them, IBM FlashSystem 9100 is the solution.

1IBM System White Paper: Transforming real-time insight into reality, September 2017 TSW03555-USEN-00

2Metrics based on IBM research using calculations made with preliminary pricing, a 75% discount on hardware, and 30% on system maintenance (street pricing). Calculations made with preliminary pricing, using a 75% discount on Hardware and 30% on Maintenance (street pricing). Without data reduction: 1-FS1150 controler, fully populated with 24 9.6TB NVMe FCM; 2-4 expansions, fully populated with 24 7.68 TB SSD’s; 3-Full SW licenses, 24×7 support over 3 years; Management cost per TB, according to Gartner 2520 per year; Total Capacity: 967.68 TB: Total cost: Capex 1.17M + Opex 7.35 M. With data reduction: FS1150 controller, fully populated with 24 9.6TB NVMe FCM; No need for expansions; Full SW licenses, 24×7 support over 3 years; Management cost per TB, according to Gartner 2520 per year; Total capacity needed after data reduction technologies are applied: 230.4; Total Cost: Capex 337 + Opex 1.75M. Savings in a 3 year period of time is 75%.