The ATS Blog

IBM Follows New Recipe to Deliver Performance Gains with POWER9 Servers
August 24, 2018

By Blake Basom | Sr. Systems Engineer (The ATS Group)
IBM has been producing servers based on their line of Power processors for close to 30 years. These servers continue to improve with each new generation, and POWER9 is no exception. The difference this time derives from the ways in which IBM achieved those performance gains.
With POWER9, IBM continues to separate themselves from the competition, offering a host of improvements over their POWER8 predecessors. In fact, IBM claims that POWER9 servers offer the following benefits over their competitors:

  • 5x max I/O bandwidth vs. x86
  • 2x high performance cores vs. x86
  • 6x more RAM supported vs. x86
  • 8x more memory bandwidth vs. x86

In this document, I will delve into some of the more significant features and changes of the POWER9 servers and offer my opinion on the benefits. This document is intended to be an overview of POWER9’s new and enhanced features, and while it will be fairly detailed, it is not meant to be a deep dive in any specific technology. The features that we look at will be grouped into categories of Processor, Memory, and I/O, but first an overview of the new server line.

POWER9 Server Line

IBM began their launch of POWER9 servers with the AC922 server in late 2017. This server is designed specifically for compute-heavy Artificial Intelligence (AI) and cognitive workloads, rather than for general computing. This system was the first to embed PCIe 4.0, Nvidia NVLink, and OpenCAPI technologies. As a result, IBM claims that the AC922 enables data to move 9.5 times faster than on PCIe 3.0 based x86 systems.
Servers designed for a more general workload began to be launched in early 2018.  They began with the “Scale Out” versions of their servers – one and two-socket rack-mountable servers in sizes from 1U – 4U.  The S914, S922, and S924 PowerVM based servers are in the traditional mold, supporting AIX, IBM i, and Linux workloads. The L922 server is a Linux-only model, while the H922 and H924 servers have been optimized for SAP HANA. Also offered are LC921 and LC922 models, which are processor and storage dense servers designed for Linux Clusters.
Finalizing the POWER9 server line, IBM announced the larger and more powerful “Scale Up” models in August, 2018. These enterprise servers offer increased computing capability, along with enhanced security and availability, and simplified cloud management. The 4-socket E950 offers up to 48 processor cores and up to 16TB of memory in a 4U package. Last, but far from least, the E980 represents the top of the POWER9 server line, offering up to 192 processor cores and 64TB of memory.

Processor

The most obvious place to start when looking at the POWER9 servers is the POWER9 processor itself. In years past, performance improvements were achieved in part through improving the fabrication process to reduce the transistor sizes, allowing the clock to run faster. While the fabrication process continues to improve and transistor sizes continue to shrink, server manufacturers are not greatly increasing clock speeds, leaving performance improvements to be achieved through other means. In the case of the POWER9 processor, it is primarily due to improving processor pipeline efficiency, increasing the data flow between components, and allowing for faster access from external sources.
The POWER9 processor was fabricated via a highly advanced 14nm finFET Silicon-On-Insulator lithography process (using a 17-layer metal stack), an improvement from the 22nm process that was used for POWER8. This allowed IBM to jam a total of 8 billion transistors in each chip, compared to 4.2 billion in POWER8. Clock speeds run up to 4 GHz, which is similar to POWER8.
The POWER9 chip is a more modular design, and performance was improved by shortening the pipeline, improving fixed-point and floating-point operations, and improving instruction management. These changes allow more instructions to be completed per clock cycle, leading to performance improvements without raising the clock speed. Increasing the amount of on-chip memory (particularly L3 cache) helps as well, and on-chip switching bandwidth of over 7 TB/s allows data to move in and out of the processor cores at 256 GB/s in the SMT8 model.
You may be thinking, “SMT8 model? Aren’t all POWER9 chips SMT8, as the POWER8 chips were?” Actually, no. IBM is producing two main variants of the POWER9 processor – the PowerVM based general purpose servers will use a full SMT8 processor (which allows up to 8 threads per core), while certain non-PowerVM based Linux models will use SMT4 versions of the POWER9 processors (which only allow up to 4 threads per core). This may sound like a step backward, but it is a result of IBM listening to its customers and partners. Basically, IBM learned that a segment of the Linux market desired the reduced SMT version, which allowed more cores to be packaged in a single chip.  In fact, the SMT4 versions will allow up to 24 cores per die, while the SMT8 models only offer up to 12 cores.
Speaking of SMT4 vs. SMT8, what is the best multi-threading mode in which to run the new processors? When IBM introduced SMT8 with POWER8 processors, there were some performance problems initially. The problems weren’t necessarily severe, but running in SMT8 mode didn’t necessarily equate to much improvement in processing power, and in some cases IBM was recommending running POWER8 servers in SMT4 mode. This issue has seemingly been fixed in POWER9, with SMT8 being the preferred mode for most applications, offering a distinct performance boost over running in SMT4 mode (under most circumstances).
IBM also introduced Workload Optimized Frequency with POWER9, where the processor can dynamically change clock speeds based on the running workload, to allow for enery savings when the workload is low, with the ability to quickly ramp up when needed. This feature can be controlled through processor mode settings and can be changed without a reboot.
All of that sounds nice, but what does it really mean? How much faster are the POWER9 processors? Well, of course it varies by server model and workload, but in general you can expect a 30-50% improvement over comparable POWER8 models, along with 20-30% improvement in price/performance ratio (more bang for your buck).
Note that when migrating workload from POWER8 to POWER9, you will likely want to reduce the number of virtual CPUs, which may improve performance, while reducing software licensing costs. Each case will be unique, so testing a specific workload with different numbers of VCPUs will reveal the optimal allocation. Likewise, running tests in both SMT4 mode and SMT8 mode will show which threading mode is best.

Memory

The POWER9 servers use top of the line DDR4 memory (some of the later POWER8 models used this as well). The SMT4 processor models allow for direct attached memory DIMMs, while the SMT8 versions allow more memory to be attached, via buffers. The SMT4 models offer up to 120 GB/s of sustained memory bandwidth, while the SMT8 models offer up to 230 GB/s of sustained bandwidth with theoretical peaks of 340 GB/s. Memory capacity varies by model, up to 4 TB for Scale Out models, and up 64TB for Scale Up models.

I/O

Most modern servers are not self-contained, meaning they are connected to external devices for storage, networking, and increasingly for hardware acceleration devices. With the blazing speeds of current processors and memory, the computing bottleneck has shifted to Input/Output devices. IBM has spent a lot of effort in this area with the POWER9 servers, offering a number of options to improve the speed and bandwidth to attached devices.
The latest edition of PCIe (Gen4) is available in POWER9 servers, offering up to twice the bandwidth of PCIe Gen3 (note that Gen3 adapters will work in Gen4 slots, albeit with the Gen3 bandwidth). 48 lanes of PCIe Gen4 adds up to 192 GB/s duplex bandwidth to attached devices.  In addition to traditional PCI adapters for network and SAN connectivity, some PCIe Gen4 slots are enabled for CAPI 2.0 devices such as ASICs and FPGAs. CAPI 2.0 using PCIe Gen4 offers 4x the bandwidth of CAPI 1.0 on POWER8.
Additional connectivity is provided by a 25 Gb/s Common Link – 48 lanes provides up to 300 GB/s bandwidth for devices attached via NVLink 2.0 or OpenCAPI 3.0 (not available on PowerVM based servers). NVLink can be used for high speed GPU attachment, while OpenCAPI is an upcoming open hardware standard that is supported by a consortium of industry heavyweights, which will be used to connect components like high-speed network and SAN adapters, as well as additional memory and GPU accelerators.
POWER9 provides support for the next generation of SR-IOV Ethernet adapters – with increased port speeds of 10Gb, 25Gb, 40Gb, and 100Gb. Additional enhancements allow more VFs per port (target 60VFs per port / 120 VFs per adapter for 100Gb adapters), as well as vNIC and vNIC failover support for Linux.
Server I/O performance is also improved by the on-chip acceleration capabilities of the POWER9 processors themselves, which speed up the common but intensive tasks of compression/decompression and encryption/decryption.
Some POWER9 servers also support internal Non-Volatile Memory (NVMe) devices. These bootable disks are meant primarily for operating systems, offering high-speed access with low latency, but in a read-mostly format.

Conclusion

When you put it all together, it is clear that IBM put an emphasis on overall server performance with their line of POWER9 servers, rather than just trying to crank out the fastest processor that they could. By focusing on I/O enhancements, and partnering with great companies across the industry, they have achieved some impressive results.  But they didn’t forget about the processor either – the POWER9 processor improved upon an already industry leading standard.  From general purpose Scale Out servers, all the way up to the enterprise class Scale Up servers, IBM has provided a robust line of servers to meet the UNIX computing needs of users around the globe. And with certain models customized for specific technologies, users can expect optimized performance for their specific needs. As a longtime user, administrator, and consultant for IBM Power servers, I think that POWER9 represents another impressive step forward for IBM, offering endless possibilities for world class computing.


Did this content resonate with you and your organization? Download the full version of the Point of View: IBM Follows New Recipe to Deliver Performance Gains with POWER9 Servers document to share with peers.

Written by Cindy Hollenbach