IBM z Systems Performance Guide
IBM z Systems Performance Guide
Trademarks
The following are trademarks of the International Business Machines Corporation in the United States and/or other countries.
BigInsights DFSMSdss FICON* IMS RACF* System z10* zEnterprise*
BlueMix DFSMShsm GDPS* Language Environment* Rational* Tivoli* z/OS*
CICS* DFSORT HyperSwap MQSeries* Redbooks* UrbanCode zSecure
COGNOS* DS6000* IBM* Parallel Sysplex* REXX WebSphere* z Systems
DB2* DS8000* IBM (logo)* PartnerWorld* SmartCloud* z13 z/VM*
DFSMSdfp
* Registered trademarks of IBM Corporation
The following are trademarks or registered trademarks of other companies.
Adobe, the Adobe logo, PostScript, and the PostScript logo are either registered trademarks or trademarks of Adobe Systems Incorporated in the United States, and/or other countries.
Cell Broadband Engine is a trademark of Sony Computer Entertainment, Inc. in the United States, other countries, or both and is used under license therefrom.
Intel, Intel logo, Intel Inside, Intel Inside logo, Intel Centrino, Intel Centrino logo, Celeron, Intel Xeon, Intel SpeedStep, Itanium, and Pentium are trademarks or registered trademarks of Intel
Corporation or its subsidiaries in the United States and other countries.
IT Infrastructure Library is a registered trademark of the Central Computer and Telecommunications Agency which is now part of the Office of Government Commerce.
ITIL is a registered trademark, and a registered community trademark of the Office of Government Commerce, and is registered in the U.S. Patent and Trademark Office.
Java and all Java based trademarks and logos are trademarks or registered trademarks of Oracle and/or its affiliates.
Linear Tape-Open, LTO, the LTO Logo, Ultrium, and the Ultrium logo are trademarks of HP, IBM Corp. and Quantum in the U.S. and
Linux is a registered trademark of Linus Torvalds in the United States, other countries, or both.
Microsoft, Windows, Windows NT, and the Windows logo are trademarks of Microsoft Corporation in the United States, other countries, or both.
OpenStack is a trademark of OpenStack LLC. The OpenStack trademark policy is available on the OpenStack website.
TEALEAF is a registered trademark of Tealeaf, an IBM Company.
Windows Server and the Windows logo are trademarks of the Microsoft group of countries.
Worklight is a trademark or registered trademark of Worklight, an IBM Company.
UNIX is a registered trademark of The Open Group in the United States and other countries.
* Other product and service names might be trademarks of IBM or other companies.
Notes:
Performance is in Internal Throughput Rate (ITR) ratio based on measurements and projections using standard IBM benchmarks in a controlled environment. The actual throughput that any
user will experience will vary depending upon considerations such as the amount of multiprogramming in the user's job stream, the I/O configuration, the storage configuration, and the workload
processed. Therefore, no assurance can be given that an individual user will achieve throughput improvements equivalent to the performance ratios stated here.
IBM hardware products are manufactured from new parts, or new and serviceable used parts. Regardless, our warranty terms apply.
All customer examples cited or described in this presentation are presented as illustrations of the manner in which some customers have used IBM products and the results they may have
achieved. Actual environmental costs and performance characteristics will vary depending on individual customer configurations and conditions.
This publication was produced in the United States. IBM may not offer the products, services or features discussed in this document in other countries, and the information may be subject to
change without notice. Consult your local IBM business contact for information on the product or services available in your area.
All statements regarding IBM's future direction and intent are subject to change or withdrawal without notice, and represent goals and objectives only.
Information about non-IBM products is obtained from the manufacturers of those products or their published announcements. IBM has not tested those products and cannot confirm the
performance, compatibility, or any other claims related to non-IBM products. Questions on the capabilities of non-IBM products should be addressed to the suppliers of those products.
Prices subject to change without notice. Contact your IBM representative or Business Partner for the most current pricing in your geography.
This information provides only general descriptions of the types and portions of workloads that are eligible for execution on Specialty Engines (e.g, zIIPs, zAAPs, and IFLs) ("SEs"). IBM
authorizes customers to use IBM SE only to execute the processing of Eligible Workloads of specific Programs expressly authorized by IBM as specified in the “Authorized Use Table for IBM
Machines” provided at www.ibm.com/systems/support/machine_warranties/machine_code/aut.html (“AUT”). No other workload processing is authorized for execution on an SE. IBM offers SE
at a lower price than General Processors/Central Processors because customers are authorized to use SEs only to process certain types and/or amounts of workloads as specified by IBM in the
AUT.
Introduction / Motivation
Hardware/Software co-optimization is increasingly important to performance
– Performance gains from technology scaling have ended
– Hardware performance gains are coming from design
• Micro-architectural innovation (and complexity)
• New instructions and architected features
– Coding practices and software exploitation needed to get the full value of the hardware
More efficient code helps everybody
– Increases value of software
• Extract the maximum useful work from the hardware
– Increases value of z Systems platform
• Solutions delivered more cost-effectively
– Decreases effective cost for end user
Goal of this session: Motivate you to make performance a priority
– Can only scratch the surface in 45 minutes
– Highlight a few high-leverage areas
– Point you to resources available to assist with optimization
Compilers
Compilers on z systems
IBM continues to invest in the compiler portfolio on z:
– Increased focus on application program performance in recent years
– Continued advancements in languages and operating systems
• Java / JIT, C/C++, COBOL, PL/I, Linux, z/OS
Enterprise COBOL for Enterprise PL/I for z/OS V2.2 XL C/C++ for Linux on z
z/OS V5.2 z/OS V4.5 XL C/C++ Systems V1.2
• Leverage SIMD instructions • Critical Business • Optional feature of z/OS • New compiler based on
to improve processing of Language – Committed to 2.2
invest in leading-edge Clang and IBM
certain COBOL statements.
technology optimization
• Provides system technology
• Increased use of DFP programming
• Shipped a new release
instructions for Packed capabilities with Metal C
every year since 1999 • Fully Supports
Decimal data option z/Architecture,
• Fully Supports including z13 & z13s
• Support COBOL 2002 z/Architecture, including • Fully Supports
z/Architecture, including processors
language features: SORT z13 & z13s processors
and table SORT statements z13 & z13s processors
• Provide full support for • Provide easy migration
JSON (Parse, Generate, • Ships with High of C/C++ applications
• Allows applications to
and Validate) performance Math to System z
access new z/OS JSON
services Libraries tuned for z13 *Up to 14% increase in
performance over GCC*
*Up to 14% reduction in CPU *Up to 17% reduction in *Up to 24% increase in
time* CPU time* throughput*
* The performance improvements are based on internal IBM lab measurements. Performance results for specific applications will vary, depending on the
source code, the compiler options specified, and other factors
5 © 2016 IBM Corporation
IBM z Systems 2016 NY NaSPA Chapter
flush
7 © 2016 IBM Corporation
IBM z Systems 2016 NY NaSPA Chapter
Cache design 4
D
L1
Core
On-chip shared L3 D L2
– Shared by all cores on the CP chip Core
4 L1
– Now also the sharing point for I-L2 and D-L2 Pipe-
line L3
4 I
L1 L2
– Re-entrant code
– Any LE-based compiler generated code z13 On-chip Cache Hierarchy
– Dynamic run-time code Local/Private caches Shared caches
Problematic Examples D L2
– True self-modifying code Core
4 L1
…But updates from multiple cores => lines bounce around among caches
– Depending on locations of cores, added access latency can be troublesome
– Need to manage well to get good performance
True sharing – real-time sharing among multiple SW threads / processes
– Atomic updates, Software locks
– Higher nWay (concurrent SW threads), more frequent access => more care needed
– If contested in real-time, can lead to “hot-cache-line” situations
False sharing – structures / elements in same cache line
– Can be avoided by separating structures into different cache lines
Cache hit Latencies Intervention Overhead
locations (no queuing) (if a core owes CP CP N N
exclusive) 8 cores 8 cores
L1,L2,L3 L1,L2,L3
L1 4 NA
CP CP CP N N
L2 ~10 NA CP XBus
8 cores 8 cores 8 cores XBus 8 cores
L1,L2,L3 L1,L2,L3 L1,L2,L3
L3 (on-chip) 35+ 40+ L1,L2,L3
N N
L3 (on-node) 180+ 20+ SC SBus SC
L4+NIC L4+NIC
L3 700+ 20+ Processor Processor N N
Node (N) Node (N)
(off-drawer, Inter-node
Far column) Cache topology and latencies for z13 Topology
Optimizer
Other Resources
Like this stuff? There’s lots more available:
Microprocessor Optimization Primer
– Available under IBM Developerworks’ LinuxOne community
• https://www.ibm.com/developerworks/community/groups/community/lozopensource
Thank you!