CMG - Unix Server Sizing
CMG - Unix Server Sizing
Performance Professionals
The Computer Measurement Group, commonly called CMG, is a not for profit, worldwide organization of data processing professionals committed to the
measurement and management of computer systems. CMG members are primarily concerned with performance evaluation of existing systems to maximize
performance (eg. response time, throughput, etc.) and with capacity management where planned enhancements to existing systems or the design of new
systems are evaluated to find the necessary resources required to provide adequate performance at a reasonable cost.
This paper was originally published in the Proceedings of the Computer Measurement Group’s 2000 International Conference.
Copyright 2000 by The Computer Measurement Group, Inc. All Rights Reserved. Published by The Computer Measurement Group, Inc. (CMG), a non-profit
Illinois membership corporation. Permission to reprint in whole or in any part may be granted for educational and scientific purposes upon written application to
the Editor, CMG Headquarters, 151 Fries Mill Road, Suite 104, Turnersville , NJ 08012.
BY DOWNLOADING THIS PUBLICATION, YOU ACKNOWLEDGE THAT YOU HAVE READ, UNDERSTOOD AND AGREE TO BE BOUND BY THE
FOLLOWING TERMS AND CONDITIONS:
License: CMG hereby grants you a nonexclusive, nontransferable right to download this publication from the CMG Web site for personal use on a single
computer owned, leased or otherwise controlled by you. In the event that the computer becomes dysfunctional, such that you are unable to access the
publication, you may transfer the publication to another single computer, provided that it is removed from the computer from which it is transferred and its use
on the replacement computer otherwise complies with the terms of this Copyright Notice and License.
Copyright: No part of this publication or electronic file may be reproduced or transmitted in any form to anyone else, including transmittal by e-mail, by file
transfer protocol (FTP), or by being made part of a network-accessible system, without the prior written permission of CMG. You may not merge, adapt,
translate, modify, rent, lease, sell, sublicense, assign or otherwise transfer the publication, or remove any proprietary notice or label appearing on the
publication.
Disclaimer; Limitation of Liability: The ideas and concepts set forth in this publication are solely those of the respective authors, and not of CMG, and CMG
does not endorse, approve, guarantee or otherwise certify any such ideas or concepts in any application or usage. CMG assumes no responsibility or liability
in connection with the use or misuse of the publication or electronic file. CMG makes no warranty or representation that the electronic file will be free from
errors, viruses, worms or other elements or codes that manifest contaminating or destructive properties, and it expressly disclaims liability arising from such
errors, elements or codes.
General: CMG reserves the right to terminate this Agreement immediately upon discovery of violation of any of its terms.
Learn the basics and latest aspects of IT Service Management at CMG's Annual Conference - www.cmg.org/conference
Buy the Latest Conference Proceedings and Find Latest Computer Performance Management 'How To' for All Platforms at www.cmg.org
Bob Chaney
Delta Technology, Inc.
Join over 14,000 peers - subscribe to free CMG publication, MeasureIT(tm), at www.cmg.org/subscribe
(p-1)
also be applied to uniprocessor SPEC ECPU = P / 1 + f
ratings.
Buy the Latest Conference Proceedings and Find Latest Computer Performance Management 'How To' for All Platforms at www.cmg.org
From the unix vendors we have the MHZ Quadratic Scaleup
rating of the CPU, which gives a good
uniprocessor metric but does little in the way ECPU = P - fP(P-1)
of providing the relative capacities of
different SMP configurations. The MHZ Where:
rating is the CPU’s clock speed, and
Join over 14,000 peers - subscribe to free CMG publication, MeasureIT(tm), at www.cmg.org/subscribe
While only representing one system The geometric algorithm models CPU-CPU
capacity metric (CPU clock speed), MHZ communications as a “token-passing”
can still be a useful relative sizing method, where each CPU communicates
measurement and certainly is no more risky with the one next to it.
than the aforementioned tpmC benchmark.
The tpmC benchmark and CPU MHZ ratings Amdahl’s algorithm assumes a “broadcast
discussed in this section will be used later in method” of CPU-CPU communication,
the paper to create SMP scaling tables for where one CPU broadcasts information to
various unix systems. all other CPUs in the complex.
Geometric Scaleup
ECPU = 1 - Op
f
Seriality Factor
The most subjective part of all three
Amdahl’s Scaleup algorithms is the seriality/overhead factor
(denoted in the above calculations as “f”).
This factor, not easily measured/estimated are added to the configuration. In other
in commercial systems, represents the words, that portion which cannot be
Buy the Latest Conference Proceedings and Find Latest Computer Performance Management 'How To' for All Platforms at www.cmg.org
percentage of the workload that is serial in parallelized. Using a 3% seriality factor
nature and will not benefit linearly as CPUs would produce the following scaleup chart.
32
30
28
26
24
22
20
Productive CPUs
18
16
14
12
10
0
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32
Physical CPUs
Chart 1
The x-axis represents the number of It turns out the 3% seriality factor provides a
physical CPUs in a server while the Y-axis very good approximation of CISC (Complex
is the number of effective CPUs, which Instruction Set Computing) systems like
takes into account the increasing SMP OS/390. Table 1 shows the data for 1 - 10
overhead experienced as physical CPUs are CPUs included in the chart above. The data
added. The quadratic algorithm actually in this table was created by applying the
projects negative capacity growth at greater three scaling algorithms then averaging the
than 18 CPUs, assuming a loss of three scaleup factors.
productive CPUs to overhead functions at
that level and beyond.
No. CPUs
Geometric Scaleup Amdahl Scaleup Quadratic Scaleup Average
Buy the Latest Conference Proceedings and Find Latest Computer Performance Management 'How To' for All Platforms at www.cmg.org
1 1.000 1.000 1.000 1.000
2 1.970 1.942 1.940 1.951
3 2.911 2.830 2.820 2.854
4 3.824 3.670 3.640 3.711
5 4.709 4.464 4.400 4.524
6 5.568 5.217 5.100 5.295
Join over 14,000 peers - subscribe to free CMG publication, MeasureIT(tm), at www.cmg.org/subscribe
By using the factors from Table 1 and the the CW MIPS rating. The uniprocessor
MIPS ratings from Cheryl Watson’s CPU MIPS are identical and are taken directly
chart [Watson 1999], we can determine from the CW CPU chart. Each
which SMP curve best fits the 9672-xx6 configuration thereafter is estimated by the
family of CMOS machines. Table 2 shows appropriate scaling factor.
how each SMP scaling table matches with
# OF MODEL
CPUs GEOMETRIC AMDAHL QUADRATIC AVERAGE CW MIPS RATING
1 9672-T16 128.5 128.5 128.5 128.5 128.5
2 9672-T26 253.1 249.5 249.3 250.6 243.2
3 9672-R36 374.1 363.7 362.4 366.7 351.5
4 9672-R46 491.3 471.6 467.7 476.9 454.2
5 9672-R56 605.1 573.7 565.4 581.4 549.4
6 9672-R66 715.4 670.4 655.4 680.4 638.8
7 9672-R76 822.5 762.3 737.6 774.1 720.1
8 9672-R86 926.3 849.6 812.1 862.7 795.1
9 9672-R96 1027.0 932.7 878.9 946.2 865.0
10 9672-Rx6 1124.7 1011.8 938.1 1024.9 929.3
Table 2
We find the quadratic model most closely the calculated quadratic scaling model and
approximates the published MIPS rating, the CW MIPS ratings in Figure 2 below.
and can see a close relationship between
Buy the Latest Conference Proceedings and Find Latest Computer Performance Management 'How To' for All Platforms at www.cmg.org
QUADRATIC CW MIPS RATING
1000
900
800
Join over 14,000 peers - subscribe to free CMG publication, MeasureIT(tm), at www.cmg.org/subscribe
700
Productive MIPS
600
500
400
300
200
100
0
9672-R36
9672-R46
9672-R56
9672-R66
9672-R76
9672-R86
9672-R96
9672-T16
9672-T26
9672-Rx6
Chart 2
We can make the two curves almost out-of-order instruction execution or parallel
identical by applying a more exact seriality execution are more difficult to implement on
factor of 3.075%, but we’ll use 3% as the CISC machines.[Patterson 1980] Therefore
baseline seriality factor for CISC systems. it is feasible and probable that RISC
The purpose of this paper, however, is to systems, by virtue of their short simple
estimate SMP capacity for RISC (Reduced instructions will result in a lower seriality
Instruction Set Computer) systems. The factor than CISC systems. The ideal RISC
primary difference between RISC and CISC CPU will complete an instruction in one
systems is how the CPU works. RISC CPU clock cycle, allowing for higher
CPUs take short, simple instructions and do degrees of parallelization, and lower SMP
them very quickly, issuing many more overhead at high physical CPU numbers.
instructions than a CISC CPU. CISC For the above reasons, a 2% seriality factor
systems use longer, more complex will be used to estimate the SMP capacity of
instructions. As a result, CPU features like unix systems. Figure 2 shows the 2% chart.
Buy the Latest Conference Proceedings and Find Latest Computer Performance Management 'How To' for All Platforms at www.cmg.org
Geometric Scaleup Amdahl Scaleup Quadratic Scaleup Average
32
30
28
26
Join over 14,000 peers - subscribe to free CMG publication, MeasureIT(tm), at www.cmg.org/subscribe
24
22
20
Effective CPUs
18
16
14
12
10
0
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32
Number of Physical CPUs
Chart 3
No. CPUs
Geometric Scaleup Amdahl Scaleup Quadratic Scaleup Average
Buy the Latest Conference Proceedings and Find Latest Computer Performance Management 'How To' for All Platforms at www.cmg.org
1 1.000 1.000 1.000 1.000
2 1.980 1.961 1.960 1.967
3 2.940 2.885 2.880 2.902
4 3.882 3.774 3.760 3.805
5 4.804 4.630 4.600 4.678
6 5.708 5.455 5.400 5.521
Join over 14,000 peers - subscribe to free CMG publication, MeasureIT(tm), at www.cmg.org/subscribe
The first step is to estimate the uniprocessor TPM by dividing the tpmC benchmark by the
Buy the Latest Conference Proceedings and Find Latest Computer Performance Management 'How To' for All Platforms at www.cmg.org
appropriate geometric factor. For the HP V2250, the calculation would be:
while at the same time assumes the appear only in the geometric column (at the
benchmark was established with the lowest appropriate CPU number - in this case 16),
possible seriality level. Given the objective with all other scaling factors returning lower
of the benchmark effort, namely to produce TPM estimates. The complete scaling
the highest possible tables for each HP server appear next.
HP 9000 V2250
G e o m etric Amdahl Q uadratic Average
1 3774 3774 3774 3774
2 7472 7400 7397 7423
3 11097 10886 10869 10950
4 14648 14241 14190 14360
5 18129 17471 17360 17653
6 21541 20585 20379 20835
7 24884 23586 23247 23906
8 28160 26483 25964 26869
9 31370 29280 28530 29727
10 34517 31982 30945 32481
11 37600 34593 33210 35134
12 40622 37120 35323 37688
13 43583 39564 37285 40144
14 46486 41931 39097 42505
15 49330 44225 40757 44771
16 52117 46447 42267 46944
Table 4
HP N4000
G e o m etric Amdahl Q uadratic Average
1 6608 6608 6608 6608
2 13084 12957 12952 12997
3 19430 19062 19031 19174
4 25650 24936 24846 25144
5 31745 30593 30397 30911
6 37718 36044 35683 36482
7 43571 41300 40705 41859
8 49308 46372 45463 47048
Table 5
HP 9000 V2500
Buy the Latest Conference Proceedings and Find Latest Computer Performance Management 'How To' for All Platforms at www.cmg.org
2 8486 8403 8400 8430
3 12601 12362 12343 12435
4 16635 16172 16114 16307
5 20588 19841 19714 20048
6 24462 23376 23142 23660
7 28258 26785 26399 27148
Join over 14,000 peers - subscribe to free CMG publication, MeasureIT(tm), at www.cmg.org/subscribe
The last table points out the huge difference to scratch a little beneath the surface to
in estimated TPM’s at higher CPU better understand the tpmC benchmark.
configurations. Depending on how well both While both systems were running HP-UX
the application(s) and operating system 11.0, they were on different Extension
scale, the V2500 could generate a high of Packs. The N benchmark used Sybase
102,023 tpmC or a low of 52,113 estimated Adaptive Server 11.9.3 while the V used
TPM. It also indicates that extreme caution version 12.0. The N benchmark server had
should be taken when using any SMP 16GB of memory to the V’s 32GB, so there
scaling tables. Of particular note is the was more memory per CPU (16/8 for the N,
difference between the N4000 and the 32/32 for the V)and both servers were fed
V2500 at 8 CPUs. Referring to the TPC by 14 client machines (HP C3000). One
web site we see that both systems used PA- could argue that more clients should have
RISC 8500 440MHZ CPUs, with the V2500 been used for the 32-CPU V benchmark
having 32 CPUs and the N4000 having 8 than were used for the 8-CPU N. All this
CPUs. If the scaling factors are relatively proves is that relying on the TPC
close, then why does the V2500 at 8 CPUs benchmark requires caution and
have 31,979 estimated TPMs while the understanding.
N4000 actually generated 49,308. We have
Buy the Latest Conference Proceedings and Find Latest Computer Performance Management 'How To' for All Platforms at www.cmg.org
provides us with a CPU MHZ SMP scaling MHZ CPU clock cycle and the SMP scaling
table. In each case, the uniprocessor CPU factors from Table 3 produces the 440 MHZ
MHZ is provided by the computer vendor. SMP SCALING table.
We simply multiply the uniprocessor MHZ
by the SMP scaling factor to arrive at
Once again we see that at 32 CPUs, the the scaling factors to a number of different
V2500 could rate at 10,475 estimated total CPU MHZ ratings. In the following table
CPU MHZ or 5,350 estimated total CPU we’ll use the AVERAGE scaling factor
MHZ, depending on the scaling curve. column to create the CPU MHZ TABLE
Another way to use the scaling factors in USING AVERAGE SCALING FACTORS.
Table 3 is to choose one method and apply
Buy the Latest Conference Proceedings and Find Latest Computer Performance Management 'How To' for All Platforms at www.cmg.org
100 MHZ 180 MHZ 200 MHZ 240 MHZ 400MHZ 440MHZ 550 MHZ
1 100 180 200 240 400 440 550
2 197 354 393 472 787 865 1082
3 290 522 580 696 1161 1277 1596
4 381 685 761 913 1522 1674 2093
5 468 842 936 1123 1871 2058 2573
6 552 994 1104 1325 2208 2429 3036
Join over 14,000 peers - subscribe to free CMG publication, MeasureIT(tm), at www.cmg.org/subscribe
The above table provides a quick way to MHZ CPUs is experiencing peak CPU
see the relative capacity of various CPU utilization of 99% with a single workload,
configurations using the same scaling factor and the server is upgraded to 32 CPUs, we
and CPU metric. can calculate the new peak utilization as:
Practical Uses
Whatever tables you use, estimated TPM or
CPU MHZ can be used to compare different
RISC system configurations between the
same or different vendors’ models. They
can also be used to quantify not only the
success of an upgrade but also the relative
accuracy of the scaling factor. For
example, by using the previous MHZ table
we can estimate the utilization results of a
CPU upgrade. If a V2500 containing 16 440
Base Peak util. / (32-CPU Factor/16-Cpu Factor) + latent demand
The factors are taken from Table 3, and for calculation purposes the latent demand is estimated
at 10%.
Buy the Latest Conference Proceedings and Find Latest Computer Performance Management 'How To' for All Platforms at www.cmg.org
For each of the scaling possibilities the calculation is:
By measuring the peak CPU utilization after detailed explanation of the underlying
the upgrade we can tell which scaling curve concepts summarized in this paper, please
would most apply to this server/workload refer to the references noted below.
combination by how close the actual comes
to the estimate. As we see, if the workload
scales close to the quadratic model, we
would see no improvement at all! If the
actual utilization is much higher/lower than References
these estimates then we’ll need to revisit the
2% seriality factor used to create Table 3. If [TPC 1999] TPC Benchmark? C Standard
the actual post-upgrade utilization is much Specification Rev. 3.5, October 25, 1999,
lower (better) than these estimates, then a Pg. 71
lower seriality factor should be used. If
much higher, then the workload has more [Gunther 1998 ] Gunther, Neil J., The
serial instructions and a higher seriality Practical Performance Analyst, McGraw-Hill,
factor would be in store. Applying this 1998
technique is meant to be a constant iterative
effort, once again with caution taken at [Watson 1999] Cheryl Watson’s CPU Chart,
every step. However, given the wide range Watson Walker, Inc., December 1999
of possibilities, it should be possible to
construct an appropriate SMP scaling table [Patterson 1980] D.A. Patterson, D.R.
for your workload/server configurations. Ditzel, “The Case for the Reduced
Instruction Set Computer”, Computer
Architecture News, October 1980
Summary
The SMP scaling tables created in this www.tpc.org, web site of the Transaction
paper can be a quick-sizing tool for Processing Performance Council
estimating the capacity of unix systems.
However, once again we must caution www.spec.org, web site of the Standard
against blindly relying on the metrics. Performance Evaluation Corporation
The calculations allow for customized tables
for each company’s configuration, with the
stipulation that other proof of the concept be
applied, e.g., verifying the pre and post
upgrade utilization against the tables. Using
the tables for management presentations
has the advantage of simplicity and ease of
use. The table construction process and the
concept of SMP overhead can be easily
explained and readily charted for any
management presentation. For a more