0% found this document useful (0 votes)
93 views44 pages

HNAS Performance Data Collection and Analysis

HNAS Performance Data Collection and Analysis

Uploaded by

Yongbo Shuai
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
93 views44 pages

HNAS Performance Data Collection and Analysis

HNAS Performance Data Collection and Analysis

Uploaded by

Yongbo Shuai
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 44

HNAS Performance Data Collection & Analysis

January 2013

Gokula Rangarajan
File Services Competency Center,
Performance Measurement Group, Technical Operations

HDS employees and TrueNorth™ Partners only. NDA required for customers.
Technical Operations
The Hitachi Storage Advantage !
© 2011 Hitachi Data Systems
Agenda

Objective & expectations

HNAS Performance Data collection – Introduction

Prerequisites

What to collect?

Commands and tools.

How to collect?

Analysis

Summary

TechnIcal Operations
The Hitachi Storage Advantage !
2
Goal & expectations
• The goal of this presentation is explain the HNAS performance
information collection procedures and to provide analysis.
• Performance troubleshooting is hard. It takes a lot of time and
practice/experience.
• HNAS PIR analysis is complex and I am not aware of anyone
who can analyze the entire contents within a PIR. We will
always need extra hands.
• Eliminate the potential/known issues first.
• Before performing in depth PIR analysis, eliminate any storage
system side issues first.
• The information available in this presentation is for education
purposes only. Let GSC do the expert review of all the available
information.
• Not all data presented in this PPT are from the same PIR.
TechnIcal Operations
The Hitachi Storage Advantage !
3
HNAS performance data collection - Introduction

•There are couple of ways to collect performance information from an HNAS


system.
•One could run the RUSC script (pre-available in the SMU CLI), use the GUI
Performance graphs (current and some historical) or collect Performance
Information Report (shortly, PIR).
•While RUSC and performance graphs can be used to understand the performance
status of a system, it cannot be used to troubleshoot a performance issue. But they
will at least meet most customer/POC requirements in terms of performance
monitoring.
•This is where PIR comes into play. When combined with additional storage system
performance data, it is possible to identify bottlenecks and troubleshoot a
performance issue using PIR.
•But one has to remember that the statistics alone reported by PIR/PFM alone
does not tell us the whole story. One has to correlate the data collected from these
commands with the hardware platform, hardware configuration, type of workload
etc.
•Also one has to remember to collect the below information from all the cluster
TechnIcal Operations

4
nodes.
The Hitachi Storage Advantage !
Prerequisites

• See the big picture.


• Know the customer infrastructure and issue in detail.
• Entire head vs. specific FS, all vs. few clients, all vs. specific protocol, all vs.
specific client OS, all vs. specific protocol version, all vs. specific end
switches etc.
• Do SAN/Storage analysis. Bad drive, rebuilding, cable failure, CTL failure,
switch failure etc.
• Study the recent changes. Sudden vs. reoccurring issue, consistent vs. time
specific, one off vs. reproducible etc
• Collect as much as configuration specific information as possible.

TechnIcal Operations
The Hitachi Storage Advantage !
5
What information should you collect?

Below are some of the key performance related questions the GSC typically ask:

1) Provide a description of the symptom / problem [application, latency, performance, symptoms, are all
applications / clients affected, what is expected performance / behavior etc.]:
2) Is the problem reproducible?
3) What have you done thus far to troubleshoot this issue?
4) Is this client specific? Or across multiple clients?
5) Which OS is the client(s) running? [e.g. SW version and patch level ]
6) When was the problem first noticed?
7) What changes have been made to the infrastructure?
8) What protocols are being used (NFS, CIFS, iSCSI, FTP)? If NFS, what mount options are being used? If
iSCSI which initiator is being used?
9) Is Anti-Virus configured on the Bluearc / HNAS ?
10) What is the network topology between the affected client and BlueArc server?
11) Please provide port configuration showing flow-control and port counters for the ingress and egress
ports involved for each switch / router involved.
12) Can you gather a network trace from the client? Please provide a description of the associated IP's
and what was happening during the trace.
13) Is there any additional relevant information we should be aware of?
14) After all pir's have completed, please gather the system diagnostics, and if fs related, storage
diagnostics .
TechnIcal Operations
The Hitachi Storage Advantage !
6
RUSC
•The RUSC (Ruby Statistics Client) command outputs reports the aggregated
HNAS performance information for a given window at a high level.
•Queries the stats server on the HNAS (same way as the GUI) and reports the
data.
•Prerequisite is Ruby. Preinstalled in the SMU. Can be copied over to any client
system.
•Stats are grouped into various groups. Basic, protocol, disk etc.
•Can output in CSV format.
•./rusc --period 5 --header-every 20 --timestamp --csv --groups
basic,protocol,disk 192.0.2.2 >filename.csv
•The above command options will capture information related to ops/sec,
Network Megabits/sec, FC Megabits/sec, protocol level ops/sec, disk latency
etc.

TechnIcal Operations
The Hitachi Storage Advantage !
7
What information to look for in RUSC
command output?

•Ops: Overall ops/sec at the HNAS


•MMB Ld: Load on the MMB (Mercury Motherboard)
•MFB Ld: Load on the MFB (Mercury FPGA Board)
•Eth Rx, Eth Tx: The amount of megabits of data per second request from the network as a read
or write operation. Divide it by 8 for MB/sec.
•Rx (write workload) compared to Tx (read workload) indicates the type of activity and the
related workload.
•In the above example, the peak network in was about ~850MB/s compared to peak network
out of ~6MB/s. This indicates that the workload is write intensive. This data can also be used to
identify a network or node limitations.
•For example, an 3090 without Performance Accelerator can only sustain 850MB/s of writes and
reaches system limitations at that rate.
•Ops iSCSI/NFS/CIFS: The number of NFS/CIFS/HTTP operations per second (ops/sec).
TechnIcal Operations
The Hitachi Storage Advantage !
8
•FC Rx, FC Tx: The amount of megabits of data per second request from the HNAS to the
backend storage system as a read or write operation. Divide it by 8 for MB/sec.
•Rx (read from storage system) compared to Tx (write to storage system) indicates the type
of activity and the related workload.
•In the above example, the peak FC out was about ~953MB/s compared to no FC in. This
indicates that the workload to the storage system is all writes. This data can also be used to
identify a FC or any storage limitations.
•FC Tx can be higher than Eth Rx because of the Superflush phenomenon.
•Remember the Tachyon limitations from the previous presentations? Be aware of peak write
and read rates (MB/s) of per HNAS FC port. For example, an 3090-G1 without Performance
Accelerator can only sustain 160MB/s of write MB/s per FC port and reaches system
limitations at that rate.
•Dsk Rd Lt, Dsk Wr Lt, Dsk St Lt: This is the disk latency in millisec broken down by reads,
single buffer writes and stripe writes (superflush). More details in PIR section.
TechnIcal Operations
The Hitachi Storage Advantage !
9
Performance Information Report (PIR)
•The PIR (Performance Information Report) command gathers detailed performance statistics from
the server over a specified period, and then sends them out via email (can also be extracted using
SSC).
•A file system of interest should be specified on the command line, so that the performance
statistics will relate to the cluster node that is hosting that file system.
•Use the cn or pn prefix to the command to force the PIR to run on a particular node. Or make sure
to migrate the Admin EVS to the correct node (that’s hosting the FS in question).
•PIR collects several files, including performance statistics, event log, configuration information etc.
•Pir –f fsname –r [email protected] –s subject 10 will capture performance statistics over 10
minute window on a specific file system and email it out.
•If a specific FS in question resides on node 3, then use “cn 3 Pir –f fsname –r [email protected]
–s subject 10”
•If the SMTP is not configured/available, capture PIR from the SMU cli and save it to the SMU. Then
extract the file from the SMU and email it out. From the SMU CLI, run: ssc -u supervisor -p
supervisor 192.0.2.x "pir -f fsname --to-ssc 10“
•When troubleshooting a issue that occurs across two cluster nodes, capture PIR from both the
nodes. Open two SMU CLI sessions. In session 1, run: ssc -u supervisor -p supervisor 192.0.2.x "pir
-f fsname --to-ssc 10“. Replace 192.0.2.x with private IP of node 1. In session 2, run: ssc -u
supervisor -p supervisor 192.0.2.y "pir -f fsname --to-ssc 10“. Replace 192.0.2.y with private IP of
node 2.
TechnIcal Operations
The Hitachi Storage Advantage !
10
What files are included in PIR?
•PIR collects several files, including performance statistics, event log, configuration
information etc. It varies between Titan and Mercury platform. It changes (addition of
more files) in each major release. Below are some of the files in the Mercury
(3090/3090) platform.
•loggedstatistics.csv
•A comma separated list of various, mainly, performance related metrics gathered over the
PIR duration (and some over 24-hour period). The first file to look at to give you an overview
of what is happening.
• command-output.txt .
• the second file to examine containing the most granular performance data

•showall.txt
•The third file to look at. A collection of diagnostic and configuration summary command line
outputs.
• old-statistics.csv
• historical samples from prior to the pir period sampling period.
• old-command-output.txt.
• historical samples from prior to the pir period
TechnIcal Operations
The Hitachi Storage Advantage !
11
What files are included in PIR?
•eventlog.txt
•A High level list of events that occurred on the node, spanning reboots. Each
entry is tagged by an Event ID and a Severity level.
•trouble.txt
•The output from the trouble fault reporters (performance reporters are not
included)
•vlsishowall.txt
•Contains the output from the vlsishowall command – includes the combined
output of nim-info, fsm-info and sim-info.
•scsidiags.txt
•A point in time summary of attached SCSI devices by port

•registry.tgz
•Machine-readable configuration files (not directly editable)
•current_dblog.txt
•In-depth log of what was occurring on the Mercury.
TechnIcal Operations
The Hitachi Storage Advantage !
12
What information to look for in PIR? –
loggedstatistics.csv
• Operations received per second: ops/sec
received by this physical node.
• Forwarded ops/sec: ops/sec forwarded by other
nodes to this physical node.
• Ops/sec: ops/sec processed on this physical
node (own ops/sec and forwarded by other
nodes)
• Operations forwarded per second: ops/sec
forwarded by this node to other physical node.
• SMB: SMB (typically by Windows 2003 and
below) ops/sec
• SMB2: SMB2 (typically by Windows 2008 and
above) ops/sec
• NFS/iSCSI/FTP: Similar ops/sec data broken
down by different protocols.

TechnIcal Operations
The Hitachi Storage Advantage !
13
What information to look for in PIR? –
loggedstatistics.csv
• MMB Load: Processor busy rate on the MMB. Will
typically be busy for CIFS, iSCSI, FTP, NFSv4, Dedupe
type workloads. >90% busy a concern.
• MFB load: Busy rate of a busiest FPGA on the MFB.
>90 busy a concern.

• FC Rx, Tx: Amount of megabits of data per second


request from the HNAS to the backend storage system
as a read or write operation. Divide it by 8 for
MB/sec.
• Rx (read from storage system) compared to Tx (write
to storage system) indicates the type of activity and
the related workload.
• In the above example, the peak FC out was about
~1GB/s compared to ~450MB/s FC in. This indicates
that the workload to the storage system is about 70%
writes and 30% reads. This data can also be used to
identify a FC or any storage limitations.
• Remember the Tachyon limitations from the previous
presentations? Be aware of peak write and read rates
(MB/s) of per HNAS FC port.
• For example, an 3090-G2 without Performance
Accelerator can only sustain 360MB/s of write MB/s or
400MB/s of reads MB/s per FC port and reaches
system limitations at that rate. In this case we are
handling about 1.45GB/s of data at the FC layer. Pretty
TechnIcal Operations
close to node limitations.
The Hitachi Storage Advantage !
14
What information to look for in PIR? –
loggedstatistics.csv
• Disk read latency (ms): Average read latency across all read
requests. From the time it leaves HNAS and returns.
• Disk write latency: Average latency of single buffer writes
across all (non-Superflush’ed writes).
• Disk stripe write latency: Average latency of multi buffer
writes across all (Superflush’ed writes).
• Write latency of Superflush’ed writes will always be higher
(because of bigger I/O sizes). More details later.
• Eth Rx, Eth Tx: The amount of megabits of data per second
request from the network as a read or write operations.
Divide it by 8 for MB/sec.
• Rx (write workload) compared to Tx (read workload)
indicates the type of activity and the related workload.
• In the above example, the peak network in was about
~810MB/s compared to peak network out of ~6MB/s. This
indicates that the workload is write intensive. This data can
also be used to identify a network or node limitations.
• For example, an 3090 without Performance Accelerator can
only sustain 850MB/s of writes and we almost reached
system limitations at that rate. Consider using Perf Accel in
such scenarios
• HSSI: Cluster interconnect date.
• NVRAM waited allocs: Number of times have we not had
NVRAM memory available when we really needed it. Good
indicator of storage contentions.
• In this case we were already at node limitations. So this is
TechnIcal Operations expected.
The Hitachi Storage Advantage !
15
What information to look for in PIR? –
loggedstatistics.csv
• nibrx_seeq_rx_bytes_last_second: Bytes of data
received over the last second on this interface.

• nibrx_seeq_tx_bytes_last_second: Bytes of data


transmitted over the last second on this
interface.
• nibrx_seeq_rx_bytes_last_second_max: Highest
number of bytes of data received on this
interface (since the PIR start time)

• In a 3090, the 6x1GbE interfaces are identified


here as 0-5. The 2x10GbE interfaces are
identified here as 6 and 7.

• The data here shows that both the 10GbE


interfaces were equally loaded and balanced.
One port peaked at about 510MB/s and other
port @ about 460MB/s.

TechnIcal Operations
The Hitachi Storage Advantage !
16
What information to look for in PIR? –
loggedstatistics.csv

• Bossock fibers: Number of active process on the FS


layer. Default up to 384 (varies by software release) and
maximum extendable to 512. GSC deciding authority.
• NFSvs write/read length average: Average NFS write and
read I/O size. From 8.1, HNAS supports max 64KB for
NFSv3. Bigger transfer sizes like 32/64KB results in more
MB/s.
• Busy_clocks_last_second_percentage: Busy rate of
various FPGAs.
• Si_Busy_clocks_last_second_percentage: A good
indicator to recommend Performance Accelerator.
Values over 90% will benefit from Perf Accel.
• Obj_root_onode_cache_page: Root onode is the top
most block of an object. The more we cache it, the
more better performance. TFS might help in <90% hit
environments.

TechnIcal Operations
The Hitachi Storage Advantage !
17
What information to look for in PIR? –
loggedstatistics.csv
• FC read/write ops/sec: ops/sec at the FC layer. The amount of
overall IOPS that the storage system will see from the HNAS.
• Current disk requests: Overall number of disk requests across
the entire node (includes both read and writes)
• Current disk reads: Overall disk read requests
• Current disk writes: Overall disk write requests.
• Device current disk reads: Number of disk read request on a
particular SD (LUN) (Max-=32)
• Device current disk writes: Number of disk write request on a
particular SD (LUN) (Max-=32)
• Queued disk reads/writes: Overall number of queued disk
requests. For reads, values of <100 per SD is acceptable. >100
per SD results in higher read latency.
• For writes, it’s tricky. Drive characteristics and RAID type plays
a major role. In general few hundreds of queued writes per SD
is acceptable. In thousands is a definite concern.
• More queued writes results in more NVRAM waiting allocs and
thus write smoothing.
• Si_tach_bytes_read_last_sec: Bytes of data received from the
storage system over the last second on this FC interface.
• Si_tach_bytes_written_last_sec: Bytes of data written to the
storage system over the last second on this FC interface.
• Tach 0 1 2 3 refers to the 4 FC ports in a 3090/3080.
• Remember the per port FC limits? Use that knowledge to see if
we are reaching per port limits in the PIR.
TechnIcal Operations
The Hitachi Storage Advantage !
18
What information to look for in PIR? –
command-output.txt
•Command-output is a text file that contains more granular performance information.
•Tools like Notepad++, Wordpad works well.
•There’s a table of contents at the end of the file. Use the TOC to look for specific items.
•It displays the CLI commands that it used internally to collect the output. Some of
those commands can be run manually without the DEV password.
•A lot of data in this file can be understood only by the Developers and maybe the
Engineering. So don’t feel bad.
•Depending on the software release version, the data is arranged in this order:
•Trouble
•NIM software
•NIM stats
•FSM software
•FSM stats
•FSM protocols
•FSM Protocol errors
•FSM vlsi
•FSM per-file-system info
•Sim software
•Sim storage
•Cluster stats

TechnIcal Operations
The Hitachi Storage Advantage !
19
What information to look for in PIR? –
command-output.txt

•The output from trouble fault reports (performance specific) are displayed in this section.
•It highlights the counters/issues that stands out from others.
•Any counters having much higher or lower values than the normal is highlighted here.
•For example, it says the “read-ahead” tuning is disabled, but we are issuing several read requests. In such occasions, turning on read-ahead
could help.
•Another example is the read/modify/write. For example, a application writes 32KB files but later updates a part of the file (like 4KB or 8KB)
very frequently. This creates read/modify/writes and affects the write performance.
•Another example is the write and read size. It says that most of the client writes are about 16KB, but it considers >30KB to be a good value.
•Some of the FPGAs in this example are about 90% busy.
TechnIcal Operations
The Hitachi Storage Advantage !
20
What information to look for in PIR? –
command-output.txt

•This section displays the Ethernet statistics broken


down by individual ports.
•It contains both throughput information and Ethernet
related issues.
•Look for jabbers, collisions, CRC errors, packets
dropped counters for potential hardware and network
issues.

TechnIcal Operations
The Hitachi Storage Advantage !
21
What information to look for in PIR? –
command-output.txt

•This section displays the TCP statistics broken


down by active aggregates.
•It contains both packets information and TCP
related issues.
•Keep a eye on RSTs and RETXs.
•More retransmissions RETX (even 1%) can
create performance issues in sequential
workload environments.
•Look for netstat –st on the client side and see
who’s creating trouble.

TechnIcal Operations
The Hitachi Storage Advantage !
22
What information to look for in PIR? –
command-output.txt

•This section displays the NFS performance statistics broken down by different NFS operations at the network layer.
•The samples here refer to the ops across the whole PIR capture window (like 10 minutes)
•It reports the ops for different NFS operations, average and peak latency.
•A good indicator of what kind of NFS workload does the HNAS is handling.
•In this example, it is handling a lot of NFS metadata operations.
•The response times observed here is what the clients should expect to see as well.
•Look for similar information for CIFS protocol as well.
TechnIcal Operations
The Hitachi Storage Advantage !
23
What information to look for in PIR? –
command-output.txt

•This section displays the amount of data transferred by the NFS clients .
•The samples here refer to the ops across the whole PIR capture window (like 10 minutes).
•It reports the ops for different NFS operations, amount of read data, write data, average I/O size and maximum I/O
size.
•A good indicator of what amount of data does the HNAS is handling during a given window.
•Look for similar information for CIFS protocol as well.
•In this example, it received (write workload) about 24GB of data from the clients and transferred (read workload)
about 11GB of data back to the clients.
•The max= values should be 64KB (at least 32KB). Check the NFS mount options and make sure they are using
correct NFS mount options.
•Average write/read size is application dependent. Smaller read or write sizes (like 4KB) turns the workloads to be
random at the HNAS. Will result in more ops/sec, but less MB/s.
•Larger IO sizes (like 32KB) turns the workloads to be mostly sequential (assuming FS is not fragmented and better
storage configuration) at the HNAS. Will result in less ops/sec, but more MB/s.

TechnIcal Operations
The Hitachi Storage Advantage !
24
What information to look for in PIR? –
command-output.txt

•This section displays the per connection (of a client) TCP statistics and displays information up to 50 connections.
•Identify your rogue client systems from this section. One or few clients may be doing much more operations than
the others and affecting the entire performance. This is the only place (or command) to identify such clients.
•If you are working on a POC with performance tests, make sure all the clients send uniform workload to the HNAS
for optimal performance.
•This section also displays connections with highest error counts like duplicate ACKs, Retransmits packets etc. Check
the netstat –st command output on those client systems and troubleshoot the issue.

TechnIcal Operations
The Hitachi Storage Advantage !
25
What information to look for in PIR? –
command-output.txt

•This section displays the network statistics at


the software/FPGA layer.
•Good place to identify Jumbo packets and any
other possible network issues.

TechnIcal Operations
The Hitachi Storage Advantage !
26
What information to look for in PIR? –
command-output.txt

•MAC_PAUSE is a good indicator of


flow control issues.
•Make sure the switches have flow
control enabled for both Tx and Rx.
•No specific setting on HNAS to turn
off flow control. It simply
acknowledge flow control requests.
•“ethtool -A ethxx rx on tx on” will
enable flow control on the Linux
clients.
•Some switches that have 1GbE
ports for client connectivity and
10GbE ports for HNAS connectivity
are known to create flow control
issues.

TechnIcal Operations
The Hitachi Storage Advantage !
27
What information to look for in PIR? –
command-output.txt

•LOTS of good statistics in this


section.
•Gives statistics of the entire system
(hardware, software, network, FS,
protocol, storage etc..)

TechnIcal Operations
The Hitachi Storage Advantage !
28
What information to look for in PIR? –
command-output.txt

•This section displays information about all


the active open files.
•Handy in finding “how well the HNAS is being
used”

TechnIcal Operations
The Hitachi Storage Advantage !
29
What information to look for in PIR? –
command-output.txt
•This section displays detailed
information about all the NFS
operations.
•A good indicator of what type of NFS
workloads the HNAS is handling during a
given window.
•This PIR was captured for 3 minutes. In
this example, the workload is metadata
heavy. At the same time it is also disk
intensive.
•Overall writes+reads+creates+removes
= 4,572,101 operations for 3 minutes or
~25K disk intensive ops/sec. (Whew..).
25k is the potential disk ops/sec in this
example, but not actuals.
•We would Superflush the writes and
thus result in less write ops/sec.
•Look for similar information for other
protocols as well.

TechnIcal Operations
The Hitachi Storage Advantage !
30
What information to look for in PIR? –
command-output.txt

•This section displays detailed


information per FS performance
statistics, the amount of read/write
ops/sec, read/write MB/s etc.
•A good way of finding which are the
most busiest and least busiest FS,
uniform vs. skewed workload etc.
•This PIR was captured for 3 minutes. In
this example, the average load was 85K
ops/sec and is read intensive.
•Notice that read ops/sec is 72k, but the
MB/s is only 263MB/s (meaning small
read I/O sizes). But the write ops/sec is
just 13k, but the write MB/s is 123MB/s
(meaning large I/O sizes)

TechnIcal Operations
The Hitachi Storage Advantage !
31
What information to look for in PIR? –
command-output.txt

•This section displays detailed information about the load on different hardware components (MMB
CPU, MFB FPGA’s etc).
•Values over >90% indicate a very busy system.

TechnIcal Operations
The Hitachi Storage Advantage !
32
What information to look for in PIR? –
command-output.txt

•This section displays detailed


information about NVRAM and related
counter like nvram waitied allocs, write
smoothing, checkpoints etc.
•NVRAM waited allocs: Number of times
have we not had NVRAM memory
available when we really needed it. Good
indicator of storage contentions.

TechnIcal Operations
The Hitachi Storage Advantage !
33
What information to look for in PIR? –
command-output.txt

•This section displays detailed


information about root onode cache hit,
miss, read modify write operations etc.
•Root onode is the top most block of an
object. The more the HNAS caches it, the
higher/better the performance.
•TFS might help in low cache hit systems.
(<90%)
•Read/modify/write occurs when a
application writes for example, 32KB files
but later updates a part of the file (like
4KB or 8KB) frequently. This creates
read/modify/writes and affects the write
performance.
•Good example of such use cases are
Vmware, iSCSI etc.

TechnIcal Operations
The Hitachi Storage Advantage !
34
What information to look for in PIR? –
command-output.txt

•This section has a good indicator of


possible free space fragmentation.
•Free Space Allocator (FSA) is
responsible for allocating free blocks. It
has to find out the free blocks
information from the disks (from
bitmap). The more it searches for free
space (cache miss), the slow the
performance.
•Blocks searched vs. allocated is a good
indicator of fragmentation. If searched is
2x/3x higher than allocated, we are
slowing down writes (thus performance).

TechnIcal Operations
The Hitachi Storage Advantage !
35
What information to look for in PIR? –
command-output.txt

•This section provides detailed information about overall storage ops, average and peak response
times of the entire system (including all FS and SD’s).
•There are FS specific storage statistics and per SD specific statistics available in the PIR as well.
•Drill down into FS or SD specifics for accurate information.
•This PIR was captured over 3 minutes. During the 3 minute window, the average read response
time was 13ms, single buffer write response time was 45ms and multi buffer write response time
was 54ms. In general, these are good numbers.
•Samples are the storage ops/sec and a good indicator of read vs. write intensive workloads from
the HNAS to the storage system.

TechnIcal Operations
The Hitachi Storage Advantage !
36
What information to look for in PIR? –
command-output.txt

•This section provides detailed information about storage ops/sec, average and peak response
times of a given system drive.
•This PIR was captured over 3 minutes. During the 3 minute window, the average read response
time was 13ms, single buffer write response time was 43ms and multi buffer write response time
was 58ms. In general, these are good numbers.
•Samples are the storage ops/sec and a good indicator of read vs. write intensive workloads from
the HNAS to the storage system.
•A good information for future sizing as well. For example, 243,377 ops over 3 minutes = 1352
read ops/sec for this SD. This is equivalent (or pretty close) to 1352 LUN read IOPS at the storage
system.
•But the writes are split into single and multi buffer writes (Superflush). Watch it carefully.
TechnIcal Operations
The Hitachi Storage Advantage !
37
What information to look for in PIR? –
command-output.txt

• A good indicator of how well the


HNAS hports and storage tports are
balanced.
•Use sdpath accordingly. (set or
rebalance)

TechnIcal Operations
The Hitachi Storage Advantage !
38
What information to look for in PIR? –
command-output.txt

• A good indicator of how well the


Superflush feature is working.
•>75% is excellent.
•~50% is decent.
•<50% is when we start to see slight
performance issues.
•<25% is when much of the storage
workload turns into random.
•<10% is the worst case.

TechnIcal Operations
The Hitachi Storage Advantage !
39
What information to look for in PIR? –
command-output.txt

• This section provides detailed


information about per SD disk requests,
queued, current and maximum values.
• Most performance issues occur due to
poor storage layout.
• Queued disk reads/writes: Overall
number of queued disk requests. For
reads, values of <100 per SD is fine. >100
per SD results in higher read latency.
• For writes, it’s tricky. Drive characteristics
and RAID type plays a major role. In
general few hundreds of queued writes
per SD is acceptable. In thousands is a
concern.
• More queued writes results in more
NVRAM waiting allocs and thus write
smoothing.

TechnIcal Operations
The Hitachi Storage Advantage !
40
What are the nominal LUN response times?
(from storage system perspective)
• Response time varies by the cache hit rate, RAID type, disk characteristics, thread count per disk
(command tags) etc. at the storage system.
• Below tables can be used a reference for different drive types, RAID types and in a 0% cache hit
scenario for 100% random workloads, 8KB I/O size.
• But be aware that when Superflush is in play, the average write I/O will be >256KB and that results in
higher response times between HNAS and the storage system. But the clients will NOT see higher
response time in such scenarios.

TechnIcal Operations
The Hitachi Storage Advantage !
41
What information to look for in PIR? –
showall.txt
•showall.txt file includes few diagnostics and complete configuration summary
command line outputs.
•Few key things to review are:
•Hardware model and revision (Gen1, Gen2)
•Software version
•Number of nodes, EVS
•Installed license keys
•Network configuration and advanced IP configuration
•System drive, host path, target path, LUN mapping details
•Queue depth settings
•System drive group, storage pool details
•Number of Fs and where it resides (SP, EVS etc.)
•And, many more
•Use the information from this file to understand how the HNAS is configured.
•While PIR identify possible issues, correlate that information with configuration
summary and see if the issue is due to a misconfigured system.

TechnIcal Operations
The Hitachi Storage Advantage !
42
Summary
• As you could see, HNAS PIR analysis can be complex and time consuming. I
have only covered some of the key areas that are of highest concern.
• There are more sections within the PIR that needs additional analysis. Let
GSC do the expert review of all the available information.
• But the areas that I have highlighted so far will eliminate quite a few issues.
• Always combine PIR analysis with storage system performance data analysis
as well.
• Be aware of the hardware platform and hardware configuration limitations.
• Practice it with multiple PIRs and see if you spot any issues. The more you
start to review, higher the confidence.
• The attached PIR was captured during a SPECsfs test for a target load of 95K
NFS ops/sec (the busiest it could get). It will be a good learning exercise for
you.

TechnIcal Operations
The Hitachi Storage Advantage !
43
Thank You

Technical Operations
The Hitachi Storage Advantage !
44

You might also like