100% found this document useful (1 vote)
681 views556 pages

Netezza System Admin Guide

Uploaded by

Pily Gonzalez
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
100% found this document useful (1 vote)
681 views556 pages

Netezza System Admin Guide

Uploaded by

Pily Gonzalez
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 556

IBM Netezza

Release 7.2.1.1

IBM Netezza System Administrator’s


Guide

IBM
Note
Before using this information and the product it supports, read the information in “Notices” on page D-1

Revised: December 4, 2015


This edition applies to IBM Netezza Release 7.2.1.1 and to all subsequent releases until otherwise indicated in new
editions.
© Copyright IBM Corporation 2001, 2015.
US Government Users Restricted Rights – Use, duplication or disclosure restricted by GSA ADP Schedule Contract
with IBM Corp.
Contents
Figures . . . . . . . . . . . . . .. ix The nzsql command . . . . . . . . .. 3-5
NzAdmin overview . . . . . . . . . .. 3-12
Tables . . . . . . . . . . . . . .. xi Starting the NzAdmin tool. . . . . . .. 3-12
Displaying system components and databases 3-13
Using the System view . . . . . . . .. 3-14
Electronic emission notices . . . .. xiii Status indicators . . . . . . . . . .. 3-15
Main menu . . . . . . . . . . . .. 3-16
Regulatory and compliance . . . .. xvii Administration commands. . . . . . .. 3-16
Setting an automatic refresh interval . . .. 3-17
About this publication . . . . . . .. xix Netezza Performance Portal overview . . . .. 3-18
If you need help . . . . . . . . . . .. xix
How to send your comments . . . . . . .. xix Chapter 4. Manage Netezza HA
systems . . . . . . . . . . . . .. 4-1
Chapter 1. Administration overview 1-1 Linux-HA and DRBD overview . . . . . .. 4-1
Administrator’s roles . . . . . . . . . .. 1-1 Differences with the previous Netezza HA solution 4-2
Administration tasks . . . . . . . . . .. 1-1 Linux-HA administration . . . . . . . .. 4-2
Initial system setup and information . . . . .. 1-1 Heartbeat configuration . . . . . . . .. 4-3
Netezza software directories . . . . . .. 1-2 Cluster Information Base. . . . . . . .. 4-3
External network connections . . . . . .. 1-5 Important information about host 1 and host 2 4-3
Domain Name Service (DNS) Updates . . .. 1-5 Failover timers . . . . . . . . . . .. 4-4
Remote access . . . . . . . . . . .. 1-8 Netezza cluster management scripts . . . .. 4-4
Administration interfaces . . . . . . . .. 1-8 Active and standby nodes . . . . . . .. 4-5
Other Netezza documentation . . . . . . .. 1-9 Cluster and resource group status . . . .. 4-5
The nps resource group . . . . . . . .. 4-7
Chapter 2. Netezza client software Failover criteria . . . . . . . . . . .. 4-7
Relocate to the standby node . . . . . .. 4-8
installation. . . . . . . . . . . .. 2-1
Safe manual control of the hosts and Heartbeat 4-8
Client software packages. . . . . . . . .. 2-1
Transitioning to maintenance (non-Heartbeat)
Install the Netezza CLI client on a Linux/UNIX
mode . . . . . . . . . . . . . .. 4-9
system . . . . . . . . . . . . . . .. 2-2
Transitioning from maintenance to clustering
Installing on Linux/UNIX Clients. . . . .. 2-2
mode . . . . . . . . . . . . . .. 4-10
Path for Netezza CLI client commands . . .. 2-6
Configuring Cluster Manager events . . .. 4-11
Removing the CLI clients from UNIX systems 2-6
Logging and messages . . . . . . . .. 4-12
Install the Netezza tools on a Windows client . .. 2-6
DRBD administration . . . . . . . . .. 4-12
Installation requirements. . . . . . . .. 2-7
Monitor DRBD status . . . . . . . .. 4-13
Installing the Netezza tools . . . . . . .. 2-7
Sample DRBD status output . . . . . .. 4-14
Environment variables . . . . . . . .. 2-8
Detecting split-brain . . . . . . . . .. 4-14
Removing the IBM Netezza tools . . . . .. 2-8
Administration reference and troubleshooting .. 4-15
Clients and Unicode characters . . . . . .. 2-9
IP address requirements . . . . . . .. 4-15
Client timeout controls . . . . . . . . .. 2-10
Force Heartbeat to shut down . . . . .. 4-16
Netezza port numbers . . . . . . . . .. 2-10
Shut down Heartbeat on both nodes without
Changing the default port numbers . . . .. 2-11
causing relocate . . . . . . . . . .. 4-16
Non-default NPS port numbers for clients .. 2-12
Restart Heartbeat after maintenance network
Encrypted passwords . . . . . . . . .. 2-13
issues . . . . . . . . . . . . . .. 4-16
Stored passwords . . . . . . . . . . .. 2-14
Resolve configuration problems . . . . .. 4-16
Fixed a problem, but crm_mon still shows
Chapter 3. Netezza administration failed items . . . . . . . . . . . .. 4-17
interfaces . . . . . . . . . . . .. 3-1 Output from crm_mon does not show the nps
Netezza CLI overview . . . . . . . . .. 3-1 resource group. . . . . . . . . . .. 4-17
Commands and locations . . . . . . .. 3-1 Linux users and groups required for HA . .. 4-17
Command locations . . . . . . . . .. 3-3 Checking for user sessions and activity . .. 4-18
Command syntax . . . . . . . . . .. 3-3
Issuing commands . . . . . . . . . .. 3-4 Chapter 5. Manage the Netezza
Identifiers in commands . . . . . . . .. 3-4 hardware . . . . . . . . . . . .. 5-1
SQL command overview . . . . . . . . .. 3-5
Netezza hardware components . . . . . .. 5-1

© Copyright IBM Corp. 2001, 2015 iii


Display hardware components . . . . . .. 5-2 AekSecurityEvent. . . . . . . . . . .. 6-15
Hardware types. . . . . . . . . . .. 5-4
Hardware IDs . . . . . . . . . . .. 5-5 Chapter 7. Manage the Netezza server 7-1
Hardware location . . . . . . . . . .. 5-5 Software revision levels . . . . . . . . .. 7-1
Hardware roles . . . . . . . . . . .. 5-8 Display the Netezza software revision . . .. 7-1
Hardware states . . . . . . . . . .. 5-9 Display the software revision levels . . . .. 7-1
Data slices, data partitions, and disks . . .. 5-11 System states . . . . . . . . . . . .. 7-2
IBM Netezza Storage Design . . . . . .. 5-12 Display the current system state . . . . .. 7-2
Netezza C1000 Storage Design . . . . .. 5-13 System states reference . . . . . . . .. 7-4
IBM PureData System for Analytics N3001-001 Wait for a system state . . . . . . . .. 7-5
storage design . . . . . . . . . . .. 5-15 Manage the system state . . . . . . . . .. 7-6
System resource balance recovery . . . .. 5-17 Start the system. . . . . . . . . . .. 7-6
Hardware management tasks . . . . . . .. 5-18 Stop the system . . . . . . . . . . .. 7-7
Callhome file . . . . . . . . . . .. 5-19 Pause the system . . . . . . . . . .. 7-7
Display hardware issues . . . . . . .. 5-19 Resume the system . . . . . . . . .. 7-7
Manage hosts . . . . . . . . . . .. 5-19 Take the system offline . . . . . . . .. 7-8
Manage SPUs . . . . . . . . . . .. 5-20 Restart the system . . . . . . . . . .. 7-8
Manage disks . . . . . . . . . . .. 5-23 Overview of the Netezza system processing .. 7-8
Manage data slices . . . . . . . . . .. 5-25 System states when Netezza starts . . . .. 7-10
Display data slice issues . . . . . . .. 5-26 System errors . . . . . . . . . . . .. 7-11
Monitor data slice status . . . . . . .. 5-26 System logs. . . . . . . . . . . . .. 7-12
Regenerate a data slice . . . . . . . .. 5-27 Backup and restore manager . . . . . .. 7-12
Rebalance data slices . . . . . . . .. 5-29 Bootserver manager . . . . . . . . .. 7-13
Active path topology . . . . . . . .. 5-29 Client manager . . . . . . . . . .. 7-13
Handle transactions during failover and Database operating system . . . . . .. 7-13
regeneration . . . . . . . . . . .. 5-31 Event manager . . . . . . . . . .. 7-14
Automatic query and load continuation . . .. 5-32 Flow communications retransmit . . . .. 7-15
Power procedures . . . . . . . . . .. 5-32 Host statistics generator . . . . . . .. 7-15
PDU and circuit breakers overview . . . .. 5-32 Load manager . . . . . . . . . . .. 7-15
Powering on the IBM Netezza 1000 and IBM Postgres . . . . . . . . . . . . .. 7-15
PureData System for Analytics N1001 . . .. 5-34 Session manager . . . . . . . . . .. 7-16
Powering off the IBM Netezza 1000 or IBM SPU cores manager . . . . . . . . .. 7-16
PureData System for Analytics N1001 system . 5-35 Startup server . . . . . . . . . . .. 7-16
Powering on the IBM PureData System for Statistics server . . . . . . . . . .. 7-17
Analytics N200x . . . . . . . . . .. 5-37 System Manager . . . . . . . . . .. 7-17
Powering off the IBM PureData System for The nzDbosSpill file . . . . . . . . .. 7-17
Analytics N200x system . . . . . . .. 5-38 System configuration . . . . . . . . .. 7-18
Powering on the IBM Netezza High Capacity Displaying the software revision level and
Appliance C1000 . . . . . . . . . .. 5-39 configuration registry settings . . . . .. 7-18
Powering off the IBM Netezza High Capacity Changing configuration settings . . . . .. 7-19
Appliance C1000 . . . . . . . . . .. 5-40
Powering on IBM PureData System for
Chapter 8. Event rules . . . . . . .. 8-1
Analytics N3001-001 . . . . . . . . .. 5-41
Template event rules . . . . . . . . . .. 8-1
Powering off IBM PureData System for
Event rule management . . . . . . . . .. 8-5
Analytics N3001-001 . . . . . . . . .. 5-42
Copy a template event to create an event rule 8-5
Copy and modify a user-defined event rule .. 8-6
Chapter 6. About self-encrypting Generate an event . . . . . . . . . .. 8-6
drives . . . . . . . . . . . . . .. 6-1 Delete an event rule . . . . . . . . .. 8-6
Locking the SEDs . . . . . . . . . . .. 6-1 Disable an event rule . . . . . . . . .. 8-7
Unlocking the SEDs . . . . . . . . . .. 6-3 Add an event rule . . . . . . . . . .. 8-7
SED keystore . . . . . . . . . . . .. 6-3 Event match criteria . . . . . . . . .. 8-7
IBM Security Key Lifecycle Manager configuration Event rule attributes . . . . . . . . .. 8-11
steps . . . . . . . . . . . . . . .. 6-4 Event notifications . . . . . . . . .. 8-12
SED authentication keys . . . . . . . . .. 6-7 The sendMail.cfg file . . . . . . . .. 8-12
Generate authentication keys . . . . . .. 6-8 Event email aggregation . . . . . . .. 8-14
List authentication keys . . . . . . . .. 6-8 Creating a custom event rule . . . . . .. 8-16
Check authentication keys . . . . . . .. 6-9 Template event reference . . . . . . . .. 8-17
Extract authentication keys . . . . . .. 6-12 System state changes . . . . . . . .. 8-17
Change the host authentication key . . . . .. 6-12 Hardware service requested . . . . . .. 8-18
Resume host AEK key change . . . . .. 6-13 Hardware needs attention . . . . . . .. 8-20
Change the SPU authentication key. . . . .. 6-14

iv IBM Netezza System Administrator’s Guide


Hardware path down . . . . . . . .. 8-20 Chapter 11. Security and access
Hardware restarted . . . . . . . . .. 8-22 control . . . . . . . . . . . . .. 11-1
Disk space threshold notification . . . .. 8-22 Netezza database users and user groups . . .. 11-1
Runaway query notification . . . . . .. 8-24 Access model . . . . . . . . . . .. 11-3
System state . . . . . . . . . . .. 8-25 Default Netezza groups and users . . . .. 11-3
Disk predictive failure errors event . . . .. 8-25 User authentication method . . . . . .. 11-4
Regeneration errors . . . . . . . . .. 8-26 Password content controls and reuse . . .. 11-5
Disk errors event . . . . . . . . . .. 8-27 Create Netezza database users . . . . .. 11-7
Hardware temperature event . . . . . .. 8-28 Alter Netezza database users . . . . . .. 11-8
System temperature event . . . . . . .. 8-29 Delete Netezza database users . . . . .. 11-8
History-data events . . . . . . . . .. 8-30 Create Netezza database groups . . . . .. 11-8
SPU cores event . . . . . . . . . .. 8-32 Alter Netezza database groups . . . . .. 11-9
Voltage faults event . . . . . . . . .. 8-33 Delete Netezza database groups . . . . .. 11-9
Transaction limits event. . . . . . . .. 8-33 Security model. . . . . . . . . . . .. 11-9
Switch port events . . . . . . . . .. 8-34 Administrator privileges . . . . . . .. 11-10
Reachability and availability events . . . .. 8-35 Object privileges . . . . . . . . . .. 11-11
Topology imbalance event . . . . . . .. 8-35 Object privileges by class . . . . . . .. 11-12
Event types reference . . . . . . . . .. 8-36 Scope of object privileges . . . . . . .. 11-12
Display alerts . . . . . . . . . . . .. 8-36 Revoke privileges . . . . . . . . .. 11-15
Privileges by object . . . . . . . . .. 11-15
Chapter 9. About the callhome service 9-1 Inherited connection privileges . . . . .. 11-17
Set up callHome.txt . . . . . . . . . .. 9-2 Indirect object privileges . . . . . . .. 11-17
Custom callhome email subject fields . . .. 9-5 Functions that are always available . . .. 11-17
Set up the sendMail.cfg File . . . . . . .. 9-6 Creating an administrative user group . .. 11-17
Configure access to the IBM Support servers . .. 9-8 Logon authentication . . . . . . . . .. 11-18
Enable the callhome service. . . . . . . .. 9-8 Local authentication . . . . . . . .. 11-19
nzcallhome verbose output . . . . . . .. 9-9 LDAP authentication . . . . . . . .. 11-19
Enable callhome debug logging . . . . .. 9-9 Kerberos authentication . . . . . . .. 11-22
Enable PMR creation . . . . . . . .. 9-10 Commands related to authentication methods 11-27
Enable status reporting . . . . . . . .. 9-10 Passwords and logons . . . . . . . .. 11-28
Enable inventory reporting . . . . . .. 9-10 Netezza client encryption and security . . .. 11-29
Enable email-only PMR notification . . . .. 9-11 Configuring the SSL certificate . . . . .. 11-30
Display callhome service status . . . . . .. 9-11 Configure the Netezza host authentication for
Callhome event rules . . . . . . . . .. 9-12 clients . . . . . . . . . . . . .. 11-31
Callhome event severities . . . . . . . .. 9-13 Commands related to Netezza client
Blocking callhome events . . . . . . . .. 9-14 connection methods . . . . . . . .. 11-34
Information collected by callhome . . . . .. 9-14 User and group limits . . . . . . . . .. 11-34
Testing the callhome services . . . . . . .. 9-15 Password expiration . . . . . . . .. 11-35
Generate callhome events . . . . . . .. 9-15 User rowset limits . . . . . . . . .. 11-36
Callhome processing verification (HTTPS only) 9-17 Query timeout limits . . . . . . . .. 11-37
Callhome processing verification (email only) 9-17 Session timeout . . . . . . . . . .. 11-38
Sample callhome email . . . . . . . .. 9-17 Session priority . . . . . . . . . .. 11-38
Generate callhome reports . . . . . . . .. 9-19 Logging Netezza SQL information . . . . .. 11-39
Generate a system upgrade request . . . . .. 9-19 Logging Netezza SQL information on the
Disable the callhome service . . . . . . .. 9-19 server . . . . . . . . . . . . .. 11-39
Disable callhome event rules . . . . . . .. 9-20 Logging Netezza SQL information on the
Remove callhome event rules . . . . . . .. 9-21 client . . . . . . . . . . . . .. 11-39
Group public views. . . . . . . . . .. 11-40
Chapter 10. Enhanced cryptography
support . . . . . . . . . . . .. 10-1 Chapter 12. Manage user content on
NPS requirements for enhanced cryptography the Netezza appliance . . . . . .. 12-1
support . . . . . . . . . . . . . .. 10-2 Databases and user tables . . . . . . . .. 12-1
How to enable SP 800-131a support. . . . .. 10-2 Netezza database schema overview . . . .. 12-2
Enabling SP 800-131a support. . . . . . .. 10-3 Table size and storage space . . . . . .. 12-3
Updating connections for crypto support . . .. 10-5 Disk space usage in tables . . . . . . .. 12-4
Checking the connection types . . . . . .. 10-5 Database and table guidelines . . . . .. 12-5
Disabling enhanced cryptography support . .. 10-6 Row IDs in tables. . . . . . . . . .. 12-5
Enhanced cryptography troubleshooting . . .. 10-7 Transaction IDs . . . . . . . . . .. 12-5
Downgrading an SP 800-131a crypto NPS system 10-10 Distribution keys . . . . . . . . . . .. 12-6
Select a distribution key . . . . . . .. 12-7

Contents v
Criteria for selecting distribution keys . . .. 12-7 The nzbackup command . . . . . . . .. 13-11
Choose a distribution key for a subset table 12-7 Command syntax for nzbackup. . . . .. 13-12
Distribution keys and collocated joins . . .. 12-8 Specifying backup privileges . . . . .. 13-15
Dynamic redistribution or broadcasts . . .. 12-8 Examples of the nzbackup command . . .. 13-15
Verify distribution . . . . . . . . .. 12-8 Backup archive directory . . . . . . .. 13-17
Data skew . . . . . . . . . . . . .. 12-10 Incremental backups . . . . . . . .. 13-18
Specify distribution keys . . . . . . .. 12-10 Backup History report . . . . . . . .. 13-20
View data skew . . . . . . . . . .. 12-11 Back up and restore users, groups, and
Clustered base tables . . . . . . . . .. 12-12 permissions . . . . . . . . . . .. 13-21
Organizing keys and zone maps . . . .. 12-13 The nzrestore command . . . . . . . .. 13-22
Select organizing keys . . . . . . . .. 12-14 The nzrestore command syntax . . . . .. 13-23
Reorganize the table data . . . . . . .. 12-14 Specifying restore privileges . . . . . .. 13-28
Copy clustered base tables . . . . . .. 12-15 Examples of the nzrestore command . . .. 13-29
Database statistics . . . . . . . . . .. 12-15 Database statistics after restore . . . . .. 13-30
Maintain table statistics automatically . .. 12-16 Restore tables. . . . . . . . . . .. 13-30
GENERATE STATISTICS command . . .. 12-17 Incremental restoration . . . . . . .. 13-31
Just in Time statistics . . . . . . . .. 12-17 Veritas NetBackup connector . . . . . .. 13-34
Zone maps . . . . . . . . . . .. 12-18 Installing the Veritas NetBackup license . .. 13-34
Groom tables . . . . . . . . . . . .. 12-20 Configuring NetBackup for a Netezza client 13-35
GROOM and the nzreclaim command . .. 12-20 Integrate Veritas NetBackup to Netezza . .. 13-36
Identify clustered base tables that require NetBackup troubleshooting . . . . . .. 13-40
grooming . . . . . . . . . . . .. 12-21 Procedures for backing up and restoring by
Organization percentage . . . . . . .. 12-22 using Veritas NetBackup . . . . . . .. 13-40
Groom and backup synchronization . . .. 12-23 IBM Spectrum Protect (formerly Tivoli Storage
Session management . . . . . . . . .. 12-23 Manager) connector . . . . . . . . .. 13-42
The nzsession command . . . . . . .. 12-23 Tivoli Storage Manager backup integration 13-43
Transactions . . . . . . . . . . . .. 12-25 Tivoli Storage Manager encrypted backup
Transaction control and monitoring . . .. 12-25 support. . . . . . . . . . . . .. 13-43
Transactions per system . . . . . . .. 12-25 Configuring the Netezza host . . . . .. 13-43
Transaction concurrency and isolation . .. 12-26 Configure the Tivoli Storage Manager server 13-48
Concurrent transaction serialization and Special considerations for large databases 13-54
queueing, implicit transactions . . . . .. 12-26 The nzbackup and nzrestore commands with
Concurrent transaction serialization and the Tivoli Storage Manager connector. . .. 13-57
queueing, explicit transactions . . . . .. 12-27 Host backup and restore to the Tivoli Storage
Netezza optimizer and query plans . . . .. 12-28 Manager server . . . . . . . . . .. 13-57
Execution plans . . . . . . . . . .. 12-28 Backing up and restoring data by using the
Display plan types . . . . . . . . .. 12-28 Tivoli Storage Manager interfaces . . . .. 13-58
Analyze query performance . . . . . .. 12-29 Troubleshooting . . . . . . . . . .. 13-60
Query status and history . . . . . . . .. 12-30 EMC NetWorker connector . . . . . . .. 13-61
Preparing your system for EMC NetWorker
Chapter 13. Database backup and integration. . . . . . . . . . . .. 13-62
restore . . . . . . . . . . . . .. 13-1 NetWorker installation. . . . . . . .. 13-62
NetWorker configuration . . . . . . .. 13-62
General information about backup and restore
NetWorker backup and restore . . . . .. 13-64
methods . . . . . . . . . . . . . .. 13-1
Host backup and restore . . . . . . .. 13-66
Backup options overview . . . . . . .. 13-2
NetWorker troubleshooting . . . . . . .. 13-67
Database completeness . . . . . . . .. 13-3
Portability . . . . . . . . . . . .. 13-3
Compression in backups and restores . . .. 13-4 Chapter 14. History data collection 14-1
Multi-stream backup. . . . . . . . .. 13-4 Types of history databases . . . . . . . .. 14-1
Multi-stream restore . . . . . . . . .. 13-5 History database versions . . . . . . . .. 14-2
Special columns . . . . . . . . . .. 13-6 History-data staging and loading processes . .. 14-2
Upgrade and downgrade concerns . . . .. 13-6 History-data files . . . . . . . . . .. 14-3
Compressed unload and reload . . . . .. 13-7 History log files . . . . . . . . . .. 14-4
Encryption key management in backup and History event notifications . . . . . . .. 14-4
restore . . . . . . . . . . . . .. 13-7 Setting up the system to collect history data . .. 14-4
File system connector for backup and recovery 13-7 Planning for history-data collection . . . .. 14-4
Third-party backup and recovery solutions Creating a history database . . . . . .. 14-5
support . . . . . . . . . . . . .. 13-8 Creating history configurations . . . . .. 14-5
Host backup and restore . . . . . . . .. 13-9 Managing access to a history database . . .. 14-9
Create a host backup . . . . . . . .. 13-10 Managing the collection of history data . . .. 14-9
Restore the host data directory and catalog 13-10 Changing the owner of a history database .. 14-9

vi IBM Netezza System Administrator’s Guide


Dropping a history database. . . . . .. 14-10 DBMS Group . . . . . . . . . . .. 16-2
Starting the collection of history data . . .. 14-11 Host CPU Table . . . . . . . . . .. 16-3
Stopping the collection of history data . .. 14-11 Host File System Table . . . . . . . .. 16-3
Changing history configuration settings . .. 14-11 Host Interface Table . . . . . . . . .. 16-4
Displaying history configuration settings 14-12 Host Mgmt Channel Table . . . . . . .. 16-4
Dropping history configurations . . . .. 14-12 Host Network Table . . . . . . . . .. 16-5
Managing history configurations by using Host Table . . . . . . . . . . . .. 16-6
NzAdmin . . . . . . . . . . . .. 14-13 Hardware Management Channel Table. . .. 16-7
Maintaining a history database . . . . .. 14-13 Per Table Per Data Slice Table . . . . .. 16-8
History views and tables . . . . . . . .. 14-14 Query Table . . . . . . . . . . .. 16-8
$v_hist_column_access_stats . . . . . .. 14-15 Query History Table . . . . . . . . .. 16-9
$v_hist_incomplete_queries . . . . . .. 14-16 SPU Partition Table . . . . . . . . .. 16-10
$v_hist_log_events . . . . . . . . .. 14-17 SPU Table . . . . . . . . . . . .. 16-10
$v_hist_queries, $v_hist_successful_queries, System Group . . . . . . . . . .. 16-10
and $v_hist_unsuccessful_queries . . . .. 14-17 Table Table . . . . . . . . . . .. 16-11
$v_hist_table_access_stats. . . . . . .. 14-18 System statistics . . . . . . . . . . .. 16-12
$hist_column_access_n. . . . . . . .. 14-19 The nzstats command . . . . . . . .. 16-12
$hist_failed_authentication_n . . . . .. 14-20 Display table types and fields . . . . .. 16-12
$hist_log_entry_n . . . . . . . . .. 14-21 Display a specific table . . . . . . .. 16-12
$hist_nps_n . . . . . . . . . . .. 14-21
$hist_plan_epilog_n. . . . . . . . .. 14-22 Chapter 17. System Health Check
$hist_plan_prolog_n . . . . . . . .. 14-22 tool . . . . . . . . . . . . . .. 17-1
$hist_query_epilog_n . . . . . . . .. 14-23
Tool versioning . . . . . . . . . . .. 17-1
$hist_query_overflow_n . . . . . . .. 14-25
Policy rules . . . . . . . . . . . . .. 17-2
$hist_query_prolog_n . . . . . . . .. 14-25
Daemon modes . . . . . . . . . . .. 17-3
$hist_service_n . . . . . . . . . .. 14-27
Using the health check tool . . . . . . .. 17-4
$hist_session_epilog_n . . . . . . . .. 14-27
Running and analyzing the health check report 17-4
$hist_session_prolog_n . . . . . . .. 14-28
Enabling automatic notifications . . . . .. 17-5
$hist_state_change_n . . . . . . . .. 14-29
Integration with the callhome service . . .. 17-7
$hist_table_access_n . . . . . . . .. 14-30
Running the sysinfo report . . . . . .. 17-7
$hist_version . . . . . . . . . . .. 14-31
Managing the monitoring daemon . . . .. 17-9
History table helper functions . . . . . .. 14-31
The nzhealthcheck command . . . . . . .. 17-9

Chapter 15. Workload management 15-1 Appendix A. Netezza CLI . . . . .. A-1


WLM techniques . . . . . . . . . . .. 15-1
Summary of command-line commands . . . .. A-1
Scheduler rules . . . . . . . . . . .. 15-2
Command privileges . . . . . . . . .. A-3
Scheduler rules tasks . . . . . . . .. 15-7
Commands without special privileges . . .. A-4
Guaranteed resource allocation (GRA) . . . .. 15-8
Exit codes . . . . . . . . . . . .. A-4
Resource minimums and maximums . . .. 15-10
Netezza CLI command syntax. . . . . . .. A-4
Resource group example . . . . . . .. 15-11
The nzbackup command . . . . . . . .. A-4
Guaranteed resource allocation example . .. 15-13
The nzcallhome command . . . . . . . .. A-4
Allocations for several plans associated with
The nzconfigcrypto command . . . . . .. A-8
the same group . . . . . . . . . .. 15-14
The nzcontents command . . . . . . . .. A-9
Configuring limits for resource groups . .. 15-15
The nzconvert command . . . . . . . .. A-10
Resource allocations for the admin user . .. 15-15
The nzds command . . . . . . . . . .. A-10
Disabling, enabling, and configuring GRA 15-16
The nzevent command . . . . . . . . .. A-14
GRA compliance. . . . . . . . . .. 15-16
The nzhealthcheck command. . . . . . .. A-18
Monitoring resource utilization and
The nzhistcleanupdb command . . . . . .. A-19
compliance . . . . . . . . . . .. 15-17
The nzhistcreatedb command . . . . . .. A-21
Resource group assignments. . . . . .. 15-21
The nzhostbackup command . . . . . . .. A-24
Short query bias (SQB) . . . . . . . .. 15-22
The nzhostrestore command . . . . . . .. A-26
Priority query execution (PQE) . . . . . .. 15-24
The nzhw command . . . . . . . . .. A-28
Priority levels . . . . . . . . . .. 15-25
The nzkey command . . . . . . . . .. A-34
Priority weighting and resource allocation 15-25
The nzkeydb command . . . . . . . .. A-37
Specifying priorities . . . . . . . .. 15-26
The nzkeybackup command . . . . . . .. A-39
Changing the priority of all jobs in a session 15-29
The nzkeyrestore command . . . . . . .. A-40
The nzkmip command . . . . . . . . .. A-40
Chapter 16. Netezza statistics . . .. 16-1 The nzload command . . . . . . . . .. A-42
Netezza stats tables . . . . . . . . . .. 16-1 The nznpssysrevs command . . . . . . .. A-42
Database Table. . . . . . . . . . .. 16-2 The nzpassword command . . . . . . .. A-43

Contents vii
The nzreclaim command . . . . . . . .. A-45 Host name and IP address changes . . . .. B-4
The nzrestore command . . . . . . . .. A-47 Rebooting the system . . . . . . . . .. B-4
The nzrev command . . . . . . . . .. A-47 Reformat the host disks . . . . . . . .. B-5
The nzsession command . . . . . . . .. A-49 Fix system errors . . . . . . . . . .. B-5
The nzspupart command . . . . . . . .. A-54 View system processes . . . . . . . .. B-5
The nzstart command . . . . . . . . .. A-56 Stop errant processes . . . . . . . . .. B-5
The nzstate command . . . . . . . . .. A-58 Change the system time . . . . . . . .. B-6
The nzstats command . . . . . . . . .. A-60 Determine the kernel release level . . . .. B-6
The nzstop command . . . . . . . . .. A-63 Linux system administration . . . . . . .. B-6
The nzsystem command . . . . . . . .. A-65 Display directories. . . . . . . . . .. B-7
The nzzonemapformat command . . . . .. A-68 Find files . . . . . . . . . . . . .. B-7
Customer service troubleshooting commands A-69 Display file content . . . . . . . . .. B-7
The nzconvertsyscase command. . . . .. A-70 Find Netezza hardware . . . . . . . .. B-7
The nzdumpschema command . . . . .. A-71 Time command execution . . . . . . .. B-8
The nzinitsystem command . . . . . .. A-73 Set default command line editing . . . . .. B-8
The nzlogmerge command . . . . . .. A-73 Miscellaneous commands . . . . . . .. B-8

Appendix B. Linux host administration Appendix C. Netezza user and system


reference . . . . . . . . . . . .. B-1 views . . . . . . . . . . . . . .. C-1
Linux accounts . . . . . . . . . . . .. B-1 User views . . . . . . . . . . . . .. C-1
Set up Linux user accounts . . . . . . .. B-1 System views . . . . . . . . . . . .. C-2
Modify Linux user accounts . . . . . .. B-2
Delete Linux user accounts . . . . . . .. B-2 Notices . . . . . . . . . . . . .. D-1
Change Linux account passwords . . . .. B-2 Trademarks . . . . . . . . . . . . .. D-3
Linux groups . . . . . . . . . . . .. B-3 Terms and conditions for product documentation D-3
Add Linux groups. . . . . . . . . .. B-3
Modify Linux groups . . . . . . . . .. B-4
Index . . . . . . . . . . . . . .. X-1
Delete Linux groups . . . . . . . . .. B-4
Linux Host system . . . . . . . . . .. B-4

viii IBM Netezza System Administrator’s Guide


Figures
3-1. NzAdmin main system window 3-14 5-12. IBM Netezza 1000-3 and IBM PureData
3-2. NzAdmin hyperlink support . . . .. 3-15 System for Analytics N1001-002 PDUs and
5-1. Sample nzhw show output . . . . .. 5-3 circuit breakers . . . . . . . . .. 5-34
5-2. Sample nzhw show output for Netezza 8-1. Alerts Window . . . . . . . . .. 8-37
C1000 . . . . . . . . . . . .. 5-4 12-1. Record Distribution window . . . .. 12-9
5-3. IBM PureData System for Analytics N200x 12-2. Table Skew window . . . . . . .. 12-12
system components and locations . . .. 5-6 12-3. Organizing tables with CBTs . . . .. 12-13
5-4. IBM Netezza system components and 13-1. Database backups timeline . . . . .. 13-18
locations . . . . . . . . . . .. 5-7 15-1. GRA usage sharing . . . . . . .. 15-13
5-5. IBM PureData System for Analytics 15-2. Several plans in a group share the group's
N3001-001 components and locations . .. 5-8 resources. . . . . . . . . . .. 15-14
5-6. SPUs, disks, data slices, and data partitions 5-13 15-3. Effects of the admin user on GRA 15-16
5-7. Netezza C1000 SPU and storage 15-4. Resource Allocation Performance window 15-19
representation . . . . . . . . .. 5-14 15-5. Resource Allocation Performance History
5-8. Model N3001-001 storage architecture window . . . . . . . . . . .. 15-20
overview . . . . . . . . . . .. 5-16 15-6. Resource Allocation Performance graph 15-21
5-9. Model N3001-001 storage architecture 15-7. SQB queuing and priority . . . . .. 15-24
overview in one-host mode . . . . .. 5-17 15-8. GRA and priority . . . . . . . .. 15-26
5-10. Balanced and unbalanced disk topologies 5-18
5-11. IBM Netezza 1000-6 and N1001-005 and
larger PDUs and circuit breakers . . .. 5-33

© Copyright IBM Corp. 2001, 2015 ix


x IBM Netezza System Administrator’s Guide
Tables
2-1. Netezza supported platforms. . . . .. 2-1 8-24. TransactionLimitEvent rule . . . . .. 8-34
2-2. Environment variables . . . . . . .. 2-8 9-1. IBM Support servers for call home PMR
2-3. Netezza port numbers for database access 2-10 access . . . . . . . . . . . .. 9-8
3-1. Command summary . . . . . . .. 3-1 9-2. Callhome event rules . . . . . . .. 9-12
3-2. nzsql command parameters . . . . .. 3-5 9-3. Callhome event conditions and severities 9-13
3-3. The nzsql slash commands . . . . .. 3-10 10-1. Troubleshooting nzstart errors . . . .. 10-7
3-4. Status indicators . . . . . . . .. 3-15 11-1. Administrator privileges . . . . .. 11-10
3-5. Automatic refresh . . . . . . . .. 3-17 11-2. Object privileges . . . . . . . .. 11-11
4-1. HA tasks and commands (Old design and 11-3. Slash commands to display privileges 11-14
new design) . . . . . . . . . .. 4-2 11-4. Privileges by object . . . . . . .. 11-15
4-2. Cluster management scripts . . . . .. 4-4 11-5. Indirect object privileges . . . . .. 11-17
4-3. HA IP addresses . . . . . . . .. 4-15 11-6. Netezza supported platforms for Kerberos
5-1. Key Netezza hardware components to authentication . . . . . . . . .. 11-23
monitor . . . . . . . . . . . .. 5-1 11-7. Authentication-related commands 11-27
5-2. Hardware description types . . . . .. 5-4 11-8. Client connection-related commands 11-34
5-3. Hardware roles . . . . . . . . .. 5-8 11-9. User and group settings . . . . . .. 11-34
5-4. Hardware states. . . . . . . . .. 5-10 11-10. Public views . . . . . . . . .. 11-40
5-5. Data slice status . . . . . . . . .. 5-27 11-11. System views . . . . . . . . .. 11-41
5-6. System states and transactions . . . .. 5-31 12-1. Data type disk usage . . . . . . .. 12-4
6-1. nzkey check output samples . . . .. 6-10 12-2. Table skew . . . . . . . . . .. 12-11
7-1. Netezza software revision numbering 7-2 12-3. Database information . . . . . .. 12-15
7-2. Common system states . . . . . . .. 7-3 12-4. Generate statistics syntax . . . . .. 12-16
7-3. System states reference . . . . . . .. 7-4 12-5. Automatic Statistics . . . . . . .. 12-17
7-4. Netezza processes . . . . . . . .. 7-8 12-6. The cbts_needing_groom input options 12-22
7-5. Error categories . . . . . . . . .. 7-11 12-7. The 64th read/write transaction queueing 12-27
7-6. Configuration settings for short query bias 12-8. The _v_qrystat view . . . . . . .. 12-30
(SQB) . . . . . . . . . . . .. 7-19 12-9. The _v_qryhist view . . . . . . .. 12-31
7-7. Configuration settings for the updating 13-1. Choose a backup and restore method 13-1
virtual tables . . . . . . . . . .. 7-20 13-2. Backup and restore commands and content 13-2
7-8. Configuration settings for plan history 7-20 13-3. Retaining specials . . . . . . . .. 13-6
7-9. Configuration settings for downtime event 13-4. nzbackup command parameters 13-12
logging. . . . . . . . . . . .. 7-20 13-5. Environment settings. . . . . . .. 13-14
7-10. Configuration settings for backup 7-21 13-6. Backup history source . . . . . .. 13-20
8-1. Template event rules . . . . . . .. 8-1 13-7. Backup and Restore Behavior 13-21
8-2. Netezza template event rules . . . . .. 8-3 13-8. The nzrestore command options 13-23
8-3. Event types . . . . . . . . . .. 8-7 13-9. Environment settings. . . . . . .. 13-27
8-4. Event argument expression syntax 8-12 13-10. Backup history target . . . . . .. 13-32
8-5. Notification substitution tags . . . .. 8-13 13-11. Restore history source . . . . . .. 13-34
8-6. Notification syntax . . . . . . . .. 8-13 13-12. NetBackup policy settings . . . . .. 13-35
8-7. System state changes . . . . . . .. 8-18 14-1. History loading settings . . . . . .. 14-8
8-8. HardwareServiceRequested event rule 8-19 14-2. $v_hist_column_access_stats . . . .. 14-15
8-9. HardwareNeedsAttention event rule 8-20 14-3. $v_hist_incomplete_queries . . . .. 14-16
8-10. HardwarePathDown event rule 8-21 14-4. $v_hist_log_events . . . . . . .. 14-17
8-11. HardwareRestarted event rule . . . .. 8-22 14-5. $v_hist_queries,
8-12. DiskSpace event rules . . . . . . .. 8-23 $v_hist_successful_queries, and
8-13. Threshold and states . . . . . . .. 8-23 $v_hist_unsuccessful_queries . . . .. 14-17
8-14. RunAwayQuery event rule . . . . .. 8-25 14-6. $v_hist_table_access_stats View 14-19
8-15. SCSIPredictiveFailure event rule 8-26 14-7. $hist_column_access_n . . . . . .. 14-19
8-16. RegenFault event rule . . . . . . .. 8-27 14-8. $hist_failed_authentication_n 14-20
8-17. SCSIDiskError event rule . . . . . .. 8-27 14-9. $hist_log_entry_2 . . . . . . . .. 14-21
8-18. ThermalFault event rule . . . . . .. 8-29 14-10. $hist_nps_n . . . . . . . . . .. 14-21
8-19. SysHeatThreshold event rule . . . .. 8-29 14-11. $hist_plan_epilog_n . . . . . . .. 14-22
8-20. The histCaptureEvent rule . . . . .. 8-30 14-12. $hist_plan_prolog_n . . . . . . .. 14-23
8-21. The histLoadEvent rule . . . . . .. 8-31 14-13. $hist_query_epilog_n. . . . . . .. 14-24
8-22. The spuCore event rule . . . . . .. 8-33 14-14. $hist_query_overflow_n . . . . . .. 14-25
8-23. VoltageFault event rule . . . . . .. 8-33 14-15. $hist_query_prolog_n . . . . . .. 14-25

© Copyright IBM Corp. 2001, 2015 xi


14-16. $hist_service_n . . . . . . . . .. 14-27 A-11. The nzhistcreatedb input options A-21
14-17. $hist_session_epilog_n . . . . . .. 14-28 A-12. The nzhistcreatedb output messages A-23
14-18. $hist_session_prolog_n . . . . . .. 14-28 A-13. The nzhostbackup input options A-25
14-19. $hist_state_change_n . . . . . . .. 14-30 A-14. The nzhostrestore input options A-27
14-20. $hist_table_access_n . . . . . . .. 14-30 A-15. The nzhostrestore options . . . . .. A-27
14-21. $hist_version . . . . . . . . .. 14-31 A-16. The nzhw input options . . . . . .. A-29
15-1. Workload management feature summary 15-1 A-17. The nzhw options . . . . . . . .. A-32
15-2. Assign resources to active resource groups 15-11 A-18. The nzkey input options . . . . . .. A-35
15-3. Example of resource groups . . . .. 15-12 A-19. The nzkey input options . . . . . .. A-36
15-4. Example of a distribution of resources A-20. The nzkeydb input options . . . . .. A-37
within resource groups . . . . . .. 15-26 A-21. The nzkeydb input options . . . . .. A-38
16-1. Netezza Groups and Tables . . . . .. 16-1 A-22. The nzkeybackup input options A-39
16-2. Database Table . . . . . . . . .. 16-2 A-23. The nzkeyrestore input options A-40
16-3. DBMS Group. . . . . . . . . .. 16-2 A-24. The nzkmip input options . . . . .. A-41
16-4. Host CPU Table . . . . . . . . .. 16-3 A-25. The nzkmip input options . . . . .. A-41
16-5. Host File System Table . . . . . .. 16-3 A-26. The nzpassword input options A-43
16-6. Host Interfaces Table . . . . . . .. 16-4 A-27. The nzpassword options. . . . . .. A-44
16-7. Host Management Channel Table 16-5 A-28. The nzreclaim input options . . . .. A-45
16-8. Host Network Table . . . . . . .. 16-5 A-29. The nzreclaim options . . . . . .. A-46
16-9. Host Table . . . . . . . . . .. 16-6 A-30. The nzrev input options . . . . . .. A-48
16-10. Hardware Management Channel Table 16-7 A-31. The nzsession input options . . . .. A-49
16-11. Per Table Data Slice Table . . . . .. 16-8 A-32. The nzsession options . . . . . .. A-50
16-12. Query Table . . . . . . . . . .. 16-8 A-33. Session information . . . . . . .. A-52
16-13. Query History Table . . . . . . .. 16-9 A-34. The nzspupart input options . . . .. A-54
16-14. SPU Partition Table . . . . . . .. 16-10 A-35. The nzspupart options . . . . . .. A-54
16-15. SPU Table . . . . . . . . . .. 16-10 A-36. The nzstart inputs . . . . . . . .. A-57
16-16. System Group . . . . . . . . .. 16-11 A-37. The nzstate inputs . . . . . . . .. A-58
16-17. Table Table . . . . . . . . . .. 16-11 A-38. The nzstate options . . . . . . .. A-58
17-1. Health check tool versions and the A-39. The nzstats inputs . . . . . . . .. A-60
corresponding supported NPS versions .. 17-1 A-40. The nzstats options . . . . . . .. A-61
17-2. The nzhealthcheck input options 17-9 A-41. The nzstop inputs . . . . . . . .. A-64
A-1. Command-line summary . . . . . .. A-1 A-42. The nzstop options . . . . . . .. A-64
A-2. The nzcallhome input options . . . .. A-5 A-43. The nzsystem inputs . . . . . . .. A-65
A-3. The nzconfigcrypto input options A-8 A-44. The nzsystem options . . . . . .. A-65
A-4. The nzconfigcrypto input options A-9 A-45. The nzzonemapformat input options A-68
A-5. The nzds input options . . . . . .. A-11 A-46. Diagnostic commands . . . . . .. A-69
A-6. The nzds options . . . . . . . .. A-12 A-47. The nzconvertsyscase input options A-71
A-7. The nzevent input options . . . . .. A-14 A-48. The nzdumpschema inputs . . . . .. A-72
A-8. The nzevent options . . . . . . .. A-15 A-49. The nzlogmerge options . . . . . .. A-73
A-9. The nzhealthcheck input options A-18 C-1. User views. . . . . . . . . . .. C-1
A-10. The nzhistcleanupdb input options A-20 C-2. System views . . . . . . . . . .. C-2

xii IBM Netezza System Administrator’s Guide


Electronic emission notices
When you attach a monitor to the equipment, you must use the designated
monitor cable and any interference suppression devices that are supplied with the
monitor.

Federal Communications Commission (FCC) Statement

This equipment was tested and found to comply with the limits for a Class A
digital device, according to Part 15 of the FCC Rules. These limits are designed to
provide reasonable protection against harmful interference when the equipment is
operated in a commercial environment. This equipment generates, uses, and can
radiate radio frequency energy and, if not installed and used in accordance with
the instruction manual, might cause harmful interference to radio communications.
Operation of this equipment in a residential area is likely to cause harmful
interference, in which case the user is required to correct the interference at their
own expense.

Properly shielded and grounded cables and connectors must be used to meet FCC
emission limits. IBM® is not responsible for any radio or television interference
caused by using other than recommended cables and connectors or by
unauthorized changes or modifications to this equipment. Unauthorized changes
or modifications might void the authority of the user to operate the equipment.

This device complies with Part 15 of the FCC Rules. Operation is subject to the
following two conditions: (1) this device might not cause harmful interference, and
(2) this device must accept any interference received, including interference that
might cause undesired operation.

Industry Canada Class A Emission Compliance Statement

This Class A digital apparatus complies with Canadian ICES-003.

Avis de conformité à la réglementation d'Industrie Canada

Cet appareil numérique de la classe A est conforme à la norme NMB-003 du


Canada.

Australia and New Zealand Class A Statement

This product is a Class A product. In a domestic environment, this product might


cause radio interference in which case the user might be required to take adequate
measures.

European Union EMC Directive Conformance Statement

This product is in conformity with the protection requirements of EU Council


Directive 2004/108/EC on the approximation of the laws of the Member States
relating to electromagnetic compatibility. IBM cannot accept responsibility for any
failure to satisfy the protection requirements resulting from a nonrecommended
modification of the product, including the fitting of non-IBM option cards.

© Copyright IBM Corp. 2001, 2015 xiii


This product is an EN 55022 Class A product. In a domestic environment, this
product might cause radio interference in which case the user might be required to
take adequate measures.

Responsible manufacturer:

International Business Machines Corp.


New Orchard Road
Armonk, New York 10504
914-499-1900

European Community contact:

IBM Technical Regulations, Department M456


IBM-Allee 1, 71137 Ehningen, Germany
Telephone: +49 7032 15-2937
Email: [email protected]

Germany Class A Statement

Deutschsprachiger EU Hinweis: Hinweis für Geräte der Klasse A EU-Richtlinie


zur Elektromagnetischen Verträglichkeit

Dieses Produkt entspricht den Schutzanforderungen der EU-Richtlinie


2004/108/EG zur Angleichung der Rechtsvorschriften über die elektromagnetische
Verträglichkeit in den EUMitgliedsstaaten und hält die Grenzwerte der EN 55022
Klasse A ein.

Um dieses sicherzustellen, sind die Geräte wie in den Handbüchern beschrieben zu


installieren und zu betreiben. Des Weiteren dürfen auch nur von der IBM
empfohlene Kabel angeschlossen werden. IBM übernimmt keine Verantwortung für
die Einhaltung der Schutzanforderungen, wenn das Produkt ohne Zustimmung der
IBM verändert bzw. wenn Erweiterungskomponenten von Fremdherstellern ohne
Empfehlung der IBM gesteckt/eingebaut werden.

EN 55022 Klasse A Geräte müssen mit folgendem Warnhinweis versehen werden:


“Warnung: Dieses ist eine Einrichtung der Klasse A. Diese Einrichtung kann im
Wohnbereich Funk-Störungen verursachen; in diesem Fall kann vom Betreiber
verlangt werden, angemessene Maßnahmen zu ergreifen und dafür
aufzukommen.”

Deutschland: Einhaltung des Gesetzes über die


elektromagnetische Verträglichkeit von Geräten

Dieses Produkt entspricht dem “Gesetz über die elektromagnetische Verträglichkeit


von Geräten (EMVG)”. Dies ist die Umsetzung der EU-Richtlinie 2004/108/EG in
der Bundesrepublik Deutschland.

Zulassungsbescheinigung laut dem Deutschen Gesetz über die


elektromagnetische Verträglichkeit von Geräten (EMVG) (bzw. der
EMC EG Richtlinie 2004/108/EG) für Geräte der Klasse A

Dieses Gerät ist berechtigt, in Übereinstimmung mit dem Deutschen EMVG das
EG-Konformitätszeichen - CE - zu führen.

Verantwortlich für die Einhaltung der EMV Vorschriften ist der Hersteller:

xiv IBM Netezza System Administrator’s Guide


International Business Machines Corp.
New Orchard Road
Armonk, New York 10504
914-499-1900

Der verantwortliche Ansprechpartner des Herstellers in der EU ist:

IBM Deutschland
Technical Regulations, Department M456
IBM-Allee 1, 71137 Ehningen, Germany
Telephone: +49 7032 15-2937
Email: [email protected]

Generelle Informationen: Das Gerät erfüllt die Schutzanforderungen nach EN 55024


und EN 55022 Klasse A.

Japan VCCI Class A Statement

This product is a Class A product based on the standard of the Voluntary Control
Council for Interference (VCCI). If this equipment is used in a domestic
environment, radio interference might occur, in which case the user might be
required to take corrective actions.

Japan Electronics and Information Technology Industries


Association (JEITA) Statement

Japan Electronics and Information Technology Industries Association (JEITA)


Confirmed Harmonics Guidelines (products less than or equal to 20 A per phase)

Japan Electronics and Information Technology Industries


Association (JEITA) Statement

Japan Electronics and Information Technology Industries Association (JEITA)


Confirmed Harmonics Guidelines (products greater than 20 A per phase)

Electronic emission notices xv


Korea Communications Commission (KCC) Statement

This is electromagnetic wave compatibility equipment for business (Type A). Sellers
and users need to pay attention to it. This is for any areas other than home.

Russia Electromagnetic Interference (EMI) Class A Statement

People's Republic of China Class A Electronic Emission


Statement

Taiwan Class A Compliance Statement

xvi IBM Netezza System Administrator’s Guide


Regulatory and compliance
Regulatory Notices

Install the NPS® system in a restricted-access location. Ensure that only those
people trained to operate or service the equipment have physical access to it.
Install each AC power outlet near the NPS rack that plugs into it, and keep it
freely accessible.

Provide approved circuit breakers on all power sources.

The IBM PureData® System for Analytics appliance requires a readily accessible
power cutoff. This can be a Unit Emergency Power Off Switch (UEPO), a circuit
breaker or completely remove power from the equipment by disconnecting the
Appliance Coupler (line cord) from all rack PDUs.

CAUTION:
Disconnecting power from the appliance without first stopping the NPS
software and high availability processes might result in data loss and increased
service time to restart the appliance. For all non-emergency situations, follow the
documented power-down procedures in the IBM Netezza System Administrator’s
Guide to ensure that the software and databases are stopped correctly, in order, to
avoid data loss or file corruption.

Product might be powered by redundant power sources. Disconnect ALL power


sources before servicing.

High leakage current. Earth connection essential before connecting supply. Courant
de fuite élevé. Raccordement à la terre indispensable avant le raccordement au
réseau.

Homologation Statement

This product may not be certified in your country for connection by any means
whatsoever to interfaces of public telecommunications networks. Further
certification may be required by law prior to making any such connection. Contact
an IBM representative or reseller for any questions.

© Copyright IBM Corp. 2001, 2015 xvii


xviii IBM Netezza System Administrator’s Guide
About this publication
The IBM Netezza® data warehouse appliance is a high performance, integrated
database appliance that provides unparalleled performance, extensive scaling, high
reliability, and ease of use. The Netezza appliance uses a unique architecture that
combines current trends in processor, network, and software technologies to
deliver a high performance system for large enterprise customers.

These topics are written for system administrators and database administrators. In
some customer environments, these roles can be the responsibility of one person or
several administrators.

You should be familiar with Netezza concepts and user interfaces, as described in
the IBM Netezza Getting Started Tips. Be comfortable with using command-line
interfaces, Linux operating system utilities, windows-based administration
interfaces, and installing software on client systems to access the Netezza
appliance.

If you need help


If you are having trouble using the IBM Netezza appliance, follow these steps:
1. Try the action again, carefully following the instructions for that task in the
documentation.
2. Go to the IBM Support Portal at: http://www.ibm.com/support. Log in using
your IBM ID and password. You can search the Support Portal for solutions. To
submit a support request, click the Service Requests & PMRs tab.
3. If you have an active service contract maintenance agreement with IBM, you
can contact customer support teams by telephone. For individual countries,
visit the Technical Support section of the IBM Directory of worldwide contacts
(http://www.ibm.com/support/customercare/sas/f/handbook/contacts.html).

How to send your comments


You are encouraged to send any questions, comments, or suggestions about the
IBM Netezza documentation. Send an email to [email protected]
and include the following information:
v The name and version of the manual that you are using
v Any comments that you have about the manual
v Your name, address, and phone number

We appreciate your suggestions.

© Copyright IBM Corp. 2001, 2015 xix


xx IBM Netezza System Administrator’s Guide
Chapter 1. Administration overview
This section provides an introduction and overview to the tasks involved in
administering an IBM Netezza data warehouse appliance.

Administrator’s roles
IBM Netezza administration tasks typically fall into two categories:
System administration
Managing the hardware, configuration settings, system status, access, disk
space, usage, upgrades, and other tasks
Database administration
Managing the user databases and their content, loading data, backing up
data, restoring data, controlling access to data and permissions

In some customer environments, one person can be both the system and database
administrator to do the tasks when needed. In other environments, multiple people
might share these responsibilities, or they might own specific tasks or
responsibilities. You can develop the administrative model that works best for your
environment.

In addition to the administrator roles, there are also database user roles. A database
user is someone who has access to one or more databases and has permission to
run queries on the data that is stored within those databases. In general, database
users have access permissions to one or more user databases, or to one or more
schemas within databases, and they have permission to do certain types of tasks
and to create or manage certain types of objects within those databases.

Administration tasks
The administration tasks generally fall into these categories:
v Service level planning
v Deploying and installing Netezza clients
v Managing a Netezza system
v Managing system notifications and events
v Managing Netezza users and groups
v Managing databases
v Loading data (described in the IBM Netezza Data Loading Guide)
v Backing up and restoring databases
v Collecting and evaluating history data
v Workload management

Initial system setup and information


A factory-configured and installed IBM Netezza system includes the following
components:
v A Netezza data warehouse appliance with preinstalled Netezza software

© Copyright IBM Corp. 2001, 2015 1-1


v A preconfigured Linux operating system (with Netezza modifications) on one or
both system hosts. Netezza high-availability (HA) models have two hosts, while
non-HA models have one host.
v Several preconfigured Linux users and groups, which must not be modified or
deleted.
– The nz user is the default Netezza system administrator account. The Linux
user is named nz with a default password of nz. The Netezza software runs
as this user, and you can access the system by using a command shell or
remote access software as the nz user.
– Netezza HA systems also require a Linux user (hacluster) and two Linux
groups (hacluster and haclient) that are added automatically to the host
during the Heartbeat RPM installation.
v A Netezza database user named admin (with a default password of password).
The admin user is the database super-user, and has full access to all system
functions and objects. You cannot delete the admin user. You use the admin
account to start creating user databases and other database user groups and
accounts to which you can assign appropriate permissions and access.
v A preconfigured database group named public. All database users are
automatically placed in the group public and therefore inherit all of its
privileges. The group public has default access privileges to selected system
views, such as lists of available databases, tables, and views. You cannot delete
the group public.

Netezza Support and Sales representatives work with you to install and initially
configure the Netezza in your customer environment. Typically, the initial rollout
consists of installing the system in your data center, and then configuring the
system host name and IP address to connect the system to your network and make
it accessible to users. They also work with you to do initial studies of the system
usage and query performance, and might advocate other configuration settings or
administration ideas to improve the performance of and access to the Netezza for
your users.
Related concepts:
“Linux users and groups required for HA” on page 4-17

Netezza software directories


The IBM Netezza software is installed in several directories on the Netezza host as
follows:
v The /nz directory is the Netezza host software installation directory.
v The /export/home/nz directory is a home directory for the nz user.
v The Linux operating system boot directories.

Host software directory


The IBM Netezza host installation directory contains the following software
directories and files.
/nz The root of the Netezza software installation tree. On a production host,
the default software installation directory is /nz. If you are a Linux user
that is connected to the Netezza host, include /nz/kit/bin and
/nz/kit/bin/adm in your PATH.
/nz/data->
A link to the current data directory.

1-2 IBM Netezza System Administrator’s Guide


/nz/kit->
A link to the current kit of executable files. The kit link points to the
current software revision in use.
/nz/data.<ver>/
System catalog and other host-side database files.
/nz/kit.<rev>/
The set of optimized executable files and support files that are needed to
run the product. The <rev> represents the revision of the software.
/nz/tmp/
Netezza temporary files.
/nzscratch
A location for Netezza internal files. This location is not mirrored. The
/nzscratch/tmp directory is the default temporary files directory, which is
specified by the NZ_TMP_DIR variable. It holds files that are created and
used by the transaction manager and other processes. The contents of
NZ_TMP_DIR are deleted when the Netezza software starts and when the
Netezza system restarts. Do not store large files in /nzscratch or its
subdirectories; if /nzscratch runs out of space, Netezza processes can fail.

The /nz directory

The /nz directory is the top-level directory that contains the Netezza software
installation kits, data, and important information for the system and database. As a
best practice, use caution when you are viewing files in this directory or its
subfolders because unintended changes can impact the operation of the Netezza
system or cause data loss. Never delete or modify files or folders in the /nz
directory unless directed to do so by Netezza Support or an IBM representative.
Do not store large files, unrelated files, or backups in the /nz directory.

The system manager monitors the size of the /nz directory. If the /nz directory
reaches a configured usage percentage, the system manager stops the Netezza
software and logs a message in the sysmgr.log file. The default threshold is 95%,
which is specified by the value of the
sysmgr.hostFileSystemUsageThresholdToStopSystem registry setting. Do not
change the value of the registry setting unless directed to do so by Netezza
Support.

A sample sysmgr.log file message for a case where the /nz directory has reached
the configured 95% capacity threshold follows.
Error: File system /nz usage exceeded 95 threshold on rack1.host1 System will
be stopped

If the Netezza software stops and this message is in the sysmgr.log file, contact
Netezza Support for assistance to carefully review the contents of the /nz directory
and to delete appropriate files. When the /nz directory usage falls below the
configured threshold, you can start the Netezza software.

The data directory

The /nz/data directory contains the following subdirectories:


data.<ver>/base
Contains system tables, catalog information, and subdirectories for the
databases. Each database that you create has its own subdirectory whose

Chapter 1. Administration overview 1-3


name matches the database object ID value. For example, base/1/ is the
system database, base/2/ is the master_db database, and base/nnn is a user
database, where nnn is the object ID of the database.
data.<ver>/cache
Contains copies of compiled code that were dynamically generated on the
host, cross-compiled to run on the SPUs, then downloaded to the SPUs for
execution. The copies are saved to eliminate extra steps and duplicate
work for similar queries.
data.<ver>/config
Contains configuration files such as:
callHome.txt
The callhome attachment file.
sendMail.cfg
A file that contains the configuration parameters for the sendmail
program.
system.cfg
The system configuration file, which contains settings that control
the system.

If the Netezza system uses options such as LDAP or Kerberos


authentication or other applications, this directory might also contain
additional files.
data.<ver>/plans
Contains copies of the most recent execution plans for reference. The
system stores the execution plan (for each query) in a separate file with a
.pln extension, and includes the following information:
v The original SQL that was submitted.
v The plan itself, describing how the various tables and columns are to be
accessed, when joins, sorts, and aggregations are performed, and other
processes.
v If the system was able to reuse a cached (already compiled) version of
the code.
The system also generates a separate C program (.cpp file) to process
each snippet of each plan. The system compares this code against files in
/nz/data/cache to determine whether the compilation step can be
skipped.

The kit directory

The kit directory contains the following subdirectories:


kit.<rev>/
Top-level directory for the release <rev> (for example, kit.6.0).
kit.<rev>/bin/
All user-level CLI programs.
kit.<rev>/bin/adm
Internal CLI programs.
kit.<rev>/log/<pgm name>/
Component log files, one subdirectory per component that contains a file

1-4 IBM Netezza System Administrator’s Guide


per day of log information up to seven days. The information in the logs
includes when the process started, when the process exited or completed,
and any error conditions.
kit.<rev>/ sbin
Internal host and utility programs that are not intended to be run directly
by users. These programs are not prefixed (for example, clientmgr).
kit.<rev>/share/
Postgres-specific files.
kit.<rev>/sys/
System configuration files, startup.cfg, and some subdirectories (init,
include, strings).
kit.<rev>/sys/init/
Files that are used for system initialization.

nz user home directory


The host software runs under a preconfigured Linux user named nz. The home
directory for the nz user is /export/home/nz. The default shell configuration file, in
addition to standard UNIX specifications, adds /nz/kit/bin to the PATH
environment variable so that user nz can automatically locate CLI commands.

Linux boot directories


To ensure that the system starts the IBM Netezza software when it boots, Netezza
places some entries in the init.d directory, which is a standard system facility for
starting applications. Never modify the Linux operating system boot directories or
files unless you are directed to by Netezza Support or documented Netezza
procedures. Changes to these files can impact the operation of the host.

External network connections


During the on-site installation of the IBM Netezza system, Netezza installation
engineers work with you to configure your system by using the site survey
information that is prepared for your environment. The initial setup process
includes steps to configure the external network connections (that is, the host name
and IP address information) of your Netezza system.

CAUTION:
If you need to change the host name or IP address information, do not use the
general Linux procedures to change this information. Contact Netezza Support
for assistance to ensure that the changes are using Netezza procedures to ensure
that the changes are propagated to the high availability configuration and
related services.

Domain Name Service (DNS) Updates


The IBM Netezza server uses a domain name service (DNS) server to provide
name resolution to devices such as S-Blades within the system. This allows SPUs to
have a DNS name (such as spu0103) and an IP address.

To change the DNS settings for your system, use the nzresolv service to manage
the DNS updates. The nzresolv service updates the resolv.conf information
stored on the Netezza host; for highly available Netezza systems, the nzresolv
service updates the information stored on both hosts. (You can log in to either host
to do the DNS updates.) You must be able to log in as the root user to update the
resolv.conf information; any Linux user such as nz can display the DNS
information by using the show option.

Chapter 1. Administration overview 1-5


Note: Do not manually edit the /etc/resolv.conf* files, even as the root user. Use
the nzresolv service to update the files and to ensure that the information is
maintained correctly on the hosts.

The Netezza system manages the DNS services as needed during actions such as
host failovers from the master host to the standby host. Never manually restart the
nzresolv service unless directed to do so by Netezza Support for troubleshooting.
A restart can cause loss of contact with the localhost DNS service, and
communication issues between the host and the system hardware components. Do
not use any of the nzresolv subcommands other than update, status, or show
unless directed to do so by Netezza Support.

Displaying the DNS information


About this task

To display the current DNS information for the system, do the following steps:

Procedure
1. Log in to the active host as a Linux user such as nz.
2. Enter the following command:
[nz@nzhost1 ~]$ service nzresolv show

Example

Sample output follows:


search yourcompany.com
nameserver 1.2.3.4
nameserver 1.2.5.6

Changing DNS information


About this task

You update the DNS information by using the nzresolv service. You can change
the DNS information by using a text editor, and read the DNS information from a
file or enter it on the command line. Any changes that you make take effect
immediately (and on both hosts, for HA systems). The DNS server uses the
changes for the subsequent DNS lookup requests.

Updating DNS information with a text editor:


About this task

To change the DNS information, do the following steps:

Procedure
1. Log in to either host as root.
2. Enter the following command:
[root@nzhost1 ~]# service nzresolv update

Note: If you use the service command to edit the DNS information, you must
use vi as the text editor tool, as shown in these examples. However, if you
prefer to use a different text editor, you can set the $EDITOR environment
variable and use the /etc/init.d/nzresolve update command to edit the files
using by your editor of choice.
3. Review the system DNS information as shown in the sample file.

1-6 IBM Netezza System Administrator’s Guide


# !!! All lines starting ’# !!!’ will be removed.
# !!!
search yourcompany.com
nameserver 1.2.3.4
nameserver 1.2.5.6
4. Enter, delete, or change the DNS information as required. When you are
finished, save your changes and exit (or you can exit without saving the
changes) using one of the following commands:
v :wq to save the changes.
v :q to exit the file.
v :q! to exit without saving any changes that you made in the file.

CAUTION:
Use caution before you change the DNS information; incorrect changes can
affect the operation of the IBM Netezza system. Review any changes with
the DNS administrator at your site to ensure that the changes are correct.

Overwriting DNS information with a text file:


About this task

To change the DNS information by reading the information from an existing text
file, do the following steps:

Procedure
1. Log in to either host as root.
2. Create a text file with your DNS information. Make your text file similar to the
following format:
search yourcompany.com
nameserver 1.2.3.4
nameserver 1.2.5.6
3. Enter the following command, where file is the fully qualified path name to
the text file:
[root@nzhost1 ~]# service nzresolv update file

Appending DNS information from the command prompt:


About this task

To change the DNS information by entering the information from the command
prompt, do the following steps:

Procedure
1. Log in to either host as root.
2. Enter the following command (note the dash character at the end of the
command):
[root@nzhost1 ~]# service nzresolv update -
The command prompt proceeds to a new line where you can enter the DNS
information. Enter the complete DNS information because the text that you
type replaces the existing information in the resolv.conf file.
3. After you finish typing the DNS information, type one of the following
commands:
v Control-D to save the information that you entered and exit the editor.
v Control-C to exit without saving any changes.

Chapter 1. Administration overview 1-7


Displaying DNS service status
You can display the status of the nzresolv service on your IBM Netezza hosts.

About this task

To display the current status of the Netezza nzresolv service, do the following
steps:

Procedure
1. Log in to the active host as a Linux user such as nz.
2. Enter the following command:
[nz@nzhost1 ~]$ service nzresolv status

Example

Sample output follows:


Configured for local resolv.conf

If you log in to the standby host of the Netezza system and run the command, the
status message is Configured for upstream resolv.conf.

Remote access
IBM Netezza systems are typically installed in a data center, which is often highly
secured from user access and sometimes in a geographically separate location.
Thus, you might need to set up remote access to Netezza so that your users can
connect to the system through the corporate network. Common ways to remotely
log on to another system through a shell (Telnet, rlogin, or rsh) do not encrypt data
that is sent over the connection between the client and the server. Consequently,
the type of remote access you choose depends upon the security considerations at
your site. Telnet is the least secure and SSH (Secure Shell) is the most secure.

If you allow remote access through Telnet, rlogin, or rsh, you can more easily
manage this access through the xinetd daemon (Extended Internet Services). The
xinetd daemon starts programs that provide Internet services. This daemon uses a
configuration file, /etc/xinetd.conf, to specify services to start. Use this file to
enable or disable remote access services according to the policy at your site.

If you use SSH, it does not use xinetd, but rather its own configuration files. For
more information, see the Red Hat documentation.

Administration interfaces
IBM Netezza offers several ways or interfaces that you can use to perform the
various system and database management tasks:
v Netezza commands (nz* commands) are installed in the /nz/kit/bin directory
on the Netezza host. For many of the nz* commands, you must be able to log on
to the Netezza system to access and run those commands. In most cases, users
log in as the default nz user account, but you can create other Linux user
accounts on your system. Some commands require you to specify a database
user account, password, and database to ensure that you have permissions to do
the task.
v The Netezza CLI client kits package a subset of the nz* commands that can be
run from Windows and UNIX client systems. The client commands might also

1-8 IBM Netezza System Administrator’s Guide


require you to specify a database user account, password, and database to
ensure that you have database administrative and object permissions to do the
task.
v The SQL commands support administration tasks and queries within a SQL
database session. You can run the SQL commands from the Netezza nzsql
command interpreter or through SQL APIs such as ODBC, JDBC, and the OLE
DB Provider. You must have a database user account to run the SQL commands
with appropriate permissions for the queries and tasks that you perform.
v The NzAdmin tool is a Netezza interface that runs on Windows client
workstations to manage Netezza systems.
v Netezza Performance Portal. The Netezza Performance Portal is a web browser
client that provides detailed monitoring capabilities for your Netezza systems.
You can use the portal to answer questions about system usage, workload,
capacity planning, and overall query performance.

The nz* commands are installed and available on the Netezza system, but it is
more common for users to install Netezza client applications on client
workstations. Netezza supports various Windows and UNIX client operating
systems. Chapter 2, “Netezza client software installation,” on page 2-1 describes
the Netezza clients and how to install them. Chapter 3, “Netezza administration
interfaces,” on page 3-1 describes how to get started by using the administration
interfaces.

The client interfaces provide you with different ways to do similar tasks. While
most users tend to use the nz* commands or SQL commands for tasks, you can
use any combination of the client interfaces, depending on the task or your
workstation environment, or interface preferences.
Related concepts:
Chapter 2, “Netezza client software installation,” on page 2-1
This section describes how to install the Netezza CLI clients and the NzAdmin
tool.

Other Netezza documentation


The IBM Netezza documentation set contains other documents that can help you
in your day-to-day use of the Netezza system and features:
v IBM Netezza Database User’s Guide describes the Netezza SQL commands and
how to use them to create queries and how to create and manage database
objects
v IBM Netezza Data Loading Guide describes how to load data into a Netezza
system
v IBM Netezza ODBC, JDBC, OLE DB, and .NET Installation and Configuration Guide
describes how to configure data connectivity clients to connect to your Netezza
system and run queries through the supported drivers
v IBM Netezza Advanced Security Administrator's Guide describes how to manage
multi-level security, audit logging and history, and authentication within the
Netezza database
v IBM Netezza Getting Started Tips provides a high-level overview of Netezza
appliances and concepts for the new user, plus an overview of the
documentation set
v IBM Netezza Software Upgrade Guide describes how to upgrade the Netezza
software

Chapter 1. Administration overview 1-9


v IBM Netezza Release Notes describes new features and changes in a Netezza
software release, and a summary of known issues and fixes for
customer-reported issues

There are several Netezza documents that offer more specialized information about
features or tasks. For more information, see IBM Netezza Getting Started Tips.

1-10 IBM Netezza System Administrator’s Guide


Chapter 2. Netezza client software installation
This section describes how to install the Netezza CLI clients and the NzAdmin
tool.

In most cases, the only applications that IBM Netezza administrators or users must
install are the client applications to access the Netezza system. Netezza provides
client software that runs on various systems such as Windows, Linux, Solaris,
AIX®, and HP-UX systems.

The instructions to install and use the Netezza Performance Portal are in the IBM
Netezza Performance Portal User's Guide, which is available with the software kit for
that interface.

This section does not describe how to install the Netezza system software or how
to upgrade the Netezza host software. Typically, Netezza Support works with you
for any situations that might require software reinstallations, and the steps to
upgrade a Netezza system are described in the IBM Netezza Software Upgrade Guide.

If your users or their business reporting applications access the Netezza system
through ODBC, JDBC, or OLE-DB Provider APIs, see the IBM Netezza ODBC,
JDBC, OLE DB, and .NET Installation and Configuration Guide for detailed
instructions on the installation and setup of these data connectivity clients.
Related concepts:
“Administration interfaces” on page 1-8

Client software packages


If you have access to IBM Passport Advantage® or the IBM Fix Central downloads
area, you can obtain the Netezza client software. You must have support accounts
with permission to download the IBM Netezza software from these locations.

To access Passport Advantage, go to http://www.ibm.com/software/howtobuy/


passportadvantage/pao_customers.htm.

To access Fix Central, go to http://www.ibm.com/support/fixcentral.

The following table lists the supported operating systems and revisions for the
Netezza CLI clients.
Table 2-1. Netezza supported platforms
Operating system 32-bit 64-bit
Windows
Windows 2008, Vista, 7, 8 Intel / AMD Intel / AMD
Windows Server 2012, 2012 R2 N/A Intel / AMD
Linux
Red Hat Enterprise Linux 5.2, 5.3, 5.5, 5.9; and 6 Intel / AMD Intel / AMD
through 6.5
Red Hat Enterprise Linux 6.2+ N/A PowerPC®
Red Hat Enterprise Linux 7.1 N/A POWER8® LE mode

© Copyright IBM Corp. 2001, 2015 2-1


Table 2-1. Netezza supported platforms (continued)
Operating system 32-bit 64-bit
SUSE Linux Enterprise Server 11 Intel / AMD Intel / AMD
®
SUSE Linux Enterprise Server 10 and 11, and IBM System z IBM System z
Red Hat Enterprise Linux 5.x, and Red Hat
Enterprise Hat 6.1
Ubuntu Server 15.04 Intel / AMD Intel / AMD
UNIX
IBM AIX 6.1 with 5.0.2.1 C++ runtime libraries, N/A PowerPC
7.1
HP-UX 11i versions 1.6, 2 (B.11.22 and B.11.23), Itanium Itanium
and 3
Oracle Solaris 10, 11 SPARC SPARC
Oracle Solaris 10 x86 x86

The Netezza client kits are designed to run on the proprietary hardware
architecture for the vendor. For example, the AIX, HP-UX, and Solaris clients are
intended for the proprietary RISC architecture. The Linux client is intended for
RedHat or SUSE on the 32-bit Intel architecture.

Note: Typically, the Netezza clients also support the update releases for each of the
OS versions listed in the table, unless the OS vendor introduced architecture
changes in the update.

Install the Netezza CLI client on a Linux/UNIX system


The IBM Netezza UNIX clients contain a tar file of the client software for a
platform and an unpack script. You use the unpack script to install the client nz
commands and their necessary files to the UNIX client system. “Client software
packages” on page 2-1 lists the supported UNIX client operating systems.

Installing on Linux/UNIX Clients


The topic describes how to install the IBM Netezza UNIX client packages on 32-bit
and 64-bit operating system workstations.

About this task

If you are installing the clients on 64-bit operating systems, there are some
additional steps to install a second, 64-bit client package. The IBM Netezza clients
are 32-bit operating system executables and they require 32-bit libraries that are not
provided with the clients. If the libraries are not already installed on your system,
you must obtain and install the libraries using your operating system update
process.

Procedure
1. Obtain the nz-platformclient-version.archive) client package from the IBM
Fix Central site and download it to the client system. Use or create a new,
empty directory to reduce any confusion with other files or directories. There
are several client packages available for different common operating system
types, as described in “Client software packages” on page 2-1. Make sure that

2-2 IBM Netezza System Administrator’s Guide


you obtain the correct client package. These instructions use the Linux client
package as an example of the procedure.
2. Log in to the workstation as the root user or a superuser account.
3. Change to the directory where you saved the client package, then uncompress
and extract the contents.
For the Linux client, use the gunzip command to uncompress the client
package, then you use a command such as tar xzf nz-linuxclient-
version.tar.gz to extract the package. To extract the other UNIX packages,
such as AIX, you might need to run other commands, such as uncompress to
uncompress the archive.The unpack process for the Linux package creates a
linux directory, a linux64 directory, a webadmin directory, and a
datadirect.package.tar.z file. Ignore the webadmin directory, which contains
the deprecated Web Admin interface client.
4. For all clients (either 32-bit or 64-bit operating system clients), change to the
linux directory and run the unpack command: ./unpack.

Note: On an HP-UX 11i client, /bin/sh might not be available. You can use the
command form sh ./unpack to unpack the client.
The unpack command checks the client system to ensure that it supports the
CLI package and prompts you for an installation location. The default is
/usr/local/nz for Linux, but you can install the CLI tools to any location on
the client. The program prompts you to create the directory if it does not
already exist. Sample command output follows:
------------------------------------------------------------------
IBM Netezza -- NPS Linux Client 7.1
(C) Copyright IBM Corp. 2002, 2013 All Rights Reserved.
------------------------------------------------------------------
Validating package checksum ... ok
Where should the NPS Linux Client be unpacked? [/usr/local/nz]
Directory ’/usr/local/nz’ does not exist; create it (y/n)? [y] Enter
0% 25% 50% 75% 100%
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Unpacking complete.
5. If your client has a 64-bit operating system, change to the linux64 directory
and run the unpack command to install the additional 64-bit files: ./unpack.
The unpack command prompts you for an installation location. The default is
/usr/local/nz for Linux, but you should use the same location that you used
for the 32-bit CLI files in the previous step. Sample command output follows:
------------------------------------------------------------------
IBM Netezza -- NPS Linux Client 7.1
(C) Copyright IBM Corp. 2002, 2013 All Rights Reserved.
------------------------------------------------------------------
Validating package checksum ... ok
Where should the NPS Linux Client be unpacked? [/usr/local/nz]
Installing in an existing directory. Changing permissions to
overwrite existing files...
0% 25% 50% 75% 100%
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Unpacking complete.

Results

The client installation steps are complete, and the Netezza CLI commands are
installed to your specified destination directory. The NPS commands are located in
the bin directory where you unpacked the NPS clients. If you are using a 64-bit
operating system on your workstation, note that there is a 64-bit nzodbcsql
command in the bin64 directory for testing the SQL command connections.

Chapter 2. Netezza client software installation 2-3


What to do next

Test to make sure that you can run the client commands. Change to the bin
subdirectory of the client installation directory (for example, /usr/local/nz/bin).
Run a sample command such as the nzds command to verify that the command
succeeds or to identify any errors.
./nzds -host nzhost -u user -pw password

The command displays a list of the data slices on the target NPS system. If the
command runs without error, your client system has the required libraries and
packages to support the Netezza clients. If the command fails with a library or
other error, the client may require some additional libraries or shared objects.

For example, on a Red Hat Enterprise Linux 64-bit client system, you could see an
error similar to the following:
[root@myrhsystem bin]# ./nzds
-bash: ./nzds: /lib/ld-linux.so.2: bad ELF interpreter: No such file or directory

For example, on a SUSE 10/11 64-bit client system, you could see an error similar
to the following:
mylinux:/usr/local/nz/bin # ./nzds
./nzds: error while loading shared libraries: libssl.so.4: cannot open shared
object file: No such file or directory

mylinux:/usr/local/nz/bin # ldd nzds


linux-gate.so.1 => (0xffffe000)
libcrypt.so.1 => /lib/libcrypt.so.1 (0xf76f1000)
libdl.so.2 => /lib/libdl.so.2 (0xf76ec000)
libssl.so.4 => not found
libcrypto.so.4 => not foundlibm.so.6 => /lib/libm.so.6 (0xf76c4000)
libc.so.6 => /lib/libc.so.6 (0xf7582000)
/lib/ld-linux.so.2 (0xf773f000)

These errors indicate that the client is missing 32-bit library files that are required
to run the NPS clients. Identify the packages that provide the library and obtain
those packages. You may need assistance from your local workstation IT
administrators to obtain the operating system packages for your workstation.

To identify and obtain the required Red Hat packages, you could use a process
similar to the following.
v Use the yum provides command and specify the file name to see which package
provides the file that could not be found (ld-linux.so.2 in this example).
yum provides ld-linux.so.2
Loaded plugins: product-id, refresh-packagekit, security, subscription-manager
This system is not registered to Red Hat Subscription Management. You can use
subscription-manager to register.
RHEL64 | 3.9 kB 00:00 ...
glibc-2.12-1.107.el6.i686 : The GNU libc libraries
Repo : RHEL64
Matched from:
Other : ld-linux.so.2
In this example, the missing package is glibc-2.12-1.107.el6.i686.
v In some cases, the NPS command could report an error for a missing libssl file.
You can use the yum provides command to obtain more information about the
packages that contain the library, and if any of the files already exist on your
workstation.

2-4 IBM Netezza System Administrator’s Guide


yum provides */libssl*
Loaded plugins: product-id, refresh-packagekit, security, subscription-manager
This system is not registered to Red Hat Subscription Management. You can use
subscription-manager to register.
nss-3.14.0.0-12.el6.x86_64 : Network Security Services
Repo : RHEL64
Matched from:
Filename : /usr/lib64/libssl3.soopenssl-devel-1.0.0-27.el6.x86_64 : Files for
: development of applications which will use OpenSSL
Repo : RHEL64
Matched from:
Filename : /usr/lib64/pkgconfig/libssl.pc
Filename : /usr/lib64/libssl.so
To resolve the problem, you may need to obtain and install the package
nss-3.14.0.0-12.el6.x86_64 or you might be able to create a symbolic link if
the library already exists on your system. Use caution when creating symbolic
links or changing the library files. You should consult with your IT department
to ensure that you can obtain the needed packages, or that changes to the
symbolic links will not impact the operation of other applications on your
workstation.

Based on the missing libraries and packages, use the following steps to obtain the
Red Hat packages.
v Mount the Red Hat distribution DVD or ISO file to the client system. Insert the
DVD into the DVD drive.
v Open a terminal window and log in as root.
v Run the following commands:
[root@myrhsystem]# mkdir /mnt/cdrom
[root@myrhsystem]# mount -o ro /dev/cdrom /mnt/cdrom
v Create the text file server.repo in the /etc/yum.repos.d directory.

Note: To use gedit, run the command: gedit /etc/yum.repos.d/server.repo


and add the following text to the file where baseurl is the mount point and the
RHEL distribution. In the example, the mounting point is cdrom and the RHEL
distribution is Workstation but it could be a server or the ISO file.
name=server
baseurl=file:///mnt/cdrom/Workstation
enabled=1
v Run the command: yum clean all
v Run the command to import related public keys: rpm --import
/mnt/cdrom/*GPG*
v Run the following command to install the required libraries: yum install
<package-name> where <package-name> is the file that contains the libraries that
you require for the NPS command operations.

To identify and obtain the required SUSE packages, you could use a process
similar to the following.
v Log in to the SUSE system as root or a superuser.
v If the test NPS command failed with the error that libssl.so.4 or
libcrypto.so.4 or both could not be found, you could be able to resolve the
issue by adding a symbolic link to the missing file from the NPS client
installation directory (for example, /usr/local/nz/lib). Use the ls /lib/libssl*
command to list the available libraries in the standard OS directories. You could
then create symbolic links to one of your existing libssl.so and libcrypto.so
files by using commands similar to the following:

Chapter 2. Netezza client software installation 2-5


mylinux:/usr/local/nz/lib # ln -s /usr/lib/libssl.so.0.9.8 /lib/libssl.so.4
mylinux:/usr/local/nz/lib # ln -s /usr/lib/libcrypto.so.0.9.8 /lib/libcrypto.so.4
v If you are missing other types of files or libraries, use the zypper wp command
and specify the file name to see which package provides it. An example follows.
zypper wp ld-linux.so.2
Loading repository data...
Reading installed packages...
S | Name | Type | Version | Arch | Repository
--+-------------+---------+----------+--------+---------------------------------
i | glibc-32bit | package | 2.9-13.2 | x86_64 | SUSE-Linux-Enterprise-Desktop-11
In this example, the missing package is glibc-32bit.

If the error indicates that you are missing other libraries or packages, use the
following steps to obtain the SUSE packages.
v Open a terminal window and log in as root.
v Run the yast command to open the YaST interface.
v One the YaST Control Center, select Software and go to the software repositories
to configure and enable a DVD, a server, or an ISO file as a repository source.
Select the appropriate source for your SUSE environment. Consult with your IT
department about the policies for package updates in your environment.
v On the Software tab, go to Software Management and search for the required
package or library such as glibc-32bit in this example.
v Click Accept to install the required package.
v Exit YaST by clicking Quit.

Path for Netezza CLI client commands


You can run most of the CLI commands from the IBM Netezza client systems,
except for nzstart and nzstop which run only on the host Netezza system.

To run the CLI commands on Solaris, you must include /usr/local/lib in your
environment variable LD_LIBRARY_PATH. Additionally, to use the ODBC driver on
Linux, Solaris, or HP-UX, you must include /usr/local/nz/lib, or the directory
path to nz/lib where you installed the Netezza CLI tools.
Related reference:
“Command locations” on page 3-3

Removing the CLI clients from UNIX systems


About this task

To remove the client CLI kits from a UNIX system, complete the following steps:

Procedure
1. Change to the directory where you installed the clients. For example,
/usr/local/nz.
2. Delete the nz commands manually.

Install the Netezza tools on a Windows client


The IBM Netezza Client Components for Windows contains the Windows nzsetup.exe
command, which installs the IBM Netezza Windows client tools. The installation
program installs the NzAdmin tool, several nz command-line executable files and
libraries, online help files, and Netezza guides in PDF format.

2-6 IBM Netezza System Administrator’s Guide


Installation requirements
The installation package requires a computer system that runs a supported
Windows operating system such as Windows Vista (32-bit), 2008 (32-bit and 64-bit),
Windows 7 (32-bit and 64-bit), and Windows Server 2012 (64-bit). The client system
must also have either a CD/DVD drive or a network connection.

If you are using or viewing object names that use UTF-8 encoded characters, your
Windows client systems require the Microsoft universal font to display the
characters within the NzAdmin tool. The Arial Unicode MS font is installed by
default on some Windows systems, but you might have to run a manual
installation for other Windows platforms such as 2003 or others. For more
information, see the Microsoft support topic at http://office.microsoft.com/en-us/
help/hp052558401033.aspx.

Installing the Netezza tools


About this task

To install the IBM Netezza tools on Windows, complete the following steps:

Procedure
1. Insert the IBM Netezza Client Components for Windows DVD in your media drive
and go to the admin directory. If you downloaded the client package
(nzsetup.exe) to a directory on your client system, change to that directory.
2. Double-click or run nzsetup.exe.
This program is a standard installation program that consists of a series of
steps in which you select and enter information that is used to configure the
installation. You can cancel the installation at any time.

Results

The installation program displays a license agreement, which you must accept to
install the client tools. You can also specify the following information:
Destination folder
You can use the default installation folder or specify an alternative
location. The default folder is C:\Program Files\IBM Netezza Tools. If you
choose a different folder, the installation program creates the folder if one
does not exist.
Setup type
Select the type of installation: typical, minimal, or custom.
Typical
Install the nzadmin program, the help file, the documentation, and
the console utilities, including the loader.
Minimal
Install the nzadmin program and help files.
Custom
Displays a screen where you can select to install any combination
of the administration application, console applications, or
documentation.

After you complete the selections and review the installation options, the client
installer creates the Netezza Tools folder, which has several subfolders. You cannot
change the subfolder names or locations.

Chapter 2. Netezza client software installation 2-7


Bin Executable files and support files
Doc Copies of the Netezza user guides and an Acrobat Index to search the doc
set
Help Application help files
jre Java™ runtime environment files for the Netezza tools
sys Application string files
Uninstall Netezza Tools
Files to remove Netezza tools from the client system

The installation program displays a dialog when it completes, and on some


systems, it can prompt you to restart the system before you use the application.

The installer stores copies of the software licenses in the installation directory,
which is usually C:\Program Files\IBM Netezza Tools (unless you specified a
different location).

The installation program adds the Netezza commands to the Windows Start >
Programs menu. The program group is IBM Netezza and it has the suboptions
IBM Netezza Administrator and Documentation. The IBM Netezza Administrator
command starts the NzAdmin tool. The Documentation command lists the PDFs of
the installed documentation.

To use the commands in the bin directory, you must open a Windows
command-line prompt (a DOS prompt).

Environment variables
The following table lists the operating system environment variables that the
installation tool adds for the IBM Netezza console applications.
Table 2-2. Environment variables
Variable Operation Setting
PATH append <installation directory>\bin
NZ_DIR set Installation directory (for example C:\Program
Files\IBM Netezza Tools)

Removing the IBM Netezza tools


About this task

You can remove or uninstall the Windows tools by using the Windows Add or
Remove Programs interface in the Control Panel. The uninstallation program
removes all folders, files, menu commands, and environment variables. The
registry entries that are created by other IBM Netezza applications, however, are
not removed.

To remove the IBM Netezza tools from a Windows client, complete the following
steps:

Procedure
1. Click Start > Control Panel > Uninstall. The menu options can vary with each
Windows operating system type.

2-8 IBM Netezza System Administrator’s Guide


2. Select IBM Netezza Tools, then click Uninstall/Change. The Uninstall IBM
Netezza Tools window appears.
3. Click Uninstall. The removal usually completes in a few minutes. Wait for the
removal to complete.
4. Using the File Explorer, check the installation location, which is usually
C:\Program Files\IBM Netezza Tools. If the Windows client was the only
installed Netezza software, you can delete the IBM Netezza Tools folder to
completely remove the application.

Clients and Unicode characters


If you create object names that use characters outside the 7-bit-ASCII character
range, the nzsql command, the ODBC, JDBC, and OLE-DB drivers, NzAdmin tool,
and the IBM Netezza Performance Portal interface all support the entering and
display of those characters. On Windows systems, users must ensure that they
have appropriate fonts that are loaded to support their character sets of choice.

IBM Netezza commands that display object names such as nzload, nzbackup, and
nzsession can also display non-ASCII characters, but they must operate on a
UTF-8 terminal or DOS window to display characters correctly.

For UNIX clients, make sure that the terminal window in which you run these nz
commands uses a UTF-8 locale. The output in the terminal window might not
align correctly.

Typically, Windows clients require two setup steps.

This procedure is a general recommendation that is based on common practices. If


you encounter any difficulty with Windows client setup, see Microsoft Support to
obtain the setup steps for your specific platform and fonts.
1. Set the command prompt to use an appropriate True Type font that contains
the required glyphs. To select a font:
a. Select Start > Programs > Accessories.
b. Right-click Command Prompt and then select Properties from the menu.
c. Select the Font tab. In the Font list, the True Type fixed width font or fonts
are controlled by the registry setting
HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows NT\CurrentVersion\Console\TrueTypeFont
On a standard US system, the font is Lucida Console (which does not
contain UTF-8 mapped glyphs for Kanji). On a Japanese system, the font is
MS Gothic, which contains those glyphs.
2. In a DOS command prompt window, change the code page to UTF-8 by
entering the chcp 65001 command:

As an alternative to these DOS setup steps, the input/output from the DOS clients
can be piped from/to nzconvert and converted to a local code page, such as 932
for Japanese.

On a Windows system, the fonts that you use for your display must meet the
Microsoft requirements as outlined on the Support site at http://
support.microsoft.com/default.aspx?scid=kb;EN-US;Q247815.

Chapter 2. Netezza client software installation 2-9


Client timeout controls
In some customer environments where users connect over VPNs to the IBM
Netezza appliance, users might encounter issues where active SQL sessions
timeout because of VPN/TCP connection settings in the customer environment.
For these environments, Netezza adds TCP KEEPALIVE packet support with the
following new settings in the /nz/data/postgresql.conf file:
tcp_keepidle
The number of seconds between keepalive messages that are sent on an
otherwise idle database connection. A value of 0 uses the system default
(7200 seconds). The idle timeout can be used to keep connections from
being closed by your network controls (such as firewalls, agents, or
network management software) that have idle session timeout settings. In
general, your tcp_keepidle setting should be less than the network idle
timeout settings. If the setting is longer than the network timeout, the
network could close the idle database connections.
tcp_keepinterval
The number of seconds to wait for a keepalive response before
retransmitting the message. A value of 0 uses the system default (75
seconds).
tcp_keepcount
The number of retransmission attempts that must occur before the
connection is considered dead. A value of 0 uses the system default (9
attempts).

After you define (or modify) these settings in the postgresql.conf file, you must
restart the Netezza software to apply the changes.

Netezza port numbers


The IBM Netezza system uses the following port numbers or environmental
variables for the CLI commands and the NzAdmin tool. The following table lists
the default ports and corresponding environment variables:
Table 2-3. Netezza port numbers for database access
Port Environment variable Description
5480 NZ_DBMS_PORT The postgres port for the nzsql command,
NzAdmin tool, ODBC, and JDBC.
5481 NZ_CLIENT_MGR_PORT The port for the CLI and NzAdmin tool
messaging.
5482 NZ_LOAD_MGR_PORT Before to release 3.1, this port handled loads. As of
release 3.1, this port is not required.
5483 NZ_BNR_MGR_PORT The port for the nzbackup and nzrestore
commands.

Netezza personnel, if granted access for remote service, use port 22 for SSH, and
ports 20 and 21 for FTP.

2-10 IBM Netezza System Administrator’s Guide


Changing the default port numbers
About this task

For security or port conflict reasons, you can change one or more default port
numbers for the IBM Netezza database access.

Important: Be careful when you are changing the port numbers for the Netezza
database access. Errors can severely affect the operation of the Netezza system. If
you are not familiar with editing resource shell files or changing environment
variables, contact Netezza Support for assistance.

Before you begin, make sure that you choose a port number that is not already in
use. To check the port number, you can review the /etc/services file to see
whether the port number is specified for another process. You can also use the
netstat | grep port command to see whether the designated port is in use.

To change the default port numbers for your Netezza system, complete the
following steps:

Procedure
1. Log in to the Netezza host as the nz user.
2. Change to the /nz/kit/sys/init directory.
3. Create a backup of the current nzinitrc.sh file:
[nz@nzhost init]$ cp nzinitrc.sh nzinitrc.sh.backup
4. Review the nzinitrc.sh file to see whether the Netezza port or ports that are
listed in Table 2-3 on page 2-10 that you want to change are present in the file.
For example, you might find a section that looks similar to the following, or
you might find that these variables are defined separately within the
nzinitrc.sh file.
# Application Port Numbers
# ------------------------

# To change the application-level port numbers, uncomment the


following lines,
# and then change the numbers to their new values. Note that these
new values
# will need to be set on clients as well.

# NZ_DBMS_PORT=5480; export NZ_DBMS_PORT


# NZ_CLIENT_MGR_PORT=5481; export NZ_CLIENT_MGR_PORT
# NZ_LOAD_MGR_PORT=5482; export NZ_LOAD_MGR_PORT
# NZ_BNR_MGR_PORT=5483; export NZ_BNR_MGR_PORT
# NZ_RECLAIM_MGR_PORT=5484; export NZ_RECLAIM_MGR_PORT
If you do not find your variable or variables in the file, you can edit the file to
define each variable and its new port definition. To define a variable in the
nzinitrc.sh file, use the format NZ_DBMS_PORT=value; export NZ_DBMS_PORT.

Tip: You can append the contents of the nzinitrc.sh.sample file to the
nzinitrc.sh file to create an editable section of variable definitions. You must
be able to log in to the Netezza host as the root user; then, change to the
/nz/kit/sys/init directory and run the following command:
[nz@nzhost init]$cat nzinitrc.sh.backup nzinitrc.sh.sample
>nzinitrc.sh

Chapter 2. Netezza client software installation 2-11


5. Using a text editor, edit the nzinitrc.sh file. For each port that you want to
change, remove the comment symbol (#) from the definition line and specify
the new port number. For example, to change the NZ_DBMS_PORT variable
value to 5486:
NZ_DBMS_PORT=5486; export NZ_DBMS_PORT
# NZ_CLIENT_MGR_PORT=5481; export NZ_CLIENT_MGR_PORT
# NZ_LOAD_MGR_PORT=5482; export NZ_LOAD_MGR_PORT
# NZ_BNR_MGR_PORT=5483; export NZ_BNR_MGR_PORT
# NZ_RECLAIM_MGR_PORT=5484; export NZ_RECLAIM_MGR_PORT
6. Review your changes carefully to make sure that they are correct and save the
file.
If you change the default port numbers, some of the Netezza CLI commands
might no longer work. For example, if you change the NZ_DBMS_PORT or
NZ_CLIENT_MGR_PORT value, commands such as nzds, nzstate, and others
can fail because they expect the default port value. To avoid this problem, copy
the custom port variable definitions in the nzinitrc.sh file to the
/export/home/nz/.bashrc file. You can edit the .bashrc file by using any text
editor.
7. To place the new port value or values into effect, stop and start the Netezza
server by using the following commands:
[nz@nzhost init]$ nzstop
[nz@nzhost init]$ nzstart

Non-default NPS port numbers for clients


If your IBM Netezza system uses non-default port numbers, your client users must
specify the port number when they connect by using commands such as nzsql,
nzload, or by using clients such as NzAdmin. For example, if you change the
NZ_DBMS_PORT number from the default of 5480, your client users must specify
the new port value, otherwise their commands return an error that they cannot
connect to the database server at port 5480.

Some Netezza commands such as nzsql and nzload have a -port option that
allows the user to specify the DB access port. In addition, users can create local
definitions of the environment variables to specify the new port number.

For example, on Windows clients, users can create an NZ_DBMS_PORT user


environment variable in the System Properties > Environment Variables dialog to
specify the non-default port of the Netezza system. For clients such as NzAdmin,
the environment variable is the only way to specify a non-default database port for
a target Netezza system. For many systems, the variable name and value take
effect immediately and are used the next time that you start NzAdmin. When you
start NzAdmin and connect to a system, if you receive an error that you cannot
connect to the Netezza database and the reported port number is incorrect, check
the variable name and value to confirm that they are correct. You might have to
restart the client system for the variable to take effect.

For a Linux system, you can define a session-level variable by using a command
similar to the following format:
$ NZ_DBMS_PORT=5486; export NZ_DBMS_PORT

For the instructions to define environment variables on your Windows, Linux, or


UNIX client, see the operating system documentation for your client.

2-12 IBM Netezza System Administrator’s Guide


If a client user connects to multiple Netezza hosts that each use different port
numbers, those users might have to use the -port option on the commands as an
override, or change the environment variable value on the client before they
connect to each Netezza host.

Encrypted passwords
Database user accounts must be authenticated during access requests to the IBM
Netezza database. For user accounts that use local authentication, Netezza stores
the password in encrypted form in the system catalog. For more information about
encrypting passwords on the host and the client, see the IBM Netezza Advanced
Security Administrator's Guide.

Local authentication requires a password for every account. If you use LDAP
authentication, a password is optional. During LDAP authentication, Netezza uses
the services of an LDAP server in your environment to validate and verify Netezza
database users.
v When you are using the Netezza CLI commands, the clear-text password must
be entered on the command line. You can set the environment variable
NZ_PASSWORD to avoid typing the password on the command line, but the
variable is stored in clear text with the other environment variables.
v To avoid displaying the password on the command line, in scripts, or in the
environment variables, you can use the nzpassword command to create a locally
stored encrypted password.

You cannot use stored passwords with ODBC or JDBC.

The nzpassword command syntax is:


nzpassword add -u user -pw password -host hostname

Where:
v The user name is the Netezza database user name in the Netezza system catalog.
If you do not specify the user name on the command line, the nzpassword
command uses the environment variable NZ_USER.
v The password is the Netezza database user password in the Netezza system
catalog or the password that is specified in the environment variable
NZ_PASSWORD. If you do not supply a password on the command line or in the
environment variable, the system prompts you for a password.
v The host name is the Netezza host. If you do not specify the host name on the
command line, the nzpassword command uses the environment variable NZ_HOST.
You can create encrypted passwords for any number of user name/host pairs.

When you use the nzpassword add command to cache the password, quotation
marks are not required around the user name or password values. You must only
qualify the user name or password with a surrounding set of single quotation
mark, double quotation mark pairs (for example, '"Bob"') if the value is
case-sensitive. If you specify quoted or unquoted names or passwords in
nzpassword or other nz commands, you must use the same quoting style in all
cases.

If you qualify a user name that is not case-sensitive with quotation marks (for
example '"netezza"'), the command might still complete successfully, but it might
not work in all command cases.

Chapter 2. Netezza client software installation 2-13


After you type the nzpassword command, the system sends the encrypted
password to the Netezza host where it is compared against the user
name/password in the system catalog.
v If the information matches, the Netezza stores the encrypted information in a
local password cache, and displays no additional message.
– On Linux and Solaris, the password cache is the file .nzpassword in the user
home directory. The system creates this file without access permissions to
other users, and refuses to accept a password cache whose permission allows
other users access.
– On Windows, the password cache is stored in the registry.
v If the information does not match, Netezza displays a message that indicates
that the authentication request failed. Netezza also logs all verification attempts.
v If the database administrator changed a user password in the system catalog, the
existing nzpasswords are invalid.
Related concepts:
“Logon authentication” on page 11-18

Stored passwords
If client users use the nzpassword command to store database user passwords on a
client system, they can supply only a database user name and host on the
command line. Users can also continue to enter a password on the command line
if displaying clear-text passwords is not a concern for security.

If you supply a password on the command line, it takes precedence over the
environment variable NZ_PASSWORD. If the environment variable is not set, the
system checks the locally stored password file. If there is no password in this file
and you are using the nzsql command, the system prompts you for a password,
otherwise the authentication request fails.

In all cases, using the -pw option on the command line, using the NZ_PASSWORD
environment variable, or using the locally stored password that is stored through
the nzpassword command. IBM Netezza compares the password against the entry
in the system catalog for local authentication or against the LDAP or KERBEROS
account definition. The authentication protocol is the same, and Netezza never
sends clear-text passwords over the network.

In release 6.0.x, the encryption that is used for locally encrypted passwords
changed. In previous releases, Netezza used the Blowfish encryption routines;
release 6.0 now uses the Advanced Encryption Standard AES-256 standard. When
you cache a password by using a release 6.0 client, the password is saved in
AES-256 format unless there is an existing password file in Blowfish format. In that
case, new stored passwords are saved in Blowfish format.

If you upgrade to a release 6.0.x or later client, the client can support passwords in
either the Blowfish format or the AES-256 format. If you want to convert your
existing password file to the AES-256 encryption format, you can use the
nzpassword resetkey command to update the file. If you want to convert your
password file from the AES-256 format to the Blowfish format, use the nzpassword
resetkey -none command.

Important: Older clients, such as those for release 5.0.x and those clients earlier
than release 4.6.6, do not support AES-256 format passwords. If your password file
is in AES-256 format, the older client commands prompt for a password, which can

2-14 IBM Netezza System Administrator’s Guide


cause automated scripts to hang. Also, if you use an older client to add a cached
password to or delete a cached password from an AES-256 format file, you can
corrupt the AES-256 password file and lose the cached passwords. If you typically
run multiple releases of Netezza clients, use the Blowfish format for your cached
passwords.

Chapter 2. Netezza client software installation 2-15


2-16 IBM Netezza System Administrator’s Guide
Chapter 3. Netezza administration interfaces
This section provides a high-level description of the IBM Netezza administration
interfaces, such as the command-line interface, the NzAdmin, and the SQL
commands. This section describes how to access and use these interfaces.

For information about the Netezza Performance Portal, see the IBM Netezza
Performance Portal User's Guide, which is available with the software kit for that
interface.

In general, the Netezza CLI commands are used most often for the various
administration tasks. Many of the tasks can also be performed by using SQL
commands or the interactive interfaces. Throughout this publication, the primary
task descriptions use the CLI commands and reference other ways to do the same
task.

Netezza CLI overview


You can use the IBM Netezza command-line interface (CLI) to issue commands to
manage Netezza software, hardware, and databases.

You can use Netezza CLI commands (also called nz commands) to monitor and
manage a Netezza system. Most nz* commands are issued on the Netezza host
system. Some are included with the Netezza client kits, and some are available in
optional support toolkits and other packages. This publication describes the host
and client nz commands.

Commands and locations


Table 3-1 lists the commonly used nz commands for tasks on the Netezza system,
and their availability on the Netezza host or in the Netezza client kits.

Note: When investigating problems, Netezza support personnel might ask you to
issue other internal nz commands that are not listed.
Table 3-1. Command summary
Host or Client Kit Availability
Netezza Linux Solaris HP AIX Windows
Command Description Host Client Client Client Client Client
nzbackup Backs up an existing v
database.
nzcontents Displays the revision v
and build number of
all the executable files,
plus the checksum of
Netezza binaries.
nzconvert Converts character v v v v v v
encodings for loading
with the nzload
command or external
tables.

© Copyright IBM Corp. 2001, 2015 3-1


Table 3-1. Command summary (continued)
Host or Client Kit Availability
Netezza Linux Solaris HP AIX Windows
Command Description Host Client Client Client Client Client
nzds Manages and displays v v v v v v
information about the
data slices on the
system.
nzevent Displays and manages v v v v v v
event rules.
nzhistcleanupdb Deletes old history v
information from a
history database.
nzhistcreatedb Creates a history v
database with all its
tables, views, and
objects for history
collection and
reporting.
nzhostbackup Backs up the host v
information, including
users and groups.
nzhostrestore Restores the host v
information.
nzhw Displays information v v v v v v
about the hardware
components of a
system.
nzload Loads data into v v v v v v
database files.
nzodbcsql A client command on v v v v
Netezza UNIX clients
that tests ODBC
connectivity.
nzpassword Stores a local copy of v v v v v v
the user's password.
nzreclaim Uses the SQL GROOM v v v v v
TABLE command to
reclaim disk space from
user tables, and also to
reorganize the tables.
nzrestore Restores the contents of v
a database backup.
nzrev Displays the current v v v v v v
software revision for
any Netezza software
release.

3-2 IBM Netezza System Administrator’s Guide


Table 3-1. Command summary (continued)
Host or Client Kit Availability
Netezza Linux Solaris HP AIX Windows
Command Description Host Client Client Client Client Client
nzsession Shows a list of current v v v v v v
system sessions (load,
client, and sql).
Supports filtering by
session type or user,
which you can use to
abort sessions, and
change the current job
list for a queued
session job.
nzspupart Displays information v v v v v v
about the SPU
partitions, including
status information and
the disks that support
the partition.
nzsql Invokes the SQL v v v v v v
command interpreter.
nzstart Starts the system. v
nzstate Displays the current v v v v v v
system state or waits
for a specific system
state to occur before it
returns.
nzstats Displays system level v v v v v v
statistics.
nzstop Stops the system. v
nzsystem Changes the system v v v v v v
state or displays the
current system
information.

Command locations
The following table shows the default location of each CLI command and in which
of the host and client kits they are available:

Command Type Default Location


Netezza host /nz/kit/bin
Linux, Solaris, HP, or AIX client /usr/local/nz/bin
Windows client C:\Program Files\Netezza Tools\Bin

Add the appropriate bin directory to your search path to simplify command
invocation.
Related concepts:
“Path for Netezza CLI client commands” on page 2-6

Command syntax
All IBM Netezza CLI commands have the following top-level syntax options:

Chapter 3. Netezza administration interfaces 3-3


-h Displays help. You can also enter -help.
-rev Displays the software revision level. You can also enter -V.
-hc Displays help for the subcommand (if the command has subcommands).

For many Netezza CLI commands you can specify a timeout. This time is the
amount of time the system waits before it abandons the execution of the command.
If you specify a timeout without a value, the system waits 300 seconds. The
maximum timeout value is 100 million seconds.

Issuing commands
To issue an nz command, you must have access to the IBM Netezza system (either
directly on the Netezza KVM or through a remote shell connection) or you must
install the Netezza client kit on your workstation. If you are accessing the Netezza
system directly, you must be able to log in by using a Linux account (such as nz).

While some of the nz commands can operate and display information without
additional access requirements, some commands and operations require that you
specify a Netezza database user account and password. The account might also
require appropriate access and administrative permissions to display information
or process a command.

The following are several examples.


v To display the state of a Netezza system by using a Windows client command,
run the following command:
C:\Program Files\Netezza Tools\Bin>nzstate show -host mynps -u user
-pw passwd
System state is ’Online’.
v To display the valid Netezza system states by using a Windows client command,
run the following command:
C:\Program Files\Netezza Tools\Bin>nzstate listStates

State Symbol Description


------------ ------------------------------------------------------------
initialized used by a system component when first starting
paused already running queries will complete but new ones are queued
pausedNow like paused, except running queries are aborted
offline no queries are queued, only maintenance is allowed
offlineNow like offline, except user jobs are stopped immediately
online system is running normally
stopped system software is not running
down system was not able to initialize successfully

Note: In this example, you did not have to specify a host, user, or password.
The command displayed information that was available on the local Windows
client.
v To back up a Netezza database (you must run the command while logged in to
the Netezza system, as this is not supported from a client):
[nz@npshost ~]$ nzbackup -dir /home/user/backups -u user -pw
password -db db1
Backup of database db1 to backupset 20090116125409 completed
successfully.

Identifiers in commands
When you use the IBM Netezza commands and specify identifiers for users,
passwords, database names, and other objects, you can pass normal identifiers that

3-4 IBM Netezza System Administrator’s Guide


are unquoted on the Linux command line. The Netezza server performs the
appropriate case-conversion for the identifier.

However, if you use delimited identifiers, the supported way to pass them on the
Linux command line is to use the following syntax:
’\’Identifier\’’

The syntax is single quotation mark, backslash, single quotation mark, identifier,
backslash, single quotation mark, single quotation mark. This syntax protects the
quotation marks so that the identifier remains quoted in the Netezza system.

SQL command overview


IBM Netezza database users, if permitted, can perform some administrative tasks
by using SQL commands while they are logged in through SQL sessions. For
example, users can do the following tasks:
v Manage Netezza users and groups, access permissions, and authentication
v Manage database objects (create, alter, or drop objects, for example)
v Display and manage session settings
v Manage history configurations

Throughout this publication, SQL commands are shown in uppercase (for example,
CREATE USER) to stand out as SQL commands. The commands are not
case-sensitive and can be entered by using any letter casing. Users must have
Netezza database accounts and applicable object or administrative permissions to
do tasks. For detailed information about the SQL commands and how to use them
to do various administrative tasks, see the IBM Netezza Database User’s Guide.

The nzsql command


The nzsql command invokes a SQL command interpreter on the IBM Netezza host
or on a IBM Netezza client system. You can use this SQL command interpreter to
create database objects, run queries, and manage the database.

To run the nzsql command, enter:


nzsql [options] [security options] [dbname [user] [password]]

The following table describes the nzsql command parameters. For more
information about the command parameters and how to use the command, see the
IBM Netezza Database User’s Guide.
Table 3-2. nzsql command parameters
Parameters Description
-a Echo all input from a script.
-A Use unaligned table output mode. This is equivalent to specifying
-P format=unaligned.
-c <query> Run only a single query (or slash command) and exit.
-d <dbname> Specify the name of the database to which to connect. If you do
or not specify this parameter, the nzsql command uses the value
-D <dbname> specified for the NZ_DATABASE environment variable (if it is
specified) or prompts you for a password (if it is not).

Chapter 3. Netezza administration interfaces 3-5


Table 3-2. nzsql command parameters (continued)
Parameters Description
-schema <schemaname> Specify the name of the schema to which to connect. This option
is used for Netezza Release 7.0.3 and later systems that are
configured to use multiple schemas. If the system does not
support multiple schemas, this parameter is ignored. If you do not
specify this parameter, the nzsql command uses the value
specified for the NZ_SCHEMA environment variable (if it is specified)
or a schema that matches the database account name (if it is not
and if enable_user_schema is set to TRUE), or the default schema
for the database (otherwise).
-e Echo queries that are sent to the server.
-E Display queries generated by internal commands.
-f <file name> Run queries from a file, then exit.
-F <string> Set the field separator. The default: is a vertical bar (|). This is
equivalent to specifying -P fieldsep=<string>.
-h Display help for the nzsql command.
-H Set the table output mode to HTML. This is equivalent to
specifying -P format=html.
-host <host> Specify the hostname of the database server.
-l List available databases, then exit.
-n Disable readline mode. This is required when input uses a
double-byte character set such as Japanese, Chinese, or Korean
-o <file name> Send query output to the specified file or, if a vertical bar (|) is
specified instead of a file name, to a pipe.
-O <file name> Send query output and any error messages to the specified file or,
if a vertical bar (|) is specified instead of a file name, to a pipe.
-P opt[=val] Set the printing option represented by opt to the value
represented by val.
-port <port> Specify the database server port.
-pw <password> Specify the password of the database user. If you do not specify
this parameter, the nzsql command uses the value specified for
the NZ_PASSWORD environment variable (if it is specified) or
prompts you to enter a password (if it is not).
-q Run quietly, that is, without issuing messages. Only the query
output is returned.
-r Suppress the row count that otherwise is displayed at the end of
the query output.
-R <string> Set the record separator. The default is the newline character. This
is equivalent to specifying -P recordsep=<string>.
-s Use single-step mode, which requires that each query be
confirmed.
-S Use single-line mode, which causes a newline character to
terminate a query.
-t Print rows only This is equivalent to specifying -P tuples_only.
-time Print the time that is taken by queries.
-T <text> Set the HTML table tag options such as width and border. This is
equivalent to specifying -P tableattr=<text>.

3-6 IBM Netezza System Administrator’s Guide


Table 3-2. nzsql command parameters (continued)
Parameters Description
-u <username> Specifies the database user name. If you do not specify this
or parameter, the nzsql command uses the value specified for the
-U <username> NZ_USER environment variable (if it is specified) or prompts you to
enter a user name (if it is not).
-v <name>=<value> Set the specified session variable to the specified value. Specify
this parameter once for each variable to be set, for example:
nzsql -v HISTSIZE=600 -v USER=user1 -v PASSWORD=password
-V Display version information and exit.
-w Do not require a password for the database user. The password is
supplied by other mechanisms (Kerberos, for example).
-W <password> Specify the password of the database user. (Same as -pw.)
-x Expand table output. This is equivalent to specifying -P expanded.
-X Do not read the startup file (~/.nzsqlrc).
-securityLevel Specify the security level (secured or unsecured) for a client
<level> connection to the Netezza system. This option does not apply
when you are logged in to the Netezza system and running the
command.
preferredUnSecured
You prefer an unsecured connection, but you will accept
a secured connection if the system is configured to use
only secured connections. This is the default.
preferredSecured
You prefer a secured connection, but you will accept an
unsecured connection if the system is configured to use
only unsecured connections.
onlyUnSecured
You require an unsecured connection. If the system is
configured to use only secured connections, the
connection attempt is rejected.
onlySecured
You require a secured connection. If the system is
configured to use only unsecured connections or has a
release level that is earlier than 4.5, the connection
attempt is rejected.
-caCertFile <path> Specify the path to the root CA certificate file on the client system.
This option is used by Netezza clients that use peer authentication
to verify the Netezza host system. The default value is NULL,
which skips the peer authentication process.

Within the nzsql command interpreter, enter the \h slash commands for help about
or to run a command:

Within the nzsql command interpreter, enter the following slash commands for
help about or to run a command:
\h List all SQL commands.
\h <command>
Display help about the specified SQL command.
\? List and display help about all slash commands.

Chapter 3. Netezza administration interfaces 3-7


\g Run a query. This has the same effect as terminating the query with a
semicolon.
\q Quit

nzsql behavior differences on UNIX and Windows clients

Starting in NPS release 7.2.1, the nzsql command is included as part of the
Windows client kit. In a Windows environment, note that there are some
behavioral differences when users press the Enter key or the Control-C key
sequence than in a UNIX nzsql command line environment. The Windows
command prompt environment does not support many of the common UNIX
command formats and options. However, if your Windows client is using a Linux
environment like cygwin or others, the nzsql.exe command could support more of
the UNIX-only command line options noted in the documentation.

In a UNIX environment, if you are typing a multiline SQL query into the nzsql
command line shell, the Enter key acts as a newline character to accept input for
the query until you type the semi-colon character and press Enter. The shell
prompt also changes from => to -> for the subsequent lines of the input.
MYDB.SCH(USER)=> select count(*) (press Enter)
MYDB.SCH(USER)-> from ne_part (press Enter)
MYDB.SCH(USER)-> where p_retailprice < 950.00 (press Enter)
MYDB.SCH(USER)-> ; (press Enter)

COUNT
-------
1274
(1 row)

In a UNIX environment, if you press Control-C, the entire query is cancelled and
you return to the command prompt:
MYDB.SCH(USER)=> select count(*) (press Enter)
MYDB.SCH(USER)-> from ne_part (press Enter)
MYDB.SCH(USER)-> where p_retailprice < 950.00 (press Control-C)
MYDB.SCH(USER)=>

In a Windows client environment, if you are typing a multiline SQL query into the
nzsql command line shell, the Enter key acts similarly as a newline character to
accept input for the query until you type the semi-colon character and press Enter.
MYDB.SCH(USER)=> select count(*) (press Enter)
MYDB.SCH(USER)-> from ne_part (press Enter)
MYDB.SCH(USER)-> where p_retailprice < 950.00 (press Enter)
MYDB.SCH(USER)-> ; (press Enter)

COUNT
-------
1274
(1 row)

However, in a Windows environment, the Control-C or Control-Break key


sequences do not cancel the multiline query, but instead, cancel only that line of
the query input:
MYDB.SCH(USER)=> select count(*) (press Enter)
MYDB.SCH(USER)-> from ne_part (press Enter)
MYDB.SCH(USER)-> where p_retailprice < 950.00 (press Control-C)
MYDB.SCH(USER)-> ; (press Enter)

3-8 IBM Netezza System Administrator’s Guide


COUNT
-------
100000
(1 row)

The Control-C (or a Control-Break) cancelled the WHERE clause on the third input
line, and thus the query results were larger without the restriction. In a single
input line (where the prompt is =>, note that Control-C cancels the query and you
return to the nzsql command prompt.
MYDB.SCH(USER)=> select count(*) from ne_part (press Control-C)
MYDB.SCH(USER)=>

nzsql requires the more command on Windows

When you run the nzsql command on a Windows client, you could see the error
more not recognized as an internal or external command. This error occurs
because nzsql uses the more command to process the query results. The error
indicates that the nzsql command could not locate the more command on your
Windows client.

To correct the problem, add the more.com command executable to your client
system's PATH environment variable. Each Windows OS version has a slightly
different way to modify the environment variables, so refer to your Windows
documentation for specific instructions. On a Windows 7 system, you could use a
process similar to the following:
v Click Start, and then type environment in the search field. In the search results,
click Edit the system environment variables. The System Properties dialog
opens and displays the Advanced tab.
v Click Environment variables. The Environment Variables dialog opens.
v In the System variables list, select the Path variable and click Edit. The Edit
System Variable dialog opens.
v Place the cursor at the end of the Variable value field. You can click anywhere in
the field and then press End to get to the end of the field.
v Append the value C:\Windows\System32; to the end of the Path field. Make
sure that you use a semi-colon character and type a space character at the end of
the string. If your system has the more.com file in a directory other than
C:Windows\System32, use the pathname that is applicable on your client.
v Click OK in the Edit System Variable dialog, then click OK in the Environment
Variables dialog, then click OK in the System Properties dialog.

After you make this change, the nzsql command should run without displaying
the more not recognized error.

The nzsql session history


On UNIX clients, the IBM Netezza system stores the history of your nzsql session
in the file $HOME/.nzsql_history. In interactive sessions, you can also use the
up-arrow key to display the commands that you ran.

On Windows clients, you can use the up-arrow key to display the commands that
ran previously.

By default, an nzsql batch session continues even if the system encounters errors.
You can control this behavior with the ON_ERROR_STOP variable, for example:
nzsql -v ON_ERROR_STOP=

Chapter 3. Netezza administration interfaces 3-9


You do not have to supply a value; defining it is sufficient.

You can also toggle batch processing with a SQL script. For example:
\set ON_ERROR_STOP
\unset ON_ERROR_STOP

You can use the $HOME/.nzsqlrc file to store values, such as the ON_ERROR_STOP,
and have it apply to all future nzsql sessions and all scripts.

Displaying information about databases and objects


Use nzsql slash commands to display information about databases and objects.

The following table describes the slash commands that display information about
objects or privileges within the database, or within the schema if the system
supports multiple schemas.
Table 3-3. The nzsql slash commands
Command Description
\d <object> Describe the named object such as a table, view, or
sequence
\da[+] List user-defined aggregates. Specify + for more detailed
information.
\df[+] List user-defined functions. Specify + for more detailed
information.
\de List temp tables.
\dg List groups (both user and resource groups) except
_ADMIN_.
\dG List user groups and their members.
\dGr List resource groups to which at least one user has been
assigned, including _ADMIN_, and the users assigned to
them.
\di List indexes.
\dm List materialized views
\ds List sequences.
\dt List tables.
\dv List views.
\dx List external tables.
\dy List synonyms.
\dSi List system indexes.
\dSs List system sequences.
\dSt List system tables.
\dSv List system views.
\dMi List system management indexes.
\dMs List system management sequences.
\dMt List system management tables.
\dMv List system management views.
\dp <user> List the privileges that were granted to a user either
directly or by membership in a user group.

3-10 IBM Netezza System Administrator’s Guide


Table 3-3. The nzsql slash commands (continued)
Command Description
\dpg <group> List the privileges that were granted to a user group.
\dpu <user> List the privileges that were granted to a user directly
and not by membership in a user group.
\du List users.
\dU List users who are members of at least one user group
and the groups of which each is a member.

Suppressing row count information


You can use the nzsql -r option or the NO_ROWCOUNT session variable to suppress
the row count information that displays at the end of the query output. A sample
query that displays the row count follows:
mydb.myschema(myuser)=> select count(*) from nation;
COUNT
-------
25
(1 row)

Note: Starting in Release 7.0.3, the nzsql environment prompt has changed. As
shown in the example command, the prompt now shows the database and schema
(mydb.myschema) to which you are connected. For systems that do not support
multiple schemas, there is only one schema that matches the name of the user who
created the database. For systems that support multiple schemas within a database,
the schema name will match the current schema for the connection.

To suppress the row count information, you can use the nzsql -r command when
you start the SQL command-line session. When you run a query, the output does
not show a row count:
mydb.myschema(myuser)=> select count(*) from nation;
COUNT
-------
25

You can use the NO_ROWCOUNT session variable to toggle the display of the row
count information within a session, as follows:
mydb.myschema(myuser)=> select count(*) from nation;
COUNT
-------
25
(1 row)

mydb.myschema(myuser)=> \set NO_ROWCOUNT

mydb.myschema(myuser)=> select count(*) from nation;


COUNT
-------
25

mydb.myschema(myuser)=> \unset NO_ROWCOUNT

mydb.myschema(myuser)=> select count(*) from nation;


COUNT
-------
25
(1 row)

Chapter 3. Netezza administration interfaces 3-11


NzAdmin overview
NzAdmin is a tool that you can use to manage an IBM Netezza the system, obtain
information about the hardware, and manage various aspects of the user
databases, tables, and other objects. Because the NzAdmin tool runs on a Windows
client system, to use it you must first install the IBM Netezza windows client
application as described in “Install the Netezza tools on a Windows client” on
page 2-6. You must have NzAdmin database accounts and applicable object or
administrative privileges to do tasks.

Starting the NzAdmin tool


There are several ways to start an NzAdmin session:
v Click Start > Programs > IBM Netezza > IBM Netezza Administrator.
v Create a shortcut on the desktop.
v Run the nzadmin.exe by using the Run window:

v Run the nzadmin.exe file from a command window. To bypass the login dialog,
enter the following login information:
– -host or /host and the name or IP address of the Netezza host.
– -user or /user and a Netezza user name. The name you specify can be
delimited. A delimited user name is contained in quotation marks.
– -pw or /pw and the password of the specified user. To specify that a saved
password is to be used, enter -pw without entering a password string.
You can specify these parameters in any order, but you must separate them by
spaces or commas. If you specify:
– All three parameters, NzAdmin bypasses the login dialog and connects you
to the host that you specify.
– Less than three parameters, NzAdmin displays the login dialog and prompts
you to complete the remaining fields.

When you log in to the NzAdmin tool you must specify the name of the host, your
user name, and your password. The drop-down list in the host field displays the
host addresses or names that you specified in the past. If you choose to save the
password on the local system, when you log in again, you need to enter only the
host and user names.

3-12 IBM Netezza System Administrator’s Guide


After you log in, the NzAdmin tool checks whether its version level matches that
of the IBM Netezza system. If not, the NzAdmin tool displays a warning message
and disables certain commands. This causes operations involving event rules,
statistics, and system hardware to be unavailable.

Displaying system components and databases


The NzAdmin tool displays information in two panes:
Navigation pane
The pane on the left is the navigation pane. It displays a tree view of the
components that comprise the selected view type (System or Database).
Content pane
The pane on the right is the content pane. It displays information about or
the contents of the item selected in the tree view in the navigation pane.

At the top of the navigation pane there are tabs that you can use to select the view
type:
System
The navigation pane displays components related to system hardware such
as SPA units, SPU units, and data slices.
Database
The navigation pane displays components related to database processing
such as databases, users, groups, and database sessions.

In the status bar at the bottom of the window, the NzAdmin tool displays your
user name and the duration (days, hours, and minutes) of the current NzAdmin
session or, if the host system is not online, a message indicating this.

You can access commands by using the menu bar or the toolbar, or by
right-clicking a object and using its pop-up menu.

Chapter 3. Netezza administration interfaces 3-13


Figure 3-1. NzAdmin main system window

Using the System view


When displaying the System view type, you can navigate among the system
components by using either the navigation pane or the content pane. As you move
the mouse pointer over an image in the content pane, a tool tip displays
information about the corresponding component, such as its hardware ID. Click an
image in the content pane to display more information about the corresponding
component.

For example, as you move the mouse pointer over the image of a SPA unit, a tool
tip displays the slot number, hardware ID, role, and state of each of the SPUs that
comprise it. Clicking a SPU displays the SPU status window and selects the
corresponding object in the tree view shown in the navigation pane.

3-14 IBM Netezza System Administrator’s Guide


Figure 3-2. NzAdmin hyperlink support

Status indicators
Each component has a status indicator:
Table 3-4. Status indicators
Indicator Status Description
Normal The component is operating normally.

Warning The meaning depends on the specific component or components.

Failed The component is down, has failed, or is likely to fail. For example,
if two fans on the same SPA are down, the SPA is flagged as being
likely to fail.
Missing The component is missing and so no state information is available
for it.

Chapter 3. Netezza administration interfaces 3-15


Main menu
The NzAdmin main menu contains the following items:

Command Description
File > New Create a new database, table, view, materialized
view, sequence, synonym, user, or group. Available
only in the Database view.
File > System State Change the system state.
File > Reconnect Reconnect to the system with a different host name,
address, or user name.
File > Exit Exit the NzAdmin tool.
View > Toolbar Show or hide the toolbar.
View > Status Bar Show or hide the status bar.
View > System Objects Show or hide the system tables and views, and the
object privilege lists in the Object Privileges window.
View > SQL Statements Display the SQL window, which shows a subset of
the SQL commands run in this session.
View > Refresh Refresh the current view. This can be either the
System or Database view.
Tools > Workload Management Display workload management information:
Performance
Summary, history, and graph workload
management information.
Settings
The system defaults that determine the
limits on session timeout, row set, query
timeout. and session priority; and the
resource allocation that determines resource
usage among groups.
Tools > Table Skew Display any tables that meet or exceed a specified
skew threshold.
Tools > Table Storage Display table and materialized view storage usage
by database or by user.
Tools > Query History Display a window that you can use to create and
Configuration alter history configurations, and to set the current
configuration.
Tools > Default Settings Display the materialized view refresh threshold.
Tools > Options Display the Preferences tab where you can set the
object naming preferences and whether you want to
automatically refresh the NzAdmin window.
Help > NzAdmin Help Display the online help for the NzAdmin tool.
Help > About NzAdmin Display the NzAdmin and Netezza revision
numbers and copyright text.

Administration commands
You can access system and database administration commands from both the tree
view and the status pane of the NzAdmin tool. In either case, a pop-up menu lists
the commands that can be issued for the selected components.
v To activate a pop-up menu, right-click a component in a list.

3-16 IBM Netezza System Administrator’s Guide


v The Options hyperlink menu is in the top bar of the window.

Setting an automatic refresh interval


About this task

You can manually refresh the current (System or Database) view by clicking the
refresh icon on the toolbar, or by choosing Refresh from a menu. In addition, you
can specify that both views are to be periodically automatically refreshed, and the
refresh interval. To do this:

Procedure
1. In the main menu, click Tools > Options
2. In the Preferences tab, enable automatic refresh and specify a refresh interval.

Results

The refresh interval you specify remains in effect until you change it.

To reduce communication with the server, the NzAdmin tool refreshes data based
on the item you select in the left pane. The following table lists the items and
corresponding data that is retrieved on refresh.
Table 3-5. Automatic refresh
Selected item Data retrieved
Server (system view): All topology and hardware state information.
v SPA Units
v SPA ID n
v SPU units
Event rules Event rules.

Chapter 3. Netezza administration interfaces 3-17


Table 3-5. Automatic refresh (continued)
Selected item Data retrieved
Individual statistic such as DBMS Specific statistic information.
Group
Server (database view) All databases and their associated objects, users,
groups, and session information.
Databases All database information and associated objects.
Database <name> Specific database, table, and view information.
Schemas All schemas within a database (for systems that
support multiple schemas).
Schema <name> Specific schema, table, and view information.
Tables Table information.
Views View information.
Sequences Sequences information.
Synonyms Synonyms information.
Functions User-defined functions information.
Aggregates User-defined aggregates information.
Procedures Stored procedure information.
Users User information.
Groups Group information.
Sessions Session information.

If the NzAdmin tool is busy communicating with the server (for example, if it is
processing a user command or doing a manual refresh), it does not perform an
automatic refresh.

Netezza Performance Portal overview


Use the IBM Netezza Performance Portal to monitor and administer a Netezza
system.

The Netezza Performance Portal is a web-based client for monitoring and


administering Netezza systems. It replaces the deprecated Web Admin client. For
more information, refer to the IBM Netezza Performance Portal User's Guide.

3-18 IBM Netezza System Administrator’s Guide


Chapter 4. Manage Netezza HA systems
The IBM Netezza high availability (HA) solution uses Linux-HA and Distributed
Replicated Block Device (DRBD) as the foundation for cluster management and
data mirroring. The Linux-HA and DRBD applications are commonly used,
established, open source projects for creating HA clusters in various environments.

They are supported by a large and active community for improvements and fixes,
and they also offer the flexibility for Netezza to add corrections or improvements
on a faster basis, without waiting for updates from third-party vendors.

All the Netezza models except the Netezza 100 are HA systems, which means that
they have two host servers for managing Netezza operations. The host server
(often called host within the publication) is a Linux server that runs the Netezza
software and utilities.

Linux-HA and DRBD overview


High-Availability Linux (also called Linux-HA) provides the failover capabilities
from a primary or active IBM Netezza host to a secondary or standby Netezza
host. The main cluster management daemon in the Linux-HA solution is called
Heartbeat. Heartbeat watches the hosts and manages the communication and status
checks of services. Each service is a resource. Netezza groups the Netezza-specific
services into the nps resource group. When Heartbeat detects problems that imply
a host failure condition or loss of service to the Netezza users, Heartbeat can
initiate a failover to the standby host. For details about Linux-HA and its terms
and operations, see the documentation at http://www.linux-ha.org.

Distributed Replicated Block Device (DRBD) is a block device driver that mirrors
the content of block devices (hard disks, partitions, and logical volumes) between
the hosts. Netezza uses the DRBD replication only on the /nz and /export/home
partitions. As new data is written to the /nz partition and the /export/home
partition on the primary host, the DRBD software automatically makes the same
changes to the /nz and /export/home partition of the standby host.

The Netezza implementation uses DRBD in a synchronous mode, which is a tightly


coupled mirroring system. When a block is written, the active host does not record
the write as complete until both the active and the standby hosts successfully write
the block. The active host must receive an acknowledgement from the standby host
that it also completed the write. Synchronous mirroring (DRBD protocol C) is most
often used in HA environments that want the highest possible assurance of no lost
transactions if the active node fails over to the standby node. Heartbeat typically
controls the DRBD services, but commands are available to manually manage the
services.

For details about DRBD and its terms and operations, see the documentation at
http://www.drbd.org.

© Copyright IBM Corp. 2001, 2015 4-1


Differences with the previous Netezza HA solution
In previous releases, the IBM Netezza HA solution used the Red Hat Cluster
Manager as the foundation for managing HA host systems. The Linux-HA solution
uses different commands to manage the cluster. The following table outlines the
common tasks and the commands that are used in each HA environment.
Table 4-1. HA tasks and commands (Old design and new design)
Old command (Cluster
Task Manager) New command (Linux-HA)
Display cluster status clustat -i 5 crm_mon -i5
Relocate NPS service cluadmin -- service /nzlocal/scripts/
relocate nps heartbeat_admin.sh --migrate
Enable the NPS cluadmin -- service crm_resource -r nps -p target_role
service enable nps -v started
Disable the NPS cluadmin -- service crm_resource -r nps -p target_role
service disable nps -v stopped
Start the cluster on service cluster start service heartbeat start
each node
Stop the cluster on service cluster stop service heartbeat stop
each node

Some additional points of differences between the solutions:


v All Linux-HA and DRBD logging information is written to /var/log/messages
on each host.
v In the new cluster environment, pingd replaces netchecker (the Network Failure
Daemon). pingd is a built-in part of the Linux-HA suite.
v The cluster manager HA solution also required a storage array (the MSA500) as
a quorum disk to hold the shared data. A storage array is not used in the new
Linux-HA/DRBD solution, as DRBD automatically mirrors the data in the /nz
and /export/home partitions from the primary host to the secondary host.

Note: The /nzdata and /shrres file systems on the MSA500 are deprecated.
v In some customer environments that used the previous cluster manager solution,
it was possible to have only the active host running while the secondary was
powered off. If problems occurred on the active host, the Netezza administrator
on-site would power off the active host and power on the standby. In the new
Linux-HA DRBD solution, both HA hosts must be operational at all times.
DRBD ensures that the data saved on both hosts is synchronized, and when
Heartbeat detects problems on the active host, the software automatically fails
over to the standby with no manual intervention.
Related concepts:
“Logging and messages” on page 4-12

Linux-HA administration
When you start an IBM Netezza HA system, Heartbeat automatically starts on both
hosts. It can take a few minutes for Heartbeat to start all the members of the nps
resource group. You can use the crm_mon command from either host to observe the
status, as described in “Cluster and resource group status” on page 4-5.

4-2 IBM Netezza System Administrator’s Guide


Heartbeat configuration
Heartbeat uses the /etc/ha.d/ha.cf configuration file first to load its configuration.
The file contains low-level information about fencing mechanisms, timing
parameters, and whether the configuration is v1 (old-style) or v2 (CIB). IBM
Netezza uses the v2 implementation.

CAUTION:
Do not modify the file unless directed to in Netezza documentation or by
Netezza Support.

Cluster Information Base


Most of the Heartbeat configuration is stored in the Cluster Information Base (CIB).
The CIB is on disk at /var/lib/heartbeat/crm/cib.xml. Heartbeat synchronizes it
automatically between the two Netezza hosts.

CAUTION:
Never manually edit the CIB file. You must use cibadmin (or crm_resource) to
modify the Heartbeat configuration. Wrapper scripts like heartbeat_admin.sh
update the file safely.

Note: It is possible to get into a situation where Heartbeat does not start properly
because of a manual CIB modification. The CIB cannot be safely modified if
Heartbeat is not started (that is, cibadmin cannot run). In this situation, you can
run /nzlocal/scripts/heartbeat_config.sh to reset the CIB and /etc/ha.d/ha.cf
to factory-default status. After you do this, it is necessary to run
/nzlocal/scripts/heartbeat_admin.sh --enable-nps to complete the CIB
configuration.

Important information about host 1 and host 2


In the Red Hat cluster manager implementation, the HA hosts were commonly
called HA1 and HA2. The terms stemmed from the hardware and rack
configurations as HA systems were typically multi-rack systems, and HA1 was
located in the first rack (usually the leftmost rack from the front), while HA2 was
located in the second rack of the HA system. Either HA1 or HA2 could serve as
the active or standby host, although HA1 was most often the “default” active host
and so HA1 is often synonymous with the active host. The names HA1 and HA2
are still used to refer to the host servers regardless of their active/standby role.

In IBM Netezza HA system designs, host1/HA1 is configured by default to be the


active host. You can run cluster management commands from either the active or
the standby host. The nz commands must be run on the active host, but the
commands run the same regardless of whether host 1 or host 2 is the active host.
The Netezza software operation is not affected by the host that it runs on; the
operation is identical when either host 1 or host 2 is the active host.

However, when host 1 is the active host, certain system-level operations such as
S-Blade restarts and system reboots often complete more quickly than when host
2/HA2 is the active host. An S-Blade restart can take one to two minutes longer to
complete when host 2 is the active host. Certain tasks such as manufacturing and
system configuration scripts can require host 1 to be the active host, and they
display an error if run on host 2 as the active host. The documentation for these
commands indicates whether they require host 1 to be the active host, or if special
steps are required when host 2 is the active host.

Chapter 4. Manage Netezza HA systems 4-3


Failover timers
There are several failover timers that monitor Heartbeat operations and timings.
The default settings cover the general range of IBM Netezza system
implementations. Although Netezza has not encountered many cases where
environments require different values, each customer environment is unique. Do
not change failover timers without consultation from Netezza Support.

The failover timers are configured in /etc/ha.d/ha.cf.


Deadtime
Specifies the failure detection time (default: 30 seconds). For a busy
Netezza system in a heavily loaded environment, you might increase this
value if you observe frequent No local heartbeat errors or Cluster node
returning after partition errors in the /var/log/messages file.
Warntime
Specifies the warning for late heartbeat (default: 10 seconds).
Keepalive
Specifies the interval between liveness pings (default: 2 seconds).

You can change the settings by editing the values in ha.cf on both hosts and
restarting Heartbeat, but use care when you are editing the file.

Netezza cluster management scripts


IBM Netezza provides wrapper scripts for many of the common cluster
management tasks. These wrapper scripts help to simplify the operations and to
guard against accidental configuration changes that can cause the Netezza HA
operations to fail.

The following table lists the common commands. These commands are listed here
for reference.
Table 4-2. Cluster management scripts
Type Scripts
Initial installation heartbeat_config.sh sets up Heartbeat for the first time
scripts
heartbeat_admin.sh --enable-nps adds Netezza services to
cluster control after initial installation
Host name change heartbeat_admin.sh --change-hostname
Fabric IP change heartbeat_admin.sh --change-fabric-ip
Wall IP change heartbeat_admin.sh --change-wall-ip
Manual migrate heartbeat_admin.sh --migrate
(relocate)
Linux-HA status and crm_mon monitors cluster status
troubleshooting
commands crm_verify sanity checks configuration, and prints status

The following is a list of other Linux-HA commands available. This list is also
provided as a reference, but do not use any of these commands unless directed to
by Netezza documentation or by Netezza Support.

Linux-HA configuration commands:

4-4 IBM Netezza System Administrator’s Guide


cibadmin
Main interface to modify configuration
crm_resource
Shortcut interface for modifying configuration
crm_attribute
Shortcut interface for modifying configuration
crm_diff
Diff and patch two different CIBs

Linux-HA administration commands:


crmadmin
Low-level query and control
crm_failcount
Query and reset failcount
crm_standby
Mark a node as standby, usually for maintenance

Active and standby nodes


There are two ways to determine which IBM Netezza host is the active host and
which is the standby.
v Use the crm_resource command.
v Review the output of the crm_mon command.

A sample crm_resource command and its output follow.


[root@nzhost1 ~]# crm_resource -r nps -W

crm_resource[5377]: 2009/01/31_10:13:12 info: Invoked: crm_resource -r


nps -W
resource nps is running on: nzhost1

The command output displays a message about how it was started, and then
displays the host name where the nps resource group is running. The host that
runs the nps resource group is the active host.

You can obtain more information about the state of the cluster and which host is
active by using the crm_mon command. See the sample output that is shown in
“Cluster and resource group status.”

If the nps resource group is unable to start, or if it has been manually stopped
(such as by crm_resource -r nps -p target_role -v stopped), neither host is
considered the active host and the crm_resource -r nps -W command does not
return a host name.

Cluster and resource group status


To check the state of the cluster and the nps resource group, run the following
command:
crm_mon -i5

Sample output follows. This command refreshes its display every 5 seconds, but
you can specify a different refresh rate (for example, -i10 is a 10-second refresh
rate). Press Control-C to exit the command.

Chapter 4. Manage Netezza HA systems 4-5


[root@nzhost1 ~]# crm_mon -i5
============
Last updated: Wed Sep 30 13:42:39 2009
Current DC: nzhost1 (key)
2 Nodes configured.
3 Resources configured.
============
Node: nzhost1 (key): online
Node: nzhost2 (key): online
Resource Group: nps
drbd_exphome_device (heartbeat:drbddisk): Started nzhost1
drbd_nz_device (heartbeat:drbddisk): Started nzhost1
exphome_filesystem (heartbeat::ocf:Filesystem): Started nzhost1
nz_filesystem (heartbeat::ocf:Filesystem): Started nzhost1
fabric_ip (heartbeat::ocf:IPaddr): Started nzhost1
wall_ip (heartbeat::ocf:IPaddr): Started nzhost1
nz_dnsmasq (lsb:nz_dnsmasq): Started nzhost1
nzinit (lsb:nzinit): Started nzhost1
fencing_route_to_ha1 (stonith:apcmaster): Started nzhost2
fencing_route_to_ha2 (stonith:apcmaster): Started nzhost1

The host that is running the nps resource group is the active host. Every member
of the nps resource group starts on the same host. The sample output shows that
they are all running on nzhost1.

The crm_mon output also shows the name of the Current Designated Coordinator
(DC). The DC host is not an indication of the active host. The DC is an
automatically assigned role that Linux-HA uses to identify a node that acts as a
coordinator when the cluster is in a healthy state. This is a Linux-HA
implementation detail and does not affect Netezza. Each host recognizes and
recovers from failure, regardless of which one is the DC. For more information
about the DC and Linux-HA implementation details, see http://www.linux-
ha.org/DesignatedCoordinator.

The resources under the nps resource group are as follows:


v The DRBD devices:
drbd_exphome_device (heartbeat:drbddisk): Started nzhost1
drbd_nz_device (heartbeat:drbddisk): Started nzhost1
v Both file system mounts:
exphome_filesystem (heartbeat::ocf:Filesystem): Started nzhost1
nz_filesystem (heartbeat::ocf:Filesystem): Started nzhost1
v The 10.0.0.1 IP setup on the fabric interface:
fabric_ip (heartbeat::ocf:IPaddr): Started nzhost1
v The floating wall IP (external IP for HA1 + 3):
wall_ip (heartbeat::ocf:IPaddr): Started nzhost1
v The DNS daemon for Netezza:
nz_dnsmasq (lsb:nz_dnsmasq): Started nzhost1
v The Netezza daemon which performs prerequisite work and then starts the
Netezza software:
nzinit (lsb:nzinit): Started nzhost1

The fence routes for internal Heartbeat use are not part of the nps resource group.
If these services are started, it means that failovers are possible:
fencing_route_to_ha1 (stonith:apcmaster): Started nzhost2
fencing_route_to_ha2 (stonith:apcmaster): Started nzhost1

4-6 IBM Netezza System Administrator’s Guide


The nps resource group
The nps resource group contains the following services or resources:
v drbd_exphome_device
v drbd_nz_device
v exphome_filesystem
v nz_filesystem
v fabric_ip
v wall_ip
v nz_dnsmasq
v nzinit

The order of the members of the group matters; group members are started
sequentially from first to last. They are stopped sequentially in reverse order, from
last to first. Heartbeat does not attempt to start the next group member until the
previous member starts successfully. If any member of the resource group is unable
to start (returns an error or times out), Heartbeat performs a failover to the
standby node.

Failover criteria
During a failover or resource migration, the nps resource group is stopped on the
active host and started on the standby host. The standby host then becomes the
active host.

It is important to differentiate between a resource failover and a resource migration


(or relocation). A failover is an automated event which the cluster manager
performs without human intervention when it detects a failure case. A resource
migration occurs when an administrator intentionally moves the resources to the
standby.

A failover can be triggered by any of the following events:


v Both maintenance network links to the active host are lost.
v All fabric network links to the active host are lost.
v A user manually stops Heartbeat on the active host.
v The active host is cleanly shut down, such as if someone issued the command
shutdown -h on that host.
v The active host is uncleanly shut down, such as during a power failure to the
system (both power supplies fail).
v If any member of the nps resource group cannot start properly when the
resource group is initially started.
v If any one of the following members of the nps resource group fails after the
resource group was successfully started:
– drbd_exphome_device or drbd_nz_device: These correspond to low-level
DRBD devices that serve the shared file systems. If these devices fail, the
shared data would not be accessible on that host.
– exphome_filesystem or nz_filesystem: These are the actual mounts for the
DRBD devices.
– nz_dnsmasq: The DNS daemon for the IBM Netezza system.

Note: If any of these resource group members experiences a failure, Heartbeat first
tries to restart or repair the process locally. The failover is triggered only if that

Chapter 4. Manage Netezza HA systems 4-7


repair or restart process does not work. Other resources in the group that are not
listed previously are not monitored for failover detection.

The following common situations do not trigger a failover:


v Any of the failover criteria that occurs on the STANDBY host while the active
host is working correctly.
Heartbeat might decide to fence (forcibly power cycle) the standby host when it
detects certain failures to try to restore the standby host to a state of good
health.
v A single maintenance network link to the active host is lost.
v Losing some (but not all) of the fabric network links to the active host.
v Network connectivity from the Netezza host (either active or standby) to the
customer network is lost.
v One or both network connections that serve the DRBD network fail.

Relocate to the standby node


The following commands can be used to manually relocate the nps resource group
from the active IBM Netezza node to the standby node. At the conclusion of this
process, the standby node becomes the active node and the previous active node
becomes the standby.

Note: In the previous Netezza Cluster Manager solution, HA1 is the name of the
primary node, and HA2 the secondary node. In Linux-HA/DRBD, either host can
be primary; thus, these procedures call one host as the active host and one as the
standby host.

To relocate the nps resource group from the active host to the standby host:
[root@nzhost1 ~]# /nzlocal/scripts/heartbeat_admin.sh --migrate
Testing DRBD communication channel...Done.
Checking DRBD state...Done.

Migrating the NPS resource group from NZHOST1 to


NZHOST2................................................Complete.
20100112_084039 INFO : Run crm_mon to check NPS’ initialization
status.

The command blocks until the nps resource group stops completely. To monitor
the status, use the crm_mon -i5 command. You can run the command on either
host, although on the active host you must run it from a different terminal
window.

Safe manual control of the hosts and Heartbeat


About this task

In general, you should not have to stop Heartbeat unless the IBM Netezza HA
system requires hardware or software maintenance or troubleshooting. During
these times, it is important that you control Heartbeat to ensure that it does not
interfere with your work by taking STONITH actions to regain control of the hosts.
The recommended practice is to shut down Heartbeat completely for service.

To shut down the nps resource group and Heartbeat, complete the following steps:

Procedure
1. Identify which node is the active node by using the following command:

4-8 IBM Netezza System Administrator’s Guide


[root@nzhost1 ~]# crm_resource -r nps -W
resource nps is running on: nzhost1
2. Stop Heartbeat on the standby Netezza host:
[root@nzhost2 ~]# service heartbeat stop
Stopping High-Availability services:
[ OK ]
This command blocks until it completes successfully. Wait and let the command
complete. You can check /var/log/messages for status messages, or you can
monitor progress on a separate terminal session by using either of the
following commands: tail -f /var/log/messages or crm_mon -i5.
3. Stop Heartbeat on the active Netezza host:
[root@nzhost1 ~]# service heartbeat stop
Stopping High-Availability services:
[ OK ]
In some rare cases, the Heartbeat cannot be stopped by using this process. In
these cases, you can force Heartbeat to stop as described in “Force Heartbeat to
shut down” on page 4-16.
Related concepts:
“Shut down Heartbeat on both nodes without causing relocate” on page 4-16

Transitioning to maintenance (non-Heartbeat) mode


About this task

To enter maintenance mode, complete the following steps:

Procedure
1. While logged in to either host as root, display the name of the active node:
[root@nzhost1 ~]# crm_resource -r nps -W
resource nps is running on: nzhost1
2. As root, stop Heartbeat on the standby node (nzhost2 in this example):
[root@nzhost2 ~]# service heartbeat stop
3. As root, stop Heartbeat on the active node:
[root@nzhost1 ~]# service heartbeat stop
4. As root, make sure that there are no open nz sessions or any open files in the
shared directories /nz, /export/home, or both. For details, see “Checking for
user sessions and activity” on page 4-18.
[root@nzhost1 ~]# lsof /nz /export/home
5. Run the following script in /nzlocal/scripts to make the IBM Netezza system
ready for non-clustered operations. The command prompts you for a
confirmation to continue, shown as Enter in the output.
[root@nzhost1 ~]# /nzlocal/scripts/nz.non-heartbeat.sh
---------------------------------------------------------------
Thu Jan 7 15:13:27 EST 2010

File systems and eth2 on this host are okay. Going on.

File systems and eth2 on other host are okay. Going on.

This script will configure Host 1 or 2 to own the shared disks and
own the fabric.

When complete, this script will have:


mounted /export/home and /nz
aliased 10.0.0.1 on eth2
run the rackenable script appropriate for this host

Chapter 4. Manage Netezza HA systems 4-9


based on the last octet of eth2
being 2 for rack 1 or 3 for rack 2

To proceed, please hit enter. Otherwise, abort this. Enter

Okay, we are proceeding.


Thu Jan 7 15:13:29 EST 2010
Filesystem 1K-blocks Used Available Use% Mounted on
/dev/sda6 16253924 935980 14478952 7% /
/dev/sda10 8123168 435272 7268604 6% /tmp
/dev/sda9 8123168 998808 6705068 13% /usr
/dev/sda8 8123168 211916 7491960 3% /var
/dev/sda7 8123168 500392 7203484 7% /opt
/dev/sda3 312925264 535788 296237324 1% /nzscratch
/dev/sda1 1019208 40192 926408 5% /boot
none 8704000 2228 8701772 1% /dev/shm
/dev/sda12 4061540 73940 3777956 2% /usr/local
/dev/drbd0 16387068 175972 15378660 2% /export/home
/dev/drbd1 309510044 5447740 288340020 2% /nz
Done mounting file systems
eth2:0 Link encap:Ethernet HWaddr 00:07:43:05:8E:26
inet addr:10.0.0.1 Bcast:10.0.15.255 Mask:255.255.240.0
UP BROADCAST RUNNING MULTICAST MTU:9000 Metric:1
Interrupt:122 Memory:c1fff000-c1ffffff

Done enabling IP alias

Running nz_dnsmasq: [ OK ]
nz_dnsmasq started.

Ready to use NPS in non-cluster environment


6. As the nz user, start the Netezza software:
[nz@nzhost1 ~] nzstart

Transitioning from maintenance to clustering mode


About this task

To reinstate the cluster from a maintenance mode, complete the following steps:

Procedure
1. Stop the IBM Netezza software by using the nzstop command.
2. Make sure that Heartbeat is not running on either node. Use the service
heartbeat stop command to stop the Heartbeat on either host if it is running.
3. Make sure that there are no nz user login sessions, and make sure that no users
are in the /nz or /export/home directories. Otherwise, the nz.heartbeat.sh
command is not able to unmount the DRBD partitions. For details, see
“Checking for user sessions and activity” on page 4-18.
4. Run the following script in /nzlocal/scripts to make the Netezza system
ready for clustered operations. The command prompts you for a confirmation
to continue, shown as Enter in the output.
[root@nzhost1 ~]# /nzlocal/scripts/nz.heartbeat.sh
---------------------------------------------------------------
Thu Jan 7 15:14:32 EST 2010

This script will configure Host 1 or 2 to run in a cluster

When complete, this script will have:


unmounted /export/home and /nz
Disabling IP alias 10.0.0.1 from eth2

To proceed, please hit enter. Otherwise, abort this. Enter

4-10 IBM Netezza System Administrator’s Guide


Okay, we are proceeding.
Thu Jan 7 15:14:33 EST 2010
Filesystem 1K-blocks Used Available Use% Mounted on
/dev/sda6 16253924 935980 14478952 7% /
/dev/sda10 8123168 435272 7268604 6% /tmp
/dev/sda9 8123168 998808 6705068 13% /usr
/dev/sda8 8123168 211928 7491948 3% /var
/dev/sda7 8123168 500544 7203332 7% /opt
/dev/sda3 312925264 535788 296237324 1% /nzscratch
/dev/sda1 1019208 40192 926408 5% /boot
none 8704000 2228 8701772 1% /dev/shm
/dev/sda12 4061540 73940 3777956 2% /usr/local
Done unmounting file systems
eth2:0 Link encap:Ethernet HWaddr 00:07:43:05:8E:26
UP BROADCAST RUNNING MULTICAST MTU:9000 Metric:1
Interrupt:122 Memory:c1fff000-c1ffffff

Done disabling IP alias


Shutting down dnsmasq: [ OK ]
nz_dnsmasq stopped.
Ready to use NPS in a cluster environment

Note: If the command reports errors that it is unable to unmount /nz or


/export/home, you must manually make sure that both partitions are mounted
before you run the command again. The script might unmount one of the
partitions, even if the script failed. Otherwise, the script might not run.
5. As root, start the cluster on the first node, which becomes the active node:
[root@nzhost1 ~] service heartbeat start
Starting High-Availability services:
[ OK ]
6. As root, start the cluster on the second node, which becomes the standby node:
[root@nzhost2 ~] service heartbeat start
Starting High-Availability services:
[ OK ]

Configuring Cluster Manager events


About this task

You can configure the Cluster Manager to send events when a failover is caused by
any of the following events:
v Node shutdown
v Node restart
v Node fencing actions (STONITH actions)

To configure the Cluster Manager, complete the following steps:

Procedure
1. Log in to the active host as the root user.
2. Using a text editor, edit the /nzlocal/maillist file as follows. Add the lines
that are shown in bold.
#
#Email notification list for the cluster manager problems
#
#Enter email addresses of mail recipients under the TO entry, one
to a line
#
#Enter email address of from email address (if a non-default is
desired)

Chapter 4. Manage Netezza HA systems 4-11


#under the FROM entry
#
TO:
[email protected]
[email protected]
FROM:
[email protected]
For the TO: email addresses, specify one or more email addresses for the users
who want to receive email about cluster manager events. For the FROM: email
address, specify the email address that you want to use as the sender of the
event email.
3. Save and close the maillist file.
4. Log in as root to the standby host and repeat steps 2 on page 4-11 and 3 on the
standby host.
The /nzlocal/maillist files must be identical on both hosts in the cluster.
5. After you configure the maillist files, test the event mail by shutting down or
restarting either host in the cluster. Your specified TO addresses will receive
email about the event.

Logging and messages


All the logging information is stored in the /var/log/messages file on each host.
The log file on the active host typically contains more information, but messages
can be written to the log files on both hosts. Any event or change in status for
Heartbeat is documented in this log file. If something goes wrong, you can often
find the explanation in this log file. If you are working with IBM Netezza Support
to troubleshoot Linux-HA or DRBD issues, be sure to send a copy of the log files
from both Netezza hosts.
Related reference:
“Differences with the previous Netezza HA solution” on page 4-2

DRBD administration
DRBD provides replicated storage of the data in managed partitions (that is, /nz
and /export/home). When a write occurs to one of these locations, the write action
occurs at both the local node and the peer standby node. Both perform the same
write to keep the data in synchronization. The peer responds to the active node
when finished, and if the local write operation is also successfully finished, the
active node reports the write as complete.

Read operations are always performed by the local node.

The DRBD software can be started, stopped, and monitored by using the
/sbin/service drbd start/stop/status command (as root):

While you can use the status command as needed, only stop and start the DRBD
processes during routine maintenance procedures or when directed by IBM
Netezza Support. Do not stop the DRBD processes on an active, properly working
Netezza HA host to avoid the risk of split-brain.
Related tasks:
“Detecting split-brain” on page 4-14

4-12 IBM Netezza System Administrator’s Guide


Monitor DRBD status
You can monitor the DRBD status by using one of two methods:
v service drbd status
v cat /proc/drbd

Sample output of the commands follows. These examples assume that you are
running the commands on the primary (active) IBM Netezza host. If you run them
from the standby host, the output shows the secondary status first, then the
primary.
[root@nzhost1 ~]# service drbd status
drbd driver loaded OK; device status:
version: 8.2.6 (api:88/proto:86-88)
GIT-hash: 3e69822d3bb4920a8c1bfdf7d647169eba7d2eb4 build by root@nps22094, 2009-06-09
16:25:53
m:res cs st ds p mounted fstype
0:r1 Connected Primary/Secondary UpToDate/UpToDate C /export/home ext3
1:r0 Connected Primary/Secondary UpToDate/UpToDate C /nz ext3
[root@nzhost1 ~]# cat /proc/drbd
version: 8.2.6 (api:88/proto:86-88)
GIT-hash: 3e69822d3bb4920a8c1bfdf7d647169eba7d2eb4 build by root@nps22094, 2009-06-09
16:25:53
0: cs:Connected st:Primary/Secondary ds:UpToDate/UpToDate C r---
ns:15068 nr:1032 dw:16100 dr:3529 al:22 bm:37 lo:0 pe:0 ua:0 ap:0 oos:0
1: cs:Connected st:Primary/Secondary ds:UpToDate/UpToDate C r---
ns:66084648 nr:130552 dw:66215200 dr:3052965 al:23975 bm:650 lo:0 pe:0 ua:0 ap:0 oos:0

In the sample output, the DRBD states are one of the following values:
Primary/Secondary
The "healthy" state for DRBD. One device is Primary and one is Secondary.
Secondary/Secondary
DRBD is in a suspended or waiting mode. This usually occurs at boot time
or when the nps resource group is stopped.
Primary/Unknown
One node is available and healthy, the other node is either down or the
cable is not connected.
Secondary/Unknown
This is a rare case where one node is in standby, the other is either down
or the cable is not connected, and DRBD cannot declare a node as the
primary/active node. If the other host also shows this status, the problem
is most likely in the connection between the hosts. Contact Netezza
Support for assistance in troubleshooting this case.

The common Connection State values include the following values:


Connected
The normal and operating state; the host is communicating with its peer.
WFConnection
The host is waiting for its peer node connection; usually seen when other
node is rebooting.
Standalone
The node is functioning alone because of a lack of network connection
with its peer. It does not try to reconnect. If the cluster is in this state, it
means that data is not being replicated. Manual intervention is required to
fix this problem.

Chapter 4. Manage Netezza HA systems 4-13


The common State values include the following values:
Primary
The primary image; local on active host.
Secondary
The mirror image, which receives updates from the primary; local on
standby host.
Unknown
Always on other host; state of image is unknown.

The common Disk State values include the following values:


UpToDate
The data on the image is current.
DUnknown
This value is an unknown data state; usually results from a broken
connection.

Sample DRBD status output


The DRBD status before Heartbeat start:
M:res cs st ds p mounted fstype
0:r1 Connected Secondary/Secondary UpToDate/UpToDate C
1:r0 Connected Secondary/Secondary UpToDate/UpToDate C

The DRBD status when the current node is active and the standby node is down:
m:res cs st ds p mounted fstype
0:r1 WFConnection Primary/Unknown UpToDate/DUnknown C /export/home ext3
1:r0 WFConnection Primary/Unknown UpToDate/DUnknown C /nz ext3

The DRBD status as displayed from the standby node:


m:res cs st ds p mounted fstype
0:r1 Connected Secondary/Primary UpToDate/UpToDate C
1:r0 Connected Secondary/Primary UpToDate/UpToDate C

Detecting split-brain
About this task

Split-brain is an error state that occurs when the images of data on each IBM
Netezza host are different. It typically occurs when synchronization is disabled and
users change data independently on each Netezza host. As a result, the two
Netezza host images are different, and it becomes difficult to resolve what the
latest, correct image should be.

Important: Split-brain does not occur if clustering is enabled. The fencing controls
prevent users from changing the replicated data on the standby node. Allow DRBD
management to be controlled by Heartbeat to avoid the split-brain problems.

However, if a split-brain problem occurs, the following message is written to the


/var/log/messages file:
Split-Brain detected, dropping connection!

While DRBD does have automatic correction processes to resolve split-brain


situations, the Netezza implementation disables the automatic correction. Manual
intervention is required, which is the best way to ensure that as many of the data
changes are restored as possible.

4-14 IBM Netezza System Administrator’s Guide


To detect and repair split-brain, work with IBM Support to follow this procedure:

Procedure
1. Look for Split in /var/log/messages, usually on the host that you are trying to
make the primary/active host. Let DRBD detect this condition.
2. Because split-brain results from running both images as primary Netezza hosts
without synchronization, check the Netezza logs on both hosts. For example,
check the pg.log files on both hosts to see when/if updates occur. If there is an
overlap in times, both images have different information.
3. Identify which host image, if either, is the correct image. In some cases, neither
host image might be fully correct. You must choose the image that is the more
correct. The host that has the image which you decide is correct is the
“survivor”, and the other host is the “victim”.
4. Perform the following procedure:
a. Log in to the victim host as root and run these commands:
drbdadm secondary resource
drbdadm disconnect resource
drbdadm -- --discard-my-data connect resource

where resource can be r0, r1, or all.


Complete these steps for one resource at a time; that is, run all the
commands in steps a. and b. for r0 and then repeat them all for r1. There is
an all option, but use it carefully. The individual resource commands
usually work more effectively.
b. Log in to the survivor host as root and run this command:
drbdadm connect resource

where resource can be r0, r1, or all

Note: The connect command might display an error that instructs you to
run drbdadm disconnect first.
5. Check the status of the fix by using drbdadm primary resource and the service
drbd status command. Make sure that you run drbdadm secondary resource
before you start Heartbeat.
Related concepts:
“DRBD administration” on page 4-12

Administration reference and troubleshooting


The following sections describe some common administration task references and
troubleshooting steps.

IP address requirements
The following table is an example block of the eight IP addresses that are
recommended for a customer to reserve for an HA system:
Table 4-3. HA IP addresses
Entity Sample IP address
HA1 172.16.103.209
HA1 Host Management 172.16.103.210
Floating IP 172.16.103.212

Chapter 4. Manage Netezza HA systems 4-15


Table 4-3. HA IP addresses (continued)
Entity Sample IP address
HA2 172.16.103.213
HA2 Host Management 172.16.103.214
Reserved 172.16.103.215
Reserved 172.16.103.216

In the IP addressing scheme, there are two host IPs, two host management IPs, and
the floating IP, which is HA1 + 3.

Force Heartbeat to shut down


There might be times when you try to stop Heartbeat by using the normal process
as described in “Safe manual control of the hosts and Heartbeat” on page 4-8, but
Heartbeat does not stop even after it waits a few minutes. If you must stop
Heartbeat, you can use the following command to force Heartbeat to stop itself:
crmadmin -K hostname

You must run this command twice. Then, try to stop Heartbeat again by using
service heartbeat stop. This process might not stop all of the resources that
Heartbeat manages, such as /nz mount, drbd devices, nzbootpd, and other
resources.

Shut down Heartbeat on both nodes without causing relocate


If you stop Heartbeat on the active node first, Linux-HA identifies this as a
resource failure and initiates a failover to the standby node. To avoid this, always
stop Heartbeat on the standby first. After it stops completely, you can stop
Heartbeat on the active node.
Related tasks:
“Safe manual control of the hosts and Heartbeat” on page 4-8

Restart Heartbeat after maintenance network issues


If a host loses its maintenance network connection to the system devices, the IBM
Netezza HA system performs a fencing operation (STONITH) to stop the failed
host. After the host restarts, Heartbeat fails to start on the reboot. After the
maintenance network is repaired, you must manually restart Heartbeat to resume
normal cluster operations. To restart Heartbeat on the recovered node, log in to
that host as root and use the service heartbeat start command.

Resolve configuration problems


If you make a configuration change to the nps resource group or Heartbeat, and
there are problems after the change, you can often diagnose the problem from the
status information of the crm_verify command:
crm_verify -LVVVV

You can specify one or more V characters. The more Vs that you specify, the more
verbose the output. Specify at least four or five Vs and increase the number as
needed. You can specify up to 12 Vs, but that large a number is not recommended.

Sample output follows:

4-16 IBM Netezza System Administrator’s Guide


[root@ nzhost1 ha.d]# crm_verify -LVVV
crm_verify[18488]: 2008/11/18_00:02:03 info: main: =#=#=#=#= Getting XML
=#=#=#=#=
crm_verify[18488]: 2008/11/18_00:02:03 info: main: Reading XML from: live
cluster
crm_verify[18488]: 2008/11/18_00:02:03 notice: main: Required feature set:
1.1
crm_verify[18488]: 2008/11/18_00:02:03 notice: cluster_option: Using
default value ’60s’ for cluster option ’cluster-delay’
crm_verify[18488]: 2008/11/18_00:02:03 notice: cluster_option: Using
default value ’-1’ for cluster option ’pe-error-series-max’
crm_verify[18488]: 2008/11/18_00:02:03 notice: cluster_option: Using
default value ’-1’ for cluster option ’pe-warn-series-max’
crm_verify[18488]: 2008/11/18_00:02:03 notice: cluster_option: Using
default value ’-1’ for cluster option ’pe-input-series-max’
crm_verify[18488]: 2008/11/18_00:02:03 notice: cluster_option: Using
default value ’true’ for cluster option ’startup-fencing’
crm_verify[18488]: 2008/11/18_00:02:03 info: determine_online_status:
Node nzhost1 is online
crm_verify[18488]: 2008/11/18_00:02:03 info: determine_online_status:
Node nzhost2 is online

Fixed a problem, but crm_mon still shows failed items


Heartbeat sometimes leaves error status on crm_mon output, even after an item is
fixed. To resolve this problem, use crm_resource in Cleanup Mode:
crm_resource -r name_of_resource -C -H hostname

For example, if the fencing route to ha1 is listed as failed on host1, use the
crm_resource -r fencing_route_to_ha1 -C -H host1 command.

Output from crm_mon does not show the nps resource group
If the log messages indicate that the nps resource group cannot run anywhere, the
cause is that Heartbeat tried to run the resource group on both HA1 and HA2, but
it failed in both cases. Search in /var/log/messages on each host to find this first
failure. Search from the bottom of the log for the message cannot run anywhere
and then scan upward in the log to find the service failures. You must fix the
problems that caused a service to fail to start before you can successfully start the
cluster.

After you fix the failure case, you must restart Heartbeat following the instructions
in “Transitioning from maintenance to clustering mode” on page 4-10.

Linux users and groups required for HA


To operate properly, Heartbeat requires the following Linux user and groups that
are added automatically to each of the IBM Netezza hosts during the Heartbeat
RPM installation:
v User: hacluster:x:750:750::/home/hacluster:/bin/bash
v Groups:
– hacluster:x:750:
– haclient:x:65:

Do not modify or remove the user or groups because those changes will impact
Heartbeat and disrupt HA operations on the Netezza system.
Related concepts:
“Initial system setup and information” on page 1-1

Chapter 4. Manage Netezza HA systems 4-17


Checking for user sessions and activity
About this task

Open nz user sessions and nz user activity can cause the procedures to stop
Heartbeat and to return to clustering to fail. Use the nzsession command to see
whether there are active database sessions in progress. For example:
[nz@nzhost1 ~]$ nzsession -u admin -pw password
ID Type User Start Time PID Database State Priority
Name Client IP Client PID Command
----- ---- ----- ----------------------- ----- -------- ------
------------- --------- ---------- ------------------------
16748 sql ADMIN 14-Jan-10, 08:56:56 EST 4500 CUST active normal
127.0.0.1 4499 create table test_2
16753 sql ADMIN 14-Jan-10, 09:12:36 EST 7748 INV active normal
127.0.0.1 7747 create table test_s
16948 sql ADMIN 14-Jan-10, 10:14:32 EST 21098 SYSTEM active normal
127.0.0.1 21097 SELECT session_id, clien

The sample output shows three sessions: the last entry is the session that is created
to generate the results for the nzsession command. The first two entries are user
activity. Wait for those sessions to complete or stop them before you use the
nz.heartbeat.sh or nz.non-heartbeat.sh commands.

To check for connections to the /export/home and /nz directory, complete the
following steps:

Procedure
1. As the nz user on the active host, stop the IBM Netezza software:
[nz@nzhost1 ~]$ /nz/kit/bin/nzstop
2. Log out of the nz account and return to the root account; then use the lsof
command to list any open files that are in /nz or /export/home.

Results

The output that is displayed will look similar to this:


[root@nzhost1 ~]# lsof /nz /export/home
COMMAND PID USER FD TYPE DEVICE SIZE NODE NAME
bash 2913 nz cwd DIR 8,5 4096 1497025 /export/home/nz
indexall. 4493 nz cwd DIR 8,5 4096 1497025 /export/home/nz
less 7399 nz cwd DIR 8,5 4096 1497025 /export/home/nz
lsof 13205 nz cwd DIR 8,5 4096 1497025 /export/home/nz
grep 13206 nz cwd DIR 8,5 4096 1497025 /export/home/nz
tail 22819 nz 3r REG 8,5 146995 1497188 /export/home/nz/f_5.log

This example shows several open files in the /export/home directory. If necessary,
you can close open files by issuing a command such as kill and supplying the
process ID (PID) shown in the second column from the left. Use caution with the
kill command; if you are not familiar with Linux system commands, contact
Support or your Linux system administrator for assistance.

4-18 IBM Netezza System Administrator’s Guide


Chapter 5. Manage the Netezza hardware
This section describes administration tasks for hardware components of the IBM
Netezza appliance. Most of the administration tasks focus on obtaining status and
information about the operation of the appliance, and in becoming familiar with
the hardware states. This section also describes tasks to perform if a hardware
component fails.

Netezza hardware components


The IBM Netezza appliance has a number of hardware components that support
the operation of the device. The Netezza appliance consists of one or more racks of
hardware, with host servers, switches, SPUs, disks, power controllers, cooling
devices, I/O cards, management modules, and cables. However, in the day-to-day
administration of the device, only a subset of these components require
administrative attention of any kind. Many of these components are redundant and
hot-swappable to ensure highly available operation of the hardware.

The following table lists the key hardware components to monitor:


Table 5-1. Key Netezza hardware components to monitor
Component Description Comments/Management Focus
Host Each Netezza HA system has one or Tasks include monitoring of the
servers two host servers to run the Netezza hardware status of the active/standby
software and supporting applications. hosts, and occasional monitoring of
If a system has two host servers, the disk space usage on the hosts. At
hosts operate in a highly available times, the host might require Linux
(HA) configuration; that is, one host OS or health driver upgrades to
is the active or primary host, and the improve its operational software.
other is a standby host ready to take
over if the active host fails.
Snippet SPAs contain the SPUs and associated Tasks include monitoring of the SPA
processing disk storage which drive the query environment, such as fans, power,
arrays processing on the Netezza appliance. and temperature. SPUs and disks are
(SPAs) IBM Netezza 100 systems have one monitored separately.
host server and thus are not HA
configurations.
Snippet SPUs provide the CPU, memory, and Tasks include monitoring the status
Processing Netezza FPGA processing power for of each SPU. If a SPU fails, the disks
Units the queries that run on the system. that it “owns” are redirected to other
(SPUs) SPUs for processing ownership.
Storage In the IBM Netezza High Capacity Tasks include monitoring the status
group Appliance C1000 model, disks reside of the disks within the storage group.
within a storage group. The storage
group consists of three disk
enclosures: an intelligent storage
enclosure with redundant hardware
RAID controllers, and two expansion
disk enclosures. There are four
storage groups in each Netezza C1000
rack.

© Copyright IBM Corp. 2001, 2015 5-1


Table 5-1. Key Netezza hardware components to monitor (continued)
Component Description Comments/Management Focus
Disks Disks are the storage media for the Tasks include monitoring the health
user databases and tables that are and status of the disk hardware. If a
managed by the Netezza appliance. disk fails, tasks include regenerating
the disk to a spare and replacing the
disk.
Data slices Data slices are virtual partitions on Tasks include monitoring the
the disks. They contain user mirroring status of the data slices and
databases and tables, and their also the space consumption of the
content is mirrored to ensure HA data slice.
access to the data in the event of a
disk failure.
Fans and These components control the Tasks include monitoring the status
blowers thermal cooling for the racks and of the fans and blowers, and if a
components such as SPAs and disk component fails, replacing the
enclosures. component to ensure proper cooling
of the hardware.
Power These components provide electrical Tasks include monitoring the status
supplies power to the various hardware of the power supplies, and if a
components of the system. component fails, replacing the
component to ensure redundant
power to the hardware.

The Netezza appliance uses SNMP events (described in Chapter 8, “Event rules,”
on page 8-1) and status indicators to send notifications of any hardware failures.
Most hardware components are redundant; thus, a failure typically means that the
remaining hardware components assume the work of the component that failed.
The system might or might not be operating in a degraded state, depending on the
component that failed.

CAUTION:
Never run the system in a degraded state for a long time. It is imperative to
replace a failed component in a timely manner so that the system returns to an
optimal topology and best performance.

Netezza Support and Field Service work with you to replace failed components to
ensure that the system returns to full service as quickly as possible. Most of the
system components require Field Service support to replace. Components such as
disks can be replaced by customer administrators.

Display hardware components


You use the nzhw show command to display information about the hardware
components of your IBM Netezza system. You can also use the NzAdmin tool or
IBM Netezza Performance Portal interface to display hardware information and
status.

The following figure shows some sample output of the nzhw show command:

5-2 IBM Netezza System Administrator’s Guide


Figure 5-1. Sample nzhw show output

Legend:
1 Hardware type
2 Hardware ID
3 Hardware role
4 Hardware state
5 Security

For an IBM Netezza High Capacity Appliance C1000 system, the output of the
nzhw show command: includes information about the storage groups:

Chapter 5. Manage the Netezza hardware 5-3


Figure 5-2. Sample nzhw show output for Netezza C1000

Related reference:
“The nzhw command” on page A-28
Use the nzhw command to manage the hardware of the IBM Netezza system.

Hardware types
Each hardware component of the IBM Netezza system has a type that identifies the
hardware component.

The following table describes the hardware types. You see these types when you
run the nzhw command or display hardware by using the NzAdmin or IBM
Netezza Performance Portal UIs.
Table 5-2. Hardware description types
Description Comments
Rack A hardware rack for the Netezza system
SPA Snippet processing array (SPA)
SPU Snippet processing unit (SPU)
Disk enclosure A disk enclosure chassis, which contains the disk devices
Disk A storage disk, contains the user databases and tables
Fan A thermal cooling device for the system
Blower A fan pack used within the S-Blade chassis for thermal cooling
Power supply A power supply for an enclosure (SPU chassis or disk)
MM A management device for the associated unit (SPU chassis, disk
enclosure). These devices include the AMM and ESM components, or a
RAID controller for an intelligent storage enclosure in a Netezza C1000
system.
Store group A group of three disk enclosures within a Netezza C1000 system
managed by redundant hardware RAID controllers
Ethernet switch Ethernet switch (for internal network traffic on the system)
Host A high availability (HA) host on the Netezza appliance
SAS Controller A SAS controller within the Netezza HA hosts

5-4 IBM Netezza System Administrator’s Guide


Table 5-2. Hardware description types (continued)
Description Comments
Host disk A disk resident on the host that provides local storage to the host
Database A Netezza Database Accelerator Card (DAC), which is part of the
accelerator card S-Blade/SPU pair

Hardware IDs
Each hardware component has a unique hardware identifier (ID) that is in the
form of an integer, such as 1000, 1001, or 1014. You can use the hardware ID to
manage a specific hardware component, or to uniquely identify which component
in command output or other informational displays.

To display information about a component by the hardware ID, specify the ID in


the command as in the following example:
[nz@nzhost ~]$ nzhw show -id 1609
Description HW ID Location Role State Security
------------- ----- -------------------------- --------- ----------- --------
Disk 1609 spa1.diskEncl1.disk13 Active Ok Enabled

Hardware location
IBM Netezza uses two formats to describe the position of a hardware component
within a rack.
v The logical location is a string in a dot format that describes the position of a
hardware component within the Netezza rack. For example, the nzhw output that
is shown in Figure 5-1 on page 5-3 shows the logical location for components; a
Disk component description follows:
Disk 1609 spa1.diskEncl1.disk13 Active Ok Enabled
In this example, the location of the disk is in SPA 1, disk enclosure one, disk
position one.
Similarly, the disk location for a disk on a system shows the location including
storage group:
Disk 1029 spa1.storeGrp1.diskEncl2.disk5 Active Ok
v The physical location is a text string that describes the location of a component.
You can display the physical location of a component by using the nzhw locate
command. For example, to display the physical location of disk ID 1011:
[nz@nzhost ~]$ nzhw locate -id 1011
Turned locator LED ’ON’ for Disk: Logical
Name:’spa1.diskEncl4.disk1’ Physical Location:’1st Rack, 4th
DiskEnclosure, Disk in Row 1/Column 1’
As shown in the command output, the nzhw locate command also lights the
locator LED for components such as SPUs, disks, and disk enclosures. For
hardware components that do not have LEDs, the command displays the
physical location string.

The following figure shows an IBM PureData System for Analytics N200x-010
system with a closer view of the storage arrays and SPU chassis components and
locations.

Chapter 5. Manage the Netezza hardware 5-5


Figure 5-3. IBM PureData System for Analytics N200x system components and locations

A Each IBM PureData System for Analytics N200x rack is one array of disk
enclosures. There are 12 enclosures in a full rack configuration, and IBM
PureData System for Analytics N200x-005 half racks have 6 enclosures.
Each disk enclosure has 24 disks, numbered 1 to 24 from left to right on
the front of the rack.
B SPU1 occupies slots 1 and 2. SPU3 occupies slots 3 and 4, up to SPU13
which occupies slots 13 and 14
C The disk enclosures
D Host 1, host 2, and a KVM
E SPU chassis

The following figure shows an IBM Netezza 1000-12 system or an IBM PureData
System for Analytics N1001-010 with a closer view of the storage arrays and SPU
chassis components and locations.

5-6 IBM Netezza System Administrator’s Guide


Figure 5-4. IBM Netezza system components and locations

A Each disk array has four disk enclosures. Each enclosure has 12 disks,
numbered as in the chart shown in the figure.
B SPU1 occupies slots 1 and 2. SPU3 occupies slots 3 and 4, up to SPU11
which occupies slots 11 and 12
C Disk array 1 with four enclosures.
D Disk array 2 with four enclosures.
E Host 1, host 2, and a KVM
F SPU chassis 1
G SPU chassis 2

For detailed information about the locations of various components in the front
and back of the system racks, see the Site Preparation and Specifications guide for
your model type.

The following figure shows an IBM PureData System for Analytics N3001-001
system with host and disk numbering.

Chapter 5. Manage the Netezza hardware 5-7


Figure 5-5. IBM PureData System for Analytics N3001-001 components and locations

A The host marked in the figure is HA1. It is always placed in the rack
directly above HA2.
B The first disk in the host occupies the slot labeled as 0, the second one
occupies slot 1, and, following this pattern, the last disk resides in slot 23.

Sample output of the nzhw locate command on this system looks like the
following:
[nz@v10-12-h1 ~]$ nzhw locate -id 1011

Turned locator LED ’ON’ for disk: Logical Name:’spa1.diskEncl1.disk12’


Physical Location:’upper host, host disk in slot 11’.

Hardware roles
Each hardware component of the IBM Netezza system has a hardware role, which
represents how the hardware is being used. The following table describes the
hardware roles. You see these roles when you run the nzhw command or display
hardware status by using the NzAdmin or IBM Netezza Performance Portal UIs.
Table 5-3. Hardware roles
Role Description Comments
None The None role indicates that the hardware All active SPUs must be
is initialized, but it has yet to be discovered discovered before the system
by the Netezza system. This process can make the transition from
usually occurs during system startup the Discovery state to the
before any of the SPUs send their discovery Initializing state.
information.
Active The hardware component is an active Normal system state
system participant. Failing over this device
can impact the Netezza system.

In model N3001-001, for host disks 9 - 24,


this role means that the disk is currently
associated with the virtual SPU.

5-8 IBM Netezza System Administrator’s Guide


Table 5-3. Hardware roles (continued)
Role Description Comments
Assigned The hardware is transitioning from spare to Transitional state
active. In Netezza 100, IBM Netezza 1000,
IBM PureData System for Analytics
appliances, this is the role when a disk is
involved in a regeneration. It is not yet
active, so it cannot participate in queries.
Failed The hardware failed. It cannot be used as a Monitor your supply of spare
spare. After maintenance is performed, you disks. Do not operate without
must activate the hardware by using the spare disks.
nzhw command before it can become a
spare and used in the system.
Inactive The hardware is not available for any The Inactive role could refer to
system operations. an S-Blade that was
deactivated, or a new
In model N3001-001, for host disks 9 - 24, replacement disk that was
this role means that the disk is currently added to the system.
not associated with the virtual SPU. To
associate the disk with virtual SPU, run the For an S-Blade, you must
nzhw activate command. activate the hardware by using
the nzhw command before it
can become a spare and used
in the system.

After a new disk is added to


the system, its role is set to
Inactive. The disk transitions
to Sparing while the system
performs firmware checks,
updates, and formats the disk,
and then the disk becomes a
Spare.
Mismatched This role is specific to disks. If the disk has To use the disk as a spare,
a UUID that does not match the host activate it, otherwise, remove it
UUID, then it is considered mismatched. from the system. To delete it
You must activate the hardware by using from the system catalog, use
the nzhw command before it can become a the nzhw delete command.
spare and used in the system.
Spare The hardware is not used in the current Normal system state.
running Netezza system, but it is available
to become active in the event of a failover.
Sparing A replacement disk is in the process of A transitional system state
firmware updates and formatting. where a replacement disk is
checked and formatted before
it can become a spare disk.
Incompatible The hardware is incompatible with the Some examples are disks that
system. It must be removed and replaced are smaller in capacity than
with compatible hardware. the smallest disk in use, or
blade cards which are not
Netezza SPUs.

Hardware states
The state of a hardware component represents the power status of the hardware.
Each hardware component has a state.

Chapter 5. Manage the Netezza hardware 5-9


The following table describes the hardware states for all components except a SPU.
SPU states are the system states, which are described in Table 7-3 on page 7-4.

You see these states when you run the nzhw command or display hardware status
by using the NzAdmin or IBM Netezza Performance Portal UIs.
Table 5-4. Hardware states
State Description Comments
None The None state indicates that the All active SPUs must be
hardware is initialized, but it has yet to discovered before the system can
be discovered by the IBM Netezza make the transition from the
system. This process usually occurs Discovery state to the Initializing
during system startup before any of the state. If any active SPUs are still
SPUs have sent their discovery in the booting state, there can be
information. an issue with the hardware
startup.
Ok The Netezza system has received the Normal state
discovery information for this device,
and it is working properly.
Down The device is turned off.
Invalid
Online The system is running normally. It can
service requests.
Missing The System Manager detects a new This typically occurs when a disk
device in a slot that was previously or SPU has been removed and
occupied but not deleted. replaced with a spare without
deleting the old device. The old
device is considered absent
because the System Manager
cannot find it within the system.
Unreachable The System Manager cannot The device may have been failed
communicate with a previously or physically removed from the
discovered device. system.
Critical The management module detects a Contact Netezza Support to
critical hardware problem, and the obtain help with identifying and
problem component amber service light troubleshooting the cause of the
might be illuminated. critical alarm.
Warning The system manager has detected a Contact Netezza Support to
condition that requires investigation. troubleshoot the warning
For example, a host disk may have condition and to determine
reported a predictive failure error whether a proactive replacement
(PFE), which indicates that the disk is is needed.
reporting internal errors.
Checking The system manager is checking or These are normal states for new
Firmware updating the firmware of a disk before replacement disks that are being
it can be brought online as a spare. checked and updated before they
Updating
are added to service.
Firmware
Unsupported The hardware component is not a Contact Netezza Support because
supported model for the appliance. the replacement part is not
supported on the appliance.

The System Manager also monitors the management modules (MMs) in the
system, which have a status view of all the blades in the system. As a result, you
might see messages similar to the following in the sysmgr.log file:

5-10 IBM Netezza System Administrator’s Guide


2011-05-18 13:34:44.711813 EDT Info: Blade in SPA 5, slot 11 changed
from state ’good’ to ’discovering’, reason is ’No critical or warning
events’
2011-05-18 13:35:33.172005 EDT Info: Blade in SPA 5, slot 11 changed
from state ’discovering’ to ’good’, reason is ’No critical or warning
events’

A transition from “good” to “discovering” indicates that the IMM (a management


processor on the blade) rebooted and that it is querying the blade hardware for
status. The blade remains in the “discovering” state during the query. The IMM
then determines whether the blade hardware state is good, warning, or critical, and
posts the result to the AMM. The System Manager reports the AMM status by
using these log messages. You can ignore these normal messages. However, if you
see a frequent number of these messages for the same blade, there might be an
issue with the IMM processor on that blade.

Data slices, data partitions, and disks


It is important to understand the relationship of SPUs, disks, data slices, and data
partitions in the IBM Netezza appliance. Netezza uses these terms to help identify
hardware components for system management tasks and troubleshooting events.

Disks

A disk is a physical drive on which data resides. In a Netezza system, host servers
have several disks that hold the Netezza software, host operating system, database
metadata, and sometimes small user files. The Netezza system also has many more
disks that hold the user databases and tables. Each disk has a unique hardware ID
to identify it.

For the IBM PureData System for Analytics N200x appliances, 24 disks reside in
each disk enclosure, and full rack models have 12 enclosures per rack for a total of
288 disks per rack.

For IBM Netezza 1000 or IBM PureData System for Analytics N1001 systems, 48
disks reside in one storage array; a full-rack system has two storage arrays for a
total of 96 disks.

For IBM PureData System for Analytics N3001-001 appliances, all disks are located
on two hosts. 16 out of 24 disks on each host are used for storing data slices.

Data slices

A data slice is a logical representation of the data that is saved on a disk. The data
slice contains “pieces” of each user database and table. When users create tables
and load their data, they distribute the data for the table across the data slices in
the system by using a distribution key. An optimal distribution is one where each
data slice has approximately the same amount of each user table as any other. The
Netezza system distributes the user data to all of the data slices in the system by
using a hashing algorithm.

Data partitions

A data partition is a logical representation of a data slice that is managed by a


specific SPU. That is, each SPU owns one or more data partitions, which contains
the user data that the SPU is responsible for processing during queries. For
example, in the IBM PureData System for Analytics N200x appliances, each SPU

Chapter 5. Manage the Netezza hardware 5-11


typically owns 40 data partitions although one or two may own 32 partitions. For
example, in IBM Netezza 1000 or IBM PureData System for Analytics N1001
systems, each SPU typically owns 8 data partitions although one SPU has only 6
partitions. For a Netezza C1000 system, each SPU owns 9 data partitions by
default. SPUs could own more than their default number of partitions; if a SPU
fails, its data partitions are reassigned to the other active SPUs in the system. In
IBM PureData System for Analytics N3001-001 appliances, each of the two virtual
SPUs owns 14 data partitions.

IBM Netezza Storage Design


The following figure shows a conceptual overview of SPUs, disks, data slices, and
data partitions in an IBM Netezza 100, IBM Netezza 1000, IBM PureData System
for Analytics N1001, and IBM PureData System for Analytics N200x appliance.

Each SPU in an IBM Netezza system "owns" a set of data partitions where the user
data is stored. For the IBM Netezza 100, IBM Netezza 1000, and IBM PureData
System for Analytics N1001 systems, each SPU owns eight data partitions which
are numbered from 0 to 7. For IBM PureData System for Analytics N200x systems,
each SPU typically owns 40 data partitions which are numbered 0 through 39.

For SPU ID 1003, its first data partition (0) points to data slice ID 9, which is stored
on disk 1070. Each data partition points to a data slice. As an example, assume that
disk 1014 fails and its contents are regenerated to a spare disk ID 1024. In this
situation, the SPU 1003’s data partition 7, which previously pointed to data slice 16
on disk 1014, is updated to point to data slice 16 on the new disk 1024 (not
shown).

5-12 IBM Netezza System Administrator’s Guide


Figure 5-6. SPUs, disks, data slices, and data partitions

If a SPU fails, the system moves all its data slices to the remaining active SPUs for
management. The system moves them in pairs (the pair of disks that contain the
primary and mirror data slices of each other). In this situation, some SPUs that
normally had 8 partitions will now own 10 data partitions. You can use the nzds
command to review the data slices on the system and the SPUs that manage them.

Netezza C1000 Storage Design


In a Netezza C1000 system, each storage group has an intelligent storage controller
which resides in disk enclosure 3.

The intelligent storage controller contains two redundant RAID controllers that
manage the disks and associated hardware within a storage group. The RAID
controllers are caching devices, which improves the performance of the read and
write operations to the disks. The caches are mirrored between the two RAID
controllers for redundancy; each controller has a flash backup device and a battery
to protect the cache against power loss.

The RAID controllers operate independently of the Netezza software and hosts.
For example, if you stop the Netezza software (such as for an upgrade or other
maintenance tasks), the RAID controllers continue to run and manage the disks
within their storage group. It is common to see the activity LEDS on the storage
groups operating even when the Netezza system is stopped. If a disk fails, the
Chapter 5. Manage the Netezza hardware 5-13
RAID controller initiates the recovery and regeneration process; the regeneration
continues to run even when the Netezza software is stopped. If you use the nzhw
command to activate, fail, or otherwise manage disks manually, the RAID
controllers ensure that the action is allowed at that time; in some cases, commands
return an error when the requested operation, such as a disk failover, is not
allowed.

The RAID controller caches are disabled when any of the following conditions
occur:
v Battery failure
v Cache backup device failure
v Peer RAID controller failure (that is, a loss of the mirrored cache)

When the cache is disabled, the storage group (and the Netezza system)
experiences a performance degradation until the condition is resolved and the
cache is enabled again.

The following figure shows an illustration of the SPU/storage mapping. Each SPU
in a Netezza C1000 system owns nine user data slices by default. Each data slice is
supported by a three disk RAID 5 storage array. The RAID 5 array can support a
single disk failure within the three-disk array. (More than one disk failure within
the three-disk array results in the loss of the data slice.) Seven disks within the
storage group in a RAID 5 array are used to hold important system information
such as the nzlocal, swap and log partition.

Figure 5-7. Netezza C1000 SPU and storage representation

A SPU
B Data slice 1
C Data slice 9
D nzlocal, swap, and log partitions

If a SPU fails, the system manager distributes the user data partitions and the
nzlocal and log partitions to the other active SPUs in the same SPU chassis. A

5-14 IBM Netezza System Administrator’s Guide


Netezza C1000 system requires a minimum of three active SPUs; if only three SPUs
are active and one fails, the system transitions to the down state.

IBM PureData System for Analytics N3001-001 storage design


Model N3001-001 uses 28 out of its 48 disks to store data in the warehouse. On
each of these disks, four equal disk partitions are created.

Each disk partition is used to store one copy of a data slice. Disks are divided into
groups of four with two disks from each host in such a group. In each group, there
are 16 disk partitions (four on every disk) that are used to store data slices with
four copies of every data slice.

Each of the data slices always uses disk partition 1, 2, 3, 4 from the disks in the
group.

Each host runs one virtual SPU. The data slice is owned by the virtual SPU that
runs on the host where the disk with the first disk partition of that SPU is
physically located.

Data mirroring for these disk partitions is handled on the software level by the
virtual SPU as RAID0 with 4 parties.

Remote disks are accessed using iSCSI using the network that connects two hosts.

In addition, there are 4 spare disks (2 per host) used as a target for the regen
operation of failed disks.

Chapter 5. Manage the Netezza hardware 5-15


Figure 5-8. Model N3001-001 storage architecture overview. The figure presents how data slices are mapped to disk
partitions. A1 - A14 are data slices of the first SPU and B1 - B14 are data slices of the second SPU. The arrows show
how disk partitions are mirrored for data slices A1. This mirroring pattern is analogical for all other data slices.

One-host mode

When one of the hosts is not available, manually failed, or its SPU is manually
failed using nzhw, the system switches into one-host mode.

In this mode, only one virtual SPU is working and only two disks from each disk
group are used.

Each data slice is now stored on two disk partitions instead of four, and two data
slices must read data from the same disk.

5-16 IBM Netezza System Administrator’s Guide


Figure 5-9. Model N3001-001 storage architecture overview in one-host mode

System resource balance recovery


The system resource balance is an important part of overall system performance.
When a component fails, or when an administrator performs a manual failover, the
resulting configuration (that is, topology) can result in unequal workloads among
the resources and possible performance impacts.

For example, the default disk topology for IBM Netezza 100/1000 or IBM PureData
System for Analytics N1001 systems configures each S-Blade with eight disks that
are evenly distributed across the disk enclosures of its SPA, as shown in the
following figure. If disks failover and regenerate to spares, it is possible to have an
unbalanced topology where the disks are not evenly distributed among the
odd-numbered and even-numbered enclosures. This causes some of the SAS (also
called HBA) paths, which are shown as the dark lines that connect the blade
chassis to the disk enclosures, to carry more traffic than the others.

Chapter 5. Manage the Netezza hardware 5-17


Figure 5-10. Balanced and unbalanced disk topologies

The System Manager can detect and respond to disk topology issues. For example,
if an S-Blade has more disks in the odd-numbered enclosures of its array, the
System Manager reports the problem as an overloaded SAS bus. You can use the
nzhw rebalance command to reconfigure the topology so that half of the disks are
in the odd-numbered enclosures and half in the even-numbered. The rebalance
process requires the system to transition to the “pausing now” state for the
topology update.

When the Netezza system restarts, the restart process checks for topology issues
such as overloaded SAS buses or SPAs that have S-Blades with uneven shares of
data slices. If the system detects a spare S-Blade for example, it will reconfigure the
data slice topology to distribute the workload equally among the S-Blades.
Related reference:
“Hardware path down” on page 8-20
“Rebalance data slices” on page 5-29

Hardware management tasks


This section describes some administration tasks for the hardware components that
are typically monitored and managed by IBM Netezza administrators.

These components include the following components:


v Hosts
v SPUs
v Disks

Other hardware components of the system do not have special administration


tasks. In general, if one of the other components such as a power supply, fan, host,
or other component fails, you are alerted. Netezza Support works with you to
schedule Service so that the failed components can be replaced to restore full
operations and hardware redundancy.

5-18 IBM Netezza System Administrator’s Guide


Callhome file
The callHome.txt file is in the /nz/data/config directory and it defines important
information about the IBM Netezza system such as primary and secondary
administrator contact information, and system information such as location, model
number, and serial number. Typically, the Netezza installation team member edits
this file for you when the Netezza system is installed on-site, but you can review
or edit the file as needed to ensure that the contact information is current.
Related reference:
“Add an event rule” on page 8-7
You can use the nzevent add command to add an event rule. You can also use the
NzAdmin tool to add event rules by using a wizard for creating events.
“Display a specific table” on page 16-12
You can use the nzstats command to display the statistics of a specific group or
table.

Display hardware issues


You can display a list of the hardware components that have problems and require
administrative attention by using the nzhw show -issues command.

This command displays such problems as components that have failed or


components that are in an “abnormal” state such as: disks that are assigned,
missing, incompatible, or unsupported; SPUs that are incompatible.

For example, the following command shows two failed disks on the system:
[nz@nzhost ~]$ nzhw show -issues
Description HW ID Location Role State Security
----------- ----- ---------------------- -------- ----------- --------
Disk 1498 spa1.diskEncl11.disk21 Failed Ok Disabled
Disk 1526 spa1.diskEncl9.disk4 Failed Ok Disabled

The disks must be replaced to ensure that the system has spares and an optimal
topology. You can also use the NzAdmin andIBM Netezza Performance Portal
interfaces to obtain visibility to hardware issues and failures.

Manage hosts
In general, there are few management tasks that relate to the IBM Netezza hosts. In
most cases, the tasks are for the optimal operation of the host. For example:
v Do not change or customize the kernel or operating system files unless directed
to do so by Netezza Support or Netezza customer documentation. Changes to
the kernel or operating system files can impact the performance of the host.
v Do not install third-party software on the Netezza host without first testing the
impact on a development or test Netezza system. While management agents or
other applications might be of interest, it is important to test and verify that the
application does not impact the performance or operation of the Netezza system.
v During Netezza software upgrades, host and kernel software revisions are
verified to ensure that the host software is operating with the latest required
levels. The upgrade processes might display messages that inform you to update
the host software to obtain the latest performance and security features.
v On Netezza HA systems, Netezza uses DRBD replication only on the /nz and
/export/home partitions. As new data is written to the Netezza /nz partition and
the /export/home partition on the primary Netezza system, the DRBD software
automatically makes the same changes to the /nz and /export/home partition of
the standby Netezza system.

Chapter 5. Manage the Netezza hardware 5-19


v Use caution when you are saving files to the host disks; in general, do not store
Netezza database backups on the host disks, and do not use the host disks to
store large files that can grow and fill the host disks over time. Be sure to clean
up and remove any temporary files that you create on the host disks to keep the
disk space as available as possible for Netezza software and database use.

If the active host fails, the Netezza HA software typically fails over to the standby
host to run the Netezza database and system. Netezza Support works with you to
schedule field service to repair the failed host.

For the N3001-001 appliance, this process is similar. If the active host is
unreachable, the NPS services automatically fail over to the second host. It may
take 15 minutes for NPS to start discovering its SPUs. Next, the discovery process
waits up to 15 minutes for both SPUs to report their status. After that time, if only
the local SPU reports status, the system transitions into one-host mode. If the
second host becomes unreachable for more than 15 minutes, the active host
transitions into one-host mode.

Model N3001-001

For the N3001-001 appliance, both hosts are by default used for running the virtual
SPUs. Resources of both hosts, such as CPU or memory, are in use and none of the
hosts is marked as spare in nzhw.

You can switch to the one-host mode in which the resources of only one host are in
use. To do this, run the following command:
nzhw failover -id XXXX

where XXXX is the hwid of the host that you do not want to use. It is only
possible to fail over a host that is a standby in the cluster.

When the system runs in one-host mode, the role of the other host and its virtual
SPU is Failed and disks located on that host that are normally used to store data
(disks 9 - 24) have the role Inactive. The data slices remain mirrored but only with
two disks. In the two-host mode, each data slice is backed up by four disks.

To switch back from one-host mode to two-host mode, run the following
command:
nzhw activate -id XXXX

where XXXX is the hwid of the failed host. This operation activates the host, its
SPU, and all of its disks. Then, a rebalance is requested.

Note: Switching from one-host mode to two-host mode may take a significant
amount of time, for example a few hours. It depends on the amount of data stored
in the system.

Manage SPUs
Snippet Processing Units (SPUs) or S-Blades are hardware components that serve
as the query processing engines of the IBM Netezza appliance.

5-20 IBM Netezza System Administrator’s Guide


Each SPU has CPUs and FPGAs and memory and I/O to process queries and
query results. Each SPU has associated data partitions that it “owns” to store the
portions of the user databases and tables that the SPU processes during queries.

In model N3001-001, the SPUs are emulated using host resources, such as CPU and
memory. The SPUs are not physical components of the system and there is no
FPGA.

You can use the nzhw command to activate, deactivate, failover, locate, and reset a
SPU, or delete SPU information from the system catalog.

To indicate which SPU you want to control, you can refer to the SPU by using its
hardware ID. You can use the nzhw command to display the IDs, and obtain the
information from management UIs such as NzAdmin or IBM Netezza Performance
Portal.

The basic SPU management tasks are as follows:

Monitor SPU Status

To obtain the status of one or more SPUs, you can use the nzhw command with the
show options.

To show the status of all the SPUs:


[nz@nzhost ~]$ nzhw show -type spu
Description HW ID Location Role State Security
----------- ----- ---------- ------ ----------- --------
SPU 1007 spa1.spu1 Failed Booting N/A
SPU 1008 spa1.spu3 Failed Booting N/A
SPU 1009 spa1.spu5 Spare Booting N/A
SPU 1010 spa1.spu7 Spare Discovering N/A
SPU 1011 spa1.spu9 Active Discovering N/A
SPU 1012 spa1.spu11 Active Discovering N/A
SPU 1013 spa1.spu13 Active Discovering N/A

To show detailed information about SPU ID 1082:


[nz@nzhost ~]$ nzhw show -id 1082 -detail
Description HW Location Role State Security Serial Version
ID number
----------- ---- ---------- ------ ----------- -------- ------------ -------
SPU 1012 spa1.spu11 Active Discovering N/A Y011UF3CD01L 10.0
Detail
-----------------------------------------------------
40 CPU Cores; 125.90GB Memory; Ip Addr: 10.0.13.197;;

Activate a SPU

You can use the nzhw command to activate a SPU that is inactive or failed.

To activate a SPU:
nzhw activate -u admin -pw password -host nzhost -id 1004

For model N3001-001, if you have enabled the one-host mode by failing over a
SPU, you must activate that SPU to switch back to two-host mode. You must then
request a rebalance operation using nzds. In such case, switching to two-host mode
may take a significant amount of time, for example a few hours. It depends on the
amount of data stored in the system.

Chapter 5. Manage the Netezza hardware 5-21


Deactivate a SPU

You can use the nzhw command to make a spare SPU unavailable to the system. If
the specified SPU is active, the command displays an error.

To deactivate a spare SPU:


nzhw deactivate -u admin -pw password -host nzhost -id 1004

Fail over a SPU

You can use the nzhw command to initiate a SPU failover.

To fail over a SPU, enter:


nzhw failover -u admin -pw password -host nzhost -id 1004

For model N3001-001, when a SPU is failed over, the system switches into one-host
mode in which the resources of only one host are used. You can only fail over a
SPU of the standby host. In order to fail over a SPU that is running on the active
host, you must first migrate the cluster to the other host. To switch back to
two-host mode, activate the failed SPU.

Locate a SPU

You can use the nzhw command to turn on or off a SPU LED and display the
physical location of the SPU. The default is on.

To locate a SPU, enter:


nzhw locate -u admin -pw password -host nzhost -id 1082
Turned locator LED ’ON’ for SPU: Logical Name:’spa1.spu11’ Physical
Location:’1st Rack, 1st SPA, SPU in 11th slot’

To turn off a SPU LED, enter:


nzhw locate -u admin -pw password -host nzhost -id 1082 -off
Turned locator LED ’OFF’ for SPU: Logical Name:’spa1.spu11’
Physical Location:’1st Rack, 1st SPA, SPU in 11th slot’

For model N3001-001, the SPUs are emulated and the output of the locate
command is the following:
Logical Name:’spa1.spu2’ Physical Location:’lower host, virtual SPU’

Reset a SPU

You can use the nzhw command to power cycle a SPU (a hard reset).

To reset a SPU, enter:


nzhw reset -u admin -pw password -id 1006

Delete a SPU entry from the system catalog

You can use the nzhw command to remove a failed, inactive, or incompatible SPU
from the system catalog.

To delete a SPU entry, enter:


nzhw delete -u admin -pw password -host nzhost -id 1004

5-22 IBM Netezza System Administrator’s Guide


Replace a failed SPU

If a SPU hardware component fails and must be replaced, Netezza Support works
with you to schedule service to replace the SPU.
Related reference:
“The nzhw command” on page A-28
Use the nzhw command to manage the hardware of the IBM Netezza system.

Manage disks
The disks on the system store the user databases and tables that are managed and
queried by the IBM Netezza appliance. You can use the nzhw command to activate,
failover, and locate a disk, or delete disk information from the system catalog.

To protect against data loss, never remove a disk from an enclosure or remove a
RAID controller or ESM card from its enclosure unless directed to do so by
Netezza Support or when you are using the hardware replacement procedure
documentation. If you remove an Active or Spare disk drive, you could cause the
system to restart or to transition to the down state. Data loss and system issues can
occur if you remove these components when it is not safe to do so.

Netezza C1000 systems have RAID controllers to manage the disks and hardware
in the storage groups. You cannot deactivate a disk on a C1000 system, and the
commands to activate, fail, or delete a disk return an error if the storage group
cannot support the action at that time.

To indicate which disk you want to control, you can refer to the disk by using its
hardware ID. You can use the nzhw command to display the IDs, and obtain the
information from management UIs such as NzAdmin or IBM Netezza Performance
Portal.

For model IBM PureData System for Analytics N3001-001, the physical disks are
represented as two nzhw objects:
v A disk in an emulated enclosure in SPA (like for other N3001 systems).
v A host disk.
The majority of disk management operations should be performed on the storage
array disks, not on the host disk. The only operation that must be run on the host
disk is activation. This operation is required to assign a newly inserted physical
disk to the virtual SPU.

The basic disk management tasks are as follows:

Monitor disk status

To obtain the status of one or more disks, you can use the nzhw command with the
show options.

To show the status of all the disks (the sample output is abbreviated for the
documentation), enter:
[nz@nzhost ~]$ nzhw show -type disk
Description HW ID Location Role State Security
----------- ----- ---------------------- ------ ----------- --------
Disk 1076 spa1.diskEncl4.disk2 Active Ok Enabled
Disk 1077 spa1.diskEncl4.disk3 Active Ok Enabled
Disk 1078 spa1.diskEncl4.disk4 Active Ok Enabled
Disk 1079 spa1.diskEncl4.disk5 Active Ok Enabled

Chapter 5. Manage the Netezza hardware 5-23


Disk 1080 spa1.diskEncl4.disk6 Active Ok Enabled
Disk 1081 spa1.diskEncl4.disk7 Active Ok Enabled
Disk 1082 spa1.diskEncl4.disk8 Active Ok Enabled
Disk 1083 spa1.diskEncl4.disk9 Active Ok Enabled
Disk 1084 spa1.diskEncl4.disk10 Active Ok Enabled
Disk 1085 spa1.diskEncl4.disk11 Active Ok Enabled
Disk 1086 spa1.diskEncl4.disk12 Active Ok Enabled

To show detailed information about disk ID 1076, enter:


[nz@nzhost ~]$ nzhw show -id 1076 -detail
Description HW Location Role State Security Serial number
ID
----------- ---- -------------------- ------ ----- -------- --------------------
Disk 1076 spa1.diskEncl4.disk2 Active Ok Enabled S0M1YDHX0000B429FVYS
Version Detail
------- ------------------------------------
E56B 558.91 GiB; Model ST600MM0026; SED;

Activate a disk

You can use the nzhw command to make an inactive, failed, or mismatched disk
available to the system as a spare.

To activate a disk, enter:


nzhw activate -u admin -pw password -host nzhost -id 1004

In some cases, the system might display a message that it cannot activate the disk
yet because the SPU has not finished an existing activation request. Disk activation
usually occurs quickly, unless there are several activations that are taking place at
the same time. In this case, later activations wait until they are processed in turn.

Note: For a Netezza C1000 system, you cannot activate a disk that is being used
by the RAID controller for a regeneration or other task. If the disk cannot be
activated, an error message similar to the following appears:Error: Can not
update role of Disk 1004 to Spare - The disk is still part of a non healthy
array. Please wait for the array to become healthy before activating.

For model N3001-001, in addition to activating a disk, it is sometimes required to


activate a host disk. This happens when a new disk is inserted into the system.
The new disk is not assigned to the virtual SPU and therefore cannot be used until
the activate operation is called on the host disk that represents the same physical
disk. When a host disk is being activated, the system is paused, the storage
configuration between the hosts is synchronized, and the SPUs are restarted. The
entire operation takes about four minutes on an idle system. When the system is in
use, the duration of the operation depends on how fast it can be paused because
pausing does not interrupt the running transactions. After the operation the system
is switched back to online state. During the whole precess several intermediate
system states can also be observed (such as Discovering, Initializing, and
Resuming).

Fail over a disk

You can use the nzhw command to initiate a failover. You cannot fail over a disk
until the system is at least in the initialized state.

To fail over a disk, enter:


nzhw failover -u admin -pw password -host nzhost -id 1004

5-24 IBM Netezza System Administrator’s Guide


On a Netezza C1000 system, when you fail a disk, the RAID controller
automatically starts a regeneration to a spare disk. Note that the RAID controller
may not allow you to fail a disk if you are attempting to fail a disk in a RAID 5
array that already has a failed disk.

Note: For a Netezza C1000 system, the RAID controller still considers a failed disk
to be part of the array until the regeneration is complete. After the regen
completes, the failed disk is logically removed from the array.

Locate a disk

You can use the nzhw command to turn on or off the LED on a disk in the storage
arrays. (This command does not work for disks in the hosts.) The default is on.
The command also displays the physical location of the disk.

To turn on a disk LED, enter:


nzhw locate -u admin -pw password -host nzhost -id 1004
Turned locator LED ’ON’ for Disk: Logical
Name:’spa1.diskEncl4.disk1’ Physical Location:’1st Rack, 4th
DiskEnclosure, Disk in Row 1/Column 1’

To turn off a disk LED, enter:


nzhw locate -u admin -pw password -host nzhost -id 1004 -off
Turned locator LED ’OFF’ for Disk: Logical
Name:’spa1.diskEncl4.disk1’ Physical Location:’1st Rack, 4th
DiskEnclosure, Disk in Row 1/Column 1’

For model N3001-001, you can locate both disks and host disks, including the host
disks managed by the hardware RAID controller.

Delete a disk entry from the system catalog

You can use the nzhw command to remove a disk that is failed, inactive,
mismatched, or incompatible from the system catalog. For Netezza C1000 systems,
do not delete the hardware ID of a failed disk until after you have successfully
replaced it using the instructions in the Replacement Procedures: IBM Netezza C1000
Systems.

To delete a disk entry, enter:


nzhw delete -u admin -pw password -host nzhost -id 1004

Replace a failed disk

If a disk hardware component fails and must be replaced, Netezza Support works
with you to schedule service to replace the disk.
Related reference:
“The nzhw command” on page A-28
Use the nzhw command to manage the hardware of the IBM Netezza system.

Manage data slices


A data slice is a logical representation of the data that is saved in the partitions of
a disk. The data slice contains pieces of each user database and table. The IBM
Netezza system distributes the user data to all of the disks in the system by using
a hashing algorithm.

Chapter 5. Manage the Netezza hardware 5-25


Each data slice has an ID, and is logically owned by a SPU to process queries on
the data that is contained within that data slice.

The following are basic data slice management tasks:


v Monitor status, space consumption, and overall health
v Rebalance data slices to the available SPUs
v Regenerate (or regen) a data slice after a disk failure
v Display the current topology of the data slices

You can use the nzhw, nzds, and nzspupart commands to manage data slices. To
indicate which data slice you want to control, you can refer to the data slice by
using its data slice ID. You can use the nzds command to display the IDs, and
obtain the information from management UIs such as NzAdmin or IBM Netezza
Performance Portal.
Related reference:
“The nzds command” on page A-10
Use the nzds command to manage and obtain information about the data slices in
the system.

Display data slice issues


You can quickly display a list of any data slices that have issues and that might
require administrative attention by using the nzds show -issues command. This
command displays data slices that are in the Degraded state (a loss of data
redundancy) or that are Repairing (that is, the data is being regenerated to a spare
disk). For example, the following command shows that several data slices are
being repaired:
[nz@nzhost ~]$ nzds show -issues
Data Slice Status SPU Partition Size (GiB) % Used Supporting Disks
---------- --------- ---- --------- ---------- ------ ----------------
15 Repairing 1137 3 356 46.87 1080,1086
16 Repairing 1137 2 356 46.79 1080,1086
46 Repairing 1135 4 356 46.73 1055,1098

You can also use the NzAdmin and IBM Netezza Performance Portal interfaces to
obtain visibility to hardware issues and failures.

Monitor data slice status


To obtain the status of one or more data slices, you can use the nzds command
with the show options.

To show the status of all the data slices (the sample output is abbreviated for the
documentation), enter:
[nz@nzhost ~]$ nzds show
Data Slice Status SPU Partition Size (GiB) % Used Supporting Disks
---------- ------- ---- --------- ---------- ------ ----------------
1 Repairing 1017 2 356 58.54 1021,1029
2 Repairing 1017 3 356 58.54 1021,1029
3 Healthy 1017 5 356 58.53 1022,1030
4 Healthy 1017 4 356 58.53 1022,1030
5 Healthy 1017 0 356 58.53 1023,1031
6 Healthy 1017 1 356 58.53 1023,1031
7 Healthy 1017 7 356 58.53 1024,1032
8 Healthy 1017 6 356 58.53 1024,1032

Data slices 1 and 2 in the sample output is regenerating due to a disk failure. The
command output could be different on different models of appliances.

5-26 IBM Netezza System Administrator’s Guide


Note: For model N3001-001, each healthy data slice is stored on partitions of four
different disks. Two of these disks are always located on the first host and the
other two are located on the second host.

Note: For a Netezza C1000 system, three disks hold the user data for a data slice;
the fourth disk is the regen target for the failed drive. The RAID controller still
considers a failed disk to be part of the array until the regeneration is complete.
After the regen completes, the failed disk is logically removed from the array.

To show detailed information about the data slices that are being regenerated, you
can use the -regenstatus and -detail options, for example:
[nz@nzhost ~]$ nzds show -regenstatus -detail
Data Slice Status SPU Partition Size (GiB) % Used Supporting Disks
Start Time % Done
---------- --------- ---- --------- ---------- ------ -------------------
------------------- ------
2 Repairing 1255 1 3725 0.00 1012,1028,1031,1056
2011-07-01 10:41:44 23

The status of a data slice shows the health of the data slice. The following table
describes the status values for a data slice. You see these states when you run the
nzds command or display data slices by using the NzAdmin or IBM Netezza
Performance Portal UIs.
Table 5-5. Data slice status
State Description
Healthy The data slice is operating normally and the data is protected in a
redundant configuration; that is, the data is fully mirrored.
Repairing The data slice is in the process of being regenerated to a spare disk
because of a disk failure.
Degraded The data slice is not protected in a redundant configuration. Another
disk failure could result in loss of a data slice, and the degraded
condition impacts system performance.

On a N3001-001 system, a data slice is marked as degraded when


fewer than four disks are used to store it. If two or three disks are
used for it, the data slice is still mirrored.

Regenerate a data slice


If a disk encounters problems or fails, you perform a data slice regeneration to
create a copy of the primary and mirror data slices on an available spare disk.
During regeneration, the regular system processing continues for the bulk of the
regeneration.

Note: In the IBM PureData System for Analytics N1001 or IBM Netezza 1000 and
later models, the system does not change states during a regeneration; that is, the
system remains online while the regeneration is in progress. There is no
synchronization state change and no interruption to active jobs during this process.
If the regeneration process fails or stops for any reason, the system transitions to
the Discovering state to establish the topology of the system.

You can use the nzspupart regen command or the NzAdmin interface to
regenerate a disk. If you do not specify any options, the system manager checks
for any degraded partitions and if found, starts a regeneration if there is a spare
disk in the system.

Chapter 5. Manage the Netezza hardware 5-27


nz@nzhost ~]$ nzspupart regen
Are you sure you want to proceed (y|n)? [n] y
Info: Regen Configuration - Regen configured on SPA:1 Data slice 20 and 19

For IBM PureData System for Analytics N2001 and later systems, each disk
contains partitions for the user data and the log and swap partitions. When the
system regenerates a disk to a spare, the system copies all of the partitions to the
spare. If you issue the nzspupart regen command manually, specify:
v The hardware ID of the SPU that has the degraded partitions
v One of the partition IDs
v The hardware ID for the spare disk
The regeneration affects all partitions on that disk. For example:
nz@nzhost ~]$ nzspupart regen -spu 1099 -part 1 -dest 1066

You can then issue the nzspupart show -regenstatus or the nzds show
-regenstatus command to display the status of the regeneration. For example:
[nz@nzhost ~]$ nzspupart -regenstatus
SPU Partition Id Partition Type Status Size (GiB) % Used Supporting Disks % Done Repairing Disks Starttime
---- ------------ -------------- --------- ---------- ------ ------------------- ------- --------------- ---------
1099 0 Data Repairing 356 0.00 1065,1066 0.00 1066 0
1099 1 Data Repairing 356 0.00 1065,1066 0.00 1066 0
1099 100 NzLocal Repairing 1920989772 0.00 1065,1066,1076,1087 0.00 1066 0
1099 101 Swap Repairing 32 0.00 1065,1066,1076,1087 0.00 1066 0
1099 110 Log Repairing 1 3.31 1065,1066 0.00 1066 0

For systems earlier than the N200x models, you have to specify the data slice IDs
and spare disk ID. For example, to regenerate dataslice IDs 11 and 17 affected by
the failing disk and regenerate them on spare disk ID 1024, enter:
nzds regen -u admin -pw password -ds "11,17" -dest 1024

If you want to control the regeneration source and target destinations, you can
specify source SPU and partition IDs, and the target or destination disk ID. The
spare disk must reside in the same SPA as the disk that you are regenerating. You
can obtain the IDs for the source partition by issuing the nzspupart show -details
command.

To regenerate a degraded partition and specify the information for the source and
destination, enter the following command:
nzspupart regen -spu 1035 -part 7 -dest 1024

Note: Regeneration can take several hours to complete. If the system is idle and
has no other activity except the regeneration, or if the user data partitions are not
very full, the regeneration takes less time to complete. You can review the status of
the regeneration by issuing the nzspupart show -regenStatus command. During
the regeneration, user query performance can be impacted while the system is
busy processing the regeneration. Likewise, user query activity can increase the
time that is required for the regeneration.

If the system manager is unable to remove the failed disk from the RAID array, or
if it cannot add the spare disk to the RAID array, a regeneration setup failure can
occur. If a regeneration failure occurs, or if a spare disk is not available for the
regeneration, the system continues processing jobs. The data slices that lose their
mirror continue to operate in an unmirrored or degraded state; however, you
should replace your spare disks as soon as possible and ensure that all data slices
are mirrored. If an unmirrored disk fails, the system is brought to a down state.

5-28 IBM Netezza System Administrator’s Guide


Rebalance data slices
Each SPU owns or manages a number of data slices for query processing. The
SPUs and their data slices must reside in the same SPA. If a SPU fails, the System
Manager reassigns its data slices to the other active SPUs in the same SPA. The
System Manager randomly assigns a pair of data slices (the primary and mirrors)
from the failed SPU to an available SPU in the SPA. The System Manager ensures
that each SPU has no more than two data slices more than one of its peers.

After the failed SPU is replaced or reactivated, you must rebalance the data slices
to return to optimal performance. The rebalance process checks each SPU in the
SPA; if a SPU has more than two data slices more than another SPU, the System
Manager redistributes the data slices to equalize the workload and return the SPA
to an optimal performance topology. (The System Manager changes the system to
the discovering state to perform the rebalance.)

In addition, if an S-Blade does not have an equal distribution of disks in the


odd-numbered versus even-numbered enclosures of its array, the System Manager
reports the problem as an overloaded SAS bus. The nzhw rebalance command also
reconfigures the topology so that half of the disks are in the odd-numbered
enclosures and half in the even-numbered.

You can use the nzhw command to rebalance the data slice topology. The system
also runs the rebalance check each time the that system is restarted, or after a SPU
failover or a disk regeneration setup failure.

To rebalance the data slices:


nzhw rebalance -u admin -pw password

If a rebalance is not required, the command displays a message that a rebalance is


not necessary and exits without completing the step.

You can also use the nzhw rebalance -check option to have the system check the
topology and only report whether a rebalance is needed. The command displays
the message Rebalance is needed or There is nothing to rebalance. If a
rebalance is needed, you can run the nzhw rebalance command to perform the
rebalance, or you could wait until the next time the Netezza software is stopped
and restarted to rebalance the system.

For a N3001-001 system, the rebalance operation is used for switching the system
back to two-host mode after activating the failed SPU. Rebalance is automatically
requested by the system when the transition to two-host mode is requested by the
activate operation for a host.
Related concepts:
“System resource balance recovery” on page 5-17

Active path topology


The active path topology defines the ports and switches that offer the best
connection performance to carry the traffic between the S-Blades and their disks.
For best system performance, all links and components must remain balanced and
equally loaded.

To display the current storage topology, use the nzds show -topology command:

Chapter 5. Manage the Netezza hardware 5-29


[nz@nzhost ~]$ nzds show -topology
===============================================================================
Topology for SPA 1
spu0101 has 8 datapartitions: [ 0:1 1:2 2:11 3:12 4:10 5:9 6:18 7:17 ]
hba[0] 4 disks
port[2] 2 disks: [ 1:encl1Slot10 11:encl1Slot06 ] -> switch 0
port[3] 2 disks: [ 10:encl2Slot05 18:encl2Slot09 ] -> switch 1
hba[1] 4 disks
port[0] 2 disks: [ 2:encl2Slot01 12:encl2Slot06 ] -> switch 0
port[1] 2 disks: [ 9:encl1Slot05 17:encl1Slot12 ] -> switch 1
...............................................................................
spu0103 has 8 datapartitions: [ 0:22 1:21 2:16 3:15 4:13 5:14 6:5 7:6 ]
hba[0] 4 disks
port[2] 2 disks: [ 16:encl2Slot02 22:encl2Slot11 ] -> switch 0
port[3] 2 disks: [ 5:encl1Slot03 13:encl1Slot07 ] -> switch 1
hba[1] 4 disks
port[0] 2 disks: [ 15:encl1Slot08 21:encl1Slot11 ] -> switch 0
port[1] 2 disks: [ 6:encl2Slot03 14:encl2Slot07 ] -> switch 1
...............................................................................
spu0105 has 6 datapartitions: [ 0:19 1:20 2:7 3:8 4:4 5:3 ]
hba[0] 3 disks
port[2] 2 disks: [ 7:encl1Slot04 19:encl1Slot09 ] -> switch 0
port[3] 1 disks: [ 4:encl2Slot12 ] -> switch 1
hba[1] 3 disks
port[0] 2 disks: [ 8:encl2Slot04 20:encl2Slot10 ] -> switch 0
port[1] 1 disks: [ 3:encl1Slot01 ] -> switch 1
...............................................................................
Switch 0
port[1] 6 disks: [ 1:encl1Slot10 7:encl1Slot04 11:encl1Slot06 15:encl1Slot08
19:encl1Slot09 21:encl1Slot11 ] -> encl1

port[2] 6 disks: [ 2:encl2Slot01 8:encl2Slot04 12:encl2Slot06 16:encl2Slot02


20:encl2Slot10 22:encl2Slot11 ] -> encl2

Switch 1
port[1] 5 disks: [ 3:encl1Slot01 5:encl1Slot03 9:encl1Slot05 13:encl1Slot07
17:encl1Slot12 ] -> encl1

port[2] 5 disks: [ 4:encl2Slot12 6:encl2Slot03 10:encl2Slot05 14:encl2Slot07


18:encl2Slot09 ] -> encl2
===============================================================================

This sample output shows a normal topology for an IBM Netezza 1000-3 system.
The command output is complex and is typically used by Netezza Support to
troubleshoot problems. If there are any issues to investigate in the topology, the
command displays a WARNING section at the bottom, for example:
WARNING: 2 issues detected
spu0101 hba [0] port [2] has 3 disks
SPA 1 SAS switch [sassw01a] port [3] has 7 disks

These warnings indicate problems in the path topology where storage components
are overloaded. These problems can affect query performance and also system
availability if other path failures occur. Contact Support to troubleshoot these
warnings.

To display detailed information about path failure problems, you can use the
following command:
[nz@nzhost ~]$ nzpush -a mpath -issues
spu0109: Encl: 4 Slot: 4 DM: dm-5 HWID: 1093 SN: number PathCnt: 1
PrefPath: yes
spu0107: Encl: 2 Slot: 8 DM: dm-1 HWID: 1055 SN: number PathCnt: 1
PrefPath: yes
spu0111: Encl: 1 Slot: 10 DM: dm-0 HWID: 1036 SN: number PathCnt: 1
PrefPath: no

5-30 IBM Netezza System Administrator’s Guide


If the command does not return any output, there are no path failures observed on
the system. It is not uncommon for some path failures to occur and then clear
quickly. However, if the command displays some output, as in this example, there
are path failures on the system and system performance can be degraded. The
sample output shows that spu0111 is not using the higher performing preferred
path (PrefPath: no) and there is only one path to each disk (PathCnt: 1) instead of
the normal two paths. Contact Netezza Support and report the path failures to
initiate troubleshooting and repair.

Note: It is possible to see errors that are reported in the nzpush command output
even if the nzds -topology command does not report any warnings. In these cases,
the errors are still problems in the topology, but they do not affect the performance
and availability of the current topology. Be sure to report any path failures to
ensure that problems are diagnosed and resolved by Support for optimal system
performance.
Related reference:
“Hardware path down” on page 8-20

Handle transactions during failover and regeneration


When a disk failover occurs, the system continues processing any active jobs while
it runs a disk regeneration. No active queries must be stopped and restarted.

If a SPU fails, the system state changes to the pausing -now state (which stops
active jobs), and then transitions to the discovering state to identify the active SPUs
in the SPA. The system also rebalances the data slices to the active SPUs.

After the system returns to an online state:


v The system restarts transactions that did not return data before the pause -now
transition.
v Read-only queries begin again with their original transaction ID and priority.

The following table describes the system states and the way IBM Netezza handles
transactions during failover.
Table 5-6. System states and transactions
System state Active transactions New transactions
Offline(ing) Now Aborts all transactions. Returns an error.
Offline(ing) Waits for the transaction to finish. Returns an error.
Pause(ing) Now Aborts only those transactions that Queues the transaction.
cannot be restarted.
Pause(ing) Waits for the transaction to finish. Queues the transaction.

The following examples provide specific instances of how the system handles
failovers that happen before, during, or after data is returned.
v If the pause -now occurs immediately after a BEGIN command completes, before
data is returned, the transaction is restarted when the system returns to an
online state.
v If a statement such as the following completes and then the system transitions,
the transaction can restart because data has not been modified and the reboot
does not interrupt a transaction.
BEGIN;
SELECT * FROM emp;

Chapter 5. Manage the Netezza hardware 5-31


v If a statement such as the following completes, but the system goes transitions
before the commit to disk, the transaction is aborted.
BEGIN;
INSERT INTO emp2 FROM emp;
v A statement such as the following can be restarted if it has not returned data, in
this case a single number that represents the number of rows in a table. This
sample includes an implicit BEGIN command.
SELECT count(*) FROM small_lineitem;
v If a statement such as the following begins to return rows before the system
transitions, the statement will be aborted.
INSERT INTO emp2 SELECT * FROM externaltable;
This transaction and others that would normally be aborted will be restarted if
the nzload -allowReplay option applied to the associated table.

Note: There is a retry count for each transaction. If the system transitions to
pause -now more than the number of retries that are allowed, the transaction is
stopped.

Automatic query and load continuation


When a SPU unexpectedly restarts or is failed-over, the System Manager initiates a
state change from online to pause -now. During this transition, rather than aborting
all transactions, the IBM Netezza system aborts only those transactions that cannot
be restarted.

The system restarts the following transactions:


v Read-only queries that have not returned data. The system restarts the request
with a new plan and the same transaction ID.
v Loads. If you have enabled load continuation, the system rolls back the load to
the beginning of the replay region and resends the data.

After the system restarts these transactions, the system state returns to online. For
more information, see the IBM Netezza Data Loading Guide.

Power procedures
This section describes how to power on an IBM Netezza system and how to
power-off the system. Typically, you would only power off the system if you are
moving the system physically within the data center, or in the event of possible
maintenance or emergency conditions within the data center.

The instructions to power on or off an IBM Netezza 100 system are available in the
Site Preparation and Specifications: IBM Netezza 100 Systems.

Note: To power cycle a Netezza system, you must have physical access to the
system to press power switches and to connect or disconnect cables. Netezza
systems have keyboard/video/mouse (KVM) units that you can use to enter
administrative commands on the hosts.

PDU and circuit breakers overview


On the IBM Netezza 1000-6 and larger models, and the IBM PureData System for
Analytics N1001-005 and larger models. the main input power distribution units
(PDUs) are at the bottom of the rack on the right and left sides, as shown in the

5-32 IBM Netezza System Administrator’s Guide


following figure.

Figure 5-11. IBM Netezza 1000-6 and N1001-005 and larger PDUs and circuit breakers

A OFF setting
B ON setting
C PDU circuit breakers. 3 rows of 3 breaker pins.
v To close the circuit breakers (power up the PDUs), press in each of the nine
breaker pins until they engage. Be sure to close the nine pins on both main
PDUs in each rack of the system.
v To open the circuit breakers (power off the PDUs), pull out each of the nine
breaker pins on the left and the right PDU in the rack. If it becomes difficult to
pull out the breaker pins by using your fingers, you can use a tool such as a
pair of needle-nose pliers to gently pull out the pins.

On the IBM Netezza 1000-3 and IBM PureData System for Analytics N1001-002
models, the main input power distribution units (PDUs) are on the right and left
sides of the rack, as shown in the following figure.

Chapter 5. Manage the Netezza hardware 5-33


Figure 5-12. IBM Netezza 1000-3 and IBM PureData System for Analytics N1001-002 PDUs and circuit breakers

A Two circuit breakers at the top of the PDU


B OFF setting
C ON setting

At the top of each PDU is a pair of breaker rocker switches. The labels on the
switches are upside down when you view the PDUs.
v To close the circuit breakers (power up the PDUs), you push the On toggle of
the rocker switch in. Make sure that you push in all four rocker switches, two
on each PDU.
v To open the circuit breakers (power off the PDUs), you must use a tool such as a
small flathead screwdriver; insert the tool into the hole that is labeled OFF and
gently press until the rocker toggle pops out. Make sure that you open all four
of the rocker toggles, two on each PDU.

Powering on the IBM Netezza 1000 and IBM PureData System


for Analytics N1001
About this task

To power on an IBM Netezza 1000 or IBM PureData System for Analytics N1001
system, complete the following steps:

Procedure
1. Make sure that the two main power cables are connected to the data center
drops; there are two power cables for each rack of the system.
2. Do one of the following steps depending on which system model you have:

5-34 IBM Netezza System Administrator’s Guide


v For an IBM Netezza 1000-6, N1001-005 or larger model, push in the nine
breaker pins on both the left and right lower PDUs as shown in Figure 5-11
on page 5-33. Repeat these steps for each rack of the system.
v For an IBM Netezza 1000-3 or N1001-002 model, close the two breaker
switches on both the left and right PDUs as shown in Figure 5-12 on page
5-34.
3. Press the power button on both host servers and wait for the servers to start.
This process can take a few minutes.
4. Log in as root to one of the hosts and confirm that the Netezza software is
started:
a. Run the crm_mon command to obtain the cluster status:
[root@nzhost1 ~]# crm_mon -i5
============
Last updated: Tue Jun 2 11:46:43 2009
Current DC: nzhost1 (key)
2 Nodes configured.
3 Resources configured.
============
Node: nzhost1 (key): online
Node: nzhost2 (key): online
Resource Group: nps
drbd_exphome_device (heartbeat:drbddisk): Started nzhost1
drbd_nz_device (heartbeat:drbddisk): Started nzhost1
exphome_filesystem (heartbeat::ocf:Filesystem): Started nzhost1
nz_filesystem (heartbeat::ocf:Filesystem): Started nzhost1
fabric_ip (heartbeat::ocf:IPaddr): Started nzhost1
wall_ip (heartbeat::ocf:IPaddr): Started nzhost1
nz_dnsmasq (lsb:nz_dnsmasq): Started nzhost1
nzinit (lsb:nzinit): Started nzhost1
fencing_route_to_ha1 (stonith:apcmaster): Started nzhost2
fencing_route_to_ha2 (stonith:apcmaster): Started nzhost1
b. Identify the active host in the cluster, which is the host where the nps
resource group is running:
[root@nzhost1 ~]# crm_resource -r nps -W

crm_resource[5377]: 2009/06/01_10:13:12 info: Invoked: crm_resource


-r nps -W
resource nps is running on: nzhost1
5. Log in as nz to the active host and verify that the Netezza server is online:
[nz@nzhost1 ~]$ nzstate
System state is ’Online’.
6. If your system runs the Call Home support feature, enable it.
[nz@nzhost1 ~]$ nzOpenPmr --on

Powering off the IBM Netezza 1000 or IBM PureData System


for Analytics N1001 system
About this task

To power off an IBM Netezza 1000 or IBM PureData System for Analytics N1001
system, complete the following steps:

Procedure
1. Log in to the host server (ha1) as root.

Note: Do not use the su command to become root.

Chapter 5. Manage the Netezza hardware 5-35


2. Identify the active host in the cluster, which is the host where the nps resource
group is running:
[root@nzhost1 ~]# crm_resource -r nps -W

crm_resource[5377]: 2009/06/07_10:13:12 info: Invoked: crm_resource


-r nps -W
resource nps is running on: nzhost1
3. Log in to the active host (nzhost1 in this example) as the nz user.
4. Check to see if Call Home is enabled, and if so, disable it.
a. Check if Call Home is enabled:
[nz@nzhost1 ~]$ nzOpenPmr --status
b. If Call Home is enabled, disable it:
[nz@nzhost1 ~]$ nzOpenPmr --off
5. Run the following command to stop the IBM Netezza server:
nz@nzhost1 ~]$ nzstop
6. From the active host, stop Heartbeat on each host, starting with the non-active
host. For example, if HA1 is the active host, type the following commands to
stop the clustering processes:
[root@nzhost1 ~]# ssh ha2 ’service heartbeat stop’
[root@nzhost1 ~]# service heartbeat stop
7. Log in as root on the standby host (nzhost2 in this example), and run the
following command to shut down the host:
[root@nzhost2 ~]# shutdown -h now
The system displays a series of messages as it stops processes and other
system activity. When it finishes, it displays the message “power down”
which indicates that it is now safe to turn off the power to the server.
8. Press the power button on Host 2 (located in the front of the cabinet) to
power down that NPS host.
9. As root on the active host, run the following command to shut down the host:
[root@nzhost1 ~]# shutdown -h now
The system displays a series of messages as it stops processes and other
system activity. When it finishes, it displays the message “power down”
which indicates that it is now safe to turn off the power to the server.
10. Press the power button on Host 1 (located in the front of the cabinet) to
power down that NPS host.
11. Do one of the following steps depending on which appliance model you have:
v For an IBM Netezza 1000-6 or N1001-005 or larger model, pull out the nine
breaker pins on both the left and right lower PDUs as shown in Figure 5-11
on page 5-33. Repeat these steps for each rack of the system.
v For an IBM Netezza 1000-3 or N1001-002 model, use a small tool such as a
pocket screwdriver to open the two breaker switches on both the left and
right PDUs as shown in Figure 5-12 on page 5-34.
12. Disconnect the main input power cables (two per rack) from the data center
power drops. Do not disconnect the power cords from the plug/connector on
the PDUs in the rack; instead, disconnect them from the power drops outside
the IBM Netezza 1000 rack.

5-36 IBM Netezza System Administrator’s Guide


Powering on the IBM PureData System for Analytics N200x
About this task

To power on an IBM PureData System for Analytics N200x system, complete the
following steps:

Procedure
1. Switch on the power to the two PDUs located in the rear of the cabinet at the
bottom. Make sure that you switch on both power controls. Repeat this steps
for each rack of a multi-rack system.
2. Press the power button on Host 1. The power button is on the host in the front
of the cabinet. Host 1 is the upper host in the rack, or the host located in rack
one of older multi-rack systems. A series of messages appears as the host
system boots.
3. Wait at least 30 seconds after powering up Host 1, then press the power button
on Host 2. (Host 2 is the lower host in the rack, or the host located in rack two
of older multi-rack systems.) The delay ensures that Host 1 completes its
start-up operations first, and thus is the primary host for the system.
4. Log in as root to Host 1 and run the crm_mon command to monitor the status of
the HA services and cluster operations:
[root@nzhost1 ~]# crm_mon -i5
The output of the command refreshes at the specified interval rate of 5 seconds
(-i5).
5. Review the output and watch for the resource groups to all have a Started
status. This usually takes about 2 to 3 minutes, then proceed to the next step.
Sample output follows:
[root@nzhost1 ~]# crm_mon -i5
============
Last updated: Tue Jun 2 11:46:43 2009
Current DC: nzhost1 (key)
2 Nodes configured.
3 Resources configured.
============
Node: nzhost1 (key): online
Node: nzhost2 (key): online
Resource Group: nps
drbd_exphome_device (heartbeat:drbddisk): Started nzhost1
drbd_nz_device (heartbeat:drbddisk): Started nzhost1
exphome_filesystem (heartbeat::ocf:Filesystem): Started nzhost1
nz_filesystem (heartbeat::ocf:Filesystem): Started nzhost1
fabric_ip (heartbeat::ocf:IPaddr): Started nzhost1
wall_ip (heartbeat::ocf:IPaddr): Started nzhost1
nz_dnsmasq (lsb:nz_dnsmasq): Started nzhost1
nzinit (lsb:nzinit): Started nzhost1
fencing_route_to_ha1 (stonith:apcmaster): Started nzhost2
fencing_route_to_ha2 (stonith:apcmaster): Started nzhost1
6. Press Ctrl-C to exit the crm_mon command and return to the command prompt.
7. Log in to the nz account.
[root@nzhost1 ~]# su - nz
8. Verify that the system is online using the following command:
[nz@nzhost1 ~]$ nzstate
System state is ’Online’.
9. If your system runs the Call Home support feature, enable it.
[nz@nzhost1 ~]$ nzOpenPmr --on

Chapter 5. Manage the Netezza hardware 5-37


Powering off the IBM PureData System for Analytics N200x
system
About this task

To power off an IBM PureData System for Analytics N200x system, complete the
following steps:

Procedure
1. Log in to the host server (ha1) as root.

Note: Do not use the su command to become root.


2. Identify the active host in the cluster, which is the host where the nps resource
group is running:
[root@nzhost1 ~]# crm_resource -r nps -W

crm_resource[5377]: 2009/06/07_10:13:12 info: Invoked: crm_resource


-r nps -W
resource nps is running on: nzhost1
3. Log in to the active host (nzhost1 in this example) as the nz user.
4. Check to see if Call Home is enabled, and if so, disable it.
a. Check if Call Home is enabled:
[nz@nzhost1 ~]$ nzOpenPmr --status
b. If Call Home is enabled, disable it:
[nz@nzhost1 ~]$ nzOpenPmr --off
5. On the active host (nzhost1 in this example), run the following command to
stop the Netezza server:
[nz@nzhost1 ~]$ nzstop
6. Log in as root to ha1 and type the following commands to stop the clustering
processes:
[root@nzhost1 ~]# ssh ha2 ’service heartbeat stop’
[root@nzhost1 ~]# service heartbeat stop
7. As root on the standby host (nzhost2 in this example), run the following
command to shut down the host:
[root@nzhost2 ~]# shutdown -h now
The system displays a series of messages as it stops processes and other
system activity. When it finishes, it displays the message power down which
indicates that it is now safe to turn off the power to the server.
8. Press the power button on Host 2 (located in the front of the cabinet) to
power down that Netezza host.
9. On host 1, shut down the Linux operating system using the following
command:
[root@nzhost1 ~]# shutdown -h now
The system displays a series of messages as it stops processes and other
system activity. When it finishes, it displays the message power down which
indicates that it is now safe to turn off the power to the server.
10. Press the power button on Host 1 (located in the front of the cabinet) to
power down that Netezza host.
11. Switch off the power to the two PDUs located in the rear of the cabinet at the
bottom. Make sure that you switch off both power controls. (Repeat this steps
for each rack of a multi-rack system.)

5-38 IBM Netezza System Administrator’s Guide


Powering on the IBM Netezza High Capacity Appliance C1000
About this task

To power on an IBM Netezza High Capacity Appliance C1000, complete the


following steps:

Procedure
1. Make sure that the two main power cables are connected to the data center
drops; there are two power cables for each rack of the system. For a North
American power configuration, there are four power cables for the first two
racks of a Netezza C1000 (or two cables for a European Union power
configuration);
2. Switch the breakers to ON on both the left and right PDUs. (Repeat these
steps for each rack of the system.)
3. Press the power button on both host servers and wait for the servers to start.
This process can take a few minutes.
4. Log in to the host server (ha1) as root.
5. Change to the nz user account and run the following command to stop the
Netezza server: nzstop
6. Wait for the Netezza system to stop.
7. Log out of the nz account to return to the root account, then type the
following command to power on the storage groups:
[root@nzhost1 ~]# /nzlocal/scripts/rpc/spapwr.sh -on all -j all
8. Wait five minutes and then type the following command to power on all the
S-blade chassis:
[root@nzhost1 ~]# /nzlocal/scripts/rpc/spapwr.sh -on all
9. Run the crm_mon -i5 command to monitor the status of the HA services and
cluster operations. Review the output and watch for the resource groups to all
have a Started status. This usually takes about 2 to 3 minutes, then proceed to
the next step.
[root@nzhost1 ~]# crm_mon -i5
============
Last updated: Tue Jun 2 11:46:43 2009
Current DC: nzhost1 (key)
2 Nodes configured.
3 Resources configured.
============
Node: nzhost1 (key): online
Node: nzhost2 (key): online
Resource Group: nps
drbd_exphome_device (heartbeat:drbddisk): Started nzhost1
drbd_nz_device (heartbeat:drbddisk): Started nzhost1
exphome_filesystem (heartbeat::ocf:Filesystem): Started nzhost1
nz_filesystem (heartbeat::ocf:Filesystem): Started nzhost1
fabric_ip (heartbeat::ocf:IPaddr): Started nzhost1
wall_ip (heartbeat::ocf:IPaddr): Started nzhost1
nz_dnsmasq (lsb:nz_dnsmasq): Started nzhost1
nzinit (lsb:nzinit): Started nzhost1
fencing_route_to_ha1 (stonith:apcmaster): Started nzhost2
fencing_route_to_ha2 (stonith:apcmaster): Started nzhost1
10. Press Ctrl-C to exit the crm_mon command and return to the command prompt.
11. Log into the nz account.
[root@nzhost1 ~]# su - nz
12. Verify that the system is online using the following command:

Chapter 5. Manage the Netezza hardware 5-39


[nz@nzhost1 ~]$ nzstate
System state is ’Online’.

Powering off the IBM Netezza High Capacity Appliance C1000


About this task

To power off an IBM Netezza High Capacity Appliance C1000, complete the
following steps:

CAUTION:
Unless the system shutdown is an emergency situation, do not power down a
Netezza C1000 system when there are any amber (Needs Attention) LEDs
illuminated in the storage groups. It is highly recommended that you resolve the
problems that are causing the Needs Attention LEDs before you power off a
system to ensure that the power-up procedures are not impacted by the
unresolved conditions within the groups.

Procedure
1. Identify the active host in the cluster, which is the host where the nps resource
group is running:
[root@nzhost1 ~]# crm_resource -r nps -W

crm_resource[5377]: 2009/06/07_10:13:12 info: Invoked: crm_resource


-r nps -W
resource nps is running on: nzhost1
2. Log in as root to the standby host (nzhost1 in this example) and run the
following command to stop the Netezza server:
[root@nzhost1 ~]# nzstop
3. Type the following commands to stop the clustering processes:
[root@nzhost1 ~]# ssh ha2 ’service heartbeat stop’
[root@nzhost1 ~]# service heartbeat stop
4. On ha1, type the following commands to power off the S-blade chassis and
storage groups:
[root@nzhost1 ~]# /nzlocal/scripts/rpc/spapwr.sh -off all
[root@nzhost1 ~]# /nzlocal/scripts/rpc/spapwr.sh -off all -j all
5. Log into ha2 as root and shut down the Linux operating system using the
following command:
[root@nzhost2 ~]# shutdown -h now
The system displays a series of messages as it stops processes and other system
activity. When it finishes, it displays the message power down which indicates
that it is now safe to turn off the power to the server.
6. Press the power button on host 2 (located in the front of the cabinet) to power
down that Netezza host.
7. Log into ha1 as root and shut down the Linux operating system using the
following command:
[root@nzhost1 ~]# shutdown -h now
The system displays a series of messages as it stops processes and other system
activity. When it finishes, it displays the message power down which indicates
that it is now safe to turn off the power to the server.
8. Press the power button on host 1 (located in the front of the cabinet) to power
down that Netezza host.
9. Switch the breakers to OFF on both the left and right PDUs. (Repeat this step
for each rack of the system.)

5-40 IBM Netezza System Administrator’s Guide


Powering on IBM PureData System for Analytics N3001-001
Perform the following steps to power on the IBM PureData System for Analytics
N3001-001 appliance.

Procedure
1. Press the power button on Host 1. The power button is located on the host in
the front of the cabinet. Host 1 is the upper host in a single-rack system, or the
host located in rack one of a multi-rack system. A series of messages appears as
the host system boots.
2. Wait for at least 30 seconds after powering up Host 1. Then press the power
button of Host 2. The delay ensures that Host 1 completes its startup
operations first and therefore becomes the primary host for the system. Host 2
is the lower host in a single-rack system, or the host located in rack two of a
multi-rack system.
3. Log in to Host 1 as root and run the crm_mon command to monitor the status of
HA services and cluster operations:
[root@nzhost1 ~]# crm_mon -i5

The output of the command is refreshed at the specified interval rate of 5


seconds (-i5).
4. Review the output of the command and wait for the resource groups until they
all have the Started status. This usually takes two to three minutes. Then,
proceed to the next step. The command returns the following output:
[root@nzhost1 ~]# crm_mon -i5

============
Last updated: Fri Aug 29 02:19:25 2014
Current DC: hostname-1 (3389b15b-5fee-435d-8726-a95120f437dd)
2 Nodes configured.
2 Resources configured.
============

Node: hostname-1 (3389b15b-5fee-435d-8726-a95120f437dd): online


Node: hostname-2 (d346b950-bd01-49e3-9385-88b0a7bbbd6a): online

Resource Group: nps


drbd_exphome_device (heartbeat:drbddisk): Started hostname-1
drbd_nz_device (heartbeat:drbddisk): Started hostname-1
exphome_filesystem (heartbeat::ocf:Filesystem): Started hostname-1
nz_filesystem (heartbeat::ocf:Filesystem): Started hostname-1
fabric_ip (heartbeat::ocf:IPaddr):
wall_ip (heartbeat::ocf:IPaddr):
nz_dnsmasq (lsb:nz_dnsmasq):
nzinit (lsb:nzinit): Started hostname-1
floating_management_ip (heartbeat::ocf:IPaddr): Started hostname-1
fencing_route_to_ha1 (stonith:external/ipmioverlan):Started hostname-2
fencing_route_to_ha2 (stonith:external/ipmioverlan): Started hostname-1

5. Press Ctrl + C to exit the crm_mon command and return to the command
prompt.
6. Log in to the nz account:
[root@nzhost1 ~]# su - nz

7. Run the following command to verify that the system is online:

Chapter 5. Manage the Netezza hardware 5-41


[nz@nzhost1 ~]$ nzstate
System state is 'Online'.

Powering off IBM PureData System for Analytics N3001-001


Perform the following steps to power off the IBM PureData System for Analytics
N3001-001 appliance.

Procedure
1. Log in to Host 1 (ha1) as root.

Note: Do not use the su command to switch to root.


2. Identify the active host in the cluster. This is the host on which the NPS
resource group is running. To do this, run the following command:
[root@nzhost1 ~]# crm_resource -r nps -W

crm_resource[5377]: 2009/06/07_10:13:12 info: Invoked: crm_resource -r nps -W


resource nps is running on: nzhost1

3. On the active host , in this example nzhost1, run the following commands to
stop the Netezza server:
[nz@nzhost1 ~]$ su - nz
[nz@nzhost1 ~]$ nzstop
[nz@nzhost1 ~]$ exit

4. Run the following commands to stop the clustering processes:


[root@nzhost1 ~]# ssh ha2 'service heartbeat stop'
[root@nzhost1 ~]# service heartbeat stop

5. Log in as root to the standby host, in this example nzhost2. Run the following
command to shut down the host:
[root@nzhost2 ~]# shutdown -h now

The system displays a series of messages as it stops processes and other system
activity. When it finishes, a message is displayed that indicates that it is now
safe to power down to the server.
6. Press the power button on Host 2 to power down that Netezza host. The
button is located in the front of the cabinet.
7. On Host 1, run the following command to shut down the Linux operating
system:
[root@nzhost1 ~]# shutdown -h now

The system displays a series of messages as it stops processes and other system
activity. When it finishes, a message is displayed that indicates that it is now
safe to power down to the server.
8. Press the power button on Host 1 to power down that Netezza host. The
button is located in the front of the cabinet.

5-42 IBM Netezza System Administrator’s Guide


Chapter 6. About self-encrypting drives
The IBM PureData System for Analytics N3001 and N3001-001 appliances use
self-encrypting drives (SEDs) for improved security and protection of the data
stored on the appliance.

Self-encrypting drives (SEDs) encrypt data as it is written to the disk. Each disk
has a disk encryption key (DEK) that is set at the factory and stored on the disk.
The disk uses the DEK to encrypt data as it writes, and then to decrypt the data as
it is read from disk. The operation of the disk, and its encryption and decryption,
is transparent to the users who are reading and writing data. This default
encryption and decryption mode is referred to as secure erase mode. In secure erase
mode, you do not need an authentication key or password to decrypt and read
data. SEDs offer improved capabilities for an easy and speedy secure erase for
situations when disks must be repurposed or returned for support or warranty
reasons.

For the optimal security of the data stored on the disks, SEDs have a mode
referred to as auto-lock mode. In auto-lock mode, the disk uses an authentication
encryption key (AEK) to protect its DEK. When a disk is powered off, the disks are
automatically locked. When the disk is powered on, the SED requires a valid AEK
to read the DEK and unlock the disk to proceed with read and write operations. If
the SED does not receive a valid authentication key, the data on the disk cannot be
read. The auto-lock mode helps to protect the data when disks are accidentally or
intentionally removed from the system.

In many environments, the secure erase mode may be sufficient for normal
operations and provides you with easy access to commands that can quickly and
securely erase the contents of the disk before a maintenance or repurposing task.
For environments where protection against data theft is paramount, the auto-lock
mode adds an extra layer of access protection for the data stored on your disks.

SEDs are currently available on the following appliances:


v IBM PureData System for Analytics N3001
v IBM PureData System for Analytics N3001-001

Locking the SEDs


The IBM NPS software provides commands to configure the SEDs on the IBM
PureData System for Analytics N3001 models to use auto-lock mode.

By default, the SEDs on the IBM PureData System for Analytics N3001 appliances
operate in secure erase mode. The IBM installation team can configure the disks to
run in auto-lock mode by creating a keystore and defining an authentication key
for your host and storage disks when the system is installed in your data center. If
you choose not to auto-lock the disks during system installation, you can lock
them later. Contact IBM Support to enable the auto-lock mode. The process to
auto-lock the disks requires a short NPS service downtime window.

© Copyright IBM Corp. 2001, 2015 6-1


CAUTION:
Do not attempt to auto-lock the disks on your own. You must work with IBM
Support to ensure that the disks are auto-locked correctly and completely with a
conforming authentication key. If the process is not performed correctly, or if
authentication key used to auto-lock the disks is incorrect or lost, your system
could be unable to start and you could lose the data on your disks.

While it is recommended that you configure your SEDs to operate in auto-lock


mode, make sure that this is appropriate for your environment. After the drives are
configured for auto-lock mode, you cannot easily disable or undo the auto-lock
mode for SEDs.

The NPS system requires an AEK for the host drives and an AEK for the drives in
the storage arrays that are managed by the SPUs. You have two options for storing
the keys. The AEKs can be stored in a password protected keystore repository on
the NPS host, or if you have implemented an IBM Security Key Lifecycle Manager
(ISKLM) server, you can store the AEKs in your ISKLM server for use with the
appliance. The commands to create the keys are the same for locally or ISKLM
stored systems.

For locally stored keys, the key repository is stored in the /nz/var/keystore
directory on the NPS host. The repository is locked and protected.

For ISKLM configurations, there is no local keystore on the NPS hosts. The ISKLM
support requires some additional configuration for your NPS hosts to become a
client of the ISKLM server. The configuration steps are described in the section
“IBM Security Key Lifecycle Manager configuration steps” on page 6-4.

You should use the nzkeybackup command to create a backup copy of the AEKs
after you change the keys. If the keystore on the NPS host or the ISKLM server is
lost, the disks cannot be read. Make sure that you carefully protect the keystore
backups for the appliance in a secure area, typically in a location that is not on the
NPS hosts.

Note: When auto-lock mode is enabled, and a disk is failed over either
automatically or manually using the nzhw failover -id <diskHwId> command, the
system automatically securely erases the disk contents. Contact IBM Support for
assistance with the process to securely erase one or more disks on the system. If a
disk is physically removed from the system before it is failed over, the system
detects the missing drive and fails over to an available spare disk, but the removed
disk is not securely erased because it is no longer in the system. In auto-lock
mode, the disk is locked when it is powered down, so the contents are not
readable.

About the IBM Security Key Lifecycle Manager support

Starting in NPS release 7.2.1, you can configure your IBM PureData System for
Analytics N3001 models to send the AEKs to an IBM Security Key Lifecycle
Manager (ISKLM) server in your environment. The NPS support requires ISKLM
version 2.5.0.5 or later.

In this configuration, the ISKLM server only stores and sends the AEKs that are
manually generated on the NPS host. The ISKLM server cannot be used to
automatically create and rotate the AEKs on a scheduled basis. You must have an
ISKLM server already set up and running in your environment, and you need
assistance from the ISKLM administrator to add the NPS host as a client of the

6-2 IBM Netezza System Administrator’s Guide


ISKLM server, and you must obtain information such as the device group name,
device serial number, certificates, and ISKLM server and port information from the
ISKLM administrator. The NPS documentation does not provide instructions for
setting up and activating ISKLM in your environment.

Important: If you configure your N3001 system to use ISKLM as the key
repository, note that you cannot downgrade from NPS release 7.2.1 to an earlier 7.2
release unless you convert from ISKLM to a local keystore for your SEDs. The IBM
Netezza Software Upgrade Guide has instructions for disabling ISKLM support and
returning to a local keystore before downgrading.

Unlocking the SEDs


After your SEDs are operating in auto-lock mode, it is possible to unlock them and
restore them to the default secure erase mode, but the process requires significant
assistance from IBM Support.

Typically, after you configure SEDs to use auto-lock mode, you would never
change them back to the default secure erase mode. If for some reason you must
reconfigure the SEDs, it is possible to do so, but this process is very complex and
requires a lengthy service window and possible service charges. There is also a risk
of data loss especially if your backups for the system are stale or incomplete. Make
sure that reconfiguring your SEDs to secure erase mode is appropriate for your
environment.

CAUTION:
The process to reconfigure SEDs to secure erase mode from the auto-lock mode
is not a process that you can run on your own. You must work with IBM
Support to reset the system correctly.

There are two options for reconfiguring your host SEDs to secure erase mode:
v The first option is to have IBM Support replace your host drives with a set of
new drives that are custom-built with the correct releases of software for your
system. The host motherboards/planars must also replaced (or the host disks
securely erased) to clear the RAID controller NVRAM that holds the AEK.
Reconfiguring the host SEDs requires system downtime, charges for the
replacement disks and planars, and approximately a day of downtime to replace
the disks and restore your NPS host backups and metadata.
v The second option is to completely reinitialize your system to a factory default
level, then reload all your data from the most recent full backup. This option
could require a service window of several days for the reinitialization and
complete reload.

To change the storage array SEDs from auto-lock mode to standard secure erase
mode, there is an IBM Support process to disable the authentication key. This
process requires you to securely erase the storage drives and reload the full
database backups from your most recent NPS backup. If it is an option, such as for
a non-production test system, a full system reinitialization would also reset the
drives from auto-lock mode. You would then need to restore your NPS data from
your backups, or start creating new data from new load sources.

SED keystore
The keystore holds the AEKs for unlocking the host and SPU drives that are
configured to run in auto-lock mode.

Chapter 6. About self-encrypting drives 6-3


If you chose to auto-lock the drives at installation time, the IBM installation team
probably created the keystore during the appliance installation process. The
nzkeydb command creates and manages the keystores. If you use a local keystore,
the keystore is stored in the /nz/var/keystore directory. The keystore has a
password to help protect it from users who are not allowed to see the contents or
manage keys.

Important: If you use the IBM Security Key Lifecycle Manager (ISKLM) to store
and retrieve the AEKs for your NPS appliance, you can lock the drives using a
local keystore and then migrate to ISKLM management of the keys, or you can
configure the system to use ISKLM to create the keys and lock the drives. See the
“IBM Security Key Lifecycle Manager configuration steps” section for the
instructions to configure ISKLM support. After you configure ISKLM, the keys are
sent to the ISKLM server for storage and are not stored locally on the system.

If you lose the keystore, either because the local keystore is corrupted or deleted,
or because connectivity to the ISKLM server is lost, you lose the ability to unlock
your SED drives when they power on. As a best practice, make sure that you have
a recent backup of the current keys. You use the nzkeybackup command to create a
compressed tar file backup of the current keystore. You should always back up the
keystore after any key changes. Make sure that you save the keystore backups in a
safe location away from the NPS appliance.

Note: The nzhostbackup also captures the local keystore in the host backup, but
nzkeybackup is better because it does not require you to pause the NPS system and
stop query activity, and nzkeybackup -sklm can capture the keys that are stored in
an ISKLM server.

You can use the nzkeyrestore command to restore a keystore from a keystore
backup file.

Refer to the following command descriptions for more information:


v “The nzkeydb command” on page A-37
v “The nzkeybackup command” on page A-39
v “The nzkeyrestore command” on page A-40

IBM Security Key Lifecycle Manager configuration steps


If you want to store and retrieve your SED AEKs from an IBM Security Key
Lifecycle Manager (ISKLM) server in your environment, you must configure the
NPS appliance as a client.

The following list summarizes the steps needed for the ISKLM server setup. It is
important to work with your IBM Security Key Lifecycle Manager (ISKLM) system
administrator to configure the ISKLM server to communicate with the NPS
appliance.

ISKLM server setup

The ISKLM administrator must complete the following tasks:


v Create a device group for a Generic device type. The group is identified by its
group name.

6-4 IBM Netezza System Administrator’s Guide


v Create a device under the new group to represent the NPS appliance. The device
is identified by a unique serial number that is assigned by the ISKLM
administrator. (This serial number does not have to be the serial number of the
NPS appliance
v Set the new device to allow KMIP delete action. (This allows the ISKLM server
to delete old AEKs after the NPS administrator changes them on the NPS host.)
v Export the CA certificate that the client (the NPS appliance) can use to
authenticate the server certificate. The certificate must be in PEM (.pem) format.

NPS appliance setup

After the ISKLM administrator has added the NPS appliance to the ISKLM server,
make sure that you have the following information:
v The CA certificate and the client certificate in .pem format from the ISKLM
server
v The device group name created on the ISKLM server
v The device serial number created on the ISKLM server
v The ISKLM IP address and KMIP port value

To configure the ISKLM information on the NPS appliance, the NPS administrator
must do the following steps:
1. Log in to the active NPS host as the root user.
2. Save a copy of the CA certificate and client certificate files (must be in .pem
format) in the /nz/data/security directory.
3. Log in to the active NPS host as the nz user.
4. Using any text editor, edit the /nz/data/config/system.cfg file (or create the
file if it does not exist).
5. Define the following settings in the system.cfg file:
startup.kmipDevGrpSrNum = Device_serial_number
startup.kmipDevGrp = Device_group_name
startup.kmipClientCert = /nz/data/security/client.pem
startup.kmipClientKey = /nz/data/security/privkey.pem
startup.kmipCaCert = /nz/data/security/ca.pem
startup.keyMgmtServer = tls://ISKLM_IP_ADDRESS:KMIP_PORT
startup.keyMgmtProtocol = local

The keyMgmtProtocol = local setting indicates that the system uses a locally
managed keystore and keys. Keep the local setting until you verify that the
connections to the ISKLM server are correctly configured and working. After
that verification, and after uploading the AEKs to the ISKLM server, you can
change the setting to use the ISKLM keystore.
6. Save the system.cfg file.
7. Log out of the nz account and return to the root account.

Testing the ISKLM server connection

As root, use the nzkmip test command on the NPS host to test ISKLM
connectivity. This command requires you to specify a label and key (either directly
or in a file) to test the ISKLM server operations:
[root@nzhost ~]# /nz/kit/bin/adm/nzkmip test -label spuaek
-file /tmp/new_spukey.pem
Connecting to SKLM server at tls://1.2.3.4:5696
Success: Connection to SKLM store succeeded

Chapter 6. About self-encrypting drives 6-5


Preparing to switch from the local to the ISKLM keystore

After you confirm that the ISKLM connection is working, follow these steps to
prepare for switching over to the ISKLM server.
1. As root, run the following command to populate the keys from the local
keystore to the ISKLM keystore:
[root@nzhost ~]# /nz/kit/bin/adm/nzkmip populate
2. To confirm that the keys were populated correctly, query the _t_kmip_mapping
table:
SYSTEM.ADMIN(ADMIN)=> select * from _t_kmip_mapping;
DISKLABEL | UID
-------------+-----------------------------------------
spuaek | KEY-56e36030-3a9c-4313-8ce6-4c6d5d898211
spuaekOld | KEY-56e36030-3a9c-4313-8ce6-4c6d5d898312
hostkey1 | KEY-56e36030-3a9c-4313-8ce6-4c6d5d898432
hostkey1Old | KEY-56e36030-3a9c-4313-8ce6-4c6d5d898541
hostkey2 | KEY-56e36030-3a9c-4313-8ce6-4c6d5d898865
hostkey2Old | KEY-56e36030-3a9c-4313-8ce6-4c6d5d898901
(6 rows)
3. For each UUID listed in the table, run the following command to display the
value of the key:
[root@nzhost ~]# /nz/kit/bin/adm/nzkmip get
-uuid KEY-56e36030-3a9c-4313-8ce6-4c6d5d898211
Key Value : t7Nº×nq¦CÃ<"*"ºìýGse»¤;|%
4. Create a backup of the local keystore with nzkeybackup. As a best practice, save
the backup to a secure location away from the NPS host.

Switch from local keystore to ISKLM

After you have completed and tested the ISKLM connection, and you have created
a local keystore backup file, follow these steps to switch to the ISKLM server:
1. Log in to the NPS host as the nz user.
2. Stop the system using the nzstop command.
3. Rename the local GSKit keystore to keydb.pl2 and keydb.sth files.
4. Log in as root and edit the /nz/data/config/system.cfg file.
5. Change the setting for the keyMgmtProtocol to kmipv1.1 to switch to the
ISKLM server support:
startup.keyMgmtProtocol = kmipv1.1
6. Save and close the system.cfg file.
7. Log out of the root account to return to the nz account.
8. Start the system using the nzstart command. After the system starts, AEKs that
you create with the nzkey command are stored in and retrieved from the
ISKLM server.
9. Remove the renamed GSKit keystore files keydb.pl2 and keydb.sth.

Disable ISKLM AEK storage

If you need to change the NPS host to disable ISKLM support and return to a local
GSKit keystore for managing the keys, follow these steps:
1. Log in as root to the NPS host.
2. Dump the keys from ISKLM server to a local GSKit keystore:
[root@nzhost ~]# /nz/kit/bin/adm/nzkey dump
DB creation successful

6-6 IBM Netezza System Administrator’s Guide


Switch from ISKLM to the local keystore

After you have dumped the AEKs from the ISKLM server, follow these steps to
switch to a local keystore for the AEKs:
1. Log in to the NPS host as the nz user.
2. Stop the system using the nzstop command.
3. Log in as root and edit the /nz/data/config/system.cfg file.
4. Change the setting for the keyMgmtProtocol to local to switch to the local
GSKit keystore support:
startup.keyMgmtProtocol = local
5. Save and close the system.cfg file.
6. Run the following command to verify that the keys were dumped correctly:
[root@nzhost ~]# /nz/kit/bin/adm/nzkey list
7. Log out of the root account to return to the nz account.
8. Start the system using the nzstart command.
9. After the system starts, use the nzsql command to connect to the SYSTEM
database and delete entries from the _t_kmip_mapping table because the
system is now using a local GSKit keystore.
SYSTEM.ADMIN(ADMIN)=> truncate table _t_kmip_mapping;
TRUNCATE TABLE

After the system starts, AEKs that you create with the nzkey command are stored
and retrieved from the local keystore.

SED authentication keys


The authentication keys you use to lock the SEDs have the following requirements
and behaviors.

You can create and apply an authentication key to auto-lock the host drives and
the drives in the storage arrays. An authentication key must be 32 bytes. The keys
are managed using the IBM GSKit software. No other key management software or
server is required.

CAUTION:
Always protect and back up the authentication keys that you create and apply to
the disks. If you lose the keys, the disks cannot be unlocked when they are
powered on. You will be unable to read data from the disks, and you could
prevent the NPS system from starting.

You could create a conforming key for the host and SPU AEKs, but as a best
practice, you should use the nzkey generate command to automatically create a
random, conformant AEK for the host or SPU drives and store it in your local
keystore or in the ISM Security Key Lifecycle Manager if you have configured that
support for your appliance.

Each of the hosts in the appliance use an AEK to auto-lock the SEDs. The keys are
referred to as hostkey1 and hostkey2. The host RAID controllers have specific
requirements for the host authentication keys:
v The key value must be 32 bytes in length.
v The key is case-sensitive.
v The key must contain at least one number, one lowercase letter, one uppercase
letter, and one non-alphanumeric character (for example, < > @ +). You cannot
Chapter 6. About self-encrypting drives 6-7
specify a blank space, single quotation character, double quotation character,
exclamation point, or equals sign in the key value.
v The key can use only the printable characters in the range ASCII 0x21 to 0x7E.

The SEDs in the storage arrays use the SPU AEK to auto-lock the drives. The
storage array SPU keys must meet the following requirements:
v The key value must be 32 bytes in length.
v The key can use characters in the range ASCII from 0x00 to 0xFF.

Generate authentication keys


You use the nzkey generate command to create a conformant host or SPU key for
the SED drives.

Before you begin

If you want to change the host or SPU key that is used to lock your SEDs, you can
create a key manually, or you can use the nzkey generate command to create a
conforming key. Run separate commands to create the host key and the SPU key.

Procedure
1. Log in to the active NPS host as the root user.
2. Use the following command to create a host key:
[root@nzhost1 nz]# /nz/kit/bin/adm/nzkey generate -hostkey
-file /export/home/nz/hostkey.txt
Host key written to file
3. Use the following command to create a SPU key:
[root@nzhost1 nz]# /nz/kit/bin/adm/nzkey generate -spukey
-file /export/home/nz/spukey.txt
SPU key written to file

Results

The command creates saves the key in the specified file in plaintext. You can then
specify the host or key file as part of an nzkey change operation.

Important: The key files are in plain text and unencrypted. After you use the files
to change the key for the hosts or SPUs, make sure that you delete the generated
key files to protect the keys from being read by users who log in to the NPS
system.

List authentication keys


You use the nzkey list command to list the key labels that are in the keystore.

Before you begin

You can use the nzkey list command to display information about the keys that
are currently defined in the keystore without displaying the key text.

Procedure
1. Log in to the active NPS host as the root user.
2. Use the following command to list the key labels:

6-8 IBM Netezza System Administrator’s Guide


[root@nzhost1 nz]# /nz/kit/bin/adm/nzkey list
hostkey1
hostkey1Old
hostkey2
hostkey2Old
spuaek
spuaekOld
The sample output shows all the possible keys, although the output could
include a subset of these labels.
v hostkey1 is the label for the current AEK for host1. It is in the keystore if the
host1 drives have been auto-locked.
v hostkey1Old is the label for the previous AEK for host1 when a key change
is in progress. It is in the keystore if you changed the AEK for the host1
drives.
v hostkey2 is the label for the current AEK for host2. It is in the keystore if the
host2 drives have been auto-locked.
v hostkey2Old is the label for the previous AEK for host2 when a key change
is in progress. It is in the keystore if you changed the AEK for the host2
drives.
v spuaek is the label for the current AEK for the SPU. It is in the keystore if
the SPU drives have been auto-locked.
v spuaekOld is the label for the previous AEK for the SPU when a key change
is in progress. It is in the keystore if you changed the AEK for the SPU
drives.

Results

The command shows the labels for the keys that are currently in the keystore. If
AEKs has not been set, the command displays the message No keys found in key
store. You can use the -hostkey or -spukey option to list only the AEK labels for
the hosts or SPU.

Check authentication keys


You use the nzkey check command to check whether AEK is enabled, whether
SEDs are auto-locked, and more information about the AEK state for the SEDs.

Before you begin

You can use the nzkey check command to display information about auto-lock
state for the SEDs on the hosts and SPUs.

Procedure
1. Log in to the active NPS host as the root user.
2. Use the following command to check the AEK status. You must specify the
-spukey or the -hostkey option.
[root@nzhost1 nz]# /nz/kit/bin/adm/nzkey check {-spukey | -hostkey}
The command displays the following output.

Chapter 6. About self-encrypting drives 6-9


Table 6-1. nzkey check output samples. Possible output for the nzkey check command.
Output Description
AEK feature is enabled The AEK key feature is enabled for the
system. This indicates that the IBM
installers or IBM support has enabled
the auto-lock feature on the system.
AEK feature is not enabled The AEK key feature is not enabled for
the system. This indicates that the IBM
installers or IBM support has not
enabled the auto-lock feature on the
system
Host AEK status: When you specify the -hostkey
Verification of host ha1 AEK key: SUCCESS operation and AEK is enabled, this
Verification of host ha2 AEK key: SUCCESS message indicates that the host SEDs
are auto-locked. If the report shows a
failure, make sure that you contact IBM
Support promptly to investigate.
Failures indicate problems that must be
resolved to avoid downtime and
potential data loss.
Host AEK status: When you specify the -hostkey
Host ha1 AEK key not set operation and AEK is enabled, this
Host ha2 AEK key not set message indicates that the host SEDs
are not auto-locked.

6-10 IBM Netezza System Administrator’s Guide


Table 6-1. nzkey check output samples (continued). Possible output for the nzkey check
command.
Output Description
AEK key operations are not in progress When you specify the -spukey
SPU AEK status: operation and AEK is enabled, the
Unused = 2 command displays information about
Unset = 0 whether AEK operations are in
Old = 0 progress, and the status of the APU
New = 286
AEKs applied to the storage array
Error = 0
Error(Old) = 0 disks.
Fatal = 0
Repair = 0 If the SPU key change is in progress,
------------------ you can also use the -progress option
Total disks = 288 to display a progress percentage until
the update is complete.

You can use this command to identify


whether some or all of the disks are
using the new AEK, the old/former
AEK, or whether there are errors or
issues to investigate. The status values
indicate the following conditions:
v Unused indicates that the drives are
out of service and cannot be checked.
(It may be in the inactive or failed
role.)
v Unset indicates that the key for a
disk in the Assigned, Assigning,
Active, and Spare states is not set.
v Old indicates that the drives use the
previous/former AEK.
v New indicates that the drives use the
new AEK.
v Error indicates that the key check
failed due to an unexpected error,
such as a disk I/O error.
v Error(old) indicates that the drives
could authenticate with the old key,
but not the new one.
v Fatal indicates that the disk could
not authenticate with either the new
key or the old key.
v Repair indicates that the key
operation has been deferred until a
regen is complete on the disks's
raided partner.

This sample output shows that 286 of


288 drives in an N3001-010 system are
using the new SPU AEK for
auto-locking. Two drives (Unused) are
out of service and could not be
checked.
SPU AEK key not set When you specify the -spukey
operation and AEK is enabled, this
message indicates that the storage array
SEDs are not auto-locked.

Chapter 6. About self-encrypting drives 6-11


Results

The command provides more information about whether AEK feature is enabled or
disabled, and whether keys have been applied to auto-lock the SEDs in the hosts
and storage arrays. The command also provides information to alert you when
there may be issues with the drives that need further investigation and possible
troubleshooting from IBM Support.

Extract authentication keys


You use the nzkey extract command to extract the AEK defined in the keystore
and save the plaintext key for a specific key label to a file.

Before you begin

You can use the nzkey list command to list the available key labels defined in the
keystore. You can extract only one key to a file. If the file exists, the command
displays an error.

Procedure
1. Log in to the active NPS host as the root user.
2. Use the following command to extract the key for a specified label. For
example:
[root@nzhost1 nz]# /nz/kit/bin/adm/nzkey extract -label hostkey1
-file /nz/var/hostkey1.txt
Key written to file

Results

The command creates a file with the extracted AEK. This file can be helpful in
cases where you need the current key to reapply a key to SEDs for
troubleshooting, or if you want to preserve the key in a third-party key tracking
system. As a best practice, make sure that the output file is safe from unauthorized
access. Consider deleting the file or moving it to a secure location to protect the
key.

Change the host authentication key


You use the nzkey change -hostkey command to change the AEK for the hosts on
the appliance.

Before you begin

Before you begin, make sure that you have your new AEK for the hosts. You
should use the nzkey generate command to generate a new AEK for the host key.

To change the host AEK, the NPS system must be in the Stopped state. The new
AEK takes effect on both hosts when the nzkey command finishes running
successfully. The command creates a backup copy of the current keystore before it
changes the key. After the change is finished, you should create a backup of the
new keystore using the nzkeybackup command.

Procedure
1. Log in to the active host of the NPS system as the nz user.
2. Transition the system to the Stopped state, for example:

6-12 IBM Netezza System Administrator’s Guide


[nz@nzhost1 ~]$ nzsystem stop
3. Run the nzstate command to confirm that the system is in the Stopped state.
4. Log in as the root user:
[nz@nzhost1 ~]$ su - root
5. Use the nzkey change command to change the host key:
[root@nzhost-h1 ~] /nz/kit/bin/adm/nzkey change -hostkey
-file /usr/key/hostkey_change -backupdir /nz/var/backups/
# Keystore archive /nz/var/backups/keydb_20140711054140.tar.gz written
==========================================================
AEK Summary
==========================================================

Result: Key operation completed successfully.


6. Create a backup of the updated keystore:
[root@nzhost-h1 ~] /nz/kit/bin/adm/nzkeybackup /nz/var/keybackup.tar.gz
Keystore archive /nz/var/keybackup.tar.gz written
7. Log out of the root account and return to the nz account.
8. Run the nzstart command to return the system to the Online state.

What to do next

After they key is successfully applied, move the /nz/var/keybackup.tar.gz to a


secure location away from the appliance so that you have a backup available for
disaster recovery of the keystore. You should also delete the host key file
(/usr/key/hostkey_change in this example) as a security precaution for someone
finding your authentication key outside the protected keystore.

For more information about the command:


v “The nzkey command” on page A-34

Resume host AEK key change


You use the nzkey resume command to resume an interrupted host AEK create or
change operation.

Before you begin

You would typically use the nzkey resume command to resume a host AEK change
operation that was interrupted and did not complete. This command can also be
used to resume a host AEK create operation, but typically the IBM installers or
IBM support perform the tasks to create and enable the AEKs to auto-lock drives.
To resume the host AEK operation, you must have the backup file pathname for
the interrupted operation.

Procedure
1. Log in to the active NPS host as the root user.
2. Use the following command to resume a host AEK change operation. For
example:
[root@nzhost1 nz]# /nz/kit/bin/adm/nzkey resume
-backupDir /nz/var/hostbup_01

Results

The command resumes the host key operation. If the command displays an error,
contact IBM Support for assistance.

Chapter 6. About self-encrypting drives 6-13


Change the SPU authentication key
You use the nzkey change -spukey command to change the AEK for the storage
array SEDs.

Before you begin

Before you begin, make sure that you have your new AEK for the SPU. You should
use the nzkey generate command to generate a new AEK for the SPU key.

If you are changing the SPU key for the storage array drives, system must be in
the Paused or Offline mode because the system manager must be running to
propagate the new key but no queries or I/O activity should be active. The new
AEK is immediately communicated from the system manager to the SPUs. Note
that if you attempt to transition the system to the Online state, the state transition
wait until all the SPUs and disks are updated with the new AEK. The command
creates a backup copy of the current keystore before it changes the key. After the
change is finished, you should create a backup of the new keystore using the
nzkeybackup command.

Procedure
1. Log in to the active host of the NPS system as the nz user.
2. Transition the system to the Paused or Offline state, for example:
[nz@nzhost1 ~]$ nzsystem pause
Are you sure you want to pause the system (y|n)? [n] y
3. Log in as the root user:
[nz@nzhost1 ~]$ su - root
4. Use the nzkey change command to change the SPU key:
[root@nzhost-h1 ~] /nz/kit/bin/adm/nzkey change -spukey
-file /tmp/spukey_change -backupdir /tmp/backups/
# Keystore archive /tmp/backups/keydb_20140711054140.tar.gz written
==========================================================
AEK Summary
==========================================================

Result: Key operation completed successfully.


-> You can run ’nzsystem resume’ to resume the system state.
5. Create a backup of the updated keystore:
[root@nzhost-h1 ~] /nz/kit/bin/adm/nzkeybackup /nz/var/keybackup.tar.gz
Keystore archive /nz/var/keybackup.tar.gz written
6. Log out of the root account and return to the nz account.
7. Run the nzsystem resume command to return the system to the Online state.

What to do next

After they key is successfully applied, move the /nz/var/keybackup.tar.gz to a


safe location away from the appliance so that you have a backup available for
disaster recovery of the keystore. You should also delete or move the SPU key file
(/tmp/spukey_change in this example) as a precaution.

For more information about the command:


v “The nzkey command” on page A-34

6-14 IBM Netezza System Administrator’s Guide


AekSecurityEvent
The NPS system has a default event type AekSecurityEvent that can monitor and
report issues with the SED drives.

The AekSecurityEvent monitors the SED drives and sends an email to the
configured event contacts when any of the following conditions occur:
v The system has transitioned to the Down state because of a SPU AEK operation
failure.
v A SPU AEK operation has occurred, such as successful completion of key create
or change for the SPU key.
v A labelError has been detected on a disk for the SPU key. A labelError typically
occurs when the new SPU key is not applied to a disk and the disk still uses the
old/former key to authenticate.
v A fatal error is detected on a disk for the SPU key. A fatal error occurs when
neither the current SPU key nor the previous SPU key can be used to key to
authenticate the drive.
v A key repair state is detected on a disk during a SPU key create or change. A
key repair state issue occurs when the key operation is deferred on a SED
because of a key fatal error on the drive's RAID partner disk.
v The system manager has started a key repair operation. This usually occurs just
before applying the key on the deferred disk after the regen on the disk has
finished.

To create and enable an event rule for the AekSecurityEvent, you use the nzevent
command to add an event rule as in the following example. Make sure that you
run the command on the active host.
[nz@nzhost1 ~]$ nzevent copy -useTemplate
-name AekSecurityEvent -newName SedAekEvent -eventType AekSecurityEvent
-on 1 -dst [email protected]

Chapter 6. About self-encrypting drives 6-15


6-16 IBM Netezza System Administrator’s Guide
Chapter 7. Manage the Netezza server
This section describes how to manage the IBM Netezza server and processes. The
Netezza software that runs on the appliance can be stopped and started for
maintenance tasks, so this section describes the meaning and impact of system
states.

This section also describes log files and where to find operational and error
messages for troubleshooting activities. Although the system is configured for
typical use in most customer environments, you can also tailor software operations
to meet the special needs of your environment and users by using configuration
settings.

Software revision levels


The software revision level is the release or version of the IBM Netezza software
that is running on your Netezza appliance.

The revision level typically includes a major version number, a release number, a
maintenance release number, and a fix pack number. Some releases also include a
patch designation such as P1 or P2.

Display the Netezza software revision


You can use the nzrev command to display the current IBM Netezza software
revision. If you enter the nzrev command with no arguments, Netezza returns the
release number string and the build number. For example:
nzrev
Release 7.1.0.0 [Build 34879]

When you enter the nzrev -rev command, Netezza returns the entire revision
number string, including all fields (such as variant and patch level, which in this
example are both zero).
nzrev -rev
7.1.0.0-P0-F1-Bld34879

From a client system, you can use the following command to display the revision
information:
nzsystem showRev -host host -u user -pw password
Related reference:
“The nzrev command” on page A-47
Use the nzrev command to display the IBM Netezza software revision level.

Display the software revision levels


You can use the nzcontents command to display the revision and build number of
all the executable files on the host. This command takes several seconds to run and
results in multiple lines of output.

Note: Programs with no revisions are scripts or special binary files.

© Copyright IBM Corp. 2001, 2015 7-1


When you enter the nzcontents command, IBM Netezza displays the program
names, the revision stamps, the build stamps, and checksum. The following sample
output shows a small set of output, and the checksum values are truncated to fit
the output messages on the page.
Program Revision Stamp Build Stamp CheckSum
-------------- ------------------------ ----------------------------- --------------
adm Directory
nzbackup 7.1.0.0-P0-F1-Bld34879 2014-01-08.34879...24438 3d5da...
nzcontents ab685...
nzconvert 7.1.0.0-P0-F1-Bld34879 2014-01-08.34879...24438 3a52...
nzds 7.1.0.0-P0-F1-Bld34879 2014-01-08.34879...24438 d3f2...

The following table describes the components of the Revision Stamp fields.
Table 7-1. Netezza software revision numbering
Version Release Maintenance Fixpack -Pn -Fn -Bldn
Numeric Numeric Numeric Numeric Alphanumeric Alphanumeric Alphanumeric

Incremented Incremented Incremented Incremented Incremented for Rare cases of a Incremented


for major for minor for for quarterly a monthly patch hot fix release. serially for a
releases. releases. maintenance update release after a production build
releases. releases. fixpack release. number.
Commonly P1
and P2.

System states
The IBM Netezza system state is the current operational state of the appliance.

In most cases, the system is online and operating normally. There might be times
when you must stop the system for maintenance tasks or as part of a larger
procedure.

You can manage the Netezza system state by using the nzstate command. It can
display and wait for a specific state to occur.
Related reference:
“The nzstate command” on page A-58
Use the nzstate command to display the current system state or to wait for a
particular system state to occur.

Display the current system state


You can use the nzstate command to display the current system state.
[nz@nzhost ~]$ nzstate
System state is ’Online’.

The following table lists the common system states and how they are invoked and
exited.

7-2 IBM Netezza System Administrator’s Guide


Table 7-2. Common system states
States Description Invoked Exited
Online Select this state to The system enters The system exits the
make the IBM this state when you online state when
Netezza fully use the nzsystem you use the nzsystem
operational. This restart or resume stop, offline, pause,
state is the most command, or after or restart
common system you boot the system. commands.
state. In this state,
the system is ready
to process or is
processing user
queries.
Tip: You can also use the nzsystem restart command to quickly
stop and start all server software. You can use the nzsystem restart
command only on a running Netezza that is in a non-stopped state.
Offline Select this state to The system enters The system exits this
interrupt the this state when you state when you use
Netezza. In this state, use the nzsystem the nzsystem resume
the system completes offline command. or stop command.
any running queries,
but displays errors
for any queued and
new queries.
Paused Select this state when The system enters the The system exits the
you expect a brief paused state when paused state when
interruption of server you use the nzsystem you use the nzsystem
availability. In this pause command. resume or stop
state, the system command, or if there
completes any is a hardware failure
running queries, but on an active SPU.
prevents queued or
new queries from
starting. Except for
the delay while in
the paused state,
users do not notice
any interruption in
service.
Down The system enters the Not user invoked. You must repair the
down state if there is system hardware and
insufficient hardware then use the nzsystem
for the system to resume command.
function even in
failover mode. For
more information
about the cause of
the Down state, use
the nzstate -reason
command.

Chapter 7. Manage the Netezza server 7-3


Table 7-2. Common system states (continued)
States Description Invoked Exited
Stopped Select this state for The system enters the The system exits the
planned tasks such as stopped state when stopped state when
installation of new you use the nzsystem you use the nzstart
software. In this stop or the nzstop command.
state, the system command. If you use
waits for currently the nzstop command,
running queries to the system stops all
complete, prevents running queries.
queued or new
queries from starting,
and then shuts down
all Netezza software.

System states reference


When the IBM Netezza software is running, the system and SPUs can transition
through the following operational states. The states that end in the letters ing (such
as Pausing, Pausing Now, Discovering) are typically transitional states that are
short in duration. The other states such as those described in Table 7-2 on page 7-3
are usually the longer duration states; the system usually remains in those states
until operator action forces a state change. The following table describes all of the
system states.
Table 7-3. System states reference
State Description
Down The system is not configured (there is no configuration information
for the data slices to SPU topology) or there is not enough working
hardware to operate the system even in failover.

The SPUs can never be in this state.


Discovered The SPUs and other components are discovered, but the system is
waiting for all components to complete start-up before transitioning
to the initializing state.
Discovering The System Manager is in the process of discovering all the system
components that it manages.
Going Offline The system is in an interim state that is going to offline.
Going Offline (Now) The system is in an interim state that is going to offline now.
Going Pre-Online The system is in an interim state, going to pre-online.
Going to Maintain
Initialized The system uses this state during the initial startup sequence.
Initializing The system is initializing. You cannot execute queries or
transactions in this state.
Maintain
Missing The System Manager detected a new, unknown SPU in a slot that
was previously occupied but not deleted.
Offline (Now) This state is similar to offline, except that the system stops user jobs
immediately during the transition to offline.

For more information, see Table 5-4 on page 5-10.


Online The system is running normally. It can service requests.

7-4 IBM Netezza System Administrator’s Guide


Table 7-3. System states reference (continued)
State Description
Paused The system is paused. You cannot run user jobs.
Paused (Now) This state is similar to paused, except that the system stops user
jobs immediately when it transitions to paused.

For more information, see Table 5-4 on page 5-10.


Pausing The system is transitioning from online to paused. In this state, no
new queries or transactions are queued, although the system allows
current transactions to complete unless you used the nzsystem
pause -now command.
Pausing Now The system is attempting to pause because of a hardware failure, or
the administrator entered the nzsystem pause -now command.
Pre-Online The system has completed initialization. The system goes to the
resume state.
Resuming The system is waiting for all its components (SPUs and host
processes) to reach the online state before it changes the system
state to online.
Stopped The system is not running. Commands assume this state when they
attempt to connect to a system and get no response.

The SPUs can never be in this state.


Stopped (Now) This state is similar to stopped, except that the system stops user
jobs immediately when it makes the transition to stopped.
Stopping The system is transitionimg from online to stopped.
Stopping Now The system is attempting to stop, or the administrator entered the
nzsystem stop -now command.
Unreachable The System Manager cannot communicate with the SPU because it
has failed or it was physically removed from the system.

Wait for a system state


You use the nzstate command to wait for a specific operational state to occur
before you proceed with other commands or actions. You can use the nzstate
command to list the system states that you can wait for, as follows:
[nz@nzhost ~]$ nzstate listStates

State Symbol Description


------------ ------------------------------------------------------------
initialized used by a system component when first starting
paused already running queries will complete but new ones are queued
pausedNow like paused, except running queries are aborted
offline no queries are queued, only maintenance is allowed
offlineNow like offline, except user jobs are stopped immediately
online system is running normally
stopped system software is not running
down system was not able to initialize successfully
v To wait for the online state or else timeout after 10 seconds, enter:
nzstate waitfor -u admin -pw password -host nzhost -type online
-timeout 10
v To test scripts or do maintenance, enter:
nzsystem pause -force
nzstate waitfor -u admin -pw password -host nzhost -type paused
-timeout 300

Chapter 7. Manage the Netezza server 7-5


Do some maintenance, and then resume the system:
nzsystem resume
nzstate waitfor -u admin -pw password -host nzhost -type online
-timeout 120

Run a query.

Manage the system state


You use the nzstart to start the IBM Netezza system operations. You use the
nzstop command stop the Netezza system operations. The nzsystem command
provides more state change options, such as pausing the system, resuming the
system, and restarting the system.

All nzsystem subcommands, except the nzsystem showState and showRev


commands, require the Manage System administrative privilege. For more
information, see Table 11-1 on page 11-10.

Note: When you stop and start the Netezza system operations on a Netezza C1000
system, the storage groups continue to run and perform tasks such as media
checks and health checks for the disks in the array, as well as disk regenerations
for disks that fail. The RAID controllers are not affected by the Netezza system
state.

Start the system


When you start the IBM Netezza system, you bring the system and database
processes fully online so that the Netezza system is ready to run user queries and
other tasks.

You can use the nzstart command to start system operation if the system is in the
stopped state. The nzstart command is a script that initiates a system start by
setting up the environment and invoking the startup server. The nzstart command
does not complete until the system is online. The nzstart command also verifies
the host configuration to ensure that the environment is configured correctly and
completely; it displays messages to direct you to files or settings that are missing
or misconfigured.

To start the Netezza system, enter:


nzstart
(startupsvr) Info: NZ-00022: --- program ’startupsvr’ (23328)
starting on host ’nzhost’ ... ---

Restriction: You must run nzstart on the host and be logged on as the user nz.
You cannot run it remotely from Netezza client systems.

For IBM Netezza 1000 and IBM PureData System for Analytics N1001 systems, a
message is written to the sysmgr.log file if there are any storage path issues that
are detected when the system starts. The log displays a message similar to mpath
-issues detected: degraded disk path(s) or SPU communication error, which
helps to identify problems within storage arrays.
Related reference:
“The nzstart command” on page A-56
Use the nzstart command to start system operation after you stop the system. The
nzstart command is a script that initiates a system start by setting up the
environment and starting the startup server.

7-6 IBM Netezza System Administrator’s Guide


“Hardware path down” on page 8-20

Stop the system


When you stop the IBM Netezza system, you stop the database processes and
services. The system aborts and rolls back any active queries, so user queries or
tasks such as loads, backups, and others cannot run. Typically, you stop the server
when directed to do so as part of a specific administration procedure or when you
must perform a major management task. You can use the nzstop command to stop
a running system. (You can also use the nzsystem stop command, but nzstop is the
recommended method.) Stopping a system stops all Netezza host processes.

Restriction: You must run nzstop on the host and be logged on as the user nz.
You cannot run it remotely.

To stop the system, use the nzstop command.

To stop the system or exit after waiting for 5 minutes (300 seconds), enter nzstop
-timeout 300.
Related reference:
“The nzstop command” on page A-63
Use the nzstop command to stop the IBM Netezza software operations. Stopping a
system stops all the IBM Netezza processes that were started with the nzstart
command.

Pause the system


Certain management tasks such as host backups require the system to be in the
paused state. When you pause the system, the system queues any new queries or
work until the system is resumed. By default, the system finishes the queries and
transactions that are active at the time the pause command is issued.

To transition to the paused state, enter:


[nz@nzhost ~]$ nzsystem pause
Are you sure you want to pause the system (y|n)? [n] y

Enter y to continue. The transition completes quickly on an idle system, but it can
take much longer if the system is busy processing active queries and transactions.
When the transition completes, the system enters the paused state, which you can
confirm with the nzstate command as follows:
[nz@nzhost ~]$ nzstate
System state is ’Paused’.

You can use the -now option to force a transition to the paused state, which causes
the system to abort any active queries and transactions. As a best practice, use the
nzsession show -activeTxn command to display a list of the current active
transactions before you force the system to terminate them.

Resume the system


When a system is paused or offline, you can resume the normal operations by
resuming the system. When you resume the system from a paused state, it starts to
process all the transactions that were submitted and queued while it was paused.
In some cases, the system also restarts certain transactions that support the restart
operations.

To resume the system and return it to the online state, enter:

Chapter 7. Manage the Netezza server 7-7


[nz@nzhost ~]$ nzsystem resume

The command usually completes quickly; you can confirm that the system has
returned to the online state by using the following command:
[nz@nzhost ~]$ nzstate
System state is ’Online’.

Take the system offline


When you take the system offline, the system does not queue any new work or
transactions. The state only allows maintenance tasks to run. By default, the system
finishes the queries and transactions that are active at the time the offline
command is issued.

To make the transition to the offline state, enter:


[nz@nzhost ~]$ nzsystem offline
Are you sure you want to take the system offline (y|n)? [n] y

Enter y to continue. The transition completes quickly on an idle system, but it can
take much longer if the system is busy processing active queries and transactions.
When the transition completes, the system enters the offline state, which you can
confirm with the nzstate command as follows:
[nz@nzhost ~]$ nzstate
System state is ’Offline’.

You can use the -now option to force a transition to the offline state, which causes
the system to abort any active queries and transactions. As a best practice, use the
nzsession show -activeTxn command to display a list of the current active
transactions before you force the system to terminate them.

Restart the system


When a system is in the online state but a system problem occurs, you can restart
the system, which stops and starts all server software. You can use the nzsystem
restart command only on a running system that is in a non-stopped state.

To restart the system, enter:


[nz@docspubox ~]$ nzsystem restart
Are you sure you want to restart the system (y|n)? [n] y

Overview of the Netezza system processing


When you start the IBM Netezza system, you automatically start a number of
system processes. The following table describes the Netezza processes.
Table 7-4. Netezza processes
Process Description
bnrmgr v Handles incoming connections from the nzbackup and nzrestore
commands.
v Starts an instance of the backupsvr or restoresvr to handle each client
instance.
bootsvr v Informs TFTP client (the SPUs) of the location of their initial program
or download images on the host.
v Informs the SPUs where to upload their core file if a SPU is instructed
to create a core image for debugging purposes.

7-8 IBM Netezza System Administrator’s Guide


Table 7-4. Netezza processes (continued)
Process Description
clientmgr v Handles incoming connections from nz applications.
v This is not unlike the postmaster that handles incoming connections
from nzsql, ODBC, and others.
dbosDispatch v Accepts execution plans from the Postgres, backup, and restore process
or processes.
v Dynamically generates C code to process the query, and cross-compiles
the query so that it can be run on the host.
v Broadcasts the compiled code to the SPUs for execution.
dbosEvent v Receives responses and results from the SPUs. As appropriate, it might
have the SPUs do more steps as part of the query.
v Rolls up the individual result sets (aggregated, sorted, consolidated)
and sends the final results back to the client's Postgres, backup, or
restore process.
eventmgr v Processes events and event rules. When an event occurs, such as the
system changes state, a hardware component fails or is restarted, the
eventmgr checks to see whether any action must be taken based on
the event and if so, it takes action. The action can be sending an email
message or running an external program. For more information about
event rules, see Chapter 8, “Event rules,” on page 8-1.
loadmgr v Handles incoming connections from the nzload command.
v Starts an instance of the loadsvr to handle each instance of the nzload
command.
nzvacuumcat v At boot time, the system starts the nzvacuumcat command, which in
turn invokes the internal VACUUM command on system catalogs to
remove unneeded rows from system tables and compact disk space to
enable faster system table scanning.
v During system operation, the nzvacuumcat program monitors the
amount of host disk space that is used by system tables in each
database. It checks every 60 seconds. If the system catalog disk space
for a particular database grows over a threshold amount (128 KB), the
nzvacuumcat program initiates a system table vacuum (VACUUM) on
that database.
v The VACUUM command works on system tables only after it obtains
an exclusive lock on all system catalog tables. If it is unable to lock the
system catalog tables, it quits and tries again. Only when the
VACUUM command succeeds does the nzvacuumcat program change
the size of the database.
v While the VACUUM command is working, the system prevents any
new SQL or system table activity to start. This window of time is
usually about 1 to 2 seconds, but can be longer if significant amounts
of system catalog updates or deletes have occurred since the last
VACUUM operation.
postgres v Validates the access rights (user name, password, ACL).
v Parses the SQL, and generates the optimized execution plan.
v Returns the results set to the client application when the query finishes
executing.
Two default postgres jobs are associated with the sysmgr and the
sessionmgr processes.

Chapter 7. Manage the Netezza server 7-9


Table 7-4. Netezza processes (continued)
Process Description
postmaster v Accepts connection requests from clients (nzsql, ODBC, and other
clients).
v Starts one postgres process per connection to service the client.
sessionmgr v Keeps the session table current with the state of the different sessions
that are running the system. For more information, see “Session
manager” on page 7-16.
startupsvr v Launches and then monitors all of the other processes. If any system
process dies, startupsvr follows a set of predefined rules, and either
restarts the failed process or restarts the entire system.
v Controlled by /nz/kit/sys/startup.cfg
statsmgr v Handles requests for statistics from the nzstats command. For more
information, see “Statistics server” on page 7-17.
statsSvr v Communicates with the nzstats command to obtain host-side
operational statistics.
v The nzstats command communicates with the sysmgr to obtain SPU
statistics.
sysmgr v Monitors and manages the overall state of the system.
v Periodically polls the SPUs to ensure that they are operational.
v Initiates state changes upon requests from the user or as a result of a
change in hardware status (for example, a SPU failure).

Related reference:
“System logs” on page 7-12

System states when Netezza starts


When you boot the system, the IBM Netezza software automatically starts. The
system goes through the following states:
1. Stopped
2. Discovering
3. Initializing
4. Preonlining
5. Resuming
6. Online

When you power up (or reset) the hardware, each SPU loads an image from its
flash memory and runs it. This image is then responsible for running diagnostic
tests on the SPU, registering the SPU with the host, and downloading runtime
images for the SPU CPU and the FPGA disk controller. The system downloads
these images from the host through TFTP.

7-10 IBM Netezza System Administrator’s Guide


System errors
During system operation different types of errors can occur. The following table
describes some of those errors.
Table 7-5. Error categories
Category Description Example
User error An error on the part of the user, Invalid user name, invalid SQL
usually because of incorrect or invalid syntax.
input.
Component A hardware or software system SPU failure; host process
failure component failure. crashes.
Environment A request of an environment facility A file is locked; a buffer is full.
failure fails. This is often because of resource
or access problems.
Recoverable A detected internal programming error Unknown case value or msg
internal error that is not severe enough to abort the type; file close fails.
program.
Unrecoverable A detected internal programming error Core, memory corruption, assert
internal error or a corrupted internal state that fails.
requires the program to abort.

The IBM Netezza system can take the following actions when an error occurs:
Display an error message
Presents an error message string to the users that describes the error. This
is the common system response whenever a user request is not fulfilled.
Try again
During intermittent or temporary failures, keep trying until the error
condition disappears. The retries are often needed when resources are
limited, congested, or locked.
Fail over
Switches to an alternate or spare component because an active component
has failed. Failover is a system-level recovery mechanism and can be
triggered by a system monitor or an error that is detected by software that
is trying to use the component.
Log the error
Adds an entry to a component log. A log entry contains a date and time, a
severity level, and an error/event description.
Send an event notification
Sends notification through email or by running a command. The decision
whether to send an event notification is based on a set of user-configurable
event rules.
Abort the program
Terminates the program because it cannot continue because of an
irreparably damaged internal state or because continuing would corrupt
user data. Software asserts that detect internal programming mistakes often
fall into this category because it is difficult to determine that it is safe to
continue.
Clean up resources
Frees or releases resources that are no longer needed. Software components
are responsible for their own resource cleanup. In many cases, resources

Chapter 7. Manage the Netezza server 7-11


are freed locally as part of each specific error handler. In severe cases, a
program cleanup handler runs before the program exits and frees/releases
any resources that are still held.

System logs
All major software components that run on the host have an associated log. Log
files have the following characteristics:
v Each log consists of a set of files that are stored in a component-specific
directory. For managers, there is one log per manager. For servers, there is one
log per session, and their log files have pid identifiers, date identifiers, or both
(<pid>.<yyyy-mm-dd>).
v Each file contains one day of entries, for a default maximum of seven days.
v Each file contains entries that have a timestamp (date and time), an entry
severity type, and a message.

The system rotates log files, that is, for all the major components there are the
current log files and the archived log files.
v For all IBM Netezza components (except postgres), the system creates a new log
file at midnight if there is constant activity for that component. If, however you
load data on Monday and then do not load again until Friday, the system creates
a new log file dated the previous day from the new activity, in this case,
Thursday. Although the size of the log files is unlimited, every 30 days the
system removes all log files that were not accessed.
v For postgres logs, by default, the system checks the size of the log file daily and
rotates it to an archive file if it is greater than 1 GB in size. The system keeps 28
days (four weeks) of archived log files. (Netezza Support can help you to
customize these settings if needed.)

To view the logs, log on to the host as user nz. When you view an active logfile,
use a file viewer command such as more, less, cat, tail, or similar commands. If
you use a text editor such as emacs or vi, you could cause an interruption and
possible information loss to log files that are actively capturing log messages while
the system is running.
Related concepts:
“Logging Netezza SQL information” on page 11-39
You can log information about all user or application activity on the server, and
you can log information that is generated by individual Windows clients.
Related tasks:
“Logging Netezza SQL information on the server” on page 11-39
Related reference:
“Overview of the Netezza system processing” on page 7-8

Backup and restore manager


The backup and restore manager logs information about the nzbackup and
nzrestore commands. The log file records the start and stop times of the nzbackup
and nzrestore processes and the start and stop times of the backupsvr and
restoresvr processes.

Log file
/nz/kit/log/bnrmgr/bnrmgr.log
Current backup and restore manager log

7-12 IBM Netezza System Administrator’s Guide


/nz/kit/log/bnrmgr/bnrmgr.YYYY-MM-DD.log
Archived log

Sample messages
2012-12-12 18:12:05.645586 EST Info: NZ-00022: --- program ’bnrmgr’ (26082)
starting on host ’nzhost’ ... ---
2012-12-12 18:17:09.315244 EST Info: system is online - enabling backup and
restore sessions

Bootserver manager
The bootsvr.log file records the initiation of all SPUs on the system, usually when
the system is restarted by the nzstart command and also all stopping and
restarting of the bootsvr process.

Log files
/nz/kit/log/bootsvr/bootsvr.log
Current log
/nz/kit/log/bootsvr/bootsvr.YYYY-MM-DD.log
Archived log

Sample messages
2012-12-12 18:12:07.399506 EST Info: NZ-00022: --- program ’bootsvr’ (26094)
starting on host ’nzhost’ ... ---
2012-12-12 18:15:25.242471 EST Info: Responded to boot request from device
[ip=10.0.14.28 SPA=1 Slot=1] Run Level = 3

Client manager
The clientmgr.log file records all connection requests to the database server and
also all stopping and starting of the clientmgr process.

Log files
/nz/kit/log/clientmgr/clientmgr.log
Current log
/nz/kit/log/clientmgr/clientmgr.YYYY-MM-DD.log
Archived log

Sample messages
2012-12-12 18:12:05.874413 EST Info: NZ-00022: --- program ’clientmgr’ (26080)
starting on host ’nzhost’ ... ---
2012-12-12 18:12:05.874714 EST Info: Set timeout for receiving from the socket
300 sec.
2012-12-12 18:17:21.642075 EST Info: admin: login successful

Database operating system


The dbos.log file records information about the SQL plans submitted to the
database server and also the restarting of the dbos process.

Log files
/nz/kit/log/dbos/dbos.log
Current log
/nz/kit/log/dbos/dbos.YYYY-MM-DD.log
Archived log

Chapter 7. Manage the Netezza server 7-13


Sample messages
2012-12-12 18:12:03.258402 EST Info: NZ-00022: --- program ’dbos’ (25991) starting
on host ’nzhost’ ... ---
2012-12-12 18:12:03.321092 EST Info: cacheDir=/nz/data.1.0/cache cacheImg=/nz/data
.1.0/cache/cache_img.txt m_size=52190 m_numDirs=307 m_ROWSIZE=170 m_planHistFiles=
2000
2012-12-12 18:17:38.713027 EST Info: plan queued: planid 1 tx 0x189402 cli 1 uid 1
001 sid 16056 pid [29672] (run 0/1)
2012-12-12 18:17:38.713306 EST Info: plan prep : planid 1 tx 0x189402 cli 1 uid 1
001 sid 16056 pid [29672] (run 0/1)
Plan ID
The plan number queued or started. This number relates to the
corresponding execution plan in the /nz/data/plans directory. The system
increments it for each new portion of SQL processed and resets it to 1
when you restart the system.
Q ID The queue to which this plan was assigned.
Tx ID The unique transaction identifier.
cli The ID of the client process.
UID The unique ID of the dbos client. Every time a client connects it receives a
unique number.
SID The ID related to the ID returned from the nzsession.
PID The process ID of the calling process running on the Netezza host.

Event manager
The eventmgr.log file records system events and the stopping and starting of the
eventmgr process.

Log files
/nz/kit/log/eventmgr/eventmgr.log
Current log
/nz/kit/log/eventmgr/eventmgr.YYYY-MM-DD.log
Archived log

Sample messages
2012-12-12 18:12:05.926667 EST Info: NZ-00022: --- program ’eventmgr’ (26081)
starting on host ’nzhost’ ... ---
2012-12-12 18:13:25.064891 EST Info: received & processing event type =
hwNeedsAttention, event args = ’hwId=1037, hwType=host, location=upper host,
devSerial=06LTY66, eventSource=system, errString=Eth RX Errors exceeded threshold,
reasonCode=1052’ event source = ’System initiated event’
2012-12-12 18:16:45.987066 EST Info: received & processing event type =
sysStateChanged, event args = ’previousState=discovering, currentState=initializing,
eventSource=user’ event source = ’User initiated event’
Event type
The event that triggered the notification.
Event args
The argument that is being processed.
ErrString
The event message, which can include hardware identifications and other
details.

7-14 IBM Netezza System Administrator’s Guide


eventSource
The source of the event; system is the typical value.

Flow communications retransmit


The flow communications retransmit log file records retransmission processes.

Log files
/nz/kit/log/fcommrtx/fcommrtx.log
Current® log
/nz/kit/log/fcommrtx/fcommrtx.2006-03-01.log
Archived log

Sample messages
2012-12-12 18:12:03.055247 EST Info: NZ-00022: --- program ’fcommrtx’ (25990) star
ting on host ’nzhost’ ... ---
2012-12-12 18:12:03.055481 EST Info: FComm : g_defenv_spu2port=0,6,1,7,2,8,3,9,4,1
0,5,11,6,0,7,0,8,1,9,2,10,3,11,4,12,5,13,0
2012-12-12 18:12:03.055497 EST Info: FComm : g_defenv_port2hostthread=0,1,2,3,4,5,
6,7,8,9,10,11,12,13

Host statistics generator


The hostStatsGen.log file records the starting and stopping of the hostStatsGen
process.

Log files
/nz/kit/log/hostStatsGen/hostStatsGen.log
Current log
/nz/kit/log/hostStatsGen/hostStatsGen.YYYY-MM-DD.log
Archived log

Sample messages
2012-12-12 18:12:04.969116 EST Info: NZ-00022: --- program ’hostStatsGen’ (26077)
starting on host ’nzhost’ ... ---

Load manager
The loadmgr.log file records details of load requests, and the stopping and starting
of the loadmgr process.

Log file
/nz/kit/log/loadmgr/loadmgr.log
Current log
/nz/kit/log/loadmgr/loadmgr.YYYY-MM-DD.log
Archived log

Sample messages
2004-05-13 14:45:07.454286 EDT Info: NZ-00022:
--- log file ’loadmgr’ (12225) starting on host ’nzhost’ ...

Postgres
The postgres.log file is the main database log file. It contains information about
database activities.

Chapter 7. Manage the Netezza server 7-15


Log files
/nz/kit/log/postgres/pg.log
Current log
/nz/kit/log/postgres/pg.log.n
Archived log

Sample messages
2012-12-31 04:02:10.229470 EST [19122] DEBUG: connection: host=1.2.3.4 user=
MYUSR database=SYSTEM remotepid=6792 fetype=1
2012-12-31 04:02:10.229485 EST [19122] DEBUG: Session id is 325340
2012-12-31 04:02:10.231134 EST [19122] DEBUG: QUERY: set min_quotient_scale to
default
2012-12-31 04:02:10.231443 EST [19122] DEBUG: QUERY: set timezone = ’gmt’
2012-12-31 09:02:10.231683 gmt [19122] DEBUG: QUERY: select current_timestamp,
avg(sds_size*1.05)::integer as avg_ds_total, avg(sds_used/(1024*1024))::integer as
avg_ds_used from _v_spudevicestate

Session manager
The sessionmgr.log file records details about the starting and stopping of the
sessionmgr process, and any errors that are associated with this process.

Log files
/nz/kit/log/sessionmgr/sessionmgr.log
Current log
/nz/kit/log/sessionmgr/sessionmgr.YYYY-MM-DD.log
Archived log

Sample messages
2012-12-12 18:11:50.868454 EST Info: NZ-00022: --- program ’sessionmgr’ (25843)
starting on host ’nzhost’ ... ---

SPU cores manager


The spucores.log file contains the core file if a SPU aborts. If several SPUs abort,
the system creates a core file for two of the SPUs. A SPU log file has a name in the
form /nz/kit/log/spucores/spulog<spuid>.<date>_<info>.gz .

Startup server
The startupsvr.log file records the start of the IBM Netezza processes and any
errors that are encountered with this process.

Log files
/nz/kit/log/startupsvr/startupsvr.log
Current log
/nz/kit/log/startupsvr/startupsvr.YYYY-MM-DD.log
Archived log

Sample messages
2012-12-12 18:11:43.951689 EST Info: NZ-00022: --- program ’startupsvr’ (25173)
starting on host ’nzhost’ ... ---
2012-12-12 18:11:43.952733 EST Info: NZ-00307: starting the system, restart = no
2012-12-12 18:11:43.952778 EST Info: NZ-00313: running onStart: ’prepareForStart’
2012-12-12 18:11:43 EST: Rebooting SPUs via RICMP ...

7-16 IBM Netezza System Administrator’s Guide


Statistics server
The statssvr.log file records the details of starting and stopping the statsSvr and
any associated errors.

Log files
/nz/kit/log/statsSvr/statsSvr.log
Current log
/nz/kit/log/statsSvr/statsSvr.YYYY-MM-DD.log
Archived log

Sample messages
2012-12-12 18:12:05.794050 EST Info: NZ-00022: --- program ’statsSvr’ (26079)
starting on host ’nzhost’ ... ---

System Manager
The sysmgr log file records details of stopping and starting the sysmgr process,
and details of system initialization and system state status.

Log file
/nz/kit/log/sysmgr/sysmgr.log
Current log
/nz/kit/log/sysmgr/sysmgr.YYYY-MM-DD.log
Archived log

Output
2012-12-12 18:12:05.578573 EST Info: NZ-00022: --- program ’sysmgr’ (26078) starting
on host ’nzhost’ ... ---
2012-12-12 18:12:05.579716 EST Info: Starting sysmgr with existing topology
2012-12-12 18:12:05.882697 EST Info: Number of chassis level switches for each
chassis in this system: 1

The nzDbosSpill file


The host data handling software in DbosEvent has a disk work area that the
system uses for large sorts on the host.

The IBM Netezza system has two sorting mechanisms:


v The Host Merge that takes sorted SPU return sets and produces a single-sorted
set. It uses temporary disk space to handle SPU double-duty situations.
v A traditional sorter that begins with a random table on the host and sorts it into
the order that you want. It can use a simple external sort method to handle large
data sets.

The file on the Linux host for this disk work area is $NZ_TMP_DIR/nzDbosSpill.
Within DBOS, there is a database that tracks segments of the file presently in use.

To avoid having a runaway query use up all the host computer disk space, there is
a limit on the DbosEvent database, and hence the size of the Linux file. This limit
is in the Netezza Registry file. The tag for the value is
startup.hostSwapSpaceLimit.

Chapter 7. Manage the Netezza server 7-17


System configuration
The behavior of an IBM Netezza system is determined by configuration settings
that are loaded from the configuration file, system.cfg, during system startup and
are stored in the configuration registry. These settings influence such things as
system management, workload management, host processes, and SPUs, and are
used to control and tune the system.

Attention: Never change a configuration setting unless directed to do so


explicitly by either:
v Your IBM Netezza support representative
v A documented Netezza procedure
Incorrect configuration settings can result in significant loss of performance and
other problems.
Related reference:
“The nzsystem command” on page A-65
Use the nzsystem command to change the system state, and show and set
configuration information.

Displaying the software revision level and configuration


registry settings
To display the software revision level and the configuration registry settings, issue
the nzsystem command.

For example:
v To display all system registry settings, enter:
nzsystem showRegistry

The resulting output looks similar to this:


# Netezza NPS configuration registry
# Date: 30-Apr-09 12:48:44 EDT
# Revision: 5.0.D1
#
# Configuration options used during system start
# These options cannot be changed on a running system
#
startup.numSpus = 6
startup.numSpares = 0
startup.simMode = no
startup.autoCreateDb
. = 0
.
.
system.twoPhaseSpu2SpuDist = no
system.autoDumpandgo = yes
v To display only those registry settings that affect the plan history, enter:
nzsystem showRegistry | grep planHist

The resulting output looks similar to this:


host.planHistFiles = 2000
host.planHistKeepArchives = 10
host.planHistArchive = yes

7-18 IBM Netezza System Administrator’s Guide


Changing configuration settings
There are some configuration settings that you can change yourself, without
involving your IBM Netezza support representative. These settings are referred to
in documented Netezza procedures elsewhere in this publication or elsewhere in
the Netezza product library.

Attention: Never change a configuration setting unless directed to do so


explicitly by either:
v Your IBM Netezza support representative
v A documented Netezza procedure
Incorrect configuration settings can result in significant loss of performance and
other problems.

There are two ways to change configuration settings:


Temporarily
For you to be able to change a configuration setting temporarily, you must
either be the admin user or your user account must have the Manage
System privilege. To change a configuration setting temporarily, pause the
system, issue the appropriate nzsystem set command, then resume the
system. This is done by issuing the following commands:
nzsystem pause
nzsystem set -arg setting=value
nzsystem resume

A change made in this way remains effective only until the system is
restarted; at system startup, all configuration settings are read from the
system configuration file and loaded into the registry.
Permanently
To change a configuration setting permanently, edit the corresponding line
in the configuration file, system.cfg. Configuration settings are loaded
from this file to the registry during system startup.

The following tables describe the configuration settings that you can change
yourself, without involving your IBM Netezza support representative.
Table 7-6. Configuration settings for short query bias (SQB)
Setting Type Default Description
host.schedSQBEnabled bool true Whether SQB is enabled (true) or disabled (false).
host.schedSQBNominalSecs int 2 The threshold, in seconds, below which a query is to be
regarded as being short.
host.schedSQBReservedGraSlots int 10 The number of GRA scheduler slots that are to be reserved
for short queries.
host.schedSQBReservedSnSlots int 6 The number of snippet scheduler slots that are to be reserved
for short queries.
host.schedSQBReservedSnMb int 50 The amount of memory, in MB, that each SPU is to reserve
for short query execution.
host.schedSQBReservedHostMb int 64 The amount of memory, in MB, that the host is to reserve for
short query execution.

Chapter 7. Manage the Netezza server 7-19


Table 7-7. Configuration settings for the updating virtual tables
Setting Type Default Description
host.graVtUpdateInterval int 600 The number of seconds between updates to the
_vt_sched_gra table, which contains resource usage statistics
for completed jobs.
host.snVtUpdateInterval int 60 The number of seconds between updates to the _vt_sched_sn
table, which contains resource usage statistics for completed
snippets.
host.sysVtUpdateInterval int 600 The number of seconds between updates of the
_vt_sched_sys virtual table.
host.sysUtilVtUpdateInterval int 60 The number of seconds between updates of the
_vt_system_util virtual table.

Table 7-8. Configuration settings for plan history


Setting Type Default Description
host.planHistArchive bool true The system stores the plan files and C++ source files that it
creates for each snippet. This setting determines where these
files are stored:
true The files are stored in a compressed archive (tar)
file. This option was introduced in Release 7.0.
false The files are stored in plan directories. This was the
only option that was available in Release 6.0.8 and
earlier.
host.planHistFiles int 2000 If host.planHistArchive=true, the maximum number of plans
that can be stored in a single compressed archive file. If
host.planHistArchive=false, the maximum number of plan
directories.
host.planHistKeepArchives int 10 The number of compressed archive files to keep.

Table 7-9. Configuration settings for downtime event logging


Setting Type Default Description
sysmgr.numberOfDownPortToRiseEvent int 5 The number of ports on the same switch that must be
down for the amount of time specified by
sysmgr.portDownTime1ToRiseEvent before the
system logs a HW_NEEDS_ATTENTION event.
Specify 0 to deactivate logging
HW_NEEDS_ATTENTION events based on
sysmgr.portDownTime1ToRiseEvent.
sysmgr.portDownTime1ToRiseEvent int 300 The number of seconds that the system waits after
the number of ports specified by
sysmgr.numberOfDownPortToRiseEvent go down
before logging a HW_NEEDS_ATTENTION event.
For example, if
sysmgr.numberOfDownPortToRiseEvent=5 and
sysmgr.portDownTime1ToRiseEvent=300, when at
least 5 ports on the same switch are down for at least
300 seconds the system logs a
HW_NEEDS_ATTENTION event.

Under normal operating conditions, ports sometimes


go down for a few seconds at a time. The delay
introduced by this setting prevents events from being
logged for such transient state changes.

7-20 IBM Netezza System Administrator’s Guide


Table 7-9. Configuration settings for downtime event logging (continued)
Setting Type Default Description
sysmgr.portDownTime2ToRiseEvent int 600 The number of seconds that any one port must be
down before the system logs a
HW_NEEDS_ATTENTION event for that port. This
happens independently of the settings for
sysmgr.numberOfDownPortToRiseEvent and
sysmgr.portDownTime1ToRiseEvent.

Under normal operating conditions, ports sometimes


go down for a few seconds at a time. The delay
introduced by this setting prevents events from being
logged for such transient state changes.

Table 7-10. Configuration settings for backup


Setting Type Default Description
host.bnrNumStreamsDefault int 0 The default number of streams to use for a backup operation
as described in “Multi-stream backup” on page 13-4. A value
of 0 or 1 causes the system to use to one stream per backup
location. The maximum number of streams is 16.

Related reference:
“The nzsystem command” on page A-65
Use the nzsystem command to change the system state, and show and set
configuration information.

Chapter 7. Manage the Netezza server 7-21


7-22 IBM Netezza System Administrator’s Guide
Chapter 8. Event rules
The IBM Netezza event manager monitors the health, status, and activity of the
Netezza system operation and can act when a specific event occurs. Event
monitoring is a proactive way to manage the system without continuous human
observation.

You can configure the event manager to continually watch for specific conditions
such as system state changes, hardware restarts, faults, or failures. In addition, the
event manager can watch for conditions such as reaching a certain percentage of
full disk space, queries that have run for longer than expected, and other Netezza
system behaviors.

This section describes how to administer the Netezza system by using event rules
that you create and manage.

Template event rules


Event management consists of creating rules that define conditions to monitor and
the actions to take when that condition is detected. The event manager uses these
rules to define its monitoring scope, and thus its behavior when a rule is triggered.
Creating event rules can be a complex process because you must define the
condition clearly so that the event manager can detect it, and you must define the
actions to take when the match occurs.

To help ease the process of creating event rules, IBM Netezza supplies template
event rules that you can copy and tailor for your system. The template events
define a set of common conditions to monitor with actions that are based on the
type or effect of the condition. The template event rules are not enabled by default,
and you cannot change or delete the template events. You can copy them as starter
rules for more customized rules in your environment.

As a best practice, you can begin by copying and by using the template rules. If
you are familiar with event management and the operational characteristics of
your Netezza appliance, you can also create your own rules to monitor conditions
that are important to you. You can display the template event rules by using the
nzevent show -template command.

Note: Release 5.0.x introduced new template events for the IBM Netezza 100, IBM
Netezza 1000, and later systems. Previous event template rules specific to the
z-series platform do not apply to the new models and were replaced by similar,
new events.

The following table lists the predefined template event rules.


Table 8-1. Template event rules
Template event rule name Description
Disk80PercentFull Notifies you when a disk is more than 80 percent full.
“Disk space threshold notification” on page 8-22.
Disk90PercentFull Notifies you when a disk is more than 90 percent full.
“Disk space threshold notification” on page 8-22.

© Copyright IBM Corp. 2001, 2015 8-1


Table 8-1. Template event rules (continued)
Template event rule name Description
HardwareNeedsAttention Notifies you when the system detects a condition that
can impact the hardware. For more information, see
“Hardware needs attention” on page 8-20.
HardwareRestarted Notifies you when a hardware component successfully
restarts. For more information, see “Hardware
restarted” on page 8-22.
HardwareServiceRequested Notifies you of the failure of a hardware component,
which most likely requires a service call, hardware
replacement, or both. For more information, see
“Hardware service requested” on page 8-18.
HistCaptureEvent Notifies you if there is a problem that prevents
history-data files from being written to the staging
area.
HistLoadEvent Notifies you if there is a problem that prevents the
external tables that contain history data from being
loaded to the history database.
HwPathDown Notifies you when the status of a disk path changes
from the Up to the Down state (a path has failed). For
more information, see “Hardware path down” on page
8-20.
NPSNoLongerOnline Notifies you when the system goes from the online
state to another state. For more information, see
“System state changes” on page 8-17.
RegenFault Notifies you when the system cannot set up a data slice
regeneration.
RunAwayQuery Notifies you when a query exceeds a timeout limit. For
more information, see “Runaway query notification” on
page 8-24.
SCSIDiskError Notifies you when the System Manager detects that an
active disk has failed, or when an FPGA error occurs.
SCSIPredictiveFailure Notifies you when the SCSI SMART threshold of the
disk is exceeded.
SpuCore Notifies you when the system detects that a SPU
process has restarted and resulted in a core file. For
more information, see “SPU cores event” on page 8-32.
SystemHeatThresholdExceeded When any three boards in an SPA reach the red
temperature threshold, the event runs a command to
shut down the SPAs, SFIs, and RPCs. For more
information, see “System temperature event” on page
8-29. Enabled by default for z-series systems only.
SystemOnline Notifies you when the system is online. For more
information, see “System state changes” on page 8-17.
SystemStuckInState Notifies you when the system is stuck in the Pausing
Now state for more than the timeout specified by the
sysmgr.pausingStateTimeout (420 seconds). For more
information, see “System state changes” on page 8-17.
ThermalFault Notifies you when the temperature of a hardware
component exceeds its operating thresholds. For more
information, see “Hardware temperature event” on
page 8-28.

8-2 IBM Netezza System Administrator’s Guide


Table 8-1. Template event rules (continued)
Template event rule name Description
TopologyImbalance Notifies you when the system detects a disk topology
imbalance after a disk regeneration or when the system
transitions to the online state after a rebalance. For
more information, see “Topology imbalance event” on
page 8-35.
Transaction Limit Event Sends an email notification when the number of
outstanding transaction objects exceeds 90 percent of
the available objects. For more information, see
“Transaction limits event” on page 8-33.
VoltageFault Notifies you when the voltage of a hardware
component exceeds its operating thresholds. For more
information, see “Voltage faults event” on page 8-33.

Netezza might add new event types to monitor conditions on the system. These
event types might not be available as templates, which means you must manually
add a rule to enable them. For a description of more event types that can assist
you with monitoring and managing the system, see “Event types reference” on
page 8-36.

The action to take for an event often depends on the type of event (its effect on the
system operations or performance). The following table lists some of the
predefined template events and their corresponding effects and actions.
Table 8-2. Netezza template event rules
Template name Type Notify Severity Effect Action
Disk80PercentFull hwDiskFull Admins, Moderate Full disk Reclaim space or remove
(Notice) DBAs to Serious prevents some unwanted databases or older
Disk90PercentFull operations. data. For more information, see
“Disk space threshold
notification” on page 8-22.
HardwareNeeds hwNeeds Admins, Moderate Possible Investigate and identify whether
Attention Attention NPS change or issue more assistance is required from
that can start Support. For more information,
to affect see “Hardware needs attention”
performance. on page 8-20.
Hardware hwRestarted Admins, Moderate Any query or Investigate whether the cause is
Restarted (Notice) NPS data load in hardware or software. Check for
progress is lost. SPU cores. For more information,
see “Hardware restarted” on
page 8-22.
HardwareService hwService Admins, Moderate Any query or Contact Netezza. For more
Requested Requested NPS to Serious work in information, see “Hardware
(Warning) progress is lost. service requested” on page 8-18.
Disk failures
initiate a
regeneration.

Chapter 8. Event rules 8-3


Table 8-2. Netezza template event rules (continued)
Template name Type Notify Severity Effect Action
HistCapture histCapture Admins, Moderate The The size of the staging area
Event Event NPS to Serious history-data reaches the configured size
collection threshold, or there is no available
process disk space in /nz/data. Either
(alcapp) is increase the size of the threshold
unable to save or free up disk space by deleting
captured old files.
history data in
the staging
area; alcapp
stops collecting
new data.
HistLoadEvent histLoadEvent Admins, Moderate The The history configuration might
NPS to Serious history-data be changed, the history database
loader process might be deleted, or there might
(alcloader) is be some session connection error.
unable to load
history data
into the history
database; new
history data is
not available in
reports until it
can be loaded.
HwPathDown hwPathDown Admins Serious to Query Contact Netezza Support. For
Critical performance more information, see
and possible “Hardware path down” on page
system 8-20.
downtime.
NPSNoLongerOnline sysState Admins, Varies Availability Depends on the current state. For
Changed NPS, status. more information, see “System
SystemOnline (Information) DBAs state changes” on page 8-17.
RegenFault regenFault Admins, Critical Might prevent Contact Netezza Support. For
NPS user data from more information, see
being “Regeneration errors” on page
regenerated. 8-26.
RunAwayQuery runaway Admins, Moderate Can consume Determine whether to allow you
Query DBAs resources that to run, manage workload. For
(Notice) are needed for more information, see “Runaway
operations. query notification” on page 8-24.
SCSIDiskError scsiDiskError Admins, Serious Adversely Schedule disk replacement as
NPS affects system soon as possible. See “Disk
performance. errors event” on page 8-27.
SCSIPredictiveFailure scsiPredictive Admins, Critical Adversely Schedule disk replacement as
Failure NPS affects soon as possible. See “Disk
performance. predictive failure errors event”
on page 8-25.
SpuCore spuCore Admins, Moderate A SPU core file The system created a SPU core
NPS was created. file. See “SPU cores event” on
page 8-32.

8-4 IBM Netezza System Administrator’s Guide


Table 8-2. Netezza template event rules (continued)
Template name Type Notify Severity Effect Action
SystemHeatThreshold sysHeatThres Admins, Critical System Before you power on the system,
Exceeded hold NPS shutdown. check the SPA that caused this
event to occur. For more
information, see “System
temperature event” on page 8-29.
SystemStuckInState systemStuck Admins, Moderate A system is Contact Netezza Support. See
InState NPS stuck in the “System state” on page 8-25.
(Information) Pausing Now
state.
ThermalFault hwThermal Admins, Serious Can drastically Contact Netezza Support. For
Fault NPS reduce disk life more information, see
expectancy if “Hardware temperature event”
ignored. on page 8-28.
TopologyImbalance topology Admins, Serious Can impact Contact Netezza Support. For
Imbalance NPS query more information, see “Topology
performance imbalance event” on page 8-35.
until the
imbalance is
resolved.
TrasactionLimit transaction Admins, Serious New Stop some existing sessions that
Event LimitEvent NPS transactions are might be old and require
blocked if the cleanup, or stop/start the
limit is Netezza server to close all
reached. existing transactions.
VoltageFault hwVoltage Admins, Serious Might indicate For more information, see
Fault NPS power supply “Voltage faults event” on page
issues. 8-33.

Event rule management


To start using events, you must create and enable some event rules. You can use
any of the following methods to create and activate event rules:
v Copy and enable a template event rule
v Add an event rule

You can copy, modify, and add events by using the nzevent command or the
NzAdmin interface. You can also generate events to test the conditions and event
notifications that you are configuring. The following sections describe how to
manage events by using the nzevent command. The NzAdmin interface has an
intuitive interface for managing events, including a wizard tool for creating events.
For information about accessing the NzAdmin interface, see “NzAdmin overview”
on page 3-12.

Copy a template event to create an event rule


You can use the nzevent copy command to copy a predefined template for
activation.

The following example copies a template event named NPSNoLongerOnline to


create a user-defined rule of the same name, adds a sample email address for
contact, and activates the rule:

Chapter 8. Event rules 8-5


nzevent copy -u admin -pw password -useTemplate -name
NPSNoLongerOnline -newName NPSNoLongerOnline -on yes -dst
[email protected]

When you copy a template event rule, which is disabled by default, your new rule
is likewise disabled by default. You must enable it by using the -on yes argument.
In addition, if the template rule sends email notifications, you must specify a
destination email address.

Copy and modify a user-defined event rule


You can copy, modify, and rename an existing user-defined rule by using the
nzevent copy command.

The following example copies, renames, and modifies an existing event rule:
nzevent copy -u admin -pw password -name NPSNoLongerOnline -newName
MyModNPSNoLongerOnline -on yes -dst [email protected] -ccDst
[email protected] -callhome yes

When you copy an existing user-defined event rule, your new rule is enabled
automatically if the existing rule is enabled. If the existing rule is disabled, your
new rule is disabled by default. You must enable it by using the -on yes argument.
You must specify a unique name for your new rule; it cannot match the name of
the existing user-defined rule.

Generate an event
You can use the nzevent generate command to trigger an event for the event
manager. If the event matches a current event rule, the system takes the action that
is defined by the event rule.

You might generate events for the following cases:


v To simulate a system event to test an event rule.
v To add new events because the system is not generating events for conditions
for which you would like notification.

If the event that you want to generate has a restriction, specify the arguments that
would trigger the restriction by using the -eventArgs option. For example, if a
runaway query event has a restriction that the duration of the query must be
greater than 30 seconds, use a command similar to the following to ensure that a
generated event is triggered:
nzevent generate -eventtype runawayquery -eventArgs ’duration=50’

In this example, the duration meets the event criteria (greater than 30) and the
event is triggered. If you do not specify a value for a restriction argument in the
-eventArgs string, the command uses default values for the arguments. In this
example, duration has a default of 0, so the event would not be triggered since it
did not meet the event criteria.

To generate an event for a system state change:


nzevent generate -eventType sysStateChanged
-eventArgs ’previousState=online, currentState=paused’

Delete an event rule


You can delete event rules that you create. You cannot delete the template events.

8-6 IBM Netezza System Administrator’s Guide


To delete an event rule, enter: nzevent delete -u admin -pw password -name
<rule_name>.

Disable an event rule


To disable an event rule, enter: nzevent modify -u admin -pw password -name
<rule_name> -on no.

Add an event rule


You can use the nzevent add command to add an event rule. You can also use the
NzAdmin tool to add event rules by using a wizard for creating events.

Adding an event rule consists of two tasks: specifying the event match criteria and
specifying the notification method. These tasks are described in more detail after
the examples.

Note: Although the z-series events are not templates on IBM Netezza 1000 or
N1001 systems, you can add them by using nzevent if you have the syntax that is
documented in the previous releases. However, these events are not supported on
IBM Netezza 1000 or later systems.

To add an event rule that sends an email when the system transitions from the
online state to any other state, enter:
nzevent add -name TheSystemGoingOnline -u admin -pw password
-on yes -eventType sysStateChanged -eventArgsExpr ’$previousState
== online && $currentState != online’ -notifyType email -dst
[email protected] -msg ’NPS system $HOST went from $previousState to
$currentState at $eventTimestamp.’ -bodyText
’$notifyMsg\n\nEvent:\n$eventDetail\nEvent
Rule:\n$eventRuleDetail’

Note: If you are creating event rules on a Windows client system, use double
quotation marks instead of single quotation marks to specify strings.
Related concepts:
“Callhome file” on page 5-19

Event match criteria


The IBM Netezza event manager uses the match criterion portion of the event rule
to determine which events generate a notification and which ones the system
merely logs. A match occurs if the event type is the same and the optional event
args expression evaluates to true. If you do not specify an expression, the event
manager uses only the event type to determine a match.

The event manager generates notifications for all rules that match the criteria, not
just for the first event rule that matches. The following table lists the event types
that you can specify and the arguments and the values that are passed with the
event. You can list the defined event types by using the nzevent listEventTypes
command. Used only on z-series systems such as the 10000-series, 8000z-series, and
5200-series systems.
Table 8-3. Event types
Event type Tag name Possible values
sysStateChanged previousState, currentState, <any system state>, <Event
eventSource Source>

Chapter 8. Event rules 8-7


Table 8-3. Event types (continued)
Event type Tag name Possible values
hwFailed Used only on z-series systems such as the 10000-series,
8000z-series, and 5200-series systems.
hwRestarted hwType, hwId, spaId, v spu, <SPU HW ID>, <SPA
spaSlot, devSerial, ID>, <SPA Slot>,
devHwRev, devFwRev <SPU/SFI Serial>,
<Hardware Revision>,
<Firmware Revision>
v sfi, <SFI HW ID>, <SPA
ID>, <SPA Slot>,
<SPU/SFI Serial>,
<Hardware Revision>,
<Firmware Revision>
v fan, <FAN HW ID>, <SPA
ID>, <SPA Slot>
v pwr, <POWER SUPPLY
HW ID>, <SPA ID>, <SPA
Slot>
hwDiskFull hwType, hwId, spaId, spu, <SPU HW ID>, <Spa
spaSlot, diskHwId, location, Id>,<SPA Slot>, <Disk HW
partition, threshold, value ID>, <Location String>

<partition #>, <threshold>,


<value>
For more information, see “Disk space threshold
notification” on page 8-22
runawayQuery sessionId, planId, duration <Session Id>, <Plan Id>,
<seconds>
For more information, see “Runaway query notification” on
page 8-24.
custom1 or custom2 User-specified rule. Use with the nzevent generate
command.
For more information, see “Creating a custom event rule”
on page 8-16.
smartThreshold Used only on z-series systems such as the 10000-series,
8000z-series, and 5200-series systems.
regenError Used only on z-series systems such as the 10000-series,
8000z-series, and 5200-series systems.
diskError Used only on z-series systems such as the 10000-series,
8000z-series, and 5200-series systems.
hwHeatThreshold Used only on z-series systems such as the 10000-series,
8000z-series, and 5200-series systems.
sysHeatThreshold errType, errCode, errString <Err Type>, <Err Code>,
<Err String
For more information, see “System temperature event” on
page 8-29.
fwMismatch

8-8 IBM Netezza System Administrator’s Guide


Table 8-3. Event types (continued)
Event type Tag name Possible values
systemStuckInState duration, currentState, <seconds>, <any system
expectedState state>, <any system state>
For more information, see “System state changes” on page
8-17.
histCaptureEvent configName, histType, <Config Name>,
storageLimit, <query/audit>, <Storage
loadMinThreshold, Limit>, <Min Threshold>,
loadMaxThreshold, <Max Threshold>, <Disk Full
diskFullThreshold, Threshold>, <Load Interval>,
loadInterval, nps, database, <Target NPS>, <Target DB
capturedSize, stagedSize, Name>, <Captured Size
storageSize, dirName, MB>, <Staged Size MB>,
errCode, errString <Storage(total) Size MB>,
<Dir Name>, <Err Code>,
<Err String>
histLoadEvent configName, histType, <Config Name>,
storageLimit, <query/audit>, <Storage
loadMinThreshold, Limit>, <Min Threshold>,
loadMaxThreshold, <Max Threshold>, <Disk Full
diskFullThreshold, Threshold>, <Load Interval>,
loadInterval, nps, database, <Target NPS>, <Target DB
batchSize, stagedSize, Name>, <Batch Size MB>,
dirName, errCode, errString <Staged Size MB>, <Dir
Name>, <Err Code>, <Err
String>
hwVoltageFault hwType, hwId, label, v SPU, <SPU HW ID>,
location, curVolt, errString, <Label String>, <Location
eventSource String>, <Current
Voltage>, <Err String>,
<Event Source>
v Disk Enclosure, <Encl HW
ID>, <Label String>,
<Location String>,
<Current Voltage>, <Err
String>, <Event Source>
spuCore hwId, location, errString, <HW ID>, <Location
eventSource String>, <Err String>, <Event
Source>
regenFault hwIdSpu, hwIdSrc, <SPU HW ID>, <Source Disk
locationSrc, hwIdTgt, HW ID>, <Source Disk
locationTgt, errString, Location>, <Target Disk HW
devSerial, eventSource ID>, <Target Disk
Location>,<Error String>,
<SPU Serial>, <Event
Source>

Chapter 8. Event rules 8-9


Table 8-3. Event types (continued)
Event type Tag name Possible values
hwServiceRequested hwType, hwId, location, v spu, <SPU HW ID>,
errString, devSerial, <Location String>, <Error
eventSource String>, <SPU Serial>,
<Event Source>
v disk, <Disk HW ID>,
<Location String>, <Error
String>, <Disk Serial>,
<Event Source>
v disk enclosure power
supply, <PS HW ID>,
<Location String>, <Error
String>, <Unknown>,
<Event Source>
v disk enclosure fan, <Fan
HW ID>, <Location
String>, <Error String>,
<Unknown>, <Event
Source>
v AMM, <AMM HW ID>,
<Location String>, <Error
String>, <Unknown>,
<Event Source>
v chassis power supply, <PS
HW ID>, <Location
String>, <Error String>,
<Unknown>,<Event
Source>
v chassis fan, <FAN HW
ID>, <Location String>,
<Error String>,
<Unknown>, <Event
Source>
scsiDiskError spuHwId, diskHwId, <SPU HW ID>, <Location
location, errType, errCode, String>, <Err Type>, <Err
oper, dataPartition, lba, Code>, <Oper>, <Data
tableId, dataSliceId, Partition>, <LBA>, <Table>,
devSerial, fpgaBoardSerial, <DataSlice>, <Block>, <SPU
diskSerial, diskModel, Serial>, <FPGA Board
diskMfg, eventSource Serial>, <Disk Serial>, <Disk
Model>, <Disk
Manufacturer>, <Event
Source>
scsiPredictiveFailure spuHwId, diskHwId, <SPU HW ID>, <Disk HW
scsiAsc, scsiAscq, fru, ID>, <SCSI ASC>, <SCSI
location, devSerial, ASCQ>,<Fru>, <Location
diskSerial, diskModel, String>, <SPU Serial>, <Disk
diskMfg, eventSource Serial>, <Disk Model>, <Disk
Manufacturer>, <Event
Source>

8-10 IBM Netezza System Administrator’s Guide


Table 8-3. Event types (continued)
Event type Tag name Possible values
hwThermalFault hwType, hwId, label, spu, <SPU HW ID>, <Label
location, devSerial, errString, String>, <Location String>,
curVal, eventSource <SPU Serial>, <Error String>,
<Current Value>, <Event
Source>

Disk Enclosure, <Encl HW


ID>, <Label String>,
<Location String>, <Error
String>, <Current Value>,
<Event Source>
For more information, see “Hardware temperature event”
on page 8-28.
transactionLimitEvent CurNumTX <Current Number of
Transactions>
nwIfChanged hwType, hwId, location, <HW TYPE>, <HW ID>,
interfaceName, prevState, <Location String>, <interface
currState name>, <previous state>,
<current state>
numCpuCoreChanged hwId, location, <SPU HW ID>, <Location
initialNumCore, String>, <Number of CPU
lastNumCore, curNumCore cores of SPU on
initialization>, <Last Number
of Online CPU cores of
SPU>, <Current Number of
Online CPU Cores of SPU>
hwNeedsAttention hwType, hwId, location, spu, <SPU HW ID>,
errString, devSerial, <Location String>, <Error
eventSource String>, <SPU Serial>,
<Event Source>
hwPathDown hwType, hwId, location, spu, <SPU HW ID>,
errString, devSerial, <Location String>, <Error
eventSource String>, <SPU Serial>,
<Event Source>
topologyImbalance errString <Error String>

Event rule attributes


An event consists of three attributes: an event type, a timestamp, and a list of
event-specific arguments that are represented as a list of tag=value pairs. By using
text substitution in the form of $tag, you can create an expression to match a
specific event instance rather than all events of a specific type.

For example, to receive an email when the system is not online, it is not enough to
create an event rule for a sysStateChanged event. Because the sysStateChange
event recognizes every state transition, you can be notified whenever the state
changes at all, such as from online to paused.

You can add an event args expression to further qualify the event for notification.
If you specify an expression, the system substitutes the event arguments into the
expression before evaluating it. The system uses the result combined with the
event type to determine a match. So, to send an email message when the system is
no longer online, you would use the expression: $previousState == online &&

Chapter 8. Event rules 8-11


$currentState!=online. The system gets the value of previousState and
currentState from the actual argument values of a sysStateChanged event.

You can specify an event by using equality expressions, wildcard expressions,


compound AND expressions, or OR expressions. The following table describes
these expressions.
Table 8-4. Event argument expression syntax
Expression Syntax Example
EqualityExpr <string> == <string> ‘$hwType == spu’

<string> != <string> ‘$hwType != spu’


WildcardExpr <string> ~ <string> '$errString ~ *spu*'

<string> !~ <string> '$errString !~ *ascq*'


AndExpr EqualityExpr ‘&&’ EqualityExpr '$previousState == online &&
$currentState != online’
OrExpr EqualityExpr ‘||’ EqualityExpr '$threshold == 80 || $threshold
== 85’

Event notifications
When an event occurs, you can have the system send an email or run an external
command. Email can be aggregated whereas commands cannot.
v To specify an email, you must specify a notification type (-notifyType email), a
destination (-dst), a message (-msg), and optionally, a body text (-bodyText), and
the callhome file (-callHome).
You can specify multiple email addresses that are separated by a comma and no
space. For example,
[email protected],[email protected],[email protected]
v To specify that you want to run a command, you must specify a notification
type (-notifyType runCmd), a destination (-dst), a message (-msg), and
optionally, a body text (-bodyText), and the callhome file (-callHome).

When you are defining notification fields that are strings (-dst, -ccDst, -msg,
-bodyText), you can use $tag syntax to substitute known system or event values.
Table 8-5 on page 8-13 lists the system-defined tags that are available.
Related concepts:
“Event email aggregation” on page 8-14

The sendMail.cfg file


If you send email, you must modify the sendMail.cfg file. It contains the name of
the mail server and its port, the sender name and address, and a CC field for a list
of email names that are automatically appended to the ccDst field defined in the
event rule.

The sendmail.cfg file also contains options that you can use to specify a user
name and password for authentication on the mail server. You can find a copy of
this file in the /nz/data/config directory on the IBM Netezza host.

8-12 IBM Netezza System Administrator’s Guide


Table 8-5. Notification substitution tags
Source Tag Description
Event eventType One of the event types (for example,
sysStateChanged).
eventTimestamp The data and time the event occurred (for
example 17-Jun-02, 14:35:33 EDT).
eventArgs The event arguments (for example, hwType = spu,
hwId =1002).
eventDetail Shorthand for the eventType, eventArgs, and
eventTimestamp.
Event rule eventType One of the event types (for example, hwDiskFull).
eventArgsExpr The event argument match expression (for
example, hwType == spu).
notifyType The type of notification, email, or runCmd.
notifyDst The notification destination (from -dst) (for
example, [email protected]).
notifyCcDst The cc notification destination (from -ccDst) (for
example, [email protected]).
notifyMsg The notification message (from -msg).
notifyCallHome A boolean that indicates whether callhome was
requested (from -callHome).
notifyCallHomeFile The callhome file name.
eventRuleDetail Shorthand for tags eventArgsExpr through
notifyCallHomeFile.
eventAggrCount The aggregate count of events for notification
(email only)
Environment NZ_HOST The host environment variable.
NZ_DIR The nz directory.
NZ_BIN_DIR The nz bin directory.
NZ_DATA_DIR The nz data directory.
NZ_KIT_DIR The nz kit directory.
NZ_LOG_DIR The nz log directory.
NZ_SBIN_DIR The nz sbin directory.
NZ_SYS_DIR The nz system directory.
NZ_TMP_DIR The nz temp directory.

If you specify the email or runCmd arguments, you must enter the destination and
the subject header. You can use all the following arguments with either command,
except the -ccDst argument, which you cannot use with the runCmd. The
following table lists the syntax of the message.
Table 8-6. Notification syntax
Argument Description Example
-dst Your email address -dst '[email protected],[email protected]'

You can specify multiple recipients.

Chapter 8. Event rules 8-13


Table 8-6. Notification syntax (continued)
Argument Description Example
-msg The subject field of the -msg ‘NPS system $HOST went from
email $previousState to $currentState at
$eventTimestamp.’

This message substitutes the host name for


$HOST, the previous system state for
$previousState, the current system state for
$currentState, and the date and time the event
occurred for $eventTimeStamp.
-bodyText Optional body of the email -bodyText '$notifyMsg\n\nEvent:\
n$eventDetail\nEvent Rule:\
n$eventRuleDetail'

This message substitutes the text in the -msg


argument for the $notifyMsg, outputs a
newline and the word ‘Event’ then the
contents of the eventType through eventArgs,
newline, and the word ‘Event Rule’ and then
the contents of eventArgsExpr through
notifyCallHomeFile.
-ccDst Optional cc specification -ccDst
'[email protected],[email protected]'

You can specify multiple recipients.


-callHome Optional file -callHome yes

Event email aggregation


Some events, such as notification of power recycling, host-to-switch network
connectivity failures, and other events, can generate many email messages. To
avoid filling your inbox with email, you can aggregate your event rule notifications
by defining a system-wide aggregation time interval and a per-event-rule
notification count.

If you set email aggregation and events-per-rule reach the threshold value for the
event rule or the time interval expires, the system aggregates the events and sends
a single email per event rule.

Note: You specify aggregation only for event rules that send email, not for event
rules that run commands.
Related concepts:
“Event notifications” on page 8-12
Related reference:
“Hardware restarted” on page 8-22
If you enable the event rule HardwareRestarted, you receive notifications when
each SPU successfully restarts (after the initial startup). Restarts are usually related
to a software fault, whereas hardware causes can include uncorrectable memory
faults or a failed disk driver interaction.
“Disk space threshold notification” on page 8-22

8-14 IBM Netezza System Administrator’s Guide


Setting the system-wide aggregation time interval
About this task

You can enable event aggregation system-wide and specify the time interval. You
can specify 0 - 86400 seconds. If you specify 0 seconds, there is no aggregation,
even if aggregation is specified on individual events.

To set system-wide aggregation, complete the following steps:

Procedure
1. Pause the system using the command nzsystem pause -u bob -pw 1234 -host
nzhost
2. Specify aggregation of 2 minutes (120 seconds), enter nzsystem set -arg
sysmgr.maxAggregateEventInterval=120
3. Resume the system, enter nzsystem resume -u bob -pw 1234 -host nzhost
4. Display the aggregation setting, enter nzsystem showRegistry | grep
maxAggregateEventInterval

Specify event rule email aggregation


When you add or enable (modify) an event that specifies email notification, you
can enable event aggregation by specifying the aggregation count.

To aggregate email messages for the event rule NPSNoLongerOnline, enter:


nzevent modify -u admin -pw password -name NPSNoLongerOnline -on
yes -dst [email protected] -eventAggrCount 1

You can specify any aggregation count 1 - 1000.


v If you issue the nzstop command, the system sends no in-memory aggregations,
instead it updates the event log. In such cases, check the event log, especially if
the aggregation interval is 15 minutes or longer.
v If you modify or delete an event rule, the system flushes all events that are
aggregated for the event rule.

Disable individual event rule email aggregation


If event rule aggregation is enabled system wide, you can disable event rule
aggregation for individual event rules by setting the count to 0.

To disable email messages for the event rule NPSNoLongerOnline, enter:


nzevent modify -u admin -pw password -name NPSNoLongerOnline -on
yes -dst [email protected] -eventAggrCount 1

Sample aggregated email messages


The aggregated email describes the number of messages that are aggregated and
the time interval for the event rule.

The body of the message lists the messages by time, with the earliest events first.
The Reporting Interval indicates whether the notification trigger was the count or
time interval. The Activity Duration indicates the time interval between the first
and last event so that you can determine the granularity of the events.

For example, the following aggregation is for the Memory ECC event:
Subject: NPS nzdev1 : 2 occurrences of Memory ECC Error from 11-Jun-07
18:41:59 PDT over 2 minutes.

Date: Sun, 11 Jun 2007 18:44:05 PDT

Chapter 8. Event rules 8-15


From: NPS Event Manager <[email protected]>
Reply-To: NPS Event Manager <[email protected]>

To: Jane Doe


Message Header
Host : nzdev1.
Event : Memory ECC Error.
Event Rule Detail .
Start : 06-11-07 18:41:59.
Reporting Interval : 2 minutes.
Activity Duration : 00:00:05.
Number of events : 2.
Message Details
1 hwType=spu, hwId=1002, spaId=1, spaSlot=4, errType=2,
errCode=12,devSerial=040908061230, devHwRev=5.20814r1.20806r1,
devFwRev=3.0B1 BLD[4428], eventSource=System initiated, Memory ECC
Error on 06-11-07 18:42:05 PDT
2 hwType=spu, hwId=1002, spaId=1, spaSlot=4, errType=2, errCode=12,
devSerial=040908061230, devHwRev=5.20814r1.20806r1, devFwRev=3.0B1
BLD[4428], eventSource=System initiated, Memory ECC Error on 06-11-07
18:42:10 PDT

Creating a custom event rule


About this task

You can use the Custom1 and Custom2 event rules to define and generate events
of your own design for conditions that are not already defined as events by the
NPS software. An example of a custom event might be to track the user login
information, but these events can also be used to construct complex events.

If you define a custom event, you must also define a process to trigger the event
using the nzevent generate command. Typically, these events are generated by a
customer-created script which is invoked in response to either existing NPS events
or other conditions that you want to monitor.

To create a custom event rule, complete the following steps:

Procedure
1. Use the nzevent add command to define a new event type. Custom events are
never based on any existing event types. This example creates three different
custom events. nNewRule4 and NewRule5 use the variable eventType to
distinguish between the event types. The NewRule6 event type uses a custom
variable and compares it with the standard event type.
[nz@nzhost ~]$ nzevent add -eventType custom1 -name NewRule4
-notifyType email -dst [email protected] -msg "NewRule4 message"
-eventArgsExpr ’$eventType==RandomCustomEvent’

[nz@nzhost ~]$ nzevent add -eventType custom1 -name NewRule5


-notifyType email -dst [email protected] -msg "NewRule5 message"
-eventArgsExpr ’$eventType==sysStateChanged’

[nz@nzhost ~]$ nzevent add -eventType custom1 -name NewRule6


-notifyType email -dst [email protected] -msg "NewRule6 message"
-eventArgsExpr ’$randomEventType==sysStateChanged’
2. Use the nzevent generate command to trigger the custom events.
[nz@nzhost ~]$ nzevent generate -eventtype custom1
-eventArgs ’eventType=RandomCustomEvent’
[nz@nzhost ~]$ nzevent generate -eventtype custom1
-eventArgs ’eventType=sysStateChanged’
[nz@nzhost ~]$ nzevent generate -eventtype custom1
-eventArgs ’randomEventType=sysStateChanged’

8-16 IBM Netezza System Administrator’s Guide


3. After generating these custom events, email is sent to the specified destination
and the following messages are logged in the /nz/kit/log/eventmgr/
eventmgr.log file:
2015-11-24 09:43:31.820612 EST (16210) Info: received & processing event
type =custom1, event args = ’eventType=RandomCustomEvent’ event source =
’User initiated event’
2015-11-24 09:43:31.820724 EST (16210) Info: invoking mail notifier,
cmd = ’/nz/kit.7.2.1.0.45837/sbin/sendMail -dst "[email protected]"
-msg "NewRule4 message"’
2015-11-24 09:43:31.838814 EST (16210) Info: received & processing event
type =custom1, event args = ’eventType=sysStateChanged’ event source =
’User initiated event’
2015-11-24 09:43:31.838920 EST (16210) Info: invoking mail notifier,
cmd = ’/nz/kit.7.2.1.0.45837/sbin/sendMail -dst "[email protected]"
-msg "NewRule5 message"’
2015-11-24 09:43:32.745636 EST (16210) Info: received & processing event
type =custom1, event args = ’randomEventType=sysStateChanged’ event
source = ’User initiated event’
2015-11-24 09:43:32.745802 EST (16210) Info: invoking mail notifier,
cmd = ’/nz/kit.7.2.1.0.45837/sbin/sendMail -dst "[email protected]"
-msg "NewRule6 message"’

What to do next

Consider creating a script that runs the nzevent generate command as needed
when your custom events occur.

Template event reference


The following sections describe the predefined template event rules in more detail.

System state changes


The NPSNoLongerOnline and SystemOnline rules enable the system to notify you
when the system state changes, or when a state change exceeds a specified
timeout.

These events occur when the system is running. The typical states are
v Online
v Pausing Now
v Going Pre-Online
v Resuming
v Going OffLine Now
v Offline (now)
v Initializing
v Stopped
The Failing Back and Synchronizing states apply only to z-series systems.

The following is the syntax for the template event rule NPSNoLongerOnline:
-name NPSNoLongerOnline -on no -eventType sysStateChanged
-eventArgsExpr ’$previousState == online && $currentState != online’
-notifyType email -dst ’[email protected]’ -ccDst ’’ -msg ’NPS system
$HOST went from $previousState to $currentState at $eventTimestamp
$eventSource.’ -bodyText ’$notifyMsg\n\nEvent:\n$eventDetail\n’
-callHome yes -eventAggrCount 1

The following is the syntax for event rule SystemOnline:

Chapter 8. Event rules 8-17


-name SystemOnline -on no -eventType sysStateChanged -eventArgsExpr
’$previousState != online && $currentState == online’ -notifyType
email -dst ’[email protected]’ -ccDst ’’ -msg ’NPS system $HOST went
online at $eventTimestamp $eventSource.’ -bodyText
’$notifyMsg\n\nEvent:\n$eventDetail\n’ -callHome yes -eventAggrCount
50

The valid values for the previousState and currentState arguments are:
initializing pausedNow syncingNow
initialized preOnline syncedNow
offlining preOnlining failingBack
offliningNow resuming failedBack
offline restrictedResuming maintaining
offlineNow stopping maintain
online stoppingNow recovering
restrictedOnline stopped recovered
pausing stoppedNow down
pausingNow syncing unreachable
paused synced badState

For more information about states, see Table 5-4 on page 5-10.

The following table describes the state changes.


Table 8-7. System state changes
Previous state Current state Severity Notify Impact Action
Online Not Online Varies Admins, NPS, System no longer Determine
DBAs processing queries cause
Not Online Online n/a Admins, NPS, Normal None
DBAs
Not Synchronizing Synchronizing n/a Admins, NPS Query processing is None
suspended until
complete
Synchronizing Not Synchronizing n/a Admins, NPS Query processing is Contact IBM
resumed when Online Netezza

Hardware service requested


It is important to be notified when a hardware component fails so that Support can
notify service technicians that can replace or repair the component. For devices
such as disks, a hardware failure causes the system to bring a spare disk online,
and after an activation period, the spare disk transparently replaces the failed disk.
However, it is important to replace the failed disk with a healthy disk so that you
restore the system to its normal operation with its complement of spares.

In other cases, such as SPU failures, the system reroutes the work of the failed SPU
to the other available SPUs. The system performance is affected because the
healthy resources take on extra workload. Again, it is critical to obtain service to
replace the faulty component and restore the system to its normal performance.

If you enable the event rule HardwareServiceRequested, the system generates a


notification when there is a hardware failure and service technicians might be
required to replace or repair components.

The following is the syntax for the event rule HardwareServiceRequested:

8-18 IBM Netezza System Administrator’s Guide


-name ’HardwareServiceRequested’ -on no -eventType hwServiceRequested
-eventArgsExpr ’’ -notifyType email -dst ’[email protected]’ -ccDst ’’
-msg ’NPS system $HOST - Service requested for $hwType $hwId at
$eventTimestamp $eventSource.’ -bodyText
’$notifyMsg\n\nlocation:$location\nerror
string:$errString\ndevSerial:$devSerial\nevent source:$eventSource\n’
-callHome yes -eventAggrCount 0

The following table lists the arguments to the HardwareServiceRequested event


rule.
Table 8-8. HardwareServiceRequested event rule
Arguments Description Example
hwType The type of hardware affected spu, disk, pwr, fan, mm
hwId The hardware ID of the component 1013
that reports a problem
location A string that describes the physical
location of the component
errString Specifies more information about the
error or condition that triggered the
event. If the failed component is not
inventoried, it is specified in this
string.
devSerial Specifies the serial number of the 601S496A2012
component, or Unknown if the
component has no serial number.

Restriction: Do not aggregate this event.

For source disks used in a disk regeneration to a spare disk, the


HardwareServiceRequested event also notifies you when regeneration encounters a
read sector error on the source disk. The event helps you to identify when a
regeneration requires some attention to address possible issues on the source and
newly created mirror disks. The error messages in the event notification and in the
sysmgr.log and eventmgr.log files contain information about the bad sector, as in
the following example:
2012-04-05 19:52:41.637742 EDT Info: received & processing event type
= hwServiceRequested, event args = ’hwType=disk, hwId=1073,
location=Logical Name:’spa1.diskEncl2.disk1’ Logical Location:’1st
rack, 2nd disk enclosure, disk in Row 1/Column 1’, errString=disk md:
md2 sector: 2051 partition type: DATA table: 201328,
devSerial=9QJ2FMKN00009838VVR9...

The errString value contains more information about the sector that had a read
error:
v The md value specifies the RAID device on the SPU that encountered the issue.
v The sector value specifies which sector in the device has the read error.
v The partition type specifies whether the partition is a user data (DATA) or
SYSTEM partition.
v The table value specifies the table ID of the user table that is affected by the bad
sector.

If the system notifies you of a read sector error, contact IBM Netezza Support for
assistance with troubleshooting and resolving the problems.

Chapter 8. Event rules 8-19


Hardware needs attention
The system monitors the overall health and status of the hardware and can notify
you when changes occur that can affect the system availability or performance.
These changes can include replacement disks with invalid firmware, storage
configuration changes, unavailable/unreachable components, disks that reach a
grown defects early warning threshold, ethernet switch ports that are down, and
other conditions that can be early warnings of problems that can affect system
behavior or the ability to manage devices within the system.

If you enable the HwNeedsAttention event rule, the system generates a notification
when it detects conditions that can lead to problems or that serve as symptoms of
possible hardware failure or performance impacts.

The following is the syntax for the HwNeedsAttention event rule:


-name ’HardwareNeedsAttention’ -on no -eventType hwNeedsAttention
-eventArgsExpr ’’ -notifyType email -dst ’[email protected]’ -ccDst ’’
-msg ’NPS system $HOST - $hwType $hwId Needs attention. $eventSource.’
-bodyText ’$notifyMsg\n\nlocation:$location\nerror
string:$errString\ndevSerial:$devSerial\nevent source:$eventSource\n’
-callHome yes -eventAggrCount 0

The following table lists the arguments to the HardwareNeedsAttention event rule.
Table 8-9. HardwareNeedsAttention event rule
Arguments Description Example
hwType The type of hardware affected spu
hwId The hardware ID of the component 1013
that has a condition to investigate
location A string that describes the physical
location of the component
errString If the failed component is not
inventoried, it is specified in this
string.
devSerial Specifies the serial number of the 601S496A2012
component, or Unknown if the
component has no serial number.

Restriction: Do not aggregate this event.

Hardware path down


If the path between an S-Blade/SPU and its disks fails, and you enable the
HwPathDown event rule, the system generates a notification when it detects that a
storage path has transitioned from the Up to the Down state. Failed paths
adversely affect system and query performance.

The following is the syntax for the HwPathDown event rule:


-name ’HwPathDown’ -on no -eventType hwPathDown -eventArgsExpr ’’
-notifyType email -dst ’[email protected]’ -ccDst ’’
-msg ’NPS system $HOST - $hwType $hwId - Hardware Path Down.
$eventSource.’ -bodyText ’$notifyMsg\n\nlocation:$location\nerror
string:$errString\ndevSerial:$devSerial\nevent source:$eventSource\n’
-callHome yes -eventAggrCount 1000

8-20 IBM Netezza System Administrator’s Guide


Note: The aggregation count of 1000 is large because some kinds of storage
failures can cause hundreds of path failures on large, multi-rack systems. The
aggregation count reduces the number of email notifications for those cases. All
path failures in the last 2 minutes are grouped into a single notification email.

The following table lists the arguments to the HardwarePathDown event rule.
Table 8-10. HardwarePathDown event rule
Arguments Description Example
hwType For a path down event, the SPU that SPU
reported the problem
hwId The hardware ID of the SPU that 1013
loses path connections to disks
location A string that describes the physical First Rack, First SPA, SPU in third
location of the SPU slot
errString If the failed component is not Disk path event:Spu[1st Rack, 1st
inventoried, it is specified in this SPA, SPU in 5th slot] to Disk [disk
string. hwid=1034
sn="9WK4WX9D00009150ECWM"
SPA=1 Parent=1014 Position=12
Address=0x8e92728
ParentEnclPosition=1 Spu=1013]
(es=encl1Slot12 dev=sdl major=8
minor=176 status=DOWN)

If you are notified of hardware path down events, contact IBM Netezza Support
and alert them to the path failure or failures. It is important to identify and resolve
the issues that are causing path failures to return the system to optimal
performance as soon as possible.

A sample email follows:


Event:
Message Header
Host:nzhost.
Event:Hardware Path Down.
Event Rule Detail:.
Start: 11-08-11 11:10:41 EST.
Reporting Interval: 2 minutes.
Activity Duration:00:00:01.
Number of events:12.

Message Details

1 hwType=SPU, hwId=1017, location=1st Rack, 1st SPA, SPU in 9th slot,


devSerial=Y011UF0CJ23G, eventSource=system, errString=Disk path
event:Spu\[1st Rack, 1st SPA, SPU in 9th slot\] to
Disk\[sn=9QJ60E9M000090170SXW hwid=1027 eshp=NA es=encl4Slot01 dev=sda
Major=8 Minor=0 status=DOWN]

If you receive a path down event, you can obtain more information about the
problems. This information might be helpful when you contact Netezza Support.

To see whether there are current topology issues, use the nzds show -topology
command. The command displays the current topology, and if there are issues, a
WARNING section at the end of the output.
Related concepts:
“System resource balance recovery” on page 5-17

Chapter 8. Event rules 8-21


“Active path topology” on page 5-29
Related reference:
“Start the system” on page 7-6

Hardware restarted
If you enable the event rule HardwareRestarted, you receive notifications when
each SPU successfully restarts (after the initial startup). Restarts are usually related
to a software fault, whereas hardware causes can include uncorrectable memory
faults or a failed disk driver interaction.

The following is the syntax for the event rule HardwareRestarted:


-name HardwareRestarted -on no -eventType hwRestarted -eventArgsExpr
’’ -notifyType email -dst ’[email protected]’ -ccDst ’’ -msg ’NPS system
$HOST - $hwType $hwId restarted at $eventTimestamp.’ -bodyText
’$notifyMsg\n\nSPA ID: $spaId\nSPA Slot: $spaSlot\n’ -callHome yes
-eventAggrCount 50

You can modify the event rule to specify that the system include the device serial
number, its hardware revision, and firmware revision as part of the message,
subject, or both.

The following table describes the arguments to the HardwareRestarted event rule.
Table 8-11. HardwareRestarted event rule
Arguments Description Example
hwType The type of hardware affected spu
hwId The hardware ID of the regen source 1013
SPU having the problem
spaId The ID of the SPA A number 1 - 32
spaSlot The SPA slot number Usually a slot number from 1
to 13
devSerial The serial number of the SPU 601S496A2012
devHwRev The hardware revision 7.21496rA2.21091rB1
devFwRev The firmware revision 1.36

Related concepts:
“Event email aggregation” on page 8-14

Disk space threshold notification


You can use the hwDiskFull event type (defined in the default event rules
Disk80PercentFull and Disk90PercentFull) to receive notification when any one of
the system’s SPUs’ disk space becomes more than 80-85, or 90-95 percent full.

The following is the syntax for the event rule Disk80PercentFull:


-name Disk80PercentFull -on no -eventType hwDiskFull -eventArgsExpr
’$threshold == 80 || $threshold == 85’ -notifyType email -dst
[email protected]’ -ccDst ’’ -msg ’NPS system $HOST - $hwType $hwId
$partition partition is $value % full at $eventTimestamp.’ -bodyText
’$notifyMsg\n\nSPA ID: $spaId\nSPA Slot: $spaSlot\nThreshold:
$threshold\nValue: $value\n’ -callHome yes -eventAggrCount 50

The following is the syntax for the event rule Disk90PercentFull:

8-22 IBM Netezza System Administrator’s Guide


-name Disk90PercentFull -on no -eventType hwDiskFull -eventArgsExpr
’$threshold == 90 || $threshold == 95’ -notifyType email -dst ’<your
email here>’ -ccDst ’’ -msg ’URGENT: NPS system $HOST - $hwType $hwId
$partition partition is $value % full at $eventTimestamp.’ -bodyText
’$notifyMsg\n\nSPA ID: $spaId\nSPA Slot: $spaSlot\nThreshold:
$threshold\nValue: $value\n’ -callHome yes -eventAggrCount 50

The following table lists the arguments to the DiskSpace event rules.
Table 8-12. DiskSpace event rules
Arguments Description Example
hwType The type of hardware affected spu, disk
hwId The hardware ID of the disk that has 1013
the disk space issue
spaId The ID of the SPA
spaSlot The SPA slot number
partition The data slice number 0,1,2,3
threshold The threshold value 75, 80, 85, 90, 95
value The actual percentage full value 84

After you enable the event rule, the event manager sends you an email when the
system disk space percentage exceeds the first threshold and is below the next
threshold value. The event manager sends only one event per sampled value.

For example, if you enable the event rule Disk80PercentFull, which specifies
thresholds 80 and 85 percent, the event manager sends you an email when the disk
space is at least 80, but less than 85 percent full. When you receive the email, your
actual disk space might be 84 percent full.

The event manager maintains thresholds for the values 75, 80, 85, 90, and 95. Each
of these values (except for 75) can be in the following states:
Armed
The system has not reached this value.
Disarmed
The system has exceeded this value.
Fired The system has reached this value.
Rearmed
The system has fallen below this value.

Note: If you enable an event rule after the system reached a threshold, you are not
notified that it reached this threshold until you restart the system.

The following table lists these thresholds and their states.


Table 8-13. Threshold and states
Threshold Armed Triggered Disarmed Rearmed
75 never never never never
80 startup >= 80 && < 85 >= 80 < 75
85 startup >= 85 && < 90 >= 85 < 80
90 startup >= 90 && < 95 >= 90 < 85

Chapter 8. Event rules 8-23


Table 8-13. Threshold and states (continued)
Threshold Armed Triggered Disarmed Rearmed
95 startup >= 95 >= 95 < 90

After the IBM Netezza System Manager sends an event for a particular threshold,
it disarms all thresholds at or below that value. (So if 90 is triggered, it does not
trigger again until it is rearmed). The Netezza System Manager rearms all
disarmed higher thresholds when the disk space percentage full value falls below
the previous threshold, which can occur when you delete tables or databases. The
Netezza System Manager arms all thresholds (except 75) when the system starts
up.

Tip: To ensure maximum coverage, enable both event rules Disk80PercentFull and
Disk90PercentFull.

To send an email when the disk is more than 80 percent full, enable the predefined
event rule Disk80PercentFull:
nzevent modify -u admin -pw password -name Disk80PercentFull -on
yes -dst [email protected]

If you receive a diskFull notification from one or two disks, your data might be
unevenly distributed across the data slices (data skew). Data skew can adversely
affect performance for the tables that are involved and for combined workloads.

Tip: Consider aggregating the email messages for this event. Set the aggregation
count to the number of SPUs.
Related concepts:
“Data skew” on page 12-10
“Event email aggregation” on page 8-14

Runaway query notification


You can use the RunAwayQuery event type to monitor queries that exceed
configured query timeout limits.

The runaway query timeout is a limit that you can specify system-wide (for all
users), or for specific groups or users. The default query timeout is unlimited for
users and groups, but you can establish query timeout limits by using a system
default setting, or when you create or alter users or groups. The runaway query
timeout limit does not apply to the admin database user.

The following is the syntax for the event rule RunAwayQuery:


-name ’RunAwayQuery’ -on no -eventType runawayQuery -eventArgsExpr ’’
-notifyType email -dst ’[email protected]’ -ccDst ’’ -msg ’NPS system
$HOST - long-running query detected at $eventTimestamp.’ -bodyText
’$notifyMsg\n\nsessionId: $sessionId\nplanId: $planId\nduration:
$duration seconds’ -callHome yes -eventAggrCount 0

The following table lists the arguments to the RunAwayQuery event rule. The
arguments are case-sensitive.

8-24 IBM Netezza System Administrator’s Guide


Table 8-14. RunAwayQuery event rule
Arguments Description Examples
sessionId The ID of the runaway session Use these arguments for the email
message.
planId The ID of the plan
duration The amount of time (in seconds) that
the query was running when it
exceeded its timeout.

Note: Typically you do not aggregate this event because you should consider the
performance impact of each individual runaway query.

When you specify the duration argument in the -eventArgsExpr string, you can
specify an operator such as: ‘==’, ‘!=’, ‘>’, ‘>=’, ‘<’, or ‘<=’ to specify when to send
the event notification. Use the greater-than (or less-than) versions of the operators
to ensure that the expression triggers with a match. For example, to ensure that a
notification event is triggered when the duration of a query exceeds 100 seconds,
specify the -eventArgsExpr as follows:
-eventArgsExpr ’$duration > 100’

If a query exceeds its timeout threshold and you added a runaway query rule, the
system sends you an email that informs you how long the query ran. For example:
NPS system alpha - long-running query detected at 07-Nov-03, 15:43:49
EST.
sessionId: 10056
planId: 27
duration: 105 seconds
Related concepts:
“Query timeout limits” on page 11-37
You can place a limit on the amount of time a query is allowed to run before the
system notifies you by using the runaway query event. The event email shows
how long the query has been running, and you can decide whether to terminate
the query.

System state
You can also monitor for events when a system is “stuck” in the Pausing Now
state. The following is the syntax for event rule SystemStuckInState:
-name ’SystemStuckInState’ -on no -eventType systemStuckInState
-eventArgsExpr ’’ -notifyType email -dst ’<your email here>’ -ccDst ’’
-msg ’NPS system $HOST - System Stuck in state $currentState for
$duration seconds’ -bodyText ’The system is stuck in state change.
Contact Netezza support team\nduration: $duration seconds\nCurrent
State: $currentState\nExpected State: $expectedState’ -callHome yes
-eventAggrCount 0

It is important to monitor the transition to or from the Online state because that
transition affects system availability.

Disk predictive failure errors event


The hard disks where your user databases reside record certain performance and
reliability data as they perform I/O. This status is referred to as Self-Monitoring
Analysis and Reporting Technology (SMART) status. You can use the event
manager to notify you when certain threshold values are crossed in the recorded
performance or reliability data.

Chapter 8. Event rules 8-25


Exceeding these thresholds might indicate that the disk has started to run poorly
(that is, it reads or writes data more slowly than it should) and affects the speed at
which queries are processed. It might even indicate that the disk might fail soon.

IBM Netezza sets the thresholds that are based on analysis of disk drives and their
performance characteristics. If you receive any of these events, contact Netezza
Support and have them determine the state of your disk. Do not aggregate these
events. The templates do not aggregate these events by default.

The following is the syntax for the event rule SCSIPredictiveFailure event:
-name ’SCSIPredictiveFailure’ -on no -eventType scsiPredictiveFailure
-eventArgsExpr ’’ -notifyType email -dst ’[email protected]’ -ccDst ’’
-msg ’NPS system $HOST - SCSI Predictive Failure value exceeded for
disk $diskHwId at $eventTimestamp’ -bodyText
’$notifyMsg\n\nspuHwId:$spuHwId\ndisk
location:$location\nscsiAsc:$scsiAsc\nscsiAscq:$scsiAscq\nfru:$fru\nde
vSerial:$devSerial\ndiskSerial:$diskSerial\ndiskModel:$diskModel\ndisk
Mfg:$diskMfg\nevent source:$eventSource\n’ -callHome no
-eventAggrCount 0

The following table lists the output from the SCSIPredictiveFailure event rule.
Table 8-15. SCSIPredictiveFailure event rule
Arguments Description Example
spuHwId The hardware ID of the SPU that owns or
manages the disk that reported the event
diskHwId The hardware ID of the disk 1013
scsiAsc The attribute sense code, which is an Vendor specific
identifier of the SMART attribute
scsiAscq The attribute sense code qualifier of the Vendor specific
SMART attribute
fru The FRU ID for the disk
location The location of the disk
devSerial The serial number of the SPU to which 601S496A2012
the disk is assigned
diskSerial The disk serial number 7.21496rA2.21091rB1
diskModel The disk model number
diskMfg The disk manufacturer

Restriction: Do not aggregate this event.

Regeneration errors
If the system encounters hardware problems while it attempts to set up or perform
a regeneration, the system triggers a RegenFault event rule.

The following is the syntax for the event rule RegenFault:


-name ’RegenFault’ -on no -eventType regenFault -eventArgsExpr ’’
-notifyType email -dst ’<your email here>’ -ccDst ’’ -msg ’NPS system
$HOST - regen fault on SPU $hwIdSpu.’ -bodyText
’$notifyMsg\n\nhwIdSrc:$hwIdSrc\nsource
location:$locationSrc\nhwIdTgt:$hwIdTgt\ntarget
location:$locationTgt\ndevSerial:$devSerial\nerror
string:$errString\nevent source:$eventSource\n’ -callHome no
-eventAggrCount 0

8-26 IBM Netezza System Administrator’s Guide


The event rule RegenFault is enabled by default.

The following table lists the output from the event rule RegenFault.
Table 8-16. RegenFault event rule
Arguments Description Examples
hwIdSpu The hardware ID of the SPU that owns or 1013
manages the problem disk
hwIdSrc The hardware ID of the source disk
locationSrc The location string of the source disk
hwIdTgt The hardware ID of the target spare disk
locationTgt The location string of the target disk
errString The error string for the regeneration issue
devSerial The serial number of the owning or reporting
SPU

Restriction: Do not aggregate this event.

Disk errors event


When the disk driver detects an error, it notifies the system. If a serious error
occurs, the system fails over the disk. You can also configure the event manager to
notify you with email when the disk is failed over.

Note: If you receive a significant number of disk error messages, contact IBM
Netezza Support to investigate the state of your disks.

If you enable the event rule SCSIDiskError, the system sends you an email message
when it fails a disk.

The following is the syntax for the event rule SCSIDiskError:


-name ’SCSIDiskError’ -on no -eventType scsiDiskError -eventArgsExpr
’’ -notifyType email -dst ’<your email here>’ -ccDst ’’ -msg ’NPS
system $HOST - disk error on disk $diskHwId.’ -bodyText
’$notifyMsg\nspuHwId:$spuHwId\ndisk location:$location\nerrType:
$errType\nerrCode:$errCode\noper:$oper\ndataPartition:$dataPartition\n
lba:$lba\ndataSliceId:$dataSliceId\ntableId:$tableId\nblock:$block\nde
vSerial:$devSerial\nfpgaBoardSerial:$fpgaBoardSerial\ndiskSerial:$disk
Serial\ndiskModel:$diskModel\ndiskMfg:$diskMfg\nevent
source:$eventSource\n’ -callHome no -eventAggrCount 0

The following table lists the output from the SCSIDiskError event rule.
Table 8-17. SCSIDiskError event rule
Argument Description Examples
spuHwId The hardware ID of the SPU that owns or manages
the disk or FPGA
diskHwId The hardware ID of the disk where the error 1013
occurred
location The location string for the disk
errType The type of error, that is, whether the error is the 1 (Failure), 2 (Failure imminent) 3 (Failure
type failure, failure possible, or failure imminent possible), 4 (Failure unknown)
errCode The error code that specifies the cause of the error 110

Chapter 8. Event rules 8-27


Table 8-17. SCSIDiskError event rule (continued)
Argument Description Examples
oper The operation in progress when the disk driver 0 (read), 1 (write)
encountered the error; the possible values are read
or write
dataPartition The data partition number on which the error 1
occurred
Iba The logical block address where the error occurred 145214014
tableId The table ID where the error occurred 200350
dataSliceId The data slice ID where the error occurred 3
block The table-relative block number where the error 9
occurred
devSerial The serial number of the SPU that owns the disk or
FPGA
fpgaBoardSerial The serial number of the Netezza DB Accelerator
card where the FPGA resides
diskSerial The disk serial number 7.21496rA2.21091rB1
diskModel The disk model number WesternDigital
diskMfg The disk manufacturer

Hardware temperature event


The event manager monitors the hardware temperature of key components within
the system to maintain reliability and prevent failures because of overheating. The
system monitors the actual temperatures from SPUs and disk enclosures. If the
internal temperature rises above specified operational levels, the system sends the
hwThermalFault event through the event rule ThermalFault. Some hardware
components and models do not report thermal settings. For example, the S-blades
(SPUs) on IBM PureData System for Analytics N200x systems do not report
thermal data.

Running a system at elevated temperatures can adversely affect disk life


expectancy. If you receive a hardware temperature event, do the following:
v Physically investigate the machine room.
v Verify that the ambient temperate is within acceptable limits.
v Check that the airflow to and from the Netezza system is not occluded.
v Verify that there are no signs of combustion.
v Check that the cooling components (fans, blowers, or both) are functioning
properly.
v Check the temperature event emails for specific details.

In some cases, you might need to replace components such as cooling units (fans,
blowers, or both), or perhaps a SPU.

The following is the syntax for event rule ThermalFault:


-name ’ThermalFault’ -on no -eventType hwThermalFault -eventArgsExpr
’’ -notifyType email -dst ’[email protected]’ -ccDst ’’ -msg ’NPS system
$HOST -$hwType $hwId Hardware Thermal Fault at $eventTimestamp’

8-28 IBM Netezza System Administrator’s Guide


-bodyText
’$notifyMsg\n\nlabel:$label\nlocation:$location\ncurVal:$curVal\nerror
string:$errString\nevent source:$eventSource\n’ -callHome no
-eventAggrCount 0

The following table lists the output from the ThermalFault event rule.
Table 8-18. ThermalFault event rule
Argument Description Examples
hwType The hardware type where the error occurred SPU* or disk enclosure
hwId The hardware ID of the component where the 1013
fault occurred
label The label for the temperature sensor. For the
IBM Netezza Database Accelerator card, this
label is the BIE temperature. For a disk
enclosure, it is temp-1-1 for the first
temperature sensor on the first enclosure.
location A string that describes the physical location of
the component
curVal The current temperature reading for the
hardware component
errString The error message The board temperature
for the SPU exceeded 45
degrees centigrade.

The default behavior is to send email notification.

System temperature event


For an Netezza z-series (also called Mustang) system, the system temperature
event can notify you when three boards (SPUs or SFIs) in an SPA reach the red
threshold.

The following is the syntax for event rule Syste