Netezza System Admin Guide
Netezza System Admin Guide
Release 7.2.1.1
IBM
Note
Before using this information and the product it supports, read the information in “Notices” on page D-1
Contents v
Criteria for selecting distribution keys . . .. 12-7 The nzbackup command . . . . . . . .. 13-11
Choose a distribution key for a subset table 12-7 Command syntax for nzbackup. . . . .. 13-12
Distribution keys and collocated joins . . .. 12-8 Specifying backup privileges . . . . .. 13-15
Dynamic redistribution or broadcasts . . .. 12-8 Examples of the nzbackup command . . .. 13-15
Verify distribution . . . . . . . . .. 12-8 Backup archive directory . . . . . . .. 13-17
Data skew . . . . . . . . . . . . .. 12-10 Incremental backups . . . . . . . .. 13-18
Specify distribution keys . . . . . . .. 12-10 Backup History report . . . . . . . .. 13-20
View data skew . . . . . . . . . .. 12-11 Back up and restore users, groups, and
Clustered base tables . . . . . . . . .. 12-12 permissions . . . . . . . . . . .. 13-21
Organizing keys and zone maps . . . .. 12-13 The nzrestore command . . . . . . . .. 13-22
Select organizing keys . . . . . . . .. 12-14 The nzrestore command syntax . . . . .. 13-23
Reorganize the table data . . . . . . .. 12-14 Specifying restore privileges . . . . . .. 13-28
Copy clustered base tables . . . . . .. 12-15 Examples of the nzrestore command . . .. 13-29
Database statistics . . . . . . . . . .. 12-15 Database statistics after restore . . . . .. 13-30
Maintain table statistics automatically . .. 12-16 Restore tables. . . . . . . . . . .. 13-30
GENERATE STATISTICS command . . .. 12-17 Incremental restoration . . . . . . .. 13-31
Just in Time statistics . . . . . . . .. 12-17 Veritas NetBackup connector . . . . . .. 13-34
Zone maps . . . . . . . . . . .. 12-18 Installing the Veritas NetBackup license . .. 13-34
Groom tables . . . . . . . . . . . .. 12-20 Configuring NetBackup for a Netezza client 13-35
GROOM and the nzreclaim command . .. 12-20 Integrate Veritas NetBackup to Netezza . .. 13-36
Identify clustered base tables that require NetBackup troubleshooting . . . . . .. 13-40
grooming . . . . . . . . . . . .. 12-21 Procedures for backing up and restoring by
Organization percentage . . . . . . .. 12-22 using Veritas NetBackup . . . . . . .. 13-40
Groom and backup synchronization . . .. 12-23 IBM Spectrum Protect (formerly Tivoli Storage
Session management . . . . . . . . .. 12-23 Manager) connector . . . . . . . . .. 13-42
The nzsession command . . . . . . .. 12-23 Tivoli Storage Manager backup integration 13-43
Transactions . . . . . . . . . . . .. 12-25 Tivoli Storage Manager encrypted backup
Transaction control and monitoring . . .. 12-25 support. . . . . . . . . . . . .. 13-43
Transactions per system . . . . . . .. 12-25 Configuring the Netezza host . . . . .. 13-43
Transaction concurrency and isolation . .. 12-26 Configure the Tivoli Storage Manager server 13-48
Concurrent transaction serialization and Special considerations for large databases 13-54
queueing, implicit transactions . . . . .. 12-26 The nzbackup and nzrestore commands with
Concurrent transaction serialization and the Tivoli Storage Manager connector. . .. 13-57
queueing, explicit transactions . . . . .. 12-27 Host backup and restore to the Tivoli Storage
Netezza optimizer and query plans . . . .. 12-28 Manager server . . . . . . . . . .. 13-57
Execution plans . . . . . . . . . .. 12-28 Backing up and restoring data by using the
Display plan types . . . . . . . . .. 12-28 Tivoli Storage Manager interfaces . . . .. 13-58
Analyze query performance . . . . . .. 12-29 Troubleshooting . . . . . . . . . .. 13-60
Query status and history . . . . . . . .. 12-30 EMC NetWorker connector . . . . . . .. 13-61
Preparing your system for EMC NetWorker
Chapter 13. Database backup and integration. . . . . . . . . . . .. 13-62
restore . . . . . . . . . . . . .. 13-1 NetWorker installation. . . . . . . .. 13-62
NetWorker configuration . . . . . . .. 13-62
General information about backup and restore
NetWorker backup and restore . . . . .. 13-64
methods . . . . . . . . . . . . . .. 13-1
Host backup and restore . . . . . . .. 13-66
Backup options overview . . . . . . .. 13-2
NetWorker troubleshooting . . . . . . .. 13-67
Database completeness . . . . . . . .. 13-3
Portability . . . . . . . . . . . .. 13-3
Compression in backups and restores . . .. 13-4 Chapter 14. History data collection 14-1
Multi-stream backup. . . . . . . . .. 13-4 Types of history databases . . . . . . . .. 14-1
Multi-stream restore . . . . . . . . .. 13-5 History database versions . . . . . . . .. 14-2
Special columns . . . . . . . . . .. 13-6 History-data staging and loading processes . .. 14-2
Upgrade and downgrade concerns . . . .. 13-6 History-data files . . . . . . . . . .. 14-3
Compressed unload and reload . . . . .. 13-7 History log files . . . . . . . . . .. 14-4
Encryption key management in backup and History event notifications . . . . . . .. 14-4
restore . . . . . . . . . . . . .. 13-7 Setting up the system to collect history data . .. 14-4
File system connector for backup and recovery 13-7 Planning for history-data collection . . . .. 14-4
Third-party backup and recovery solutions Creating a history database . . . . . .. 14-5
support . . . . . . . . . . . . .. 13-8 Creating history configurations . . . . .. 14-5
Host backup and restore . . . . . . . .. 13-9 Managing access to a history database . . .. 14-9
Create a host backup . . . . . . . .. 13-10 Managing the collection of history data . . .. 14-9
Restore the host data directory and catalog 13-10 Changing the owner of a history database .. 14-9
Contents vii
The nzreclaim command . . . . . . . .. A-45 Host name and IP address changes . . . .. B-4
The nzrestore command . . . . . . . .. A-47 Rebooting the system . . . . . . . . .. B-4
The nzrev command . . . . . . . . .. A-47 Reformat the host disks . . . . . . . .. B-5
The nzsession command . . . . . . . .. A-49 Fix system errors . . . . . . . . . .. B-5
The nzspupart command . . . . . . . .. A-54 View system processes . . . . . . . .. B-5
The nzstart command . . . . . . . . .. A-56 Stop errant processes . . . . . . . . .. B-5
The nzstate command . . . . . . . . .. A-58 Change the system time . . . . . . . .. B-6
The nzstats command . . . . . . . . .. A-60 Determine the kernel release level . . . .. B-6
The nzstop command . . . . . . . . .. A-63 Linux system administration . . . . . . .. B-6
The nzsystem command . . . . . . . .. A-65 Display directories. . . . . . . . . .. B-7
The nzzonemapformat command . . . . .. A-68 Find files . . . . . . . . . . . . .. B-7
Customer service troubleshooting commands A-69 Display file content . . . . . . . . .. B-7
The nzconvertsyscase command. . . . .. A-70 Find Netezza hardware . . . . . . . .. B-7
The nzdumpschema command . . . . .. A-71 Time command execution . . . . . . .. B-8
The nzinitsystem command . . . . . .. A-73 Set default command line editing . . . . .. B-8
The nzlogmerge command . . . . . .. A-73 Miscellaneous commands . . . . . . .. B-8
This equipment was tested and found to comply with the limits for a Class A
digital device, according to Part 15 of the FCC Rules. These limits are designed to
provide reasonable protection against harmful interference when the equipment is
operated in a commercial environment. This equipment generates, uses, and can
radiate radio frequency energy and, if not installed and used in accordance with
the instruction manual, might cause harmful interference to radio communications.
Operation of this equipment in a residential area is likely to cause harmful
interference, in which case the user is required to correct the interference at their
own expense.
Properly shielded and grounded cables and connectors must be used to meet FCC
emission limits. IBM® is not responsible for any radio or television interference
caused by using other than recommended cables and connectors or by
unauthorized changes or modifications to this equipment. Unauthorized changes
or modifications might void the authority of the user to operate the equipment.
This device complies with Part 15 of the FCC Rules. Operation is subject to the
following two conditions: (1) this device might not cause harmful interference, and
(2) this device must accept any interference received, including interference that
might cause undesired operation.
Responsible manufacturer:
Dieses Gerät ist berechtigt, in Übereinstimmung mit dem Deutschen EMVG das
EG-Konformitätszeichen - CE - zu führen.
Verantwortlich für die Einhaltung der EMV Vorschriften ist der Hersteller:
IBM Deutschland
Technical Regulations, Department M456
IBM-Allee 1, 71137 Ehningen, Germany
Telephone: +49 7032 15-2937
Email: [email protected]
This product is a Class A product based on the standard of the Voluntary Control
Council for Interference (VCCI). If this equipment is used in a domestic
environment, radio interference might occur, in which case the user might be
required to take corrective actions.
This is electromagnetic wave compatibility equipment for business (Type A). Sellers
and users need to pay attention to it. This is for any areas other than home.
Install the NPS® system in a restricted-access location. Ensure that only those
people trained to operate or service the equipment have physical access to it.
Install each AC power outlet near the NPS rack that plugs into it, and keep it
freely accessible.
The IBM PureData® System for Analytics appliance requires a readily accessible
power cutoff. This can be a Unit Emergency Power Off Switch (UEPO), a circuit
breaker or completely remove power from the equipment by disconnecting the
Appliance Coupler (line cord) from all rack PDUs.
CAUTION:
Disconnecting power from the appliance without first stopping the NPS
software and high availability processes might result in data loss and increased
service time to restart the appliance. For all non-emergency situations, follow the
documented power-down procedures in the IBM Netezza System Administrator’s
Guide to ensure that the software and databases are stopped correctly, in order, to
avoid data loss or file corruption.
High leakage current. Earth connection essential before connecting supply. Courant
de fuite élevé. Raccordement à la terre indispensable avant le raccordement au
réseau.
Homologation Statement
This product may not be certified in your country for connection by any means
whatsoever to interfaces of public telecommunications networks. Further
certification may be required by law prior to making any such connection. Contact
an IBM representative or reseller for any questions.
These topics are written for system administrators and database administrators. In
some customer environments, these roles can be the responsibility of one person or
several administrators.
You should be familiar with Netezza concepts and user interfaces, as described in
the IBM Netezza Getting Started Tips. Be comfortable with using command-line
interfaces, Linux operating system utilities, windows-based administration
interfaces, and installing software on client systems to access the Netezza
appliance.
Administrator’s roles
IBM Netezza administration tasks typically fall into two categories:
System administration
Managing the hardware, configuration settings, system status, access, disk
space, usage, upgrades, and other tasks
Database administration
Managing the user databases and their content, loading data, backing up
data, restoring data, controlling access to data and permissions
In some customer environments, one person can be both the system and database
administrator to do the tasks when needed. In other environments, multiple people
might share these responsibilities, or they might own specific tasks or
responsibilities. You can develop the administrative model that works best for your
environment.
In addition to the administrator roles, there are also database user roles. A database
user is someone who has access to one or more databases and has permission to
run queries on the data that is stored within those databases. In general, database
users have access permissions to one or more user databases, or to one or more
schemas within databases, and they have permission to do certain types of tasks
and to create or manage certain types of objects within those databases.
Administration tasks
The administration tasks generally fall into these categories:
v Service level planning
v Deploying and installing Netezza clients
v Managing a Netezza system
v Managing system notifications and events
v Managing Netezza users and groups
v Managing databases
v Loading data (described in the IBM Netezza Data Loading Guide)
v Backing up and restoring databases
v Collecting and evaluating history data
v Workload management
Netezza Support and Sales representatives work with you to install and initially
configure the Netezza in your customer environment. Typically, the initial rollout
consists of installing the system in your data center, and then configuring the
system host name and IP address to connect the system to your network and make
it accessible to users. They also work with you to do initial studies of the system
usage and query performance, and might advocate other configuration settings or
administration ideas to improve the performance of and access to the Netezza for
your users.
Related concepts:
“Linux users and groups required for HA” on page 4-17
The /nz directory is the top-level directory that contains the Netezza software
installation kits, data, and important information for the system and database. As a
best practice, use caution when you are viewing files in this directory or its
subfolders because unintended changes can impact the operation of the Netezza
system or cause data loss. Never delete or modify files or folders in the /nz
directory unless directed to do so by Netezza Support or an IBM representative.
Do not store large files, unrelated files, or backups in the /nz directory.
The system manager monitors the size of the /nz directory. If the /nz directory
reaches a configured usage percentage, the system manager stops the Netezza
software and logs a message in the sysmgr.log file. The default threshold is 95%,
which is specified by the value of the
sysmgr.hostFileSystemUsageThresholdToStopSystem registry setting. Do not
change the value of the registry setting unless directed to do so by Netezza
Support.
A sample sysmgr.log file message for a case where the /nz directory has reached
the configured 95% capacity threshold follows.
Error: File system /nz usage exceeded 95 threshold on rack1.host1 System will
be stopped
If the Netezza software stops and this message is in the sysmgr.log file, contact
Netezza Support for assistance to carefully review the contents of the /nz directory
and to delete appropriate files. When the /nz directory usage falls below the
configured threshold, you can start the Netezza software.
CAUTION:
If you need to change the host name or IP address information, do not use the
general Linux procedures to change this information. Contact Netezza Support
for assistance to ensure that the changes are using Netezza procedures to ensure
that the changes are propagated to the high availability configuration and
related services.
To change the DNS settings for your system, use the nzresolv service to manage
the DNS updates. The nzresolv service updates the resolv.conf information
stored on the Netezza host; for highly available Netezza systems, the nzresolv
service updates the information stored on both hosts. (You can log in to either host
to do the DNS updates.) You must be able to log in as the root user to update the
resolv.conf information; any Linux user such as nz can display the DNS
information by using the show option.
The Netezza system manages the DNS services as needed during actions such as
host failovers from the master host to the standby host. Never manually restart the
nzresolv service unless directed to do so by Netezza Support for troubleshooting.
A restart can cause loss of contact with the localhost DNS service, and
communication issues between the host and the system hardware components. Do
not use any of the nzresolv subcommands other than update, status, or show
unless directed to do so by Netezza Support.
To display the current DNS information for the system, do the following steps:
Procedure
1. Log in to the active host as a Linux user such as nz.
2. Enter the following command:
[nz@nzhost1 ~]$ service nzresolv show
Example
You update the DNS information by using the nzresolv service. You can change
the DNS information by using a text editor, and read the DNS information from a
file or enter it on the command line. Any changes that you make take effect
immediately (and on both hosts, for HA systems). The DNS server uses the
changes for the subsequent DNS lookup requests.
Procedure
1. Log in to either host as root.
2. Enter the following command:
[root@nzhost1 ~]# service nzresolv update
Note: If you use the service command to edit the DNS information, you must
use vi as the text editor tool, as shown in these examples. However, if you
prefer to use a different text editor, you can set the $EDITOR environment
variable and use the /etc/init.d/nzresolve update command to edit the files
using by your editor of choice.
3. Review the system DNS information as shown in the sample file.
CAUTION:
Use caution before you change the DNS information; incorrect changes can
affect the operation of the IBM Netezza system. Review any changes with
the DNS administrator at your site to ensure that the changes are correct.
To change the DNS information by reading the information from an existing text
file, do the following steps:
Procedure
1. Log in to either host as root.
2. Create a text file with your DNS information. Make your text file similar to the
following format:
search yourcompany.com
nameserver 1.2.3.4
nameserver 1.2.5.6
3. Enter the following command, where file is the fully qualified path name to
the text file:
[root@nzhost1 ~]# service nzresolv update file
To change the DNS information by entering the information from the command
prompt, do the following steps:
Procedure
1. Log in to either host as root.
2. Enter the following command (note the dash character at the end of the
command):
[root@nzhost1 ~]# service nzresolv update -
The command prompt proceeds to a new line where you can enter the DNS
information. Enter the complete DNS information because the text that you
type replaces the existing information in the resolv.conf file.
3. After you finish typing the DNS information, type one of the following
commands:
v Control-D to save the information that you entered and exit the editor.
v Control-C to exit without saving any changes.
To display the current status of the Netezza nzresolv service, do the following
steps:
Procedure
1. Log in to the active host as a Linux user such as nz.
2. Enter the following command:
[nz@nzhost1 ~]$ service nzresolv status
Example
If you log in to the standby host of the Netezza system and run the command, the
status message is Configured for upstream resolv.conf.
Remote access
IBM Netezza systems are typically installed in a data center, which is often highly
secured from user access and sometimes in a geographically separate location.
Thus, you might need to set up remote access to Netezza so that your users can
connect to the system through the corporate network. Common ways to remotely
log on to another system through a shell (Telnet, rlogin, or rsh) do not encrypt data
that is sent over the connection between the client and the server. Consequently,
the type of remote access you choose depends upon the security considerations at
your site. Telnet is the least secure and SSH (Secure Shell) is the most secure.
If you allow remote access through Telnet, rlogin, or rsh, you can more easily
manage this access through the xinetd daemon (Extended Internet Services). The
xinetd daemon starts programs that provide Internet services. This daemon uses a
configuration file, /etc/xinetd.conf, to specify services to start. Use this file to
enable or disable remote access services according to the policy at your site.
If you use SSH, it does not use xinetd, but rather its own configuration files. For
more information, see the Red Hat documentation.
Administration interfaces
IBM Netezza offers several ways or interfaces that you can use to perform the
various system and database management tasks:
v Netezza commands (nz* commands) are installed in the /nz/kit/bin directory
on the Netezza host. For many of the nz* commands, you must be able to log on
to the Netezza system to access and run those commands. In most cases, users
log in as the default nz user account, but you can create other Linux user
accounts on your system. Some commands require you to specify a database
user account, password, and database to ensure that you have permissions to do
the task.
v The Netezza CLI client kits package a subset of the nz* commands that can be
run from Windows and UNIX client systems. The client commands might also
The nz* commands are installed and available on the Netezza system, but it is
more common for users to install Netezza client applications on client
workstations. Netezza supports various Windows and UNIX client operating
systems. Chapter 2, “Netezza client software installation,” on page 2-1 describes
the Netezza clients and how to install them. Chapter 3, “Netezza administration
interfaces,” on page 3-1 describes how to get started by using the administration
interfaces.
The client interfaces provide you with different ways to do similar tasks. While
most users tend to use the nz* commands or SQL commands for tasks, you can
use any combination of the client interfaces, depending on the task or your
workstation environment, or interface preferences.
Related concepts:
Chapter 2, “Netezza client software installation,” on page 2-1
This section describes how to install the Netezza CLI clients and the NzAdmin
tool.
There are several Netezza documents that offer more specialized information about
features or tasks. For more information, see IBM Netezza Getting Started Tips.
In most cases, the only applications that IBM Netezza administrators or users must
install are the client applications to access the Netezza system. Netezza provides
client software that runs on various systems such as Windows, Linux, Solaris,
AIX®, and HP-UX systems.
The instructions to install and use the Netezza Performance Portal are in the IBM
Netezza Performance Portal User's Guide, which is available with the software kit for
that interface.
This section does not describe how to install the Netezza system software or how
to upgrade the Netezza host software. Typically, Netezza Support works with you
for any situations that might require software reinstallations, and the steps to
upgrade a Netezza system are described in the IBM Netezza Software Upgrade Guide.
If your users or their business reporting applications access the Netezza system
through ODBC, JDBC, or OLE-DB Provider APIs, see the IBM Netezza ODBC,
JDBC, OLE DB, and .NET Installation and Configuration Guide for detailed
instructions on the installation and setup of these data connectivity clients.
Related concepts:
“Administration interfaces” on page 1-8
The following table lists the supported operating systems and revisions for the
Netezza CLI clients.
Table 2-1. Netezza supported platforms
Operating system 32-bit 64-bit
Windows
Windows 2008, Vista, 7, 8 Intel / AMD Intel / AMD
Windows Server 2012, 2012 R2 N/A Intel / AMD
Linux
Red Hat Enterprise Linux 5.2, 5.3, 5.5, 5.9; and 6 Intel / AMD Intel / AMD
through 6.5
Red Hat Enterprise Linux 6.2+ N/A PowerPC®
Red Hat Enterprise Linux 7.1 N/A POWER8® LE mode
The Netezza client kits are designed to run on the proprietary hardware
architecture for the vendor. For example, the AIX, HP-UX, and Solaris clients are
intended for the proprietary RISC architecture. The Linux client is intended for
RedHat or SUSE on the 32-bit Intel architecture.
Note: Typically, the Netezza clients also support the update releases for each of the
OS versions listed in the table, unless the OS vendor introduced architecture
changes in the update.
If you are installing the clients on 64-bit operating systems, there are some
additional steps to install a second, 64-bit client package. The IBM Netezza clients
are 32-bit operating system executables and they require 32-bit libraries that are not
provided with the clients. If the libraries are not already installed on your system,
you must obtain and install the libraries using your operating system update
process.
Procedure
1. Obtain the nz-platformclient-version.archive) client package from the IBM
Fix Central site and download it to the client system. Use or create a new,
empty directory to reduce any confusion with other files or directories. There
are several client packages available for different common operating system
types, as described in “Client software packages” on page 2-1. Make sure that
Note: On an HP-UX 11i client, /bin/sh might not be available. You can use the
command form sh ./unpack to unpack the client.
The unpack command checks the client system to ensure that it supports the
CLI package and prompts you for an installation location. The default is
/usr/local/nz for Linux, but you can install the CLI tools to any location on
the client. The program prompts you to create the directory if it does not
already exist. Sample command output follows:
------------------------------------------------------------------
IBM Netezza -- NPS Linux Client 7.1
(C) Copyright IBM Corp. 2002, 2013 All Rights Reserved.
------------------------------------------------------------------
Validating package checksum ... ok
Where should the NPS Linux Client be unpacked? [/usr/local/nz]
Directory ’/usr/local/nz’ does not exist; create it (y/n)? [y] Enter
0% 25% 50% 75% 100%
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Unpacking complete.
5. If your client has a 64-bit operating system, change to the linux64 directory
and run the unpack command to install the additional 64-bit files: ./unpack.
The unpack command prompts you for an installation location. The default is
/usr/local/nz for Linux, but you should use the same location that you used
for the 32-bit CLI files in the previous step. Sample command output follows:
------------------------------------------------------------------
IBM Netezza -- NPS Linux Client 7.1
(C) Copyright IBM Corp. 2002, 2013 All Rights Reserved.
------------------------------------------------------------------
Validating package checksum ... ok
Where should the NPS Linux Client be unpacked? [/usr/local/nz]
Installing in an existing directory. Changing permissions to
overwrite existing files...
0% 25% 50% 75% 100%
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Unpacking complete.
Results
The client installation steps are complete, and the Netezza CLI commands are
installed to your specified destination directory. The NPS commands are located in
the bin directory where you unpacked the NPS clients. If you are using a 64-bit
operating system on your workstation, note that there is a 64-bit nzodbcsql
command in the bin64 directory for testing the SQL command connections.
Test to make sure that you can run the client commands. Change to the bin
subdirectory of the client installation directory (for example, /usr/local/nz/bin).
Run a sample command such as the nzds command to verify that the command
succeeds or to identify any errors.
./nzds -host nzhost -u user -pw password
The command displays a list of the data slices on the target NPS system. If the
command runs without error, your client system has the required libraries and
packages to support the Netezza clients. If the command fails with a library or
other error, the client may require some additional libraries or shared objects.
For example, on a Red Hat Enterprise Linux 64-bit client system, you could see an
error similar to the following:
[root@myrhsystem bin]# ./nzds
-bash: ./nzds: /lib/ld-linux.so.2: bad ELF interpreter: No such file or directory
For example, on a SUSE 10/11 64-bit client system, you could see an error similar
to the following:
mylinux:/usr/local/nz/bin # ./nzds
./nzds: error while loading shared libraries: libssl.so.4: cannot open shared
object file: No such file or directory
These errors indicate that the client is missing 32-bit library files that are required
to run the NPS clients. Identify the packages that provide the library and obtain
those packages. You may need assistance from your local workstation IT
administrators to obtain the operating system packages for your workstation.
To identify and obtain the required Red Hat packages, you could use a process
similar to the following.
v Use the yum provides command and specify the file name to see which package
provides the file that could not be found (ld-linux.so.2 in this example).
yum provides ld-linux.so.2
Loaded plugins: product-id, refresh-packagekit, security, subscription-manager
This system is not registered to Red Hat Subscription Management. You can use
subscription-manager to register.
RHEL64 | 3.9 kB 00:00 ...
glibc-2.12-1.107.el6.i686 : The GNU libc libraries
Repo : RHEL64
Matched from:
Other : ld-linux.so.2
In this example, the missing package is glibc-2.12-1.107.el6.i686.
v In some cases, the NPS command could report an error for a missing libssl file.
You can use the yum provides command to obtain more information about the
packages that contain the library, and if any of the files already exist on your
workstation.
Based on the missing libraries and packages, use the following steps to obtain the
Red Hat packages.
v Mount the Red Hat distribution DVD or ISO file to the client system. Insert the
DVD into the DVD drive.
v Open a terminal window and log in as root.
v Run the following commands:
[root@myrhsystem]# mkdir /mnt/cdrom
[root@myrhsystem]# mount -o ro /dev/cdrom /mnt/cdrom
v Create the text file server.repo in the /etc/yum.repos.d directory.
To identify and obtain the required SUSE packages, you could use a process
similar to the following.
v Log in to the SUSE system as root or a superuser.
v If the test NPS command failed with the error that libssl.so.4 or
libcrypto.so.4 or both could not be found, you could be able to resolve the
issue by adding a symbolic link to the missing file from the NPS client
installation directory (for example, /usr/local/nz/lib). Use the ls /lib/libssl*
command to list the available libraries in the standard OS directories. You could
then create symbolic links to one of your existing libssl.so and libcrypto.so
files by using commands similar to the following:
If the error indicates that you are missing other libraries or packages, use the
following steps to obtain the SUSE packages.
v Open a terminal window and log in as root.
v Run the yast command to open the YaST interface.
v One the YaST Control Center, select Software and go to the software repositories
to configure and enable a DVD, a server, or an ISO file as a repository source.
Select the appropriate source for your SUSE environment. Consult with your IT
department about the policies for package updates in your environment.
v On the Software tab, go to Software Management and search for the required
package or library such as glibc-32bit in this example.
v Click Accept to install the required package.
v Exit YaST by clicking Quit.
To run the CLI commands on Solaris, you must include /usr/local/lib in your
environment variable LD_LIBRARY_PATH. Additionally, to use the ODBC driver on
Linux, Solaris, or HP-UX, you must include /usr/local/nz/lib, or the directory
path to nz/lib where you installed the Netezza CLI tools.
Related reference:
“Command locations” on page 3-3
To remove the client CLI kits from a UNIX system, complete the following steps:
Procedure
1. Change to the directory where you installed the clients. For example,
/usr/local/nz.
2. Delete the nz commands manually.
If you are using or viewing object names that use UTF-8 encoded characters, your
Windows client systems require the Microsoft universal font to display the
characters within the NzAdmin tool. The Arial Unicode MS font is installed by
default on some Windows systems, but you might have to run a manual
installation for other Windows platforms such as 2003 or others. For more
information, see the Microsoft support topic at http://office.microsoft.com/en-us/
help/hp052558401033.aspx.
To install the IBM Netezza tools on Windows, complete the following steps:
Procedure
1. Insert the IBM Netezza Client Components for Windows DVD in your media drive
and go to the admin directory. If you downloaded the client package
(nzsetup.exe) to a directory on your client system, change to that directory.
2. Double-click or run nzsetup.exe.
This program is a standard installation program that consists of a series of
steps in which you select and enter information that is used to configure the
installation. You can cancel the installation at any time.
Results
The installation program displays a license agreement, which you must accept to
install the client tools. You can also specify the following information:
Destination folder
You can use the default installation folder or specify an alternative
location. The default folder is C:\Program Files\IBM Netezza Tools. If you
choose a different folder, the installation program creates the folder if one
does not exist.
Setup type
Select the type of installation: typical, minimal, or custom.
Typical
Install the nzadmin program, the help file, the documentation, and
the console utilities, including the loader.
Minimal
Install the nzadmin program and help files.
Custom
Displays a screen where you can select to install any combination
of the administration application, console applications, or
documentation.
After you complete the selections and review the installation options, the client
installer creates the Netezza Tools folder, which has several subfolders. You cannot
change the subfolder names or locations.
The installer stores copies of the software licenses in the installation directory,
which is usually C:\Program Files\IBM Netezza Tools (unless you specified a
different location).
The installation program adds the Netezza commands to the Windows Start >
Programs menu. The program group is IBM Netezza and it has the suboptions
IBM Netezza Administrator and Documentation. The IBM Netezza Administrator
command starts the NzAdmin tool. The Documentation command lists the PDFs of
the installed documentation.
To use the commands in the bin directory, you must open a Windows
command-line prompt (a DOS prompt).
Environment variables
The following table lists the operating system environment variables that the
installation tool adds for the IBM Netezza console applications.
Table 2-2. Environment variables
Variable Operation Setting
PATH append <installation directory>\bin
NZ_DIR set Installation directory (for example C:\Program
Files\IBM Netezza Tools)
You can remove or uninstall the Windows tools by using the Windows Add or
Remove Programs interface in the Control Panel. The uninstallation program
removes all folders, files, menu commands, and environment variables. The
registry entries that are created by other IBM Netezza applications, however, are
not removed.
To remove the IBM Netezza tools from a Windows client, complete the following
steps:
Procedure
1. Click Start > Control Panel > Uninstall. The menu options can vary with each
Windows operating system type.
IBM Netezza commands that display object names such as nzload, nzbackup, and
nzsession can also display non-ASCII characters, but they must operate on a
UTF-8 terminal or DOS window to display characters correctly.
For UNIX clients, make sure that the terminal window in which you run these nz
commands uses a UTF-8 locale. The output in the terminal window might not
align correctly.
As an alternative to these DOS setup steps, the input/output from the DOS clients
can be piped from/to nzconvert and converted to a local code page, such as 932
for Japanese.
On a Windows system, the fonts that you use for your display must meet the
Microsoft requirements as outlined on the Support site at http://
support.microsoft.com/default.aspx?scid=kb;EN-US;Q247815.
After you define (or modify) these settings in the postgresql.conf file, you must
restart the Netezza software to apply the changes.
Netezza personnel, if granted access for remote service, use port 22 for SSH, and
ports 20 and 21 for FTP.
For security or port conflict reasons, you can change one or more default port
numbers for the IBM Netezza database access.
Important: Be careful when you are changing the port numbers for the Netezza
database access. Errors can severely affect the operation of the Netezza system. If
you are not familiar with editing resource shell files or changing environment
variables, contact Netezza Support for assistance.
Before you begin, make sure that you choose a port number that is not already in
use. To check the port number, you can review the /etc/services file to see
whether the port number is specified for another process. You can also use the
netstat | grep port command to see whether the designated port is in use.
To change the default port numbers for your Netezza system, complete the
following steps:
Procedure
1. Log in to the Netezza host as the nz user.
2. Change to the /nz/kit/sys/init directory.
3. Create a backup of the current nzinitrc.sh file:
[nz@nzhost init]$ cp nzinitrc.sh nzinitrc.sh.backup
4. Review the nzinitrc.sh file to see whether the Netezza port or ports that are
listed in Table 2-3 on page 2-10 that you want to change are present in the file.
For example, you might find a section that looks similar to the following, or
you might find that these variables are defined separately within the
nzinitrc.sh file.
# Application Port Numbers
# ------------------------
Tip: You can append the contents of the nzinitrc.sh.sample file to the
nzinitrc.sh file to create an editable section of variable definitions. You must
be able to log in to the Netezza host as the root user; then, change to the
/nz/kit/sys/init directory and run the following command:
[nz@nzhost init]$cat nzinitrc.sh.backup nzinitrc.sh.sample
>nzinitrc.sh
Some Netezza commands such as nzsql and nzload have a -port option that
allows the user to specify the DB access port. In addition, users can create local
definitions of the environment variables to specify the new port number.
For a Linux system, you can define a session-level variable by using a command
similar to the following format:
$ NZ_DBMS_PORT=5486; export NZ_DBMS_PORT
Encrypted passwords
Database user accounts must be authenticated during access requests to the IBM
Netezza database. For user accounts that use local authentication, Netezza stores
the password in encrypted form in the system catalog. For more information about
encrypting passwords on the host and the client, see the IBM Netezza Advanced
Security Administrator's Guide.
Local authentication requires a password for every account. If you use LDAP
authentication, a password is optional. During LDAP authentication, Netezza uses
the services of an LDAP server in your environment to validate and verify Netezza
database users.
v When you are using the Netezza CLI commands, the clear-text password must
be entered on the command line. You can set the environment variable
NZ_PASSWORD to avoid typing the password on the command line, but the
variable is stored in clear text with the other environment variables.
v To avoid displaying the password on the command line, in scripts, or in the
environment variables, you can use the nzpassword command to create a locally
stored encrypted password.
Where:
v The user name is the Netezza database user name in the Netezza system catalog.
If you do not specify the user name on the command line, the nzpassword
command uses the environment variable NZ_USER.
v The password is the Netezza database user password in the Netezza system
catalog or the password that is specified in the environment variable
NZ_PASSWORD. If you do not supply a password on the command line or in the
environment variable, the system prompts you for a password.
v The host name is the Netezza host. If you do not specify the host name on the
command line, the nzpassword command uses the environment variable NZ_HOST.
You can create encrypted passwords for any number of user name/host pairs.
When you use the nzpassword add command to cache the password, quotation
marks are not required around the user name or password values. You must only
qualify the user name or password with a surrounding set of single quotation
mark, double quotation mark pairs (for example, '"Bob"') if the value is
case-sensitive. If you specify quoted or unquoted names or passwords in
nzpassword or other nz commands, you must use the same quoting style in all
cases.
If you qualify a user name that is not case-sensitive with quotation marks (for
example '"netezza"'), the command might still complete successfully, but it might
not work in all command cases.
Stored passwords
If client users use the nzpassword command to store database user passwords on a
client system, they can supply only a database user name and host on the
command line. Users can also continue to enter a password on the command line
if displaying clear-text passwords is not a concern for security.
If you supply a password on the command line, it takes precedence over the
environment variable NZ_PASSWORD. If the environment variable is not set, the
system checks the locally stored password file. If there is no password in this file
and you are using the nzsql command, the system prompts you for a password,
otherwise the authentication request fails.
In all cases, using the -pw option on the command line, using the NZ_PASSWORD
environment variable, or using the locally stored password that is stored through
the nzpassword command. IBM Netezza compares the password against the entry
in the system catalog for local authentication or against the LDAP or KERBEROS
account definition. The authentication protocol is the same, and Netezza never
sends clear-text passwords over the network.
In release 6.0.x, the encryption that is used for locally encrypted passwords
changed. In previous releases, Netezza used the Blowfish encryption routines;
release 6.0 now uses the Advanced Encryption Standard AES-256 standard. When
you cache a password by using a release 6.0 client, the password is saved in
AES-256 format unless there is an existing password file in Blowfish format. In that
case, new stored passwords are saved in Blowfish format.
If you upgrade to a release 6.0.x or later client, the client can support passwords in
either the Blowfish format or the AES-256 format. If you want to convert your
existing password file to the AES-256 encryption format, you can use the
nzpassword resetkey command to update the file. If you want to convert your
password file from the AES-256 format to the Blowfish format, use the nzpassword
resetkey -none command.
Important: Older clients, such as those for release 5.0.x and those clients earlier
than release 4.6.6, do not support AES-256 format passwords. If your password file
is in AES-256 format, the older client commands prompt for a password, which can
For information about the Netezza Performance Portal, see the IBM Netezza
Performance Portal User's Guide, which is available with the software kit for that
interface.
In general, the Netezza CLI commands are used most often for the various
administration tasks. Many of the tasks can also be performed by using SQL
commands or the interactive interfaces. Throughout this publication, the primary
task descriptions use the CLI commands and reference other ways to do the same
task.
You can use Netezza CLI commands (also called nz commands) to monitor and
manage a Netezza system. Most nz* commands are issued on the Netezza host
system. Some are included with the Netezza client kits, and some are available in
optional support toolkits and other packages. This publication describes the host
and client nz commands.
Note: When investigating problems, Netezza support personnel might ask you to
issue other internal nz commands that are not listed.
Table 3-1. Command summary
Host or Client Kit Availability
Netezza Linux Solaris HP AIX Windows
Command Description Host Client Client Client Client Client
nzbackup Backs up an existing v
database.
nzcontents Displays the revision v
and build number of
all the executable files,
plus the checksum of
Netezza binaries.
nzconvert Converts character v v v v v v
encodings for loading
with the nzload
command or external
tables.
Command locations
The following table shows the default location of each CLI command and in which
of the host and client kits they are available:
Add the appropriate bin directory to your search path to simplify command
invocation.
Related concepts:
“Path for Netezza CLI client commands” on page 2-6
Command syntax
All IBM Netezza CLI commands have the following top-level syntax options:
For many Netezza CLI commands you can specify a timeout. This time is the
amount of time the system waits before it abandons the execution of the command.
If you specify a timeout without a value, the system waits 300 seconds. The
maximum timeout value is 100 million seconds.
Issuing commands
To issue an nz command, you must have access to the IBM Netezza system (either
directly on the Netezza KVM or through a remote shell connection) or you must
install the Netezza client kit on your workstation. If you are accessing the Netezza
system directly, you must be able to log in by using a Linux account (such as nz).
While some of the nz commands can operate and display information without
additional access requirements, some commands and operations require that you
specify a Netezza database user account and password. The account might also
require appropriate access and administrative permissions to display information
or process a command.
Note: In this example, you did not have to specify a host, user, or password.
The command displayed information that was available on the local Windows
client.
v To back up a Netezza database (you must run the command while logged in to
the Netezza system, as this is not supported from a client):
[nz@npshost ~]$ nzbackup -dir /home/user/backups -u user -pw
password -db db1
Backup of database db1 to backupset 20090116125409 completed
successfully.
Identifiers in commands
When you use the IBM Netezza commands and specify identifiers for users,
passwords, database names, and other objects, you can pass normal identifiers that
However, if you use delimited identifiers, the supported way to pass them on the
Linux command line is to use the following syntax:
’\’Identifier\’’
The syntax is single quotation mark, backslash, single quotation mark, identifier,
backslash, single quotation mark, single quotation mark. This syntax protects the
quotation marks so that the identifier remains quoted in the Netezza system.
Throughout this publication, SQL commands are shown in uppercase (for example,
CREATE USER) to stand out as SQL commands. The commands are not
case-sensitive and can be entered by using any letter casing. Users must have
Netezza database accounts and applicable object or administrative permissions to
do tasks. For detailed information about the SQL commands and how to use them
to do various administrative tasks, see the IBM Netezza Database User’s Guide.
The following table describes the nzsql command parameters. For more
information about the command parameters and how to use the command, see the
IBM Netezza Database User’s Guide.
Table 3-2. nzsql command parameters
Parameters Description
-a Echo all input from a script.
-A Use unaligned table output mode. This is equivalent to specifying
-P format=unaligned.
-c <query> Run only a single query (or slash command) and exit.
-d <dbname> Specify the name of the database to which to connect. If you do
or not specify this parameter, the nzsql command uses the value
-D <dbname> specified for the NZ_DATABASE environment variable (if it is
specified) or prompts you for a password (if it is not).
Within the nzsql command interpreter, enter the \h slash commands for help about
or to run a command:
Within the nzsql command interpreter, enter the following slash commands for
help about or to run a command:
\h List all SQL commands.
\h <command>
Display help about the specified SQL command.
\? List and display help about all slash commands.
Starting in NPS release 7.2.1, the nzsql command is included as part of the
Windows client kit. In a Windows environment, note that there are some
behavioral differences when users press the Enter key or the Control-C key
sequence than in a UNIX nzsql command line environment. The Windows
command prompt environment does not support many of the common UNIX
command formats and options. However, if your Windows client is using a Linux
environment like cygwin or others, the nzsql.exe command could support more of
the UNIX-only command line options noted in the documentation.
In a UNIX environment, if you are typing a multiline SQL query into the nzsql
command line shell, the Enter key acts as a newline character to accept input for
the query until you type the semi-colon character and press Enter. The shell
prompt also changes from => to -> for the subsequent lines of the input.
MYDB.SCH(USER)=> select count(*) (press Enter)
MYDB.SCH(USER)-> from ne_part (press Enter)
MYDB.SCH(USER)-> where p_retailprice < 950.00 (press Enter)
MYDB.SCH(USER)-> ; (press Enter)
COUNT
-------
1274
(1 row)
In a UNIX environment, if you press Control-C, the entire query is cancelled and
you return to the command prompt:
MYDB.SCH(USER)=> select count(*) (press Enter)
MYDB.SCH(USER)-> from ne_part (press Enter)
MYDB.SCH(USER)-> where p_retailprice < 950.00 (press Control-C)
MYDB.SCH(USER)=>
In a Windows client environment, if you are typing a multiline SQL query into the
nzsql command line shell, the Enter key acts similarly as a newline character to
accept input for the query until you type the semi-colon character and press Enter.
MYDB.SCH(USER)=> select count(*) (press Enter)
MYDB.SCH(USER)-> from ne_part (press Enter)
MYDB.SCH(USER)-> where p_retailprice < 950.00 (press Enter)
MYDB.SCH(USER)-> ; (press Enter)
COUNT
-------
1274
(1 row)
The Control-C (or a Control-Break) cancelled the WHERE clause on the third input
line, and thus the query results were larger without the restriction. In a single
input line (where the prompt is =>, note that Control-C cancels the query and you
return to the nzsql command prompt.
MYDB.SCH(USER)=> select count(*) from ne_part (press Control-C)
MYDB.SCH(USER)=>
When you run the nzsql command on a Windows client, you could see the error
more not recognized as an internal or external command. This error occurs
because nzsql uses the more command to process the query results. The error
indicates that the nzsql command could not locate the more command on your
Windows client.
To correct the problem, add the more.com command executable to your client
system's PATH environment variable. Each Windows OS version has a slightly
different way to modify the environment variables, so refer to your Windows
documentation for specific instructions. On a Windows 7 system, you could use a
process similar to the following:
v Click Start, and then type environment in the search field. In the search results,
click Edit the system environment variables. The System Properties dialog
opens and displays the Advanced tab.
v Click Environment variables. The Environment Variables dialog opens.
v In the System variables list, select the Path variable and click Edit. The Edit
System Variable dialog opens.
v Place the cursor at the end of the Variable value field. You can click anywhere in
the field and then press End to get to the end of the field.
v Append the value C:\Windows\System32; to the end of the Path field. Make
sure that you use a semi-colon character and type a space character at the end of
the string. If your system has the more.com file in a directory other than
C:Windows\System32, use the pathname that is applicable on your client.
v Click OK in the Edit System Variable dialog, then click OK in the Environment
Variables dialog, then click OK in the System Properties dialog.
After you make this change, the nzsql command should run without displaying
the more not recognized error.
On Windows clients, you can use the up-arrow key to display the commands that
ran previously.
By default, an nzsql batch session continues even if the system encounters errors.
You can control this behavior with the ON_ERROR_STOP variable, for example:
nzsql -v ON_ERROR_STOP=
You can also toggle batch processing with a SQL script. For example:
\set ON_ERROR_STOP
\unset ON_ERROR_STOP
You can use the $HOME/.nzsqlrc file to store values, such as the ON_ERROR_STOP,
and have it apply to all future nzsql sessions and all scripts.
The following table describes the slash commands that display information about
objects or privileges within the database, or within the schema if the system
supports multiple schemas.
Table 3-3. The nzsql slash commands
Command Description
\d <object> Describe the named object such as a table, view, or
sequence
\da[+] List user-defined aggregates. Specify + for more detailed
information.
\df[+] List user-defined functions. Specify + for more detailed
information.
\de List temp tables.
\dg List groups (both user and resource groups) except
_ADMIN_.
\dG List user groups and their members.
\dGr List resource groups to which at least one user has been
assigned, including _ADMIN_, and the users assigned to
them.
\di List indexes.
\dm List materialized views
\ds List sequences.
\dt List tables.
\dv List views.
\dx List external tables.
\dy List synonyms.
\dSi List system indexes.
\dSs List system sequences.
\dSt List system tables.
\dSv List system views.
\dMi List system management indexes.
\dMs List system management sequences.
\dMt List system management tables.
\dMv List system management views.
\dp <user> List the privileges that were granted to a user either
directly or by membership in a user group.
Note: Starting in Release 7.0.3, the nzsql environment prompt has changed. As
shown in the example command, the prompt now shows the database and schema
(mydb.myschema) to which you are connected. For systems that do not support
multiple schemas, there is only one schema that matches the name of the user who
created the database. For systems that support multiple schemas within a database,
the schema name will match the current schema for the connection.
To suppress the row count information, you can use the nzsql -r command when
you start the SQL command-line session. When you run a query, the output does
not show a row count:
mydb.myschema(myuser)=> select count(*) from nation;
COUNT
-------
25
You can use the NO_ROWCOUNT session variable to toggle the display of the row
count information within a session, as follows:
mydb.myschema(myuser)=> select count(*) from nation;
COUNT
-------
25
(1 row)
v Run the nzadmin.exe file from a command window. To bypass the login dialog,
enter the following login information:
– -host or /host and the name or IP address of the Netezza host.
– -user or /user and a Netezza user name. The name you specify can be
delimited. A delimited user name is contained in quotation marks.
– -pw or /pw and the password of the specified user. To specify that a saved
password is to be used, enter -pw without entering a password string.
You can specify these parameters in any order, but you must separate them by
spaces or commas. If you specify:
– All three parameters, NzAdmin bypasses the login dialog and connects you
to the host that you specify.
– Less than three parameters, NzAdmin displays the login dialog and prompts
you to complete the remaining fields.
When you log in to the NzAdmin tool you must specify the name of the host, your
user name, and your password. The drop-down list in the host field displays the
host addresses or names that you specified in the past. If you choose to save the
password on the local system, when you log in again, you need to enter only the
host and user names.
At the top of the navigation pane there are tabs that you can use to select the view
type:
System
The navigation pane displays components related to system hardware such
as SPA units, SPU units, and data slices.
Database
The navigation pane displays components related to database processing
such as databases, users, groups, and database sessions.
In the status bar at the bottom of the window, the NzAdmin tool displays your
user name and the duration (days, hours, and minutes) of the current NzAdmin
session or, if the host system is not online, a message indicating this.
You can access commands by using the menu bar or the toolbar, or by
right-clicking a object and using its pop-up menu.
For example, as you move the mouse pointer over the image of a SPA unit, a tool
tip displays the slot number, hardware ID, role, and state of each of the SPUs that
comprise it. Clicking a SPU displays the SPU status window and selects the
corresponding object in the tree view shown in the navigation pane.
Status indicators
Each component has a status indicator:
Table 3-4. Status indicators
Indicator Status Description
Normal The component is operating normally.
Failed The component is down, has failed, or is likely to fail. For example,
if two fans on the same SPA are down, the SPA is flagged as being
likely to fail.
Missing The component is missing and so no state information is available
for it.
Command Description
File > New Create a new database, table, view, materialized
view, sequence, synonym, user, or group. Available
only in the Database view.
File > System State Change the system state.
File > Reconnect Reconnect to the system with a different host name,
address, or user name.
File > Exit Exit the NzAdmin tool.
View > Toolbar Show or hide the toolbar.
View > Status Bar Show or hide the status bar.
View > System Objects Show or hide the system tables and views, and the
object privilege lists in the Object Privileges window.
View > SQL Statements Display the SQL window, which shows a subset of
the SQL commands run in this session.
View > Refresh Refresh the current view. This can be either the
System or Database view.
Tools > Workload Management Display workload management information:
Performance
Summary, history, and graph workload
management information.
Settings
The system defaults that determine the
limits on session timeout, row set, query
timeout. and session priority; and the
resource allocation that determines resource
usage among groups.
Tools > Table Skew Display any tables that meet or exceed a specified
skew threshold.
Tools > Table Storage Display table and materialized view storage usage
by database or by user.
Tools > Query History Display a window that you can use to create and
Configuration alter history configurations, and to set the current
configuration.
Tools > Default Settings Display the materialized view refresh threshold.
Tools > Options Display the Preferences tab where you can set the
object naming preferences and whether you want to
automatically refresh the NzAdmin window.
Help > NzAdmin Help Display the online help for the NzAdmin tool.
Help > About NzAdmin Display the NzAdmin and Netezza revision
numbers and copyright text.
Administration commands
You can access system and database administration commands from both the tree
view and the status pane of the NzAdmin tool. In either case, a pop-up menu lists
the commands that can be issued for the selected components.
v To activate a pop-up menu, right-click a component in a list.
You can manually refresh the current (System or Database) view by clicking the
refresh icon on the toolbar, or by choosing Refresh from a menu. In addition, you
can specify that both views are to be periodically automatically refreshed, and the
refresh interval. To do this:
Procedure
1. In the main menu, click Tools > Options
2. In the Preferences tab, enable automatic refresh and specify a refresh interval.
Results
The refresh interval you specify remains in effect until you change it.
To reduce communication with the server, the NzAdmin tool refreshes data based
on the item you select in the left pane. The following table lists the items and
corresponding data that is retrieved on refresh.
Table 3-5. Automatic refresh
Selected item Data retrieved
Server (system view): All topology and hardware state information.
v SPA Units
v SPA ID n
v SPU units
Event rules Event rules.
If the NzAdmin tool is busy communicating with the server (for example, if it is
processing a user command or doing a manual refresh), it does not perform an
automatic refresh.
They are supported by a large and active community for improvements and fixes,
and they also offer the flexibility for Netezza to add corrections or improvements
on a faster basis, without waiting for updates from third-party vendors.
All the Netezza models except the Netezza 100 are HA systems, which means that
they have two host servers for managing Netezza operations. The host server
(often called host within the publication) is a Linux server that runs the Netezza
software and utilities.
Distributed Replicated Block Device (DRBD) is a block device driver that mirrors
the content of block devices (hard disks, partitions, and logical volumes) between
the hosts. Netezza uses the DRBD replication only on the /nz and /export/home
partitions. As new data is written to the /nz partition and the /export/home
partition on the primary host, the DRBD software automatically makes the same
changes to the /nz and /export/home partition of the standby host.
For details about DRBD and its terms and operations, see the documentation at
http://www.drbd.org.
Note: The /nzdata and /shrres file systems on the MSA500 are deprecated.
v In some customer environments that used the previous cluster manager solution,
it was possible to have only the active host running while the secondary was
powered off. If problems occurred on the active host, the Netezza administrator
on-site would power off the active host and power on the standby. In the new
Linux-HA DRBD solution, both HA hosts must be operational at all times.
DRBD ensures that the data saved on both hosts is synchronized, and when
Heartbeat detects problems on the active host, the software automatically fails
over to the standby with no manual intervention.
Related concepts:
“Logging and messages” on page 4-12
Linux-HA administration
When you start an IBM Netezza HA system, Heartbeat automatically starts on both
hosts. It can take a few minutes for Heartbeat to start all the members of the nps
resource group. You can use the crm_mon command from either host to observe the
status, as described in “Cluster and resource group status” on page 4-5.
CAUTION:
Do not modify the file unless directed to in Netezza documentation or by
Netezza Support.
CAUTION:
Never manually edit the CIB file. You must use cibadmin (or crm_resource) to
modify the Heartbeat configuration. Wrapper scripts like heartbeat_admin.sh
update the file safely.
Note: It is possible to get into a situation where Heartbeat does not start properly
because of a manual CIB modification. The CIB cannot be safely modified if
Heartbeat is not started (that is, cibadmin cannot run). In this situation, you can
run /nzlocal/scripts/heartbeat_config.sh to reset the CIB and /etc/ha.d/ha.cf
to factory-default status. After you do this, it is necessary to run
/nzlocal/scripts/heartbeat_admin.sh --enable-nps to complete the CIB
configuration.
However, when host 1 is the active host, certain system-level operations such as
S-Blade restarts and system reboots often complete more quickly than when host
2/HA2 is the active host. An S-Blade restart can take one to two minutes longer to
complete when host 2 is the active host. Certain tasks such as manufacturing and
system configuration scripts can require host 1 to be the active host, and they
display an error if run on host 2 as the active host. The documentation for these
commands indicates whether they require host 1 to be the active host, or if special
steps are required when host 2 is the active host.
You can change the settings by editing the values in ha.cf on both hosts and
restarting Heartbeat, but use care when you are editing the file.
The following table lists the common commands. These commands are listed here
for reference.
Table 4-2. Cluster management scripts
Type Scripts
Initial installation heartbeat_config.sh sets up Heartbeat for the first time
scripts
heartbeat_admin.sh --enable-nps adds Netezza services to
cluster control after initial installation
Host name change heartbeat_admin.sh --change-hostname
Fabric IP change heartbeat_admin.sh --change-fabric-ip
Wall IP change heartbeat_admin.sh --change-wall-ip
Manual migrate heartbeat_admin.sh --migrate
(relocate)
Linux-HA status and crm_mon monitors cluster status
troubleshooting
commands crm_verify sanity checks configuration, and prints status
The following is a list of other Linux-HA commands available. This list is also
provided as a reference, but do not use any of these commands unless directed to
by Netezza documentation or by Netezza Support.
The command output displays a message about how it was started, and then
displays the host name where the nps resource group is running. The host that
runs the nps resource group is the active host.
You can obtain more information about the state of the cluster and which host is
active by using the crm_mon command. See the sample output that is shown in
“Cluster and resource group status.”
If the nps resource group is unable to start, or if it has been manually stopped
(such as by crm_resource -r nps -p target_role -v stopped), neither host is
considered the active host and the crm_resource -r nps -W command does not
return a host name.
Sample output follows. This command refreshes its display every 5 seconds, but
you can specify a different refresh rate (for example, -i10 is a 10-second refresh
rate). Press Control-C to exit the command.
The host that is running the nps resource group is the active host. Every member
of the nps resource group starts on the same host. The sample output shows that
they are all running on nzhost1.
The crm_mon output also shows the name of the Current Designated Coordinator
(DC). The DC host is not an indication of the active host. The DC is an
automatically assigned role that Linux-HA uses to identify a node that acts as a
coordinator when the cluster is in a healthy state. This is a Linux-HA
implementation detail and does not affect Netezza. Each host recognizes and
recovers from failure, regardless of which one is the DC. For more information
about the DC and Linux-HA implementation details, see http://www.linux-
ha.org/DesignatedCoordinator.
The fence routes for internal Heartbeat use are not part of the nps resource group.
If these services are started, it means that failovers are possible:
fencing_route_to_ha1 (stonith:apcmaster): Started nzhost2
fencing_route_to_ha2 (stonith:apcmaster): Started nzhost1
The order of the members of the group matters; group members are started
sequentially from first to last. They are stopped sequentially in reverse order, from
last to first. Heartbeat does not attempt to start the next group member until the
previous member starts successfully. If any member of the resource group is unable
to start (returns an error or times out), Heartbeat performs a failover to the
standby node.
Failover criteria
During a failover or resource migration, the nps resource group is stopped on the
active host and started on the standby host. The standby host then becomes the
active host.
Note: If any of these resource group members experiences a failure, Heartbeat first
tries to restart or repair the process locally. The failover is triggered only if that
Note: In the previous Netezza Cluster Manager solution, HA1 is the name of the
primary node, and HA2 the secondary node. In Linux-HA/DRBD, either host can
be primary; thus, these procedures call one host as the active host and one as the
standby host.
To relocate the nps resource group from the active host to the standby host:
[root@nzhost1 ~]# /nzlocal/scripts/heartbeat_admin.sh --migrate
Testing DRBD communication channel...Done.
Checking DRBD state...Done.
The command blocks until the nps resource group stops completely. To monitor
the status, use the crm_mon -i5 command. You can run the command on either
host, although on the active host you must run it from a different terminal
window.
In general, you should not have to stop Heartbeat unless the IBM Netezza HA
system requires hardware or software maintenance or troubleshooting. During
these times, it is important that you control Heartbeat to ensure that it does not
interfere with your work by taking STONITH actions to regain control of the hosts.
The recommended practice is to shut down Heartbeat completely for service.
To shut down the nps resource group and Heartbeat, complete the following steps:
Procedure
1. Identify which node is the active node by using the following command:
Procedure
1. While logged in to either host as root, display the name of the active node:
[root@nzhost1 ~]# crm_resource -r nps -W
resource nps is running on: nzhost1
2. As root, stop Heartbeat on the standby node (nzhost2 in this example):
[root@nzhost2 ~]# service heartbeat stop
3. As root, stop Heartbeat on the active node:
[root@nzhost1 ~]# service heartbeat stop
4. As root, make sure that there are no open nz sessions or any open files in the
shared directories /nz, /export/home, or both. For details, see “Checking for
user sessions and activity” on page 4-18.
[root@nzhost1 ~]# lsof /nz /export/home
5. Run the following script in /nzlocal/scripts to make the IBM Netezza system
ready for non-clustered operations. The command prompts you for a
confirmation to continue, shown as Enter in the output.
[root@nzhost1 ~]# /nzlocal/scripts/nz.non-heartbeat.sh
---------------------------------------------------------------
Thu Jan 7 15:13:27 EST 2010
File systems and eth2 on this host are okay. Going on.
File systems and eth2 on other host are okay. Going on.
This script will configure Host 1 or 2 to own the shared disks and
own the fabric.
Running nz_dnsmasq: [ OK ]
nz_dnsmasq started.
To reinstate the cluster from a maintenance mode, complete the following steps:
Procedure
1. Stop the IBM Netezza software by using the nzstop command.
2. Make sure that Heartbeat is not running on either node. Use the service
heartbeat stop command to stop the Heartbeat on either host if it is running.
3. Make sure that there are no nz user login sessions, and make sure that no users
are in the /nz or /export/home directories. Otherwise, the nz.heartbeat.sh
command is not able to unmount the DRBD partitions. For details, see
“Checking for user sessions and activity” on page 4-18.
4. Run the following script in /nzlocal/scripts to make the Netezza system
ready for clustered operations. The command prompts you for a confirmation
to continue, shown as Enter in the output.
[root@nzhost1 ~]# /nzlocal/scripts/nz.heartbeat.sh
---------------------------------------------------------------
Thu Jan 7 15:14:32 EST 2010
You can configure the Cluster Manager to send events when a failover is caused by
any of the following events:
v Node shutdown
v Node restart
v Node fencing actions (STONITH actions)
Procedure
1. Log in to the active host as the root user.
2. Using a text editor, edit the /nzlocal/maillist file as follows. Add the lines
that are shown in bold.
#
#Email notification list for the cluster manager problems
#
#Enter email addresses of mail recipients under the TO entry, one
to a line
#
#Enter email address of from email address (if a non-default is
desired)
DRBD administration
DRBD provides replicated storage of the data in managed partitions (that is, /nz
and /export/home). When a write occurs to one of these locations, the write action
occurs at both the local node and the peer standby node. Both perform the same
write to keep the data in synchronization. The peer responds to the active node
when finished, and if the local write operation is also successfully finished, the
active node reports the write as complete.
The DRBD software can be started, stopped, and monitored by using the
/sbin/service drbd start/stop/status command (as root):
While you can use the status command as needed, only stop and start the DRBD
processes during routine maintenance procedures or when directed by IBM
Netezza Support. Do not stop the DRBD processes on an active, properly working
Netezza HA host to avoid the risk of split-brain.
Related tasks:
“Detecting split-brain” on page 4-14
Sample output of the commands follows. These examples assume that you are
running the commands on the primary (active) IBM Netezza host. If you run them
from the standby host, the output shows the secondary status first, then the
primary.
[root@nzhost1 ~]# service drbd status
drbd driver loaded OK; device status:
version: 8.2.6 (api:88/proto:86-88)
GIT-hash: 3e69822d3bb4920a8c1bfdf7d647169eba7d2eb4 build by root@nps22094, 2009-06-09
16:25:53
m:res cs st ds p mounted fstype
0:r1 Connected Primary/Secondary UpToDate/UpToDate C /export/home ext3
1:r0 Connected Primary/Secondary UpToDate/UpToDate C /nz ext3
[root@nzhost1 ~]# cat /proc/drbd
version: 8.2.6 (api:88/proto:86-88)
GIT-hash: 3e69822d3bb4920a8c1bfdf7d647169eba7d2eb4 build by root@nps22094, 2009-06-09
16:25:53
0: cs:Connected st:Primary/Secondary ds:UpToDate/UpToDate C r---
ns:15068 nr:1032 dw:16100 dr:3529 al:22 bm:37 lo:0 pe:0 ua:0 ap:0 oos:0
1: cs:Connected st:Primary/Secondary ds:UpToDate/UpToDate C r---
ns:66084648 nr:130552 dw:66215200 dr:3052965 al:23975 bm:650 lo:0 pe:0 ua:0 ap:0 oos:0
In the sample output, the DRBD states are one of the following values:
Primary/Secondary
The "healthy" state for DRBD. One device is Primary and one is Secondary.
Secondary/Secondary
DRBD is in a suspended or waiting mode. This usually occurs at boot time
or when the nps resource group is stopped.
Primary/Unknown
One node is available and healthy, the other node is either down or the
cable is not connected.
Secondary/Unknown
This is a rare case where one node is in standby, the other is either down
or the cable is not connected, and DRBD cannot declare a node as the
primary/active node. If the other host also shows this status, the problem
is most likely in the connection between the hosts. Contact Netezza
Support for assistance in troubleshooting this case.
The DRBD status when the current node is active and the standby node is down:
m:res cs st ds p mounted fstype
0:r1 WFConnection Primary/Unknown UpToDate/DUnknown C /export/home ext3
1:r0 WFConnection Primary/Unknown UpToDate/DUnknown C /nz ext3
Detecting split-brain
About this task
Split-brain is an error state that occurs when the images of data on each IBM
Netezza host are different. It typically occurs when synchronization is disabled and
users change data independently on each Netezza host. As a result, the two
Netezza host images are different, and it becomes difficult to resolve what the
latest, correct image should be.
Important: Split-brain does not occur if clustering is enabled. The fencing controls
prevent users from changing the replicated data on the standby node. Allow DRBD
management to be controlled by Heartbeat to avoid the split-brain problems.
Procedure
1. Look for Split in /var/log/messages, usually on the host that you are trying to
make the primary/active host. Let DRBD detect this condition.
2. Because split-brain results from running both images as primary Netezza hosts
without synchronization, check the Netezza logs on both hosts. For example,
check the pg.log files on both hosts to see when/if updates occur. If there is an
overlap in times, both images have different information.
3. Identify which host image, if either, is the correct image. In some cases, neither
host image might be fully correct. You must choose the image that is the more
correct. The host that has the image which you decide is correct is the
“survivor”, and the other host is the “victim”.
4. Perform the following procedure:
a. Log in to the victim host as root and run these commands:
drbdadm secondary resource
drbdadm disconnect resource
drbdadm -- --discard-my-data connect resource
Note: The connect command might display an error that instructs you to
run drbdadm disconnect first.
5. Check the status of the fix by using drbdadm primary resource and the service
drbd status command. Make sure that you run drbdadm secondary resource
before you start Heartbeat.
Related concepts:
“DRBD administration” on page 4-12
IP address requirements
The following table is an example block of the eight IP addresses that are
recommended for a customer to reserve for an HA system:
Table 4-3. HA IP addresses
Entity Sample IP address
HA1 172.16.103.209
HA1 Host Management 172.16.103.210
Floating IP 172.16.103.212
In the IP addressing scheme, there are two host IPs, two host management IPs, and
the floating IP, which is HA1 + 3.
You must run this command twice. Then, try to stop Heartbeat again by using
service heartbeat stop. This process might not stop all of the resources that
Heartbeat manages, such as /nz mount, drbd devices, nzbootpd, and other
resources.
You can specify one or more V characters. The more Vs that you specify, the more
verbose the output. Specify at least four or five Vs and increase the number as
needed. You can specify up to 12 Vs, but that large a number is not recommended.
For example, if the fencing route to ha1 is listed as failed on host1, use the
crm_resource -r fencing_route_to_ha1 -C -H host1 command.
Output from crm_mon does not show the nps resource group
If the log messages indicate that the nps resource group cannot run anywhere, the
cause is that Heartbeat tried to run the resource group on both HA1 and HA2, but
it failed in both cases. Search in /var/log/messages on each host to find this first
failure. Search from the bottom of the log for the message cannot run anywhere
and then scan upward in the log to find the service failures. You must fix the
problems that caused a service to fail to start before you can successfully start the
cluster.
After you fix the failure case, you must restart Heartbeat following the instructions
in “Transitioning from maintenance to clustering mode” on page 4-10.
Do not modify or remove the user or groups because those changes will impact
Heartbeat and disrupt HA operations on the Netezza system.
Related concepts:
“Initial system setup and information” on page 1-1
Open nz user sessions and nz user activity can cause the procedures to stop
Heartbeat and to return to clustering to fail. Use the nzsession command to see
whether there are active database sessions in progress. For example:
[nz@nzhost1 ~]$ nzsession -u admin -pw password
ID Type User Start Time PID Database State Priority
Name Client IP Client PID Command
----- ---- ----- ----------------------- ----- -------- ------
------------- --------- ---------- ------------------------
16748 sql ADMIN 14-Jan-10, 08:56:56 EST 4500 CUST active normal
127.0.0.1 4499 create table test_2
16753 sql ADMIN 14-Jan-10, 09:12:36 EST 7748 INV active normal
127.0.0.1 7747 create table test_s
16948 sql ADMIN 14-Jan-10, 10:14:32 EST 21098 SYSTEM active normal
127.0.0.1 21097 SELECT session_id, clien
The sample output shows three sessions: the last entry is the session that is created
to generate the results for the nzsession command. The first two entries are user
activity. Wait for those sessions to complete or stop them before you use the
nz.heartbeat.sh or nz.non-heartbeat.sh commands.
To check for connections to the /export/home and /nz directory, complete the
following steps:
Procedure
1. As the nz user on the active host, stop the IBM Netezza software:
[nz@nzhost1 ~]$ /nz/kit/bin/nzstop
2. Log out of the nz account and return to the root account; then use the lsof
command to list any open files that are in /nz or /export/home.
Results
This example shows several open files in the /export/home directory. If necessary,
you can close open files by issuing a command such as kill and supplying the
process ID (PID) shown in the second column from the left. Use caution with the
kill command; if you are not familiar with Linux system commands, contact
Support or your Linux system administrator for assistance.
The Netezza appliance uses SNMP events (described in Chapter 8, “Event rules,”
on page 8-1) and status indicators to send notifications of any hardware failures.
Most hardware components are redundant; thus, a failure typically means that the
remaining hardware components assume the work of the component that failed.
The system might or might not be operating in a degraded state, depending on the
component that failed.
CAUTION:
Never run the system in a degraded state for a long time. It is imperative to
replace a failed component in a timely manner so that the system returns to an
optimal topology and best performance.
Netezza Support and Field Service work with you to replace failed components to
ensure that the system returns to full service as quickly as possible. Most of the
system components require Field Service support to replace. Components such as
disks can be replaced by customer administrators.
The following figure shows some sample output of the nzhw show command:
Legend:
1 Hardware type
2 Hardware ID
3 Hardware role
4 Hardware state
5 Security
For an IBM Netezza High Capacity Appliance C1000 system, the output of the
nzhw show command: includes information about the storage groups:
Related reference:
“The nzhw command” on page A-28
Use the nzhw command to manage the hardware of the IBM Netezza system.
Hardware types
Each hardware component of the IBM Netezza system has a type that identifies the
hardware component.
The following table describes the hardware types. You see these types when you
run the nzhw command or display hardware by using the NzAdmin or IBM
Netezza Performance Portal UIs.
Table 5-2. Hardware description types
Description Comments
Rack A hardware rack for the Netezza system
SPA Snippet processing array (SPA)
SPU Snippet processing unit (SPU)
Disk enclosure A disk enclosure chassis, which contains the disk devices
Disk A storage disk, contains the user databases and tables
Fan A thermal cooling device for the system
Blower A fan pack used within the S-Blade chassis for thermal cooling
Power supply A power supply for an enclosure (SPU chassis or disk)
MM A management device for the associated unit (SPU chassis, disk
enclosure). These devices include the AMM and ESM components, or a
RAID controller for an intelligent storage enclosure in a Netezza C1000
system.
Store group A group of three disk enclosures within a Netezza C1000 system
managed by redundant hardware RAID controllers
Ethernet switch Ethernet switch (for internal network traffic on the system)
Host A high availability (HA) host on the Netezza appliance
SAS Controller A SAS controller within the Netezza HA hosts
Hardware IDs
Each hardware component has a unique hardware identifier (ID) that is in the
form of an integer, such as 1000, 1001, or 1014. You can use the hardware ID to
manage a specific hardware component, or to uniquely identify which component
in command output or other informational displays.
Hardware location
IBM Netezza uses two formats to describe the position of a hardware component
within a rack.
v The logical location is a string in a dot format that describes the position of a
hardware component within the Netezza rack. For example, the nzhw output that
is shown in Figure 5-1 on page 5-3 shows the logical location for components; a
Disk component description follows:
Disk 1609 spa1.diskEncl1.disk13 Active Ok Enabled
In this example, the location of the disk is in SPA 1, disk enclosure one, disk
position one.
Similarly, the disk location for a disk on a system shows the location including
storage group:
Disk 1029 spa1.storeGrp1.diskEncl2.disk5 Active Ok
v The physical location is a text string that describes the location of a component.
You can display the physical location of a component by using the nzhw locate
command. For example, to display the physical location of disk ID 1011:
[nz@nzhost ~]$ nzhw locate -id 1011
Turned locator LED ’ON’ for Disk: Logical
Name:’spa1.diskEncl4.disk1’ Physical Location:’1st Rack, 4th
DiskEnclosure, Disk in Row 1/Column 1’
As shown in the command output, the nzhw locate command also lights the
locator LED for components such as SPUs, disks, and disk enclosures. For
hardware components that do not have LEDs, the command displays the
physical location string.
The following figure shows an IBM PureData System for Analytics N200x-010
system with a closer view of the storage arrays and SPU chassis components and
locations.
A Each IBM PureData System for Analytics N200x rack is one array of disk
enclosures. There are 12 enclosures in a full rack configuration, and IBM
PureData System for Analytics N200x-005 half racks have 6 enclosures.
Each disk enclosure has 24 disks, numbered 1 to 24 from left to right on
the front of the rack.
B SPU1 occupies slots 1 and 2. SPU3 occupies slots 3 and 4, up to SPU13
which occupies slots 13 and 14
C The disk enclosures
D Host 1, host 2, and a KVM
E SPU chassis
The following figure shows an IBM Netezza 1000-12 system or an IBM PureData
System for Analytics N1001-010 with a closer view of the storage arrays and SPU
chassis components and locations.
A Each disk array has four disk enclosures. Each enclosure has 12 disks,
numbered as in the chart shown in the figure.
B SPU1 occupies slots 1 and 2. SPU3 occupies slots 3 and 4, up to SPU11
which occupies slots 11 and 12
C Disk array 1 with four enclosures.
D Disk array 2 with four enclosures.
E Host 1, host 2, and a KVM
F SPU chassis 1
G SPU chassis 2
For detailed information about the locations of various components in the front
and back of the system racks, see the Site Preparation and Specifications guide for
your model type.
The following figure shows an IBM PureData System for Analytics N3001-001
system with host and disk numbering.
A The host marked in the figure is HA1. It is always placed in the rack
directly above HA2.
B The first disk in the host occupies the slot labeled as 0, the second one
occupies slot 1, and, following this pattern, the last disk resides in slot 23.
Sample output of the nzhw locate command on this system looks like the
following:
[nz@v10-12-h1 ~]$ nzhw locate -id 1011
Hardware roles
Each hardware component of the IBM Netezza system has a hardware role, which
represents how the hardware is being used. The following table describes the
hardware roles. You see these roles when you run the nzhw command or display
hardware status by using the NzAdmin or IBM Netezza Performance Portal UIs.
Table 5-3. Hardware roles
Role Description Comments
None The None role indicates that the hardware All active SPUs must be
is initialized, but it has yet to be discovered discovered before the system
by the Netezza system. This process can make the transition from
usually occurs during system startup the Discovery state to the
before any of the SPUs send their discovery Initializing state.
information.
Active The hardware component is an active Normal system state
system participant. Failing over this device
can impact the Netezza system.
Hardware states
The state of a hardware component represents the power status of the hardware.
Each hardware component has a state.
You see these states when you run the nzhw command or display hardware status
by using the NzAdmin or IBM Netezza Performance Portal UIs.
Table 5-4. Hardware states
State Description Comments
None The None state indicates that the All active SPUs must be
hardware is initialized, but it has yet to discovered before the system can
be discovered by the IBM Netezza make the transition from the
system. This process usually occurs Discovery state to the Initializing
during system startup before any of the state. If any active SPUs are still
SPUs have sent their discovery in the booting state, there can be
information. an issue with the hardware
startup.
Ok The Netezza system has received the Normal state
discovery information for this device,
and it is working properly.
Down The device is turned off.
Invalid
Online The system is running normally. It can
service requests.
Missing The System Manager detects a new This typically occurs when a disk
device in a slot that was previously or SPU has been removed and
occupied but not deleted. replaced with a spare without
deleting the old device. The old
device is considered absent
because the System Manager
cannot find it within the system.
Unreachable The System Manager cannot The device may have been failed
communicate with a previously or physically removed from the
discovered device. system.
Critical The management module detects a Contact Netezza Support to
critical hardware problem, and the obtain help with identifying and
problem component amber service light troubleshooting the cause of the
might be illuminated. critical alarm.
Warning The system manager has detected a Contact Netezza Support to
condition that requires investigation. troubleshoot the warning
For example, a host disk may have condition and to determine
reported a predictive failure error whether a proactive replacement
(PFE), which indicates that the disk is is needed.
reporting internal errors.
Checking The system manager is checking or These are normal states for new
Firmware updating the firmware of a disk before replacement disks that are being
it can be brought online as a spare. checked and updated before they
Updating
are added to service.
Firmware
Unsupported The hardware component is not a Contact Netezza Support because
supported model for the appliance. the replacement part is not
supported on the appliance.
The System Manager also monitors the management modules (MMs) in the
system, which have a status view of all the blades in the system. As a result, you
might see messages similar to the following in the sysmgr.log file:
Disks
A disk is a physical drive on which data resides. In a Netezza system, host servers
have several disks that hold the Netezza software, host operating system, database
metadata, and sometimes small user files. The Netezza system also has many more
disks that hold the user databases and tables. Each disk has a unique hardware ID
to identify it.
For the IBM PureData System for Analytics N200x appliances, 24 disks reside in
each disk enclosure, and full rack models have 12 enclosures per rack for a total of
288 disks per rack.
For IBM Netezza 1000 or IBM PureData System for Analytics N1001 systems, 48
disks reside in one storage array; a full-rack system has two storage arrays for a
total of 96 disks.
For IBM PureData System for Analytics N3001-001 appliances, all disks are located
on two hosts. 16 out of 24 disks on each host are used for storing data slices.
Data slices
A data slice is a logical representation of the data that is saved on a disk. The data
slice contains “pieces” of each user database and table. When users create tables
and load their data, they distribute the data for the table across the data slices in
the system by using a distribution key. An optimal distribution is one where each
data slice has approximately the same amount of each user table as any other. The
Netezza system distributes the user data to all of the data slices in the system by
using a hashing algorithm.
Data partitions
Each SPU in an IBM Netezza system "owns" a set of data partitions where the user
data is stored. For the IBM Netezza 100, IBM Netezza 1000, and IBM PureData
System for Analytics N1001 systems, each SPU owns eight data partitions which
are numbered from 0 to 7. For IBM PureData System for Analytics N200x systems,
each SPU typically owns 40 data partitions which are numbered 0 through 39.
For SPU ID 1003, its first data partition (0) points to data slice ID 9, which is stored
on disk 1070. Each data partition points to a data slice. As an example, assume that
disk 1014 fails and its contents are regenerated to a spare disk ID 1024. In this
situation, the SPU 1003’s data partition 7, which previously pointed to data slice 16
on disk 1014, is updated to point to data slice 16 on the new disk 1024 (not
shown).
If a SPU fails, the system moves all its data slices to the remaining active SPUs for
management. The system moves them in pairs (the pair of disks that contain the
primary and mirror data slices of each other). In this situation, some SPUs that
normally had 8 partitions will now own 10 data partitions. You can use the nzds
command to review the data slices on the system and the SPUs that manage them.
The intelligent storage controller contains two redundant RAID controllers that
manage the disks and associated hardware within a storage group. The RAID
controllers are caching devices, which improves the performance of the read and
write operations to the disks. The caches are mirrored between the two RAID
controllers for redundancy; each controller has a flash backup device and a battery
to protect the cache against power loss.
The RAID controllers operate independently of the Netezza software and hosts.
For example, if you stop the Netezza software (such as for an upgrade or other
maintenance tasks), the RAID controllers continue to run and manage the disks
within their storage group. It is common to see the activity LEDS on the storage
groups operating even when the Netezza system is stopped. If a disk fails, the
Chapter 5. Manage the Netezza hardware 5-13
RAID controller initiates the recovery and regeneration process; the regeneration
continues to run even when the Netezza software is stopped. If you use the nzhw
command to activate, fail, or otherwise manage disks manually, the RAID
controllers ensure that the action is allowed at that time; in some cases, commands
return an error when the requested operation, such as a disk failover, is not
allowed.
The RAID controller caches are disabled when any of the following conditions
occur:
v Battery failure
v Cache backup device failure
v Peer RAID controller failure (that is, a loss of the mirrored cache)
When the cache is disabled, the storage group (and the Netezza system)
experiences a performance degradation until the condition is resolved and the
cache is enabled again.
The following figure shows an illustration of the SPU/storage mapping. Each SPU
in a Netezza C1000 system owns nine user data slices by default. Each data slice is
supported by a three disk RAID 5 storage array. The RAID 5 array can support a
single disk failure within the three-disk array. (More than one disk failure within
the three-disk array results in the loss of the data slice.) Seven disks within the
storage group in a RAID 5 array are used to hold important system information
such as the nzlocal, swap and log partition.
A SPU
B Data slice 1
C Data slice 9
D nzlocal, swap, and log partitions
If a SPU fails, the system manager distributes the user data partitions and the
nzlocal and log partitions to the other active SPUs in the same SPU chassis. A
Each disk partition is used to store one copy of a data slice. Disks are divided into
groups of four with two disks from each host in such a group. In each group, there
are 16 disk partitions (four on every disk) that are used to store data slices with
four copies of every data slice.
Each of the data slices always uses disk partition 1, 2, 3, 4 from the disks in the
group.
Each host runs one virtual SPU. The data slice is owned by the virtual SPU that
runs on the host where the disk with the first disk partition of that SPU is
physically located.
Data mirroring for these disk partitions is handled on the software level by the
virtual SPU as RAID0 with 4 parties.
Remote disks are accessed using iSCSI using the network that connects two hosts.
In addition, there are 4 spare disks (2 per host) used as a target for the regen
operation of failed disks.
One-host mode
When one of the hosts is not available, manually failed, or its SPU is manually
failed using nzhw, the system switches into one-host mode.
In this mode, only one virtual SPU is working and only two disks from each disk
group are used.
Each data slice is now stored on two disk partitions instead of four, and two data
slices must read data from the same disk.
For example, the default disk topology for IBM Netezza 100/1000 or IBM PureData
System for Analytics N1001 systems configures each S-Blade with eight disks that
are evenly distributed across the disk enclosures of its SPA, as shown in the
following figure. If disks failover and regenerate to spares, it is possible to have an
unbalanced topology where the disks are not evenly distributed among the
odd-numbered and even-numbered enclosures. This causes some of the SAS (also
called HBA) paths, which are shown as the dark lines that connect the blade
chassis to the disk enclosures, to carry more traffic than the others.
The System Manager can detect and respond to disk topology issues. For example,
if an S-Blade has more disks in the odd-numbered enclosures of its array, the
System Manager reports the problem as an overloaded SAS bus. You can use the
nzhw rebalance command to reconfigure the topology so that half of the disks are
in the odd-numbered enclosures and half in the even-numbered. The rebalance
process requires the system to transition to the “pausing now” state for the
topology update.
When the Netezza system restarts, the restart process checks for topology issues
such as overloaded SAS buses or SPAs that have S-Blades with uneven shares of
data slices. If the system detects a spare S-Blade for example, it will reconfigure the
data slice topology to distribute the workload equally among the S-Blades.
Related reference:
“Hardware path down” on page 8-20
“Rebalance data slices” on page 5-29
For example, the following command shows two failed disks on the system:
[nz@nzhost ~]$ nzhw show -issues
Description HW ID Location Role State Security
----------- ----- ---------------------- -------- ----------- --------
Disk 1498 spa1.diskEncl11.disk21 Failed Ok Disabled
Disk 1526 spa1.diskEncl9.disk4 Failed Ok Disabled
The disks must be replaced to ensure that the system has spares and an optimal
topology. You can also use the NzAdmin andIBM Netezza Performance Portal
interfaces to obtain visibility to hardware issues and failures.
Manage hosts
In general, there are few management tasks that relate to the IBM Netezza hosts. In
most cases, the tasks are for the optimal operation of the host. For example:
v Do not change or customize the kernel or operating system files unless directed
to do so by Netezza Support or Netezza customer documentation. Changes to
the kernel or operating system files can impact the performance of the host.
v Do not install third-party software on the Netezza host without first testing the
impact on a development or test Netezza system. While management agents or
other applications might be of interest, it is important to test and verify that the
application does not impact the performance or operation of the Netezza system.
v During Netezza software upgrades, host and kernel software revisions are
verified to ensure that the host software is operating with the latest required
levels. The upgrade processes might display messages that inform you to update
the host software to obtain the latest performance and security features.
v On Netezza HA systems, Netezza uses DRBD replication only on the /nz and
/export/home partitions. As new data is written to the Netezza /nz partition and
the /export/home partition on the primary Netezza system, the DRBD software
automatically makes the same changes to the /nz and /export/home partition of
the standby Netezza system.
If the active host fails, the Netezza HA software typically fails over to the standby
host to run the Netezza database and system. Netezza Support works with you to
schedule field service to repair the failed host.
For the N3001-001 appliance, this process is similar. If the active host is
unreachable, the NPS services automatically fail over to the second host. It may
take 15 minutes for NPS to start discovering its SPUs. Next, the discovery process
waits up to 15 minutes for both SPUs to report their status. After that time, if only
the local SPU reports status, the system transitions into one-host mode. If the
second host becomes unreachable for more than 15 minutes, the active host
transitions into one-host mode.
Model N3001-001
For the N3001-001 appliance, both hosts are by default used for running the virtual
SPUs. Resources of both hosts, such as CPU or memory, are in use and none of the
hosts is marked as spare in nzhw.
You can switch to the one-host mode in which the resources of only one host are in
use. To do this, run the following command:
nzhw failover -id XXXX
where XXXX is the hwid of the host that you do not want to use. It is only
possible to fail over a host that is a standby in the cluster.
When the system runs in one-host mode, the role of the other host and its virtual
SPU is Failed and disks located on that host that are normally used to store data
(disks 9 - 24) have the role Inactive. The data slices remain mirrored but only with
two disks. In the two-host mode, each data slice is backed up by four disks.
To switch back from one-host mode to two-host mode, run the following
command:
nzhw activate -id XXXX
where XXXX is the hwid of the failed host. This operation activates the host, its
SPU, and all of its disks. Then, a rebalance is requested.
Note: Switching from one-host mode to two-host mode may take a significant
amount of time, for example a few hours. It depends on the amount of data stored
in the system.
Manage SPUs
Snippet Processing Units (SPUs) or S-Blades are hardware components that serve
as the query processing engines of the IBM Netezza appliance.
In model N3001-001, the SPUs are emulated using host resources, such as CPU and
memory. The SPUs are not physical components of the system and there is no
FPGA.
You can use the nzhw command to activate, deactivate, failover, locate, and reset a
SPU, or delete SPU information from the system catalog.
To indicate which SPU you want to control, you can refer to the SPU by using its
hardware ID. You can use the nzhw command to display the IDs, and obtain the
information from management UIs such as NzAdmin or IBM Netezza Performance
Portal.
To obtain the status of one or more SPUs, you can use the nzhw command with the
show options.
Activate a SPU
You can use the nzhw command to activate a SPU that is inactive or failed.
To activate a SPU:
nzhw activate -u admin -pw password -host nzhost -id 1004
For model N3001-001, if you have enabled the one-host mode by failing over a
SPU, you must activate that SPU to switch back to two-host mode. You must then
request a rebalance operation using nzds. In such case, switching to two-host mode
may take a significant amount of time, for example a few hours. It depends on the
amount of data stored in the system.
You can use the nzhw command to make a spare SPU unavailable to the system. If
the specified SPU is active, the command displays an error.
For model N3001-001, when a SPU is failed over, the system switches into one-host
mode in which the resources of only one host are used. You can only fail over a
SPU of the standby host. In order to fail over a SPU that is running on the active
host, you must first migrate the cluster to the other host. To switch back to
two-host mode, activate the failed SPU.
Locate a SPU
You can use the nzhw command to turn on or off a SPU LED and display the
physical location of the SPU. The default is on.
For model N3001-001, the SPUs are emulated and the output of the locate
command is the following:
Logical Name:’spa1.spu2’ Physical Location:’lower host, virtual SPU’
Reset a SPU
You can use the nzhw command to power cycle a SPU (a hard reset).
You can use the nzhw command to remove a failed, inactive, or incompatible SPU
from the system catalog.
If a SPU hardware component fails and must be replaced, Netezza Support works
with you to schedule service to replace the SPU.
Related reference:
“The nzhw command” on page A-28
Use the nzhw command to manage the hardware of the IBM Netezza system.
Manage disks
The disks on the system store the user databases and tables that are managed and
queried by the IBM Netezza appliance. You can use the nzhw command to activate,
failover, and locate a disk, or delete disk information from the system catalog.
To protect against data loss, never remove a disk from an enclosure or remove a
RAID controller or ESM card from its enclosure unless directed to do so by
Netezza Support or when you are using the hardware replacement procedure
documentation. If you remove an Active or Spare disk drive, you could cause the
system to restart or to transition to the down state. Data loss and system issues can
occur if you remove these components when it is not safe to do so.
Netezza C1000 systems have RAID controllers to manage the disks and hardware
in the storage groups. You cannot deactivate a disk on a C1000 system, and the
commands to activate, fail, or delete a disk return an error if the storage group
cannot support the action at that time.
To indicate which disk you want to control, you can refer to the disk by using its
hardware ID. You can use the nzhw command to display the IDs, and obtain the
information from management UIs such as NzAdmin or IBM Netezza Performance
Portal.
For model IBM PureData System for Analytics N3001-001, the physical disks are
represented as two nzhw objects:
v A disk in an emulated enclosure in SPA (like for other N3001 systems).
v A host disk.
The majority of disk management operations should be performed on the storage
array disks, not on the host disk. The only operation that must be run on the host
disk is activation. This operation is required to assign a newly inserted physical
disk to the virtual SPU.
To obtain the status of one or more disks, you can use the nzhw command with the
show options.
To show the status of all the disks (the sample output is abbreviated for the
documentation), enter:
[nz@nzhost ~]$ nzhw show -type disk
Description HW ID Location Role State Security
----------- ----- ---------------------- ------ ----------- --------
Disk 1076 spa1.diskEncl4.disk2 Active Ok Enabled
Disk 1077 spa1.diskEncl4.disk3 Active Ok Enabled
Disk 1078 spa1.diskEncl4.disk4 Active Ok Enabled
Disk 1079 spa1.diskEncl4.disk5 Active Ok Enabled
Activate a disk
You can use the nzhw command to make an inactive, failed, or mismatched disk
available to the system as a spare.
In some cases, the system might display a message that it cannot activate the disk
yet because the SPU has not finished an existing activation request. Disk activation
usually occurs quickly, unless there are several activations that are taking place at
the same time. In this case, later activations wait until they are processed in turn.
Note: For a Netezza C1000 system, you cannot activate a disk that is being used
by the RAID controller for a regeneration or other task. If the disk cannot be
activated, an error message similar to the following appears:Error: Can not
update role of Disk 1004 to Spare - The disk is still part of a non healthy
array. Please wait for the array to become healthy before activating.
You can use the nzhw command to initiate a failover. You cannot fail over a disk
until the system is at least in the initialized state.
Note: For a Netezza C1000 system, the RAID controller still considers a failed disk
to be part of the array until the regeneration is complete. After the regen
completes, the failed disk is logically removed from the array.
Locate a disk
You can use the nzhw command to turn on or off the LED on a disk in the storage
arrays. (This command does not work for disks in the hosts.) The default is on.
The command also displays the physical location of the disk.
For model N3001-001, you can locate both disks and host disks, including the host
disks managed by the hardware RAID controller.
You can use the nzhw command to remove a disk that is failed, inactive,
mismatched, or incompatible from the system catalog. For Netezza C1000 systems,
do not delete the hardware ID of a failed disk until after you have successfully
replaced it using the instructions in the Replacement Procedures: IBM Netezza C1000
Systems.
If a disk hardware component fails and must be replaced, Netezza Support works
with you to schedule service to replace the disk.
Related reference:
“The nzhw command” on page A-28
Use the nzhw command to manage the hardware of the IBM Netezza system.
You can use the nzhw, nzds, and nzspupart commands to manage data slices. To
indicate which data slice you want to control, you can refer to the data slice by
using its data slice ID. You can use the nzds command to display the IDs, and
obtain the information from management UIs such as NzAdmin or IBM Netezza
Performance Portal.
Related reference:
“The nzds command” on page A-10
Use the nzds command to manage and obtain information about the data slices in
the system.
You can also use the NzAdmin and IBM Netezza Performance Portal interfaces to
obtain visibility to hardware issues and failures.
To show the status of all the data slices (the sample output is abbreviated for the
documentation), enter:
[nz@nzhost ~]$ nzds show
Data Slice Status SPU Partition Size (GiB) % Used Supporting Disks
---------- ------- ---- --------- ---------- ------ ----------------
1 Repairing 1017 2 356 58.54 1021,1029
2 Repairing 1017 3 356 58.54 1021,1029
3 Healthy 1017 5 356 58.53 1022,1030
4 Healthy 1017 4 356 58.53 1022,1030
5 Healthy 1017 0 356 58.53 1023,1031
6 Healthy 1017 1 356 58.53 1023,1031
7 Healthy 1017 7 356 58.53 1024,1032
8 Healthy 1017 6 356 58.53 1024,1032
Data slices 1 and 2 in the sample output is regenerating due to a disk failure. The
command output could be different on different models of appliances.
Note: For a Netezza C1000 system, three disks hold the user data for a data slice;
the fourth disk is the regen target for the failed drive. The RAID controller still
considers a failed disk to be part of the array until the regeneration is complete.
After the regen completes, the failed disk is logically removed from the array.
To show detailed information about the data slices that are being regenerated, you
can use the -regenstatus and -detail options, for example:
[nz@nzhost ~]$ nzds show -regenstatus -detail
Data Slice Status SPU Partition Size (GiB) % Used Supporting Disks
Start Time % Done
---------- --------- ---- --------- ---------- ------ -------------------
------------------- ------
2 Repairing 1255 1 3725 0.00 1012,1028,1031,1056
2011-07-01 10:41:44 23
The status of a data slice shows the health of the data slice. The following table
describes the status values for a data slice. You see these states when you run the
nzds command or display data slices by using the NzAdmin or IBM Netezza
Performance Portal UIs.
Table 5-5. Data slice status
State Description
Healthy The data slice is operating normally and the data is protected in a
redundant configuration; that is, the data is fully mirrored.
Repairing The data slice is in the process of being regenerated to a spare disk
because of a disk failure.
Degraded The data slice is not protected in a redundant configuration. Another
disk failure could result in loss of a data slice, and the degraded
condition impacts system performance.
Note: In the IBM PureData System for Analytics N1001 or IBM Netezza 1000 and
later models, the system does not change states during a regeneration; that is, the
system remains online while the regeneration is in progress. There is no
synchronization state change and no interruption to active jobs during this process.
If the regeneration process fails or stops for any reason, the system transitions to
the Discovering state to establish the topology of the system.
You can use the nzspupart regen command or the NzAdmin interface to
regenerate a disk. If you do not specify any options, the system manager checks
for any degraded partitions and if found, starts a regeneration if there is a spare
disk in the system.
For IBM PureData System for Analytics N2001 and later systems, each disk
contains partitions for the user data and the log and swap partitions. When the
system regenerates a disk to a spare, the system copies all of the partitions to the
spare. If you issue the nzspupart regen command manually, specify:
v The hardware ID of the SPU that has the degraded partitions
v One of the partition IDs
v The hardware ID for the spare disk
The regeneration affects all partitions on that disk. For example:
nz@nzhost ~]$ nzspupart regen -spu 1099 -part 1 -dest 1066
You can then issue the nzspupart show -regenstatus or the nzds show
-regenstatus command to display the status of the regeneration. For example:
[nz@nzhost ~]$ nzspupart -regenstatus
SPU Partition Id Partition Type Status Size (GiB) % Used Supporting Disks % Done Repairing Disks Starttime
---- ------------ -------------- --------- ---------- ------ ------------------- ------- --------------- ---------
1099 0 Data Repairing 356 0.00 1065,1066 0.00 1066 0
1099 1 Data Repairing 356 0.00 1065,1066 0.00 1066 0
1099 100 NzLocal Repairing 1920989772 0.00 1065,1066,1076,1087 0.00 1066 0
1099 101 Swap Repairing 32 0.00 1065,1066,1076,1087 0.00 1066 0
1099 110 Log Repairing 1 3.31 1065,1066 0.00 1066 0
For systems earlier than the N200x models, you have to specify the data slice IDs
and spare disk ID. For example, to regenerate dataslice IDs 11 and 17 affected by
the failing disk and regenerate them on spare disk ID 1024, enter:
nzds regen -u admin -pw password -ds "11,17" -dest 1024
If you want to control the regeneration source and target destinations, you can
specify source SPU and partition IDs, and the target or destination disk ID. The
spare disk must reside in the same SPA as the disk that you are regenerating. You
can obtain the IDs for the source partition by issuing the nzspupart show -details
command.
To regenerate a degraded partition and specify the information for the source and
destination, enter the following command:
nzspupart regen -spu 1035 -part 7 -dest 1024
Note: Regeneration can take several hours to complete. If the system is idle and
has no other activity except the regeneration, or if the user data partitions are not
very full, the regeneration takes less time to complete. You can review the status of
the regeneration by issuing the nzspupart show -regenStatus command. During
the regeneration, user query performance can be impacted while the system is
busy processing the regeneration. Likewise, user query activity can increase the
time that is required for the regeneration.
If the system manager is unable to remove the failed disk from the RAID array, or
if it cannot add the spare disk to the RAID array, a regeneration setup failure can
occur. If a regeneration failure occurs, or if a spare disk is not available for the
regeneration, the system continues processing jobs. The data slices that lose their
mirror continue to operate in an unmirrored or degraded state; however, you
should replace your spare disks as soon as possible and ensure that all data slices
are mirrored. If an unmirrored disk fails, the system is brought to a down state.
After the failed SPU is replaced or reactivated, you must rebalance the data slices
to return to optimal performance. The rebalance process checks each SPU in the
SPA; if a SPU has more than two data slices more than another SPU, the System
Manager redistributes the data slices to equalize the workload and return the SPA
to an optimal performance topology. (The System Manager changes the system to
the discovering state to perform the rebalance.)
You can use the nzhw command to rebalance the data slice topology. The system
also runs the rebalance check each time the that system is restarted, or after a SPU
failover or a disk regeneration setup failure.
You can also use the nzhw rebalance -check option to have the system check the
topology and only report whether a rebalance is needed. The command displays
the message Rebalance is needed or There is nothing to rebalance. If a
rebalance is needed, you can run the nzhw rebalance command to perform the
rebalance, or you could wait until the next time the Netezza software is stopped
and restarted to rebalance the system.
For a N3001-001 system, the rebalance operation is used for switching the system
back to two-host mode after activating the failed SPU. Rebalance is automatically
requested by the system when the transition to two-host mode is requested by the
activate operation for a host.
Related concepts:
“System resource balance recovery” on page 5-17
To display the current storage topology, use the nzds show -topology command:
Switch 1
port[1] 5 disks: [ 3:encl1Slot01 5:encl1Slot03 9:encl1Slot05 13:encl1Slot07
17:encl1Slot12 ] -> encl1
This sample output shows a normal topology for an IBM Netezza 1000-3 system.
The command output is complex and is typically used by Netezza Support to
troubleshoot problems. If there are any issues to investigate in the topology, the
command displays a WARNING section at the bottom, for example:
WARNING: 2 issues detected
spu0101 hba [0] port [2] has 3 disks
SPA 1 SAS switch [sassw01a] port [3] has 7 disks
These warnings indicate problems in the path topology where storage components
are overloaded. These problems can affect query performance and also system
availability if other path failures occur. Contact Support to troubleshoot these
warnings.
To display detailed information about path failure problems, you can use the
following command:
[nz@nzhost ~]$ nzpush -a mpath -issues
spu0109: Encl: 4 Slot: 4 DM: dm-5 HWID: 1093 SN: number PathCnt: 1
PrefPath: yes
spu0107: Encl: 2 Slot: 8 DM: dm-1 HWID: 1055 SN: number PathCnt: 1
PrefPath: yes
spu0111: Encl: 1 Slot: 10 DM: dm-0 HWID: 1036 SN: number PathCnt: 1
PrefPath: no
Note: It is possible to see errors that are reported in the nzpush command output
even if the nzds -topology command does not report any warnings. In these cases,
the errors are still problems in the topology, but they do not affect the performance
and availability of the current topology. Be sure to report any path failures to
ensure that problems are diagnosed and resolved by Support for optimal system
performance.
Related reference:
“Hardware path down” on page 8-20
If a SPU fails, the system state changes to the pausing -now state (which stops
active jobs), and then transitions to the discovering state to identify the active SPUs
in the SPA. The system also rebalances the data slices to the active SPUs.
The following table describes the system states and the way IBM Netezza handles
transactions during failover.
Table 5-6. System states and transactions
System state Active transactions New transactions
Offline(ing) Now Aborts all transactions. Returns an error.
Offline(ing) Waits for the transaction to finish. Returns an error.
Pause(ing) Now Aborts only those transactions that Queues the transaction.
cannot be restarted.
Pause(ing) Waits for the transaction to finish. Queues the transaction.
The following examples provide specific instances of how the system handles
failovers that happen before, during, or after data is returned.
v If the pause -now occurs immediately after a BEGIN command completes, before
data is returned, the transaction is restarted when the system returns to an
online state.
v If a statement such as the following completes and then the system transitions,
the transaction can restart because data has not been modified and the reboot
does not interrupt a transaction.
BEGIN;
SELECT * FROM emp;
Note: There is a retry count for each transaction. If the system transitions to
pause -now more than the number of retries that are allowed, the transaction is
stopped.
After the system restarts these transactions, the system state returns to online. For
more information, see the IBM Netezza Data Loading Guide.
Power procedures
This section describes how to power on an IBM Netezza system and how to
power-off the system. Typically, you would only power off the system if you are
moving the system physically within the data center, or in the event of possible
maintenance or emergency conditions within the data center.
The instructions to power on or off an IBM Netezza 100 system are available in the
Site Preparation and Specifications: IBM Netezza 100 Systems.
Note: To power cycle a Netezza system, you must have physical access to the
system to press power switches and to connect or disconnect cables. Netezza
systems have keyboard/video/mouse (KVM) units that you can use to enter
administrative commands on the hosts.
Figure 5-11. IBM Netezza 1000-6 and N1001-005 and larger PDUs and circuit breakers
A OFF setting
B ON setting
C PDU circuit breakers. 3 rows of 3 breaker pins.
v To close the circuit breakers (power up the PDUs), press in each of the nine
breaker pins until they engage. Be sure to close the nine pins on both main
PDUs in each rack of the system.
v To open the circuit breakers (power off the PDUs), pull out each of the nine
breaker pins on the left and the right PDU in the rack. If it becomes difficult to
pull out the breaker pins by using your fingers, you can use a tool such as a
pair of needle-nose pliers to gently pull out the pins.
On the IBM Netezza 1000-3 and IBM PureData System for Analytics N1001-002
models, the main input power distribution units (PDUs) are on the right and left
sides of the rack, as shown in the following figure.
At the top of each PDU is a pair of breaker rocker switches. The labels on the
switches are upside down when you view the PDUs.
v To close the circuit breakers (power up the PDUs), you push the On toggle of
the rocker switch in. Make sure that you push in all four rocker switches, two
on each PDU.
v To open the circuit breakers (power off the PDUs), you must use a tool such as a
small flathead screwdriver; insert the tool into the hole that is labeled OFF and
gently press until the rocker toggle pops out. Make sure that you open all four
of the rocker toggles, two on each PDU.
To power on an IBM Netezza 1000 or IBM PureData System for Analytics N1001
system, complete the following steps:
Procedure
1. Make sure that the two main power cables are connected to the data center
drops; there are two power cables for each rack of the system.
2. Do one of the following steps depending on which system model you have:
To power off an IBM Netezza 1000 or IBM PureData System for Analytics N1001
system, complete the following steps:
Procedure
1. Log in to the host server (ha1) as root.
To power on an IBM PureData System for Analytics N200x system, complete the
following steps:
Procedure
1. Switch on the power to the two PDUs located in the rear of the cabinet at the
bottom. Make sure that you switch on both power controls. Repeat this steps
for each rack of a multi-rack system.
2. Press the power button on Host 1. The power button is on the host in the front
of the cabinet. Host 1 is the upper host in the rack, or the host located in rack
one of older multi-rack systems. A series of messages appears as the host
system boots.
3. Wait at least 30 seconds after powering up Host 1, then press the power button
on Host 2. (Host 2 is the lower host in the rack, or the host located in rack two
of older multi-rack systems.) The delay ensures that Host 1 completes its
start-up operations first, and thus is the primary host for the system.
4. Log in as root to Host 1 and run the crm_mon command to monitor the status of
the HA services and cluster operations:
[root@nzhost1 ~]# crm_mon -i5
The output of the command refreshes at the specified interval rate of 5 seconds
(-i5).
5. Review the output and watch for the resource groups to all have a Started
status. This usually takes about 2 to 3 minutes, then proceed to the next step.
Sample output follows:
[root@nzhost1 ~]# crm_mon -i5
============
Last updated: Tue Jun 2 11:46:43 2009
Current DC: nzhost1 (key)
2 Nodes configured.
3 Resources configured.
============
Node: nzhost1 (key): online
Node: nzhost2 (key): online
Resource Group: nps
drbd_exphome_device (heartbeat:drbddisk): Started nzhost1
drbd_nz_device (heartbeat:drbddisk): Started nzhost1
exphome_filesystem (heartbeat::ocf:Filesystem): Started nzhost1
nz_filesystem (heartbeat::ocf:Filesystem): Started nzhost1
fabric_ip (heartbeat::ocf:IPaddr): Started nzhost1
wall_ip (heartbeat::ocf:IPaddr): Started nzhost1
nz_dnsmasq (lsb:nz_dnsmasq): Started nzhost1
nzinit (lsb:nzinit): Started nzhost1
fencing_route_to_ha1 (stonith:apcmaster): Started nzhost2
fencing_route_to_ha2 (stonith:apcmaster): Started nzhost1
6. Press Ctrl-C to exit the crm_mon command and return to the command prompt.
7. Log in to the nz account.
[root@nzhost1 ~]# su - nz
8. Verify that the system is online using the following command:
[nz@nzhost1 ~]$ nzstate
System state is ’Online’.
9. If your system runs the Call Home support feature, enable it.
[nz@nzhost1 ~]$ nzOpenPmr --on
To power off an IBM PureData System for Analytics N200x system, complete the
following steps:
Procedure
1. Log in to the host server (ha1) as root.
Procedure
1. Make sure that the two main power cables are connected to the data center
drops; there are two power cables for each rack of the system. For a North
American power configuration, there are four power cables for the first two
racks of a Netezza C1000 (or two cables for a European Union power
configuration);
2. Switch the breakers to ON on both the left and right PDUs. (Repeat these
steps for each rack of the system.)
3. Press the power button on both host servers and wait for the servers to start.
This process can take a few minutes.
4. Log in to the host server (ha1) as root.
5. Change to the nz user account and run the following command to stop the
Netezza server: nzstop
6. Wait for the Netezza system to stop.
7. Log out of the nz account to return to the root account, then type the
following command to power on the storage groups:
[root@nzhost1 ~]# /nzlocal/scripts/rpc/spapwr.sh -on all -j all
8. Wait five minutes and then type the following command to power on all the
S-blade chassis:
[root@nzhost1 ~]# /nzlocal/scripts/rpc/spapwr.sh -on all
9. Run the crm_mon -i5 command to monitor the status of the HA services and
cluster operations. Review the output and watch for the resource groups to all
have a Started status. This usually takes about 2 to 3 minutes, then proceed to
the next step.
[root@nzhost1 ~]# crm_mon -i5
============
Last updated: Tue Jun 2 11:46:43 2009
Current DC: nzhost1 (key)
2 Nodes configured.
3 Resources configured.
============
Node: nzhost1 (key): online
Node: nzhost2 (key): online
Resource Group: nps
drbd_exphome_device (heartbeat:drbddisk): Started nzhost1
drbd_nz_device (heartbeat:drbddisk): Started nzhost1
exphome_filesystem (heartbeat::ocf:Filesystem): Started nzhost1
nz_filesystem (heartbeat::ocf:Filesystem): Started nzhost1
fabric_ip (heartbeat::ocf:IPaddr): Started nzhost1
wall_ip (heartbeat::ocf:IPaddr): Started nzhost1
nz_dnsmasq (lsb:nz_dnsmasq): Started nzhost1
nzinit (lsb:nzinit): Started nzhost1
fencing_route_to_ha1 (stonith:apcmaster): Started nzhost2
fencing_route_to_ha2 (stonith:apcmaster): Started nzhost1
10. Press Ctrl-C to exit the crm_mon command and return to the command prompt.
11. Log into the nz account.
[root@nzhost1 ~]# su - nz
12. Verify that the system is online using the following command:
To power off an IBM Netezza High Capacity Appliance C1000, complete the
following steps:
CAUTION:
Unless the system shutdown is an emergency situation, do not power down a
Netezza C1000 system when there are any amber (Needs Attention) LEDs
illuminated in the storage groups. It is highly recommended that you resolve the
problems that are causing the Needs Attention LEDs before you power off a
system to ensure that the power-up procedures are not impacted by the
unresolved conditions within the groups.
Procedure
1. Identify the active host in the cluster, which is the host where the nps resource
group is running:
[root@nzhost1 ~]# crm_resource -r nps -W
Procedure
1. Press the power button on Host 1. The power button is located on the host in
the front of the cabinet. Host 1 is the upper host in a single-rack system, or the
host located in rack one of a multi-rack system. A series of messages appears as
the host system boots.
2. Wait for at least 30 seconds after powering up Host 1. Then press the power
button of Host 2. The delay ensures that Host 1 completes its startup
operations first and therefore becomes the primary host for the system. Host 2
is the lower host in a single-rack system, or the host located in rack two of a
multi-rack system.
3. Log in to Host 1 as root and run the crm_mon command to monitor the status of
HA services and cluster operations:
[root@nzhost1 ~]# crm_mon -i5
============
Last updated: Fri Aug 29 02:19:25 2014
Current DC: hostname-1 (3389b15b-5fee-435d-8726-a95120f437dd)
2 Nodes configured.
2 Resources configured.
============
5. Press Ctrl + C to exit the crm_mon command and return to the command
prompt.
6. Log in to the nz account:
[root@nzhost1 ~]# su - nz
Procedure
1. Log in to Host 1 (ha1) as root.
3. On the active host , in this example nzhost1, run the following commands to
stop the Netezza server:
[nz@nzhost1 ~]$ su - nz
[nz@nzhost1 ~]$ nzstop
[nz@nzhost1 ~]$ exit
5. Log in as root to the standby host, in this example nzhost2. Run the following
command to shut down the host:
[root@nzhost2 ~]# shutdown -h now
The system displays a series of messages as it stops processes and other system
activity. When it finishes, a message is displayed that indicates that it is now
safe to power down to the server.
6. Press the power button on Host 2 to power down that Netezza host. The
button is located in the front of the cabinet.
7. On Host 1, run the following command to shut down the Linux operating
system:
[root@nzhost1 ~]# shutdown -h now
The system displays a series of messages as it stops processes and other system
activity. When it finishes, a message is displayed that indicates that it is now
safe to power down to the server.
8. Press the power button on Host 1 to power down that Netezza host. The
button is located in the front of the cabinet.
Self-encrypting drives (SEDs) encrypt data as it is written to the disk. Each disk
has a disk encryption key (DEK) that is set at the factory and stored on the disk.
The disk uses the DEK to encrypt data as it writes, and then to decrypt the data as
it is read from disk. The operation of the disk, and its encryption and decryption,
is transparent to the users who are reading and writing data. This default
encryption and decryption mode is referred to as secure erase mode. In secure erase
mode, you do not need an authentication key or password to decrypt and read
data. SEDs offer improved capabilities for an easy and speedy secure erase for
situations when disks must be repurposed or returned for support or warranty
reasons.
For the optimal security of the data stored on the disks, SEDs have a mode
referred to as auto-lock mode. In auto-lock mode, the disk uses an authentication
encryption key (AEK) to protect its DEK. When a disk is powered off, the disks are
automatically locked. When the disk is powered on, the SED requires a valid AEK
to read the DEK and unlock the disk to proceed with read and write operations. If
the SED does not receive a valid authentication key, the data on the disk cannot be
read. The auto-lock mode helps to protect the data when disks are accidentally or
intentionally removed from the system.
In many environments, the secure erase mode may be sufficient for normal
operations and provides you with easy access to commands that can quickly and
securely erase the contents of the disk before a maintenance or repurposing task.
For environments where protection against data theft is paramount, the auto-lock
mode adds an extra layer of access protection for the data stored on your disks.
By default, the SEDs on the IBM PureData System for Analytics N3001 appliances
operate in secure erase mode. The IBM installation team can configure the disks to
run in auto-lock mode by creating a keystore and defining an authentication key
for your host and storage disks when the system is installed in your data center. If
you choose not to auto-lock the disks during system installation, you can lock
them later. Contact IBM Support to enable the auto-lock mode. The process to
auto-lock the disks requires a short NPS service downtime window.
The NPS system requires an AEK for the host drives and an AEK for the drives in
the storage arrays that are managed by the SPUs. You have two options for storing
the keys. The AEKs can be stored in a password protected keystore repository on
the NPS host, or if you have implemented an IBM Security Key Lifecycle Manager
(ISKLM) server, you can store the AEKs in your ISKLM server for use with the
appliance. The commands to create the keys are the same for locally or ISKLM
stored systems.
For locally stored keys, the key repository is stored in the /nz/var/keystore
directory on the NPS host. The repository is locked and protected.
For ISKLM configurations, there is no local keystore on the NPS hosts. The ISKLM
support requires some additional configuration for your NPS hosts to become a
client of the ISKLM server. The configuration steps are described in the section
“IBM Security Key Lifecycle Manager configuration steps” on page 6-4.
You should use the nzkeybackup command to create a backup copy of the AEKs
after you change the keys. If the keystore on the NPS host or the ISKLM server is
lost, the disks cannot be read. Make sure that you carefully protect the keystore
backups for the appliance in a secure area, typically in a location that is not on the
NPS hosts.
Note: When auto-lock mode is enabled, and a disk is failed over either
automatically or manually using the nzhw failover -id <diskHwId> command, the
system automatically securely erases the disk contents. Contact IBM Support for
assistance with the process to securely erase one or more disks on the system. If a
disk is physically removed from the system before it is failed over, the system
detects the missing drive and fails over to an available spare disk, but the removed
disk is not securely erased because it is no longer in the system. In auto-lock
mode, the disk is locked when it is powered down, so the contents are not
readable.
Starting in NPS release 7.2.1, you can configure your IBM PureData System for
Analytics N3001 models to send the AEKs to an IBM Security Key Lifecycle
Manager (ISKLM) server in your environment. The NPS support requires ISKLM
version 2.5.0.5 or later.
In this configuration, the ISKLM server only stores and sends the AEKs that are
manually generated on the NPS host. The ISKLM server cannot be used to
automatically create and rotate the AEKs on a scheduled basis. You must have an
ISKLM server already set up and running in your environment, and you need
assistance from the ISKLM administrator to add the NPS host as a client of the
Important: If you configure your N3001 system to use ISKLM as the key
repository, note that you cannot downgrade from NPS release 7.2.1 to an earlier 7.2
release unless you convert from ISKLM to a local keystore for your SEDs. The IBM
Netezza Software Upgrade Guide has instructions for disabling ISKLM support and
returning to a local keystore before downgrading.
Typically, after you configure SEDs to use auto-lock mode, you would never
change them back to the default secure erase mode. If for some reason you must
reconfigure the SEDs, it is possible to do so, but this process is very complex and
requires a lengthy service window and possible service charges. There is also a risk
of data loss especially if your backups for the system are stale or incomplete. Make
sure that reconfiguring your SEDs to secure erase mode is appropriate for your
environment.
CAUTION:
The process to reconfigure SEDs to secure erase mode from the auto-lock mode
is not a process that you can run on your own. You must work with IBM
Support to reset the system correctly.
There are two options for reconfiguring your host SEDs to secure erase mode:
v The first option is to have IBM Support replace your host drives with a set of
new drives that are custom-built with the correct releases of software for your
system. The host motherboards/planars must also replaced (or the host disks
securely erased) to clear the RAID controller NVRAM that holds the AEK.
Reconfiguring the host SEDs requires system downtime, charges for the
replacement disks and planars, and approximately a day of downtime to replace
the disks and restore your NPS host backups and metadata.
v The second option is to completely reinitialize your system to a factory default
level, then reload all your data from the most recent full backup. This option
could require a service window of several days for the reinitialization and
complete reload.
To change the storage array SEDs from auto-lock mode to standard secure erase
mode, there is an IBM Support process to disable the authentication key. This
process requires you to securely erase the storage drives and reload the full
database backups from your most recent NPS backup. If it is an option, such as for
a non-production test system, a full system reinitialization would also reset the
drives from auto-lock mode. You would then need to restore your NPS data from
your backups, or start creating new data from new load sources.
SED keystore
The keystore holds the AEKs for unlocking the host and SPU drives that are
configured to run in auto-lock mode.
Important: If you use the IBM Security Key Lifecycle Manager (ISKLM) to store
and retrieve the AEKs for your NPS appliance, you can lock the drives using a
local keystore and then migrate to ISKLM management of the keys, or you can
configure the system to use ISKLM to create the keys and lock the drives. See the
“IBM Security Key Lifecycle Manager configuration steps” section for the
instructions to configure ISKLM support. After you configure ISKLM, the keys are
sent to the ISKLM server for storage and are not stored locally on the system.
If you lose the keystore, either because the local keystore is corrupted or deleted,
or because connectivity to the ISKLM server is lost, you lose the ability to unlock
your SED drives when they power on. As a best practice, make sure that you have
a recent backup of the current keys. You use the nzkeybackup command to create a
compressed tar file backup of the current keystore. You should always back up the
keystore after any key changes. Make sure that you save the keystore backups in a
safe location away from the NPS appliance.
Note: The nzhostbackup also captures the local keystore in the host backup, but
nzkeybackup is better because it does not require you to pause the NPS system and
stop query activity, and nzkeybackup -sklm can capture the keys that are stored in
an ISKLM server.
You can use the nzkeyrestore command to restore a keystore from a keystore
backup file.
The following list summarizes the steps needed for the ISKLM server setup. It is
important to work with your IBM Security Key Lifecycle Manager (ISKLM) system
administrator to configure the ISKLM server to communicate with the NPS
appliance.
After the ISKLM administrator has added the NPS appliance to the ISKLM server,
make sure that you have the following information:
v The CA certificate and the client certificate in .pem format from the ISKLM
server
v The device group name created on the ISKLM server
v The device serial number created on the ISKLM server
v The ISKLM IP address and KMIP port value
To configure the ISKLM information on the NPS appliance, the NPS administrator
must do the following steps:
1. Log in to the active NPS host as the root user.
2. Save a copy of the CA certificate and client certificate files (must be in .pem
format) in the /nz/data/security directory.
3. Log in to the active NPS host as the nz user.
4. Using any text editor, edit the /nz/data/config/system.cfg file (or create the
file if it does not exist).
5. Define the following settings in the system.cfg file:
startup.kmipDevGrpSrNum = Device_serial_number
startup.kmipDevGrp = Device_group_name
startup.kmipClientCert = /nz/data/security/client.pem
startup.kmipClientKey = /nz/data/security/privkey.pem
startup.kmipCaCert = /nz/data/security/ca.pem
startup.keyMgmtServer = tls://ISKLM_IP_ADDRESS:KMIP_PORT
startup.keyMgmtProtocol = local
The keyMgmtProtocol = local setting indicates that the system uses a locally
managed keystore and keys. Keep the local setting until you verify that the
connections to the ISKLM server are correctly configured and working. After
that verification, and after uploading the AEKs to the ISKLM server, you can
change the setting to use the ISKLM keystore.
6. Save the system.cfg file.
7. Log out of the nz account and return to the root account.
As root, use the nzkmip test command on the NPS host to test ISKLM
connectivity. This command requires you to specify a label and key (either directly
or in a file) to test the ISKLM server operations:
[root@nzhost ~]# /nz/kit/bin/adm/nzkmip test -label spuaek
-file /tmp/new_spukey.pem
Connecting to SKLM server at tls://1.2.3.4:5696
Success: Connection to SKLM store succeeded
After you confirm that the ISKLM connection is working, follow these steps to
prepare for switching over to the ISKLM server.
1. As root, run the following command to populate the keys from the local
keystore to the ISKLM keystore:
[root@nzhost ~]# /nz/kit/bin/adm/nzkmip populate
2. To confirm that the keys were populated correctly, query the _t_kmip_mapping
table:
SYSTEM.ADMIN(ADMIN)=> select * from _t_kmip_mapping;
DISKLABEL | UID
-------------+-----------------------------------------
spuaek | KEY-56e36030-3a9c-4313-8ce6-4c6d5d898211
spuaekOld | KEY-56e36030-3a9c-4313-8ce6-4c6d5d898312
hostkey1 | KEY-56e36030-3a9c-4313-8ce6-4c6d5d898432
hostkey1Old | KEY-56e36030-3a9c-4313-8ce6-4c6d5d898541
hostkey2 | KEY-56e36030-3a9c-4313-8ce6-4c6d5d898865
hostkey2Old | KEY-56e36030-3a9c-4313-8ce6-4c6d5d898901
(6 rows)
3. For each UUID listed in the table, run the following command to display the
value of the key:
[root@nzhost ~]# /nz/kit/bin/adm/nzkmip get
-uuid KEY-56e36030-3a9c-4313-8ce6-4c6d5d898211
Key Value : t7Nº×nq¦CÃ<"*"ºìýGse»¤;|%
4. Create a backup of the local keystore with nzkeybackup. As a best practice, save
the backup to a secure location away from the NPS host.
After you have completed and tested the ISKLM connection, and you have created
a local keystore backup file, follow these steps to switch to the ISKLM server:
1. Log in to the NPS host as the nz user.
2. Stop the system using the nzstop command.
3. Rename the local GSKit keystore to keydb.pl2 and keydb.sth files.
4. Log in as root and edit the /nz/data/config/system.cfg file.
5. Change the setting for the keyMgmtProtocol to kmipv1.1 to switch to the
ISKLM server support:
startup.keyMgmtProtocol = kmipv1.1
6. Save and close the system.cfg file.
7. Log out of the root account to return to the nz account.
8. Start the system using the nzstart command. After the system starts, AEKs that
you create with the nzkey command are stored in and retrieved from the
ISKLM server.
9. Remove the renamed GSKit keystore files keydb.pl2 and keydb.sth.
If you need to change the NPS host to disable ISKLM support and return to a local
GSKit keystore for managing the keys, follow these steps:
1. Log in as root to the NPS host.
2. Dump the keys from ISKLM server to a local GSKit keystore:
[root@nzhost ~]# /nz/kit/bin/adm/nzkey dump
DB creation successful
After you have dumped the AEKs from the ISKLM server, follow these steps to
switch to a local keystore for the AEKs:
1. Log in to the NPS host as the nz user.
2. Stop the system using the nzstop command.
3. Log in as root and edit the /nz/data/config/system.cfg file.
4. Change the setting for the keyMgmtProtocol to local to switch to the local
GSKit keystore support:
startup.keyMgmtProtocol = local
5. Save and close the system.cfg file.
6. Run the following command to verify that the keys were dumped correctly:
[root@nzhost ~]# /nz/kit/bin/adm/nzkey list
7. Log out of the root account to return to the nz account.
8. Start the system using the nzstart command.
9. After the system starts, use the nzsql command to connect to the SYSTEM
database and delete entries from the _t_kmip_mapping table because the
system is now using a local GSKit keystore.
SYSTEM.ADMIN(ADMIN)=> truncate table _t_kmip_mapping;
TRUNCATE TABLE
After the system starts, AEKs that you create with the nzkey command are stored
and retrieved from the local keystore.
You can create and apply an authentication key to auto-lock the host drives and
the drives in the storage arrays. An authentication key must be 32 bytes. The keys
are managed using the IBM GSKit software. No other key management software or
server is required.
CAUTION:
Always protect and back up the authentication keys that you create and apply to
the disks. If you lose the keys, the disks cannot be unlocked when they are
powered on. You will be unable to read data from the disks, and you could
prevent the NPS system from starting.
You could create a conforming key for the host and SPU AEKs, but as a best
practice, you should use the nzkey generate command to automatically create a
random, conformant AEK for the host or SPU drives and store it in your local
keystore or in the ISM Security Key Lifecycle Manager if you have configured that
support for your appliance.
Each of the hosts in the appliance use an AEK to auto-lock the SEDs. The keys are
referred to as hostkey1 and hostkey2. The host RAID controllers have specific
requirements for the host authentication keys:
v The key value must be 32 bytes in length.
v The key is case-sensitive.
v The key must contain at least one number, one lowercase letter, one uppercase
letter, and one non-alphanumeric character (for example, < > @ +). You cannot
Chapter 6. About self-encrypting drives 6-7
specify a blank space, single quotation character, double quotation character,
exclamation point, or equals sign in the key value.
v The key can use only the printable characters in the range ASCII 0x21 to 0x7E.
The SEDs in the storage arrays use the SPU AEK to auto-lock the drives. The
storage array SPU keys must meet the following requirements:
v The key value must be 32 bytes in length.
v The key can use characters in the range ASCII from 0x00 to 0xFF.
If you want to change the host or SPU key that is used to lock your SEDs, you can
create a key manually, or you can use the nzkey generate command to create a
conforming key. Run separate commands to create the host key and the SPU key.
Procedure
1. Log in to the active NPS host as the root user.
2. Use the following command to create a host key:
[root@nzhost1 nz]# /nz/kit/bin/adm/nzkey generate -hostkey
-file /export/home/nz/hostkey.txt
Host key written to file
3. Use the following command to create a SPU key:
[root@nzhost1 nz]# /nz/kit/bin/adm/nzkey generate -spukey
-file /export/home/nz/spukey.txt
SPU key written to file
Results
The command creates saves the key in the specified file in plaintext. You can then
specify the host or key file as part of an nzkey change operation.
Important: The key files are in plain text and unencrypted. After you use the files
to change the key for the hosts or SPUs, make sure that you delete the generated
key files to protect the keys from being read by users who log in to the NPS
system.
You can use the nzkey list command to display information about the keys that
are currently defined in the keystore without displaying the key text.
Procedure
1. Log in to the active NPS host as the root user.
2. Use the following command to list the key labels:
Results
The command shows the labels for the keys that are currently in the keystore. If
AEKs has not been set, the command displays the message No keys found in key
store. You can use the -hostkey or -spukey option to list only the AEK labels for
the hosts or SPU.
You can use the nzkey check command to display information about auto-lock
state for the SEDs on the hosts and SPUs.
Procedure
1. Log in to the active NPS host as the root user.
2. Use the following command to check the AEK status. You must specify the
-spukey or the -hostkey option.
[root@nzhost1 nz]# /nz/kit/bin/adm/nzkey check {-spukey | -hostkey}
The command displays the following output.
The command provides more information about whether AEK feature is enabled or
disabled, and whether keys have been applied to auto-lock the SEDs in the hosts
and storage arrays. The command also provides information to alert you when
there may be issues with the drives that need further investigation and possible
troubleshooting from IBM Support.
You can use the nzkey list command to list the available key labels defined in the
keystore. You can extract only one key to a file. If the file exists, the command
displays an error.
Procedure
1. Log in to the active NPS host as the root user.
2. Use the following command to extract the key for a specified label. For
example:
[root@nzhost1 nz]# /nz/kit/bin/adm/nzkey extract -label hostkey1
-file /nz/var/hostkey1.txt
Key written to file
Results
The command creates a file with the extracted AEK. This file can be helpful in
cases where you need the current key to reapply a key to SEDs for
troubleshooting, or if you want to preserve the key in a third-party key tracking
system. As a best practice, make sure that the output file is safe from unauthorized
access. Consider deleting the file or moving it to a secure location to protect the
key.
Before you begin, make sure that you have your new AEK for the hosts. You
should use the nzkey generate command to generate a new AEK for the host key.
To change the host AEK, the NPS system must be in the Stopped state. The new
AEK takes effect on both hosts when the nzkey command finishes running
successfully. The command creates a backup copy of the current keystore before it
changes the key. After the change is finished, you should create a backup of the
new keystore using the nzkeybackup command.
Procedure
1. Log in to the active host of the NPS system as the nz user.
2. Transition the system to the Stopped state, for example:
What to do next
You would typically use the nzkey resume command to resume a host AEK change
operation that was interrupted and did not complete. This command can also be
used to resume a host AEK create operation, but typically the IBM installers or
IBM support perform the tasks to create and enable the AEKs to auto-lock drives.
To resume the host AEK operation, you must have the backup file pathname for
the interrupted operation.
Procedure
1. Log in to the active NPS host as the root user.
2. Use the following command to resume a host AEK change operation. For
example:
[root@nzhost1 nz]# /nz/kit/bin/adm/nzkey resume
-backupDir /nz/var/hostbup_01
Results
The command resumes the host key operation. If the command displays an error,
contact IBM Support for assistance.
Before you begin, make sure that you have your new AEK for the SPU. You should
use the nzkey generate command to generate a new AEK for the SPU key.
If you are changing the SPU key for the storage array drives, system must be in
the Paused or Offline mode because the system manager must be running to
propagate the new key but no queries or I/O activity should be active. The new
AEK is immediately communicated from the system manager to the SPUs. Note
that if you attempt to transition the system to the Online state, the state transition
wait until all the SPUs and disks are updated with the new AEK. The command
creates a backup copy of the current keystore before it changes the key. After the
change is finished, you should create a backup of the new keystore using the
nzkeybackup command.
Procedure
1. Log in to the active host of the NPS system as the nz user.
2. Transition the system to the Paused or Offline state, for example:
[nz@nzhost1 ~]$ nzsystem pause
Are you sure you want to pause the system (y|n)? [n] y
3. Log in as the root user:
[nz@nzhost1 ~]$ su - root
4. Use the nzkey change command to change the SPU key:
[root@nzhost-h1 ~] /nz/kit/bin/adm/nzkey change -spukey
-file /tmp/spukey_change -backupdir /tmp/backups/
# Keystore archive /tmp/backups/keydb_20140711054140.tar.gz written
==========================================================
AEK Summary
==========================================================
What to do next
The AekSecurityEvent monitors the SED drives and sends an email to the
configured event contacts when any of the following conditions occur:
v The system has transitioned to the Down state because of a SPU AEK operation
failure.
v A SPU AEK operation has occurred, such as successful completion of key create
or change for the SPU key.
v A labelError has been detected on a disk for the SPU key. A labelError typically
occurs when the new SPU key is not applied to a disk and the disk still uses the
old/former key to authenticate.
v A fatal error is detected on a disk for the SPU key. A fatal error occurs when
neither the current SPU key nor the previous SPU key can be used to key to
authenticate the drive.
v A key repair state is detected on a disk during a SPU key create or change. A
key repair state issue occurs when the key operation is deferred on a SED
because of a key fatal error on the drive's RAID partner disk.
v The system manager has started a key repair operation. This usually occurs just
before applying the key on the deferred disk after the regen on the disk has
finished.
To create and enable an event rule for the AekSecurityEvent, you use the nzevent
command to add an event rule as in the following example. Make sure that you
run the command on the active host.
[nz@nzhost1 ~]$ nzevent copy -useTemplate
-name AekSecurityEvent -newName SedAekEvent -eventType AekSecurityEvent
-on 1 -dst [email protected]
This section also describes log files and where to find operational and error
messages for troubleshooting activities. Although the system is configured for
typical use in most customer environments, you can also tailor software operations
to meet the special needs of your environment and users by using configuration
settings.
The revision level typically includes a major version number, a release number, a
maintenance release number, and a fix pack number. Some releases also include a
patch designation such as P1 or P2.
When you enter the nzrev -rev command, Netezza returns the entire revision
number string, including all fields (such as variant and patch level, which in this
example are both zero).
nzrev -rev
7.1.0.0-P0-F1-Bld34879
From a client system, you can use the following command to display the revision
information:
nzsystem showRev -host host -u user -pw password
Related reference:
“The nzrev command” on page A-47
Use the nzrev command to display the IBM Netezza software revision level.
The following table describes the components of the Revision Stamp fields.
Table 7-1. Netezza software revision numbering
Version Release Maintenance Fixpack -Pn -Fn -Bldn
Numeric Numeric Numeric Numeric Alphanumeric Alphanumeric Alphanumeric
System states
The IBM Netezza system state is the current operational state of the appliance.
In most cases, the system is online and operating normally. There might be times
when you must stop the system for maintenance tasks or as part of a larger
procedure.
You can manage the Netezza system state by using the nzstate command. It can
display and wait for a specific state to occur.
Related reference:
“The nzstate command” on page A-58
Use the nzstate command to display the current system state or to wait for a
particular system state to occur.
The following table lists the common system states and how they are invoked and
exited.
Run a query.
Note: When you stop and start the Netezza system operations on a Netezza C1000
system, the storage groups continue to run and perform tasks such as media
checks and health checks for the disks in the array, as well as disk regenerations
for disks that fail. The RAID controllers are not affected by the Netezza system
state.
You can use the nzstart command to start system operation if the system is in the
stopped state. The nzstart command is a script that initiates a system start by
setting up the environment and invoking the startup server. The nzstart command
does not complete until the system is online. The nzstart command also verifies
the host configuration to ensure that the environment is configured correctly and
completely; it displays messages to direct you to files or settings that are missing
or misconfigured.
Restriction: You must run nzstart on the host and be logged on as the user nz.
You cannot run it remotely from Netezza client systems.
For IBM Netezza 1000 and IBM PureData System for Analytics N1001 systems, a
message is written to the sysmgr.log file if there are any storage path issues that
are detected when the system starts. The log displays a message similar to mpath
-issues detected: degraded disk path(s) or SPU communication error, which
helps to identify problems within storage arrays.
Related reference:
“The nzstart command” on page A-56
Use the nzstart command to start system operation after you stop the system. The
nzstart command is a script that initiates a system start by setting up the
environment and starting the startup server.
Restriction: You must run nzstop on the host and be logged on as the user nz.
You cannot run it remotely.
To stop the system or exit after waiting for 5 minutes (300 seconds), enter nzstop
-timeout 300.
Related reference:
“The nzstop command” on page A-63
Use the nzstop command to stop the IBM Netezza software operations. Stopping a
system stops all the IBM Netezza processes that were started with the nzstart
command.
Enter y to continue. The transition completes quickly on an idle system, but it can
take much longer if the system is busy processing active queries and transactions.
When the transition completes, the system enters the paused state, which you can
confirm with the nzstate command as follows:
[nz@nzhost ~]$ nzstate
System state is ’Paused’.
You can use the -now option to force a transition to the paused state, which causes
the system to abort any active queries and transactions. As a best practice, use the
nzsession show -activeTxn command to display a list of the current active
transactions before you force the system to terminate them.
The command usually completes quickly; you can confirm that the system has
returned to the online state by using the following command:
[nz@nzhost ~]$ nzstate
System state is ’Online’.
Enter y to continue. The transition completes quickly on an idle system, but it can
take much longer if the system is busy processing active queries and transactions.
When the transition completes, the system enters the offline state, which you can
confirm with the nzstate command as follows:
[nz@nzhost ~]$ nzstate
System state is ’Offline’.
You can use the -now option to force a transition to the offline state, which causes
the system to abort any active queries and transactions. As a best practice, use the
nzsession show -activeTxn command to display a list of the current active
transactions before you force the system to terminate them.
Related reference:
“System logs” on page 7-12
When you power up (or reset) the hardware, each SPU loads an image from its
flash memory and runs it. This image is then responsible for running diagnostic
tests on the SPU, registering the SPU with the host, and downloading runtime
images for the SPU CPU and the FPGA disk controller. The system downloads
these images from the host through TFTP.
The IBM Netezza system can take the following actions when an error occurs:
Display an error message
Presents an error message string to the users that describes the error. This
is the common system response whenever a user request is not fulfilled.
Try again
During intermittent or temporary failures, keep trying until the error
condition disappears. The retries are often needed when resources are
limited, congested, or locked.
Fail over
Switches to an alternate or spare component because an active component
has failed. Failover is a system-level recovery mechanism and can be
triggered by a system monitor or an error that is detected by software that
is trying to use the component.
Log the error
Adds an entry to a component log. A log entry contains a date and time, a
severity level, and an error/event description.
Send an event notification
Sends notification through email or by running a command. The decision
whether to send an event notification is based on a set of user-configurable
event rules.
Abort the program
Terminates the program because it cannot continue because of an
irreparably damaged internal state or because continuing would corrupt
user data. Software asserts that detect internal programming mistakes often
fall into this category because it is difficult to determine that it is safe to
continue.
Clean up resources
Frees or releases resources that are no longer needed. Software components
are responsible for their own resource cleanup. In many cases, resources
System logs
All major software components that run on the host have an associated log. Log
files have the following characteristics:
v Each log consists of a set of files that are stored in a component-specific
directory. For managers, there is one log per manager. For servers, there is one
log per session, and their log files have pid identifiers, date identifiers, or both
(<pid>.<yyyy-mm-dd>).
v Each file contains one day of entries, for a default maximum of seven days.
v Each file contains entries that have a timestamp (date and time), an entry
severity type, and a message.
The system rotates log files, that is, for all the major components there are the
current log files and the archived log files.
v For all IBM Netezza components (except postgres), the system creates a new log
file at midnight if there is constant activity for that component. If, however you
load data on Monday and then do not load again until Friday, the system creates
a new log file dated the previous day from the new activity, in this case,
Thursday. Although the size of the log files is unlimited, every 30 days the
system removes all log files that were not accessed.
v For postgres logs, by default, the system checks the size of the log file daily and
rotates it to an archive file if it is greater than 1 GB in size. The system keeps 28
days (four weeks) of archived log files. (Netezza Support can help you to
customize these settings if needed.)
To view the logs, log on to the host as user nz. When you view an active logfile,
use a file viewer command such as more, less, cat, tail, or similar commands. If
you use a text editor such as emacs or vi, you could cause an interruption and
possible information loss to log files that are actively capturing log messages while
the system is running.
Related concepts:
“Logging Netezza SQL information” on page 11-39
You can log information about all user or application activity on the server, and
you can log information that is generated by individual Windows clients.
Related tasks:
“Logging Netezza SQL information on the server” on page 11-39
Related reference:
“Overview of the Netezza system processing” on page 7-8
Log file
/nz/kit/log/bnrmgr/bnrmgr.log
Current backup and restore manager log
Sample messages
2012-12-12 18:12:05.645586 EST Info: NZ-00022: --- program ’bnrmgr’ (26082)
starting on host ’nzhost’ ... ---
2012-12-12 18:17:09.315244 EST Info: system is online - enabling backup and
restore sessions
Bootserver manager
The bootsvr.log file records the initiation of all SPUs on the system, usually when
the system is restarted by the nzstart command and also all stopping and
restarting of the bootsvr process.
Log files
/nz/kit/log/bootsvr/bootsvr.log
Current log
/nz/kit/log/bootsvr/bootsvr.YYYY-MM-DD.log
Archived log
Sample messages
2012-12-12 18:12:07.399506 EST Info: NZ-00022: --- program ’bootsvr’ (26094)
starting on host ’nzhost’ ... ---
2012-12-12 18:15:25.242471 EST Info: Responded to boot request from device
[ip=10.0.14.28 SPA=1 Slot=1] Run Level = 3
Client manager
The clientmgr.log file records all connection requests to the database server and
also all stopping and starting of the clientmgr process.
Log files
/nz/kit/log/clientmgr/clientmgr.log
Current log
/nz/kit/log/clientmgr/clientmgr.YYYY-MM-DD.log
Archived log
Sample messages
2012-12-12 18:12:05.874413 EST Info: NZ-00022: --- program ’clientmgr’ (26080)
starting on host ’nzhost’ ... ---
2012-12-12 18:12:05.874714 EST Info: Set timeout for receiving from the socket
300 sec.
2012-12-12 18:17:21.642075 EST Info: admin: login successful
Log files
/nz/kit/log/dbos/dbos.log
Current log
/nz/kit/log/dbos/dbos.YYYY-MM-DD.log
Archived log
Event manager
The eventmgr.log file records system events and the stopping and starting of the
eventmgr process.
Log files
/nz/kit/log/eventmgr/eventmgr.log
Current log
/nz/kit/log/eventmgr/eventmgr.YYYY-MM-DD.log
Archived log
Sample messages
2012-12-12 18:12:05.926667 EST Info: NZ-00022: --- program ’eventmgr’ (26081)
starting on host ’nzhost’ ... ---
2012-12-12 18:13:25.064891 EST Info: received & processing event type =
hwNeedsAttention, event args = ’hwId=1037, hwType=host, location=upper host,
devSerial=06LTY66, eventSource=system, errString=Eth RX Errors exceeded threshold,
reasonCode=1052’ event source = ’System initiated event’
2012-12-12 18:16:45.987066 EST Info: received & processing event type =
sysStateChanged, event args = ’previousState=discovering, currentState=initializing,
eventSource=user’ event source = ’User initiated event’
Event type
The event that triggered the notification.
Event args
The argument that is being processed.
ErrString
The event message, which can include hardware identifications and other
details.
Log files
/nz/kit/log/fcommrtx/fcommrtx.log
Current® log
/nz/kit/log/fcommrtx/fcommrtx.2006-03-01.log
Archived log
Sample messages
2012-12-12 18:12:03.055247 EST Info: NZ-00022: --- program ’fcommrtx’ (25990) star
ting on host ’nzhost’ ... ---
2012-12-12 18:12:03.055481 EST Info: FComm : g_defenv_spu2port=0,6,1,7,2,8,3,9,4,1
0,5,11,6,0,7,0,8,1,9,2,10,3,11,4,12,5,13,0
2012-12-12 18:12:03.055497 EST Info: FComm : g_defenv_port2hostthread=0,1,2,3,4,5,
6,7,8,9,10,11,12,13
Log files
/nz/kit/log/hostStatsGen/hostStatsGen.log
Current log
/nz/kit/log/hostStatsGen/hostStatsGen.YYYY-MM-DD.log
Archived log
Sample messages
2012-12-12 18:12:04.969116 EST Info: NZ-00022: --- program ’hostStatsGen’ (26077)
starting on host ’nzhost’ ... ---
Load manager
The loadmgr.log file records details of load requests, and the stopping and starting
of the loadmgr process.
Log file
/nz/kit/log/loadmgr/loadmgr.log
Current log
/nz/kit/log/loadmgr/loadmgr.YYYY-MM-DD.log
Archived log
Sample messages
2004-05-13 14:45:07.454286 EDT Info: NZ-00022:
--- log file ’loadmgr’ (12225) starting on host ’nzhost’ ...
Postgres
The postgres.log file is the main database log file. It contains information about
database activities.
Sample messages
2012-12-31 04:02:10.229470 EST [19122] DEBUG: connection: host=1.2.3.4 user=
MYUSR database=SYSTEM remotepid=6792 fetype=1
2012-12-31 04:02:10.229485 EST [19122] DEBUG: Session id is 325340
2012-12-31 04:02:10.231134 EST [19122] DEBUG: QUERY: set min_quotient_scale to
default
2012-12-31 04:02:10.231443 EST [19122] DEBUG: QUERY: set timezone = ’gmt’
2012-12-31 09:02:10.231683 gmt [19122] DEBUG: QUERY: select current_timestamp,
avg(sds_size*1.05)::integer as avg_ds_total, avg(sds_used/(1024*1024))::integer as
avg_ds_used from _v_spudevicestate
Session manager
The sessionmgr.log file records details about the starting and stopping of the
sessionmgr process, and any errors that are associated with this process.
Log files
/nz/kit/log/sessionmgr/sessionmgr.log
Current log
/nz/kit/log/sessionmgr/sessionmgr.YYYY-MM-DD.log
Archived log
Sample messages
2012-12-12 18:11:50.868454 EST Info: NZ-00022: --- program ’sessionmgr’ (25843)
starting on host ’nzhost’ ... ---
Startup server
The startupsvr.log file records the start of the IBM Netezza processes and any
errors that are encountered with this process.
Log files
/nz/kit/log/startupsvr/startupsvr.log
Current log
/nz/kit/log/startupsvr/startupsvr.YYYY-MM-DD.log
Archived log
Sample messages
2012-12-12 18:11:43.951689 EST Info: NZ-00022: --- program ’startupsvr’ (25173)
starting on host ’nzhost’ ... ---
2012-12-12 18:11:43.952733 EST Info: NZ-00307: starting the system, restart = no
2012-12-12 18:11:43.952778 EST Info: NZ-00313: running onStart: ’prepareForStart’
2012-12-12 18:11:43 EST: Rebooting SPUs via RICMP ...
Log files
/nz/kit/log/statsSvr/statsSvr.log
Current log
/nz/kit/log/statsSvr/statsSvr.YYYY-MM-DD.log
Archived log
Sample messages
2012-12-12 18:12:05.794050 EST Info: NZ-00022: --- program ’statsSvr’ (26079)
starting on host ’nzhost’ ... ---
System Manager
The sysmgr log file records details of stopping and starting the sysmgr process,
and details of system initialization and system state status.
Log file
/nz/kit/log/sysmgr/sysmgr.log
Current log
/nz/kit/log/sysmgr/sysmgr.YYYY-MM-DD.log
Archived log
Output
2012-12-12 18:12:05.578573 EST Info: NZ-00022: --- program ’sysmgr’ (26078) starting
on host ’nzhost’ ... ---
2012-12-12 18:12:05.579716 EST Info: Starting sysmgr with existing topology
2012-12-12 18:12:05.882697 EST Info: Number of chassis level switches for each
chassis in this system: 1
The file on the Linux host for this disk work area is $NZ_TMP_DIR/nzDbosSpill.
Within DBOS, there is a database that tracks segments of the file presently in use.
To avoid having a runaway query use up all the host computer disk space, there is
a limit on the DbosEvent database, and hence the size of the Linux file. This limit
is in the Netezza Registry file. The tag for the value is
startup.hostSwapSpaceLimit.
For example:
v To display all system registry settings, enter:
nzsystem showRegistry
A change made in this way remains effective only until the system is
restarted; at system startup, all configuration settings are read from the
system configuration file and loaded into the registry.
Permanently
To change a configuration setting permanently, edit the corresponding line
in the configuration file, system.cfg. Configuration settings are loaded
from this file to the registry during system startup.
The following tables describe the configuration settings that you can change
yourself, without involving your IBM Netezza support representative.
Table 7-6. Configuration settings for short query bias (SQB)
Setting Type Default Description
host.schedSQBEnabled bool true Whether SQB is enabled (true) or disabled (false).
host.schedSQBNominalSecs int 2 The threshold, in seconds, below which a query is to be
regarded as being short.
host.schedSQBReservedGraSlots int 10 The number of GRA scheduler slots that are to be reserved
for short queries.
host.schedSQBReservedSnSlots int 6 The number of snippet scheduler slots that are to be reserved
for short queries.
host.schedSQBReservedSnMb int 50 The amount of memory, in MB, that each SPU is to reserve
for short query execution.
host.schedSQBReservedHostMb int 64 The amount of memory, in MB, that the host is to reserve for
short query execution.
Related reference:
“The nzsystem command” on page A-65
Use the nzsystem command to change the system state, and show and set
configuration information.
You can configure the event manager to continually watch for specific conditions
such as system state changes, hardware restarts, faults, or failures. In addition, the
event manager can watch for conditions such as reaching a certain percentage of
full disk space, queries that have run for longer than expected, and other Netezza
system behaviors.
This section describes how to administer the Netezza system by using event rules
that you create and manage.
To help ease the process of creating event rules, IBM Netezza supplies template
event rules that you can copy and tailor for your system. The template events
define a set of common conditions to monitor with actions that are based on the
type or effect of the condition. The template event rules are not enabled by default,
and you cannot change or delete the template events. You can copy them as starter
rules for more customized rules in your environment.
As a best practice, you can begin by copying and by using the template rules. If
you are familiar with event management and the operational characteristics of
your Netezza appliance, you can also create your own rules to monitor conditions
that are important to you. You can display the template event rules by using the
nzevent show -template command.
Note: Release 5.0.x introduced new template events for the IBM Netezza 100, IBM
Netezza 1000, and later systems. Previous event template rules specific to the
z-series platform do not apply to the new models and were replaced by similar,
new events.
Netezza might add new event types to monitor conditions on the system. These
event types might not be available as templates, which means you must manually
add a rule to enable them. For a description of more event types that can assist
you with monitoring and managing the system, see “Event types reference” on
page 8-36.
The action to take for an event often depends on the type of event (its effect on the
system operations or performance). The following table lists some of the
predefined template events and their corresponding effects and actions.
Table 8-2. Netezza template event rules
Template name Type Notify Severity Effect Action
Disk80PercentFull hwDiskFull Admins, Moderate Full disk Reclaim space or remove
(Notice) DBAs to Serious prevents some unwanted databases or older
Disk90PercentFull operations. data. For more information, see
“Disk space threshold
notification” on page 8-22.
HardwareNeeds hwNeeds Admins, Moderate Possible Investigate and identify whether
Attention Attention NPS change or issue more assistance is required from
that can start Support. For more information,
to affect see “Hardware needs attention”
performance. on page 8-20.
Hardware hwRestarted Admins, Moderate Any query or Investigate whether the cause is
Restarted (Notice) NPS data load in hardware or software. Check for
progress is lost. SPU cores. For more information,
see “Hardware restarted” on
page 8-22.
HardwareService hwService Admins, Moderate Any query or Contact Netezza. For more
Requested Requested NPS to Serious work in information, see “Hardware
(Warning) progress is lost. service requested” on page 8-18.
Disk failures
initiate a
regeneration.
You can copy, modify, and add events by using the nzevent command or the
NzAdmin interface. You can also generate events to test the conditions and event
notifications that you are configuring. The following sections describe how to
manage events by using the nzevent command. The NzAdmin interface has an
intuitive interface for managing events, including a wizard tool for creating events.
For information about accessing the NzAdmin interface, see “NzAdmin overview”
on page 3-12.
When you copy a template event rule, which is disabled by default, your new rule
is likewise disabled by default. You must enable it by using the -on yes argument.
In addition, if the template rule sends email notifications, you must specify a
destination email address.
The following example copies, renames, and modifies an existing event rule:
nzevent copy -u admin -pw password -name NPSNoLongerOnline -newName
MyModNPSNoLongerOnline -on yes -dst [email protected] -ccDst
[email protected] -callhome yes
When you copy an existing user-defined event rule, your new rule is enabled
automatically if the existing rule is enabled. If the existing rule is disabled, your
new rule is disabled by default. You must enable it by using the -on yes argument.
You must specify a unique name for your new rule; it cannot match the name of
the existing user-defined rule.
Generate an event
You can use the nzevent generate command to trigger an event for the event
manager. If the event matches a current event rule, the system takes the action that
is defined by the event rule.
If the event that you want to generate has a restriction, specify the arguments that
would trigger the restriction by using the -eventArgs option. For example, if a
runaway query event has a restriction that the duration of the query must be
greater than 30 seconds, use a command similar to the following to ensure that a
generated event is triggered:
nzevent generate -eventtype runawayquery -eventArgs ’duration=50’
In this example, the duration meets the event criteria (greater than 30) and the
event is triggered. If you do not specify a value for a restriction argument in the
-eventArgs string, the command uses default values for the arguments. In this
example, duration has a default of 0, so the event would not be triggered since it
did not meet the event criteria.
Adding an event rule consists of two tasks: specifying the event match criteria and
specifying the notification method. These tasks are described in more detail after
the examples.
Note: Although the z-series events are not templates on IBM Netezza 1000 or
N1001 systems, you can add them by using nzevent if you have the syntax that is
documented in the previous releases. However, these events are not supported on
IBM Netezza 1000 or later systems.
To add an event rule that sends an email when the system transitions from the
online state to any other state, enter:
nzevent add -name TheSystemGoingOnline -u admin -pw password
-on yes -eventType sysStateChanged -eventArgsExpr ’$previousState
== online && $currentState != online’ -notifyType email -dst
[email protected] -msg ’NPS system $HOST went from $previousState to
$currentState at $eventTimestamp.’ -bodyText
’$notifyMsg\n\nEvent:\n$eventDetail\nEvent
Rule:\n$eventRuleDetail’
Note: If you are creating event rules on a Windows client system, use double
quotation marks instead of single quotation marks to specify strings.
Related concepts:
“Callhome file” on page 5-19
The event manager generates notifications for all rules that match the criteria, not
just for the first event rule that matches. The following table lists the event types
that you can specify and the arguments and the values that are passed with the
event. You can list the defined event types by using the nzevent listEventTypes
command. Used only on z-series systems such as the 10000-series, 8000z-series, and
5200-series systems.
Table 8-3. Event types
Event type Tag name Possible values
sysStateChanged previousState, currentState, <any system state>, <Event
eventSource Source>
For example, to receive an email when the system is not online, it is not enough to
create an event rule for a sysStateChanged event. Because the sysStateChange
event recognizes every state transition, you can be notified whenever the state
changes at all, such as from online to paused.
You can add an event args expression to further qualify the event for notification.
If you specify an expression, the system substitutes the event arguments into the
expression before evaluating it. The system uses the result combined with the
event type to determine a match. So, to send an email message when the system is
no longer online, you would use the expression: $previousState == online &&
Event notifications
When an event occurs, you can have the system send an email or run an external
command. Email can be aggregated whereas commands cannot.
v To specify an email, you must specify a notification type (-notifyType email), a
destination (-dst), a message (-msg), and optionally, a body text (-bodyText), and
the callhome file (-callHome).
You can specify multiple email addresses that are separated by a comma and no
space. For example,
[email protected],[email protected],[email protected]
v To specify that you want to run a command, you must specify a notification
type (-notifyType runCmd), a destination (-dst), a message (-msg), and
optionally, a body text (-bodyText), and the callhome file (-callHome).
When you are defining notification fields that are strings (-dst, -ccDst, -msg,
-bodyText), you can use $tag syntax to substitute known system or event values.
Table 8-5 on page 8-13 lists the system-defined tags that are available.
Related concepts:
“Event email aggregation” on page 8-14
The sendmail.cfg file also contains options that you can use to specify a user
name and password for authentication on the mail server. You can find a copy of
this file in the /nz/data/config directory on the IBM Netezza host.
If you specify the email or runCmd arguments, you must enter the destination and
the subject header. You can use all the following arguments with either command,
except the -ccDst argument, which you cannot use with the runCmd. The
following table lists the syntax of the message.
Table 8-6. Notification syntax
Argument Description Example
-dst Your email address -dst '[email protected],[email protected]'
If you set email aggregation and events-per-rule reach the threshold value for the
event rule or the time interval expires, the system aggregates the events and sends
a single email per event rule.
Note: You specify aggregation only for event rules that send email, not for event
rules that run commands.
Related concepts:
“Event notifications” on page 8-12
Related reference:
“Hardware restarted” on page 8-22
If you enable the event rule HardwareRestarted, you receive notifications when
each SPU successfully restarts (after the initial startup). Restarts are usually related
to a software fault, whereas hardware causes can include uncorrectable memory
faults or a failed disk driver interaction.
“Disk space threshold notification” on page 8-22
You can enable event aggregation system-wide and specify the time interval. You
can specify 0 - 86400 seconds. If you specify 0 seconds, there is no aggregation,
even if aggregation is specified on individual events.
Procedure
1. Pause the system using the command nzsystem pause -u bob -pw 1234 -host
nzhost
2. Specify aggregation of 2 minutes (120 seconds), enter nzsystem set -arg
sysmgr.maxAggregateEventInterval=120
3. Resume the system, enter nzsystem resume -u bob -pw 1234 -host nzhost
4. Display the aggregation setting, enter nzsystem showRegistry | grep
maxAggregateEventInterval
The body of the message lists the messages by time, with the earliest events first.
The Reporting Interval indicates whether the notification trigger was the count or
time interval. The Activity Duration indicates the time interval between the first
and last event so that you can determine the granularity of the events.
For example, the following aggregation is for the Memory ECC event:
Subject: NPS nzdev1 : 2 occurrences of Memory ECC Error from 11-Jun-07
18:41:59 PDT over 2 minutes.
You can use the Custom1 and Custom2 event rules to define and generate events
of your own design for conditions that are not already defined as events by the
NPS software. An example of a custom event might be to track the user login
information, but these events can also be used to construct complex events.
If you define a custom event, you must also define a process to trigger the event
using the nzevent generate command. Typically, these events are generated by a
customer-created script which is invoked in response to either existing NPS events
or other conditions that you want to monitor.
Procedure
1. Use the nzevent add command to define a new event type. Custom events are
never based on any existing event types. This example creates three different
custom events. nNewRule4 and NewRule5 use the variable eventType to
distinguish between the event types. The NewRule6 event type uses a custom
variable and compares it with the standard event type.
[nz@nzhost ~]$ nzevent add -eventType custom1 -name NewRule4
-notifyType email -dst [email protected] -msg "NewRule4 message"
-eventArgsExpr ’$eventType==RandomCustomEvent’
What to do next
Consider creating a script that runs the nzevent generate command as needed
when your custom events occur.
These events occur when the system is running. The typical states are
v Online
v Pausing Now
v Going Pre-Online
v Resuming
v Going OffLine Now
v Offline (now)
v Initializing
v Stopped
The Failing Back and Synchronizing states apply only to z-series systems.
The following is the syntax for the template event rule NPSNoLongerOnline:
-name NPSNoLongerOnline -on no -eventType sysStateChanged
-eventArgsExpr ’$previousState == online && $currentState != online’
-notifyType email -dst ’[email protected]’ -ccDst ’’ -msg ’NPS system
$HOST went from $previousState to $currentState at $eventTimestamp
$eventSource.’ -bodyText ’$notifyMsg\n\nEvent:\n$eventDetail\n’
-callHome yes -eventAggrCount 1
The valid values for the previousState and currentState arguments are:
initializing pausedNow syncingNow
initialized preOnline syncedNow
offlining preOnlining failingBack
offliningNow resuming failedBack
offline restrictedResuming maintaining
offlineNow stopping maintain
online stoppingNow recovering
restrictedOnline stopped recovered
pausing stoppedNow down
pausingNow syncing unreachable
paused synced badState
For more information about states, see Table 5-4 on page 5-10.
In other cases, such as SPU failures, the system reroutes the work of the failed SPU
to the other available SPUs. The system performance is affected because the
healthy resources take on extra workload. Again, it is critical to obtain service to
replace the faulty component and restore the system to its normal performance.
The errString value contains more information about the sector that had a read
error:
v The md value specifies the RAID device on the SPU that encountered the issue.
v The sector value specifies which sector in the device has the read error.
v The partition type specifies whether the partition is a user data (DATA) or
SYSTEM partition.
v The table value specifies the table ID of the user table that is affected by the bad
sector.
If the system notifies you of a read sector error, contact IBM Netezza Support for
assistance with troubleshooting and resolving the problems.
If you enable the HwNeedsAttention event rule, the system generates a notification
when it detects conditions that can lead to problems or that serve as symptoms of
possible hardware failure or performance impacts.
The following table lists the arguments to the HardwareNeedsAttention event rule.
Table 8-9. HardwareNeedsAttention event rule
Arguments Description Example
hwType The type of hardware affected spu
hwId The hardware ID of the component 1013
that has a condition to investigate
location A string that describes the physical
location of the component
errString If the failed component is not
inventoried, it is specified in this
string.
devSerial Specifies the serial number of the 601S496A2012
component, or Unknown if the
component has no serial number.
The following table lists the arguments to the HardwarePathDown event rule.
Table 8-10. HardwarePathDown event rule
Arguments Description Example
hwType For a path down event, the SPU that SPU
reported the problem
hwId The hardware ID of the SPU that 1013
loses path connections to disks
location A string that describes the physical First Rack, First SPA, SPU in third
location of the SPU slot
errString If the failed component is not Disk path event:Spu[1st Rack, 1st
inventoried, it is specified in this SPA, SPU in 5th slot] to Disk [disk
string. hwid=1034
sn="9WK4WX9D00009150ECWM"
SPA=1 Parent=1014 Position=12
Address=0x8e92728
ParentEnclPosition=1 Spu=1013]
(es=encl1Slot12 dev=sdl major=8
minor=176 status=DOWN)
If you are notified of hardware path down events, contact IBM Netezza Support
and alert them to the path failure or failures. It is important to identify and resolve
the issues that are causing path failures to return the system to optimal
performance as soon as possible.
Message Details
If you receive a path down event, you can obtain more information about the
problems. This information might be helpful when you contact Netezza Support.
To see whether there are current topology issues, use the nzds show -topology
command. The command displays the current topology, and if there are issues, a
WARNING section at the end of the output.
Related concepts:
“System resource balance recovery” on page 5-17
Hardware restarted
If you enable the event rule HardwareRestarted, you receive notifications when
each SPU successfully restarts (after the initial startup). Restarts are usually related
to a software fault, whereas hardware causes can include uncorrectable memory
faults or a failed disk driver interaction.
You can modify the event rule to specify that the system include the device serial
number, its hardware revision, and firmware revision as part of the message,
subject, or both.
The following table describes the arguments to the HardwareRestarted event rule.
Table 8-11. HardwareRestarted event rule
Arguments Description Example
hwType The type of hardware affected spu
hwId The hardware ID of the regen source 1013
SPU having the problem
spaId The ID of the SPA A number 1 - 32
spaSlot The SPA slot number Usually a slot number from 1
to 13
devSerial The serial number of the SPU 601S496A2012
devHwRev The hardware revision 7.21496rA2.21091rB1
devFwRev The firmware revision 1.36
Related concepts:
“Event email aggregation” on page 8-14
The following table lists the arguments to the DiskSpace event rules.
Table 8-12. DiskSpace event rules
Arguments Description Example
hwType The type of hardware affected spu, disk
hwId The hardware ID of the disk that has 1013
the disk space issue
spaId The ID of the SPA
spaSlot The SPA slot number
partition The data slice number 0,1,2,3
threshold The threshold value 75, 80, 85, 90, 95
value The actual percentage full value 84
After you enable the event rule, the event manager sends you an email when the
system disk space percentage exceeds the first threshold and is below the next
threshold value. The event manager sends only one event per sampled value.
For example, if you enable the event rule Disk80PercentFull, which specifies
thresholds 80 and 85 percent, the event manager sends you an email when the disk
space is at least 80, but less than 85 percent full. When you receive the email, your
actual disk space might be 84 percent full.
The event manager maintains thresholds for the values 75, 80, 85, 90, and 95. Each
of these values (except for 75) can be in the following states:
Armed
The system has not reached this value.
Disarmed
The system has exceeded this value.
Fired The system has reached this value.
Rearmed
The system has fallen below this value.
Note: If you enable an event rule after the system reached a threshold, you are not
notified that it reached this threshold until you restart the system.
After the IBM Netezza System Manager sends an event for a particular threshold,
it disarms all thresholds at or below that value. (So if 90 is triggered, it does not
trigger again until it is rearmed). The Netezza System Manager rearms all
disarmed higher thresholds when the disk space percentage full value falls below
the previous threshold, which can occur when you delete tables or databases. The
Netezza System Manager arms all thresholds (except 75) when the system starts
up.
Tip: To ensure maximum coverage, enable both event rules Disk80PercentFull and
Disk90PercentFull.
To send an email when the disk is more than 80 percent full, enable the predefined
event rule Disk80PercentFull:
nzevent modify -u admin -pw password -name Disk80PercentFull -on
yes -dst [email protected]
If you receive a diskFull notification from one or two disks, your data might be
unevenly distributed across the data slices (data skew). Data skew can adversely
affect performance for the tables that are involved and for combined workloads.
Tip: Consider aggregating the email messages for this event. Set the aggregation
count to the number of SPUs.
Related concepts:
“Data skew” on page 12-10
“Event email aggregation” on page 8-14
The runaway query timeout is a limit that you can specify system-wide (for all
users), or for specific groups or users. The default query timeout is unlimited for
users and groups, but you can establish query timeout limits by using a system
default setting, or when you create or alter users or groups. The runaway query
timeout limit does not apply to the admin database user.
The following table lists the arguments to the RunAwayQuery event rule. The
arguments are case-sensitive.
Note: Typically you do not aggregate this event because you should consider the
performance impact of each individual runaway query.
When you specify the duration argument in the -eventArgsExpr string, you can
specify an operator such as: ‘==’, ‘!=’, ‘>’, ‘>=’, ‘<’, or ‘<=’ to specify when to send
the event notification. Use the greater-than (or less-than) versions of the operators
to ensure that the expression triggers with a match. For example, to ensure that a
notification event is triggered when the duration of a query exceeds 100 seconds,
specify the -eventArgsExpr as follows:
-eventArgsExpr ’$duration > 100’
If a query exceeds its timeout threshold and you added a runaway query rule, the
system sends you an email that informs you how long the query ran. For example:
NPS system alpha - long-running query detected at 07-Nov-03, 15:43:49
EST.
sessionId: 10056
planId: 27
duration: 105 seconds
Related concepts:
“Query timeout limits” on page 11-37
You can place a limit on the amount of time a query is allowed to run before the
system notifies you by using the runaway query event. The event email shows
how long the query has been running, and you can decide whether to terminate
the query.
System state
You can also monitor for events when a system is “stuck” in the Pausing Now
state. The following is the syntax for event rule SystemStuckInState:
-name ’SystemStuckInState’ -on no -eventType systemStuckInState
-eventArgsExpr ’’ -notifyType email -dst ’<your email here>’ -ccDst ’’
-msg ’NPS system $HOST - System Stuck in state $currentState for
$duration seconds’ -bodyText ’The system is stuck in state change.
Contact Netezza support team\nduration: $duration seconds\nCurrent
State: $currentState\nExpected State: $expectedState’ -callHome yes
-eventAggrCount 0
It is important to monitor the transition to or from the Online state because that
transition affects system availability.
IBM Netezza sets the thresholds that are based on analysis of disk drives and their
performance characteristics. If you receive any of these events, contact Netezza
Support and have them determine the state of your disk. Do not aggregate these
events. The templates do not aggregate these events by default.
The following is the syntax for the event rule SCSIPredictiveFailure event:
-name ’SCSIPredictiveFailure’ -on no -eventType scsiPredictiveFailure
-eventArgsExpr ’’ -notifyType email -dst ’[email protected]’ -ccDst ’’
-msg ’NPS system $HOST - SCSI Predictive Failure value exceeded for
disk $diskHwId at $eventTimestamp’ -bodyText
’$notifyMsg\n\nspuHwId:$spuHwId\ndisk
location:$location\nscsiAsc:$scsiAsc\nscsiAscq:$scsiAscq\nfru:$fru\nde
vSerial:$devSerial\ndiskSerial:$diskSerial\ndiskModel:$diskModel\ndisk
Mfg:$diskMfg\nevent source:$eventSource\n’ -callHome no
-eventAggrCount 0
The following table lists the output from the SCSIPredictiveFailure event rule.
Table 8-15. SCSIPredictiveFailure event rule
Arguments Description Example
spuHwId The hardware ID of the SPU that owns or
manages the disk that reported the event
diskHwId The hardware ID of the disk 1013
scsiAsc The attribute sense code, which is an Vendor specific
identifier of the SMART attribute
scsiAscq The attribute sense code qualifier of the Vendor specific
SMART attribute
fru The FRU ID for the disk
location The location of the disk
devSerial The serial number of the SPU to which 601S496A2012
the disk is assigned
diskSerial The disk serial number 7.21496rA2.21091rB1
diskModel The disk model number
diskMfg The disk manufacturer
Regeneration errors
If the system encounters hardware problems while it attempts to set up or perform
a regeneration, the system triggers a RegenFault event rule.
The following table lists the output from the event rule RegenFault.
Table 8-16. RegenFault event rule
Arguments Description Examples
hwIdSpu The hardware ID of the SPU that owns or 1013
manages the problem disk
hwIdSrc The hardware ID of the source disk
locationSrc The location string of the source disk
hwIdTgt The hardware ID of the target spare disk
locationTgt The location string of the target disk
errString The error string for the regeneration issue
devSerial The serial number of the owning or reporting
SPU
Note: If you receive a significant number of disk error messages, contact IBM
Netezza Support to investigate the state of your disks.
If you enable the event rule SCSIDiskError, the system sends you an email message
when it fails a disk.
The following table lists the output from the SCSIDiskError event rule.
Table 8-17. SCSIDiskError event rule
Argument Description Examples
spuHwId The hardware ID of the SPU that owns or manages
the disk or FPGA
diskHwId The hardware ID of the disk where the error 1013
occurred
location The location string for the disk
errType The type of error, that is, whether the error is the 1 (Failure), 2 (Failure imminent) 3 (Failure
type failure, failure possible, or failure imminent possible), 4 (Failure unknown)
errCode The error code that specifies the cause of the error 110
In some cases, you might need to replace components such as cooling units (fans,
blowers, or both), or perhaps a SPU.
The following table lists the output from the ThermalFault event rule.
Table 8-18. ThermalFault event rule
Argument Description Examples
hwType The hardware type where the error occurred SPU* or disk enclosure
hwId The hardware ID of the component where the 1013
fault occurred
label The label for the temperature sensor. For the
IBM Netezza Database Accelerator card, this
label is the BIE temperature. For a disk
enclosure, it is temp-1-1 for the first
temperature sensor on the first enclosure.
location A string that describes the physical location of
the component
curVal The current temperature reading for the
hardware component
errString The error message The board temperature
for the SPU exceeded 45
degrees centigrade.