Skip to content

NULLTERM vs. NULLPAD in PyTables' generated HDF5 files #264

@dalleyg

Description

@dalleyg

PyTables use STRPAD=H5T_STR_NULLTERM for its data, but H5T_STR_NULLPAD seems like a more appropriate choice.

Here's a simple example of creating such an HDF5 file.

import tables
import cPickle
import numpy
f = tables.openFile('foo.h5', 'w')
x = 'Hello\0World'
atom = tables.StringAtom(len(x))
node = f.createCArray('a', 'b', atom, (1,),
createparents=True)
node[0] = x
f.close()
with tables.openFile('foo.h5', 'r') as f:
y = f.getNode('/b')[0]
assert x == y

According to h5dump, this is stored as a NULLTERM string:

$ h5dump foo.h5
...
DATASET "b" {
DATATYPE H5T_STRING {
STRSIZE 11;
STRPAD H5T_STR_NULLTERM;
CSET H5T_CSET_ASCII;
CTYPE H5T_C_S1;
}
DATASPACE SIMPLE { ( 1 ) / ( 5957 ) }
DATA {
(0): "Hello"
}
...

Other tools like h5py that expect NULLPAD to be use for fixed-length strings then interpret the data incorrectly. For example, the following script

import h5py
f = h5py.File('foo.h5')
print len(f['b'][0])

prints 5 instead of the expected 11.

The main objective here is to have PyTables be able to write files that other tools like h5py can properly read.

(see https://groups.google.com/forum/?fromgroups#!topic/pytables-dev/K673TZ8yXqk and https://groups.google.com/forum/#!topic/h5py/W3LgQPUH8bE for more details).

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions