Skip to content

Suggestion: provide a padding function for Dask arrays #1926

@mratsim

Description

@mratsim

Hi team,

I'm currently working on a dataset of 3D images that will be fed to a neural network.
The input arrays have varying sizes along all 3 axis for example:
[390, 355, 355]
[390, 414, 414]
[398, 474, 474]
[403, 474, 474]
[412, 490, 490]
[530, 490, 490]

All images would fit in a 530 x 490 x 490 array
I would like to pad smaller images with 0, or even better the 'edge' value as in numy.pad, so they all have the same 530x490x490 shape.

I don't see how to do that within dask without reverting to Numpy and using either:

  1. assignment
    (from https://stackoverflow.com/questions/35751306/python-how-to-pad-numpy-array-with-zeros)
def pad(array, reference_shape, offsets):
    """
    array: Array to be padded
    reference_shape: tuple of size of narray to create
    offsets: list of offsets (number of elements must be equal to the dimension of the array)
    will throw a ValueError if offsets is too big and the reference_shape cannot handle the offsets
    """

    # Create an array of zeros with the reference shape
    result = np.zeros(reference_shape)
    # Create a list of slices from offset to offset + shape in each dimension
    insertHere = [slice(offsets[dim], offsets[dim] + array.shape[dim]) for dim in range(array.ndim)]
    # Insert the array in the result at the specified offsets
    result[insertHere] = array
    return result
  1. or np.pad (see https://stackoverflow.com/questions/29218785/numpy-scale-3d-array)

I believe this is a very common scenario while preprocessing images for machine learning.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions