Skip to content

[MNT] Switch from np.column_stack() to np.vstack().T for performance #31130

@scottshambaugh

Description

@scottshambaugh

From the discussion in #31001 (comment), it looks like np.column_stack() is generally a slow operation compared to np.vstack().T. This is because the former has to interleave elements in memory whereas the second does contiguous memory copies and returns a view.

It's unclear to me how many of these operations are driving time spent in the hot paths of the code, but we should see a performance improvement by switching things over.

Marking this as an easy first issue since it's largely a find-and-replace.

10,000 elements: 10 runs x 10,000 iterations

With broadcast:
- `np.column_stack(np.broadcast_arrays(x, y))`: 36.47 us
- `np.vstack(np.broadcast_arrays(x, y)).T`: 27.67 us
- `np.empty + assign`: 30.09 us

Without broadcast:
- `np.column_stack([x, y])`: 20.63 us
- `np.vstack((x, y)).T`: 13.18 us
"""Benchmark different array combination methods."""

import timeit
import numpy as np

N = 10_000
NUMBER = 10_000
REPEAT = 10

x = np.linspace(0, 1, N)
y = np.float64(2.0)

# Pre-matched arrays for non-broadcast cases
x_full = x
y_full = np.full(N, 2.0)


def broadcast_column_stack():
    return np.column_stack(np.broadcast_arrays(x, y))

def broadcast_vstack_T():
    return np.vstack(np.broadcast_arrays(x, y)).T

def broadcast_empty_assign():
    out = np.empty((N, 2))
    bx, by = np.broadcast_arrays(x, y)
    out[:, 0] = bx
    out[:, 1] = by
    return out

def no_broadcast_column_stack():
    return np.column_stack([x_full, y_full])

def no_broadcast_vstack_T():
    return np.vstack((x_full, y_full)).T


def bench(func):
    times = timeit.repeat(func, number=NUMBER, repeat=REPEAT)
    return min(times) / NUMBER


if __name__ == "__main__":
    print(f"{N:,} elements: {REPEAT} runs x {NUMBER:,} iterations")

    broadcast_cases = [
        ("np.column_stack(np.broadcast_arrays(x, y))", broadcast_column_stack),
        ("np.vstack(np.broadcast_arrays(x, y)).T", broadcast_vstack_T),
        ("np.empty + assign", broadcast_empty_assign),
    ]

    no_broadcast_cases = [
        ("np.column_stack([x, y])", no_broadcast_column_stack),
        ("np.vstack((x, y)).T", no_broadcast_vstack_T),
    ]

    print("\nWith broadcast:")
    for label, func in broadcast_cases:
        t = bench(func)
        print(f"- `{label}`: {t * 1e6:.2f} us")

    print("\nWithout broadcast:")
    for label, func in no_broadcast_cases:
        t = bench(func)
        print(f"- `{label}`: {t * 1e6:.2f} us")

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions