Skip to content

cdist backward for euclidian distance should be expressed through matrix multiplies #31599

@ngimel

Description

@ngimel

🚀 Feature

#25799 provides optimized implementation for cdist with p=2 that uses matrix multiplies instead of specialized (inefficient) kernels. However, that PR did not change the backward logic, so backward even for the optimized case calls inefficient kernels. The logic for cdist operation should be rewritten in such a way that euclidian distance becomes a compound operation w/o backward defined (thus the backward would be optimized automatically), and the general case can go through the inefficient kernels.

Motivation

Optimized forward implementation improves performance by upto 50x, we expect similar improvements for backward.
Also, current implementation suffers from the bugs with large sizes (see #31167). Those bugs still have to be fixed for the general case, but gemm-based implementation seems more robust. Large memory usage reported in #24345 also might be due to inefficient implementation.

cc @ezyang @albanD @zou3519 @gqchen @pearu @nikitaved @VitalyFedyunin @ngimel

Metadata

Metadata

Assignees

No one assigned

    Labels

    module: autogradRelated to torch.autograd, and the autograd engine in generalmodule: distance functionsmodule: performanceIssues related to performance, either of kernel code or framework gluetriagedThis issue has been looked at a team member, and triaged and prioritized into an appropriate module

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions