Skip to content

logsumexp: two little-impact perf suggestions (important because logsumexp is used for optimizing/fusing cross_entropy over large vocabs) #31837

@vadimkantorov

Description

@vadimkantorov

https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/native/ReduceOps.cpp#L372

  1. Could probably do exp in-place?
  2. Hopefully torch.isinf could be made to not allocate a full copy with abs call (currently same happens in Python impl of torch.isinf)

cc @VitalyFedyunin @ngimel @mruberry

Metadata

Metadata

Assignees

No one assigned

    Labels

    module: cpuCPU specific problem (e.g., perf, algorithm)module: performanceIssues related to performance, either of kernel code or framework gluemodule: reductionstriagedThis issue has been looked at a team member, and triaged and prioritized into an appropriate module

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions