Skip to content

Conversation

@IvanYashchuk
Copy link
Collaborator

@IvanYashchuk IvanYashchuk commented Apr 16, 2021

Stack from ghstack:

Differential Revision: D27960151

@facebook-github-bot
Copy link
Contributor

facebook-github-bot commented Apr 16, 2021

💊 CI failures summary and remediations

As of commit 87182f9 (more details on the Dr. CI page):


  • 1/1 failures introduced in this PR

🕵️ 1 new failure recognized by patterns

The following CI failures do not appear to be due to upstream breakages:

See CircleCI build pytorch_windows_vs2019_py36_cuda10.1_test2 (1/1)

Step: "Test" (full log | diagnosis details | 🔁 rerun)

AssertionError: False is not true : Scalars fai...ith rtol=1.3e-06 and atol=1e-05 is only 1.4278052!
======================================================================
FAIL [1.157s]: test_cudnn_multiple_threads_same_device (__main__.TestCuda)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "C:\Users\circleci\project\build\win_tmp\build\torch\testing\_internal\common_utils.py", line 439, in wrapper
    fn(*args, **kwargs)
  File "test_cuda.py", line 2505, in test_cudnn_multiple_threads_same_device
    (2048 - test_iters) * (2048 - test_iters))
  File "C:\Users\circleci\project\build\win_tmp\build\torch\testing\_internal\common_utils.py", line 1397, in assertEqual
    super().assertTrue(result, msg=self._get_assert_msg(msg, debug_msg=debug_msg))
AssertionError: False is not true : Scalars failed to compare as equal! Comparing 2992900.0 and 1098304 gives a difference of 1894596.0, but the allowed difference with rtol=1.3e-06 and atol=1e-05 is only 1.4278052!

----------------------------------------------------------------------
Ran 159 tests in 85.431s

FAILED (failures=1, skipped=67)

Generating XML reports...
Generated XML report: test-reports\dist-gloo\test_cuda\TEST-TestCuda-20210429235051.xml
Generated XML report: test-reports\dist-gloo\test_cuda\TEST-TestCudaComm-20210429235051.xml
Traceback (most recent call last):

This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions to the (internal) Dr. CI Users group.

Click here to manually regenerate this comment.

IvanYashchuk added a commit to IvanYashchuk/pytorch that referenced this pull request Apr 16, 2021
@IvanYashchuk IvanYashchuk added the module: linear algebra Issues related to specialized linear algebra operations in PyTorch; includes matrix multiply matmul label Apr 16, 2021
IvanYashchuk added a commit to IvanYashchuk/pytorch that referenced this pull request Apr 16, 2021
IvanYashchuk added a commit to IvanYashchuk/pytorch that referenced this pull request Apr 19, 2021
@IvanYashchuk IvanYashchuk removed the request for review from antocuni April 20, 2021 16:50
IvanYashchuk added a commit to IvanYashchuk/pytorch that referenced this pull request Apr 20, 2021
@mruberry
Copy link
Collaborator

@lezcano would you sanity check this?

IvanYashchuk added a commit to IvanYashchuk/pytorch that referenced this pull request Apr 26, 2021
@lezcano lezcano self-requested a review April 26, 2021 10:35
Copy link
Contributor

@antocuni antocuni left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.
If I understand it correctly, this PR by itself is mostly a no-op because it introduces _linalg_qr_out_helper but it never uses it apart for re-implementing the old _linalg_qr_helper, but I assume it will be used by subsequent PRs in the ghstack

TORCH_INTERNAL_ASSERT(Q.sizes().equals(expected_Q_shape));

// Q tensor must be in batched column major order (Fortran contiguous)
TORCH_INTERNAL_ASSERT(Q.transpose(-2, -1).is_contiguous());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this has nothing to do with this PR, it's just a random idea which came to my mind while reading this.
Should we have something like is_fortran_contiguous()? And maybe even introduce at::MemoryFormat::FortranContiguous which would be useful when creating tensors.
I have seen similar patterns a bit everywhere in linalg code, and having first-class support for fortran-style memory would probably simplify a lot of code

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In linalg code we need to have only that the last two dimensions to be Fortran contiguous and all other are normal C contiguous. Real Fortran-style memory would be to have the matrix in the first two dimensions and batches in the end, it's not useful to have.

Copy link
Collaborator

@lezcano lezcano left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks very good. I just found one part that was a bit difficult to follow. I made a proposal for a simpler way to handle that, which I think should be save one allocation.

If this sounds reasonable, the handling of the "unpacking" of the tensors after the geqrf call should be changed as well.

// geqrf requires m x n workspace input that is modified in-place
// if m > n and reduced==true we use Q tensor for storing the result of geqrf operation
// otherwise R tensor is used
Tensor QR = R;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This logic is a bit difficult to follow. It's difficult to see that all the cases are covered. I think it'd be better to initialise it with a lambda and an if/else structure that has all the else's

const Tensor QR = [&]{
if (m <= n) {
    return R;
} else {
    if (compute_q) {
        return reduced_mode ? Q : R;
    } else {
        return at::empty(input.transpose(-2, -1).sizes(), input.options()).transpose_(-2, -1);
    }
}}();

Note that the current implementation allocates a new tensor whenever compute_q == false, which is not necessary.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An analogous if/else (without the lambda) should be written below to handle the unpacking of the result.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the current implementation allocates a new tensor whenever compute_q == false, which is not necessary

Right, I'll fix that.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed the code to use your conditions.

@mruberry
Copy link
Collaborator

@lezcano would you take a look at this PR?

@IvanYashchuk
Copy link
Collaborator Author

If I understand it correctly, this PR by itself is mostly a no-op because it introduces _linalg_qr_out_helper but it never uses it apart for re-implementing the old _linalg_qr_helper, but I assume it will be used by subsequent PRs in the ghstack

The previous helper was allocating memory on the go, with narrowing and resizes, which is not a good strategy to do. Here we first allocate the memory of the correct size and then fill it with the result.

IvanYashchuk added a commit to IvanYashchuk/pytorch that referenced this pull request Apr 29, 2021
Copy link
Collaborator

@lezcano lezcano left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

Copy link
Collaborator

@mruberry mruberry left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Stamped

@facebook-github-bot
Copy link
Contributor

@mruberry merged this pull request in d5e1cac.

@facebook-github-bot facebook-github-bot deleted the gh/ivanyashchuk/15/head branch May 4, 2021 14:16
krshrimali pushed a commit to krshrimali/pytorch that referenced this pull request May 19, 2021
Summary: Pull Request resolved: pytorch#56254

Test Plan: Imported from OSS

Reviewed By: ngimel

Differential Revision: D27960151

Pulled By: mruberry

fbshipit-source-id: 4067afed0dcca3f32d0fa153e50a268a850817b2
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cla signed Merged module: linear algebra Issues related to specialized linear algebra operations in PyTorch; includes matrix multiply matmul open source

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants