Skip to content

Conversation

@thenumberouscode
Copy link
Contributor

@thenumberouscode thenumberouscode commented Sep 15, 2025

This change may also resolve #161789, though verification is still needed.

PR #130472 would introduced the problem of freeing the same address without clean metadata. according to the below discussion, reverted it.

@pytorch-bot
Copy link

pytorch-bot bot commented Sep 15, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/162950

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 9a7bd01 with merge base 1c16c18 (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@thenumberouscode
Copy link
Contributor Author

@ngimel Please take a look when you have a moment.

@thenumberouscode
Copy link
Contributor Author

@pytorchbot label "release notes:cuda"

@pytorch-bot
Copy link

pytorch-bot bot commented Sep 15, 2025

Didn't find following labels among repository labels: release notes:cuda

@thenumberouscode
Copy link
Contributor Author

@pytorchbot label "release notes: cuda"

@pytorch-bot pytorch-bot bot added the release notes: cuda release notes category label Sep 15, 2025
{
const std::lock_guard<std::mutex> lock(allocator_mutex_);
allocation_metadata_.emplace(r, _AllocationMetadata(size, device, stream));
auto result = allocation_metadata_.try_emplace(r, size, device, stream);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How can previous allocation with the same address not have been erased during free?

Copy link
Contributor Author

@thenumberouscode thenumberouscode Sep 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How can previous allocation with the same address not have been erased during free?
oh, because the same address was not erased before it was allocated again, it was deallocated by context, and the context did not clear the key in allocation_metadata_.

void CUDAPluggableAllocatorDeleterContext::free() {
free_fn_(data_, size_, device_, stream_);
delete this;
}

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SO that's the original sin, raw_delete behavior is different from context deleter. Context deleter has to also clean up metadata.

Copy link
Contributor Author

@thenumberouscode thenumberouscode Sep 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SO that's the original sin, raw_delete behavior is different from context deleter. Context deleter has to also clean up metadata.

I think we should not do clean metadata in context, for this is not the design goal of context, see PR #130472. it said the author want context to free the memory without depending on an allocator. and the metadata is a private member of an allocator. so we should not record metadata in allocate function again for context itself which already records the metadata. removing record metadata may be a better way.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, then not recording the metadata sounds like a better way.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So essentially the context as implemented in #130472 is bc-breaking? Nice! If we stop recording metadata, that is a bc-breaking change. If we stop cleaning up metadata, metadata is useless. cc @syed-ahmed

…to prevent free failures when deallocating the same memory address with different sizes.
@thenumberouscode thenumberouscode force-pushed the bugfix_cuda_pluggable_allocator_malloc branch from 3a1093f to 2c11b85 Compare September 17, 2025 08:40
@thenumberouscode thenumberouscode changed the title [CUDA] use try_emplace instead of emplace to prevent insert failed when memory address is same. [CUDA] Remove metadata recording in CUDAPluggableAllocator::allocate to prevent free failures when deallocating the same memory address with different sizes.. Sep 17, 2025
@ngimel
Copy link
Collaborator

ngimel commented Sep 17, 2025

Actually, after talking to @syed-ahmed we've decided that the proper fix is to just revert #130472, it's not doing anything useful. @thenumberouscode can you please try that?

@syed-ahmed
Copy link
Collaborator

I'm verifying locally if reverting #130472 fixes the problem.

Recording our conversation here for posterity, #130472 was written to allow usage of multiple CUDAPluggableAllocator in the same program and at that time I thought we would need that functionality to enable MemPools. However, when MemPools were implemented, we actually didn't end up needing that functionality.

change_current_allocator makes sure that the user can't change the allocator again, once it's called. It doesn't prevent one from creating multiple pluggable allocators which we need for mempools. We just can't set multiple allocators as current, and that's fine, because we never need that for mempools.

@mikaylagawarecki mikaylagawarecki added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Sep 18, 2025
@thenumberouscode thenumberouscode changed the title [CUDA] Remove metadata recording in CUDAPluggableAllocator::allocate to prevent free failures when deallocating the same memory address with different sizes.. [CUDA] revert 130472 Sep 18, 2025
@pytorch-bot pytorch-bot bot added the ci-no-td Do not run TD on this PR label Sep 18, 2025
@thenumberouscode
Copy link
Contributor Author

thenumberouscode commented Sep 18, 2025

Actually, after talking to @syed-ahmed we've decided that the proper fix is to just revert #130472, it's not doing anything useful. @thenumberouscode can you please try that?

Yes, I already had revert PR 130472 in my local development environment. Local testing shows the issue had been resolved. I've updated this PR's code to revert PR 130472. cc @syed-ahmed

@thenumberouscode thenumberouscode changed the title [CUDA] revert 130472 [CUDA] revert PR 130472 Sep 18, 2025
@thenumberouscode thenumberouscode force-pushed the bugfix_cuda_pluggable_allocator_malloc branch from 6ef5a02 to ee549fd Compare September 18, 2025 04:23
cudaStream_t stream_{};
};

#if defined(USE_ROCM)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ngimel @syed-ahmed this line should not be reverted to #if defined(TORCH_HIP_VERSION) for it is imported by another bug fix PR a19b667

@ngimel ngimel closed this Sep 18, 2025
@ngimel ngimel reopened this Sep 18, 2025
@ngimel
Copy link
Collaborator

ngimel commented Sep 18, 2025

Thank you @thenumberouscode

Copy link
Collaborator

@syed-ahmed syed-ahmed left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thank you @thenumberouscode!

@thenumberouscode
Copy link
Contributor Author

@syed-ahmed I had cleaned all the build cache use python setup.py clean command. And I rebuild the Pytorch project based on the lastest commit of this branch. I still could not reproduce bug. Here are my test code and output.

#include <torch/csrc/cuda/CUDAPluggableAllocator.h>
#include <torch/torch.h>

std::unordered_map<void*, ssize_t> allocation_sizes;

#define EXPECT_EQ(actual, expected) \
    do { \
        auto actual_val = (actual); \
        auto expected_val = (expected); \
        if (actual_val != expected_val) { \
            std::cerr << "FAIL: EXPECT_EQ(" << #actual << ", " << #expected << ")" << std::endl; \
            std::cerr << "  Expected: " << expected_val << std::endl; \
            std::cerr << "  Actual: " << actual_val << std::endl; \
            std::cerr << "  At: " << __FILE__ << ":" << __LINE__ << std::endl; \
            return false; \
        } \
    } while (0)

void* logging_malloc(size_t size, int device, cudaStream_t stream) {
    void* ptr;
    cudaMalloc(&ptr, size);
    std::cout << "alloc ptr=" << ptr << " size=" << size << " device=" << device
              << " stream=" << stream << std::endl;
    allocation_sizes[ptr] = size;   
    return ptr;
}

void logging_free(void* ptr, size_t size, int device, cudaStream_t stream) {
    std::cout << "free  ptr=" << ptr << " size=" << size << " device=" << device
              << " stream=" << stream << std::endl;
    // Print out any frees that don't match the allocation sizes
    if (allocation_sizes.find(ptr) != allocation_sizes.end()) {
        if (allocation_sizes[ptr] != size) {
            std::cout << "*** ERROR: free mismatch: " << ptr << " size=" << size
                      << " expected=" << allocation_sizes[ptr] << std::endl;
        }
    } else {
        std::cout << "WARNING: free of unknown ptr=" << ptr << std::endl;
    }
    cudaFree(ptr);
    allocation_sizes.erase(ptr);
}

// TEST(TestTorchUnique, UniqueComparisonTest) {
int main() {
    auto custom_allocator =
        torch::cuda::CUDAPluggableAllocator::createCustomAllocator(logging_malloc, logging_free);
    torch::cuda::CUDAPluggableAllocator::changeCurrentAllocator(custom_allocator);
    // Run the command 3 times; the first 2 will pass and the third invocation will have
    // different sizes in alloc and free
    for (int i = 0; i < 3; ++i) {
        std::cout << "\n Starting test " << i << std::endl;
        // Initialize simple sorted tensor with repeats
        torch::Tensor sorted_tensor =
            torch::tensor({0, 0, 0, 1, 1, 2, 3, 3, 3, 3, 5},
                          torch::TensorOptions().dtype(torch::kFloat32).device(at::kCUDA));

        std::cout << "Starting unique_consecutive" << std::endl;
        // This operation will call malloc/free with different sizes on the same pointer
        auto unique_dim_result = torch::unique_consecutive(sorted_tensor, false, true, 0);
        std::cout << "Finished unique_consecutive" << std::endl;
        // Everything below is only there to validate correct results
        auto unique_dim_values = std::get<0>(unique_dim_result);
        auto unique_dim_counts = std::get<2>(unique_dim_result);

        // Check tensor sizes
        EXPECT_EQ(unique_dim_values.size(0), 5);
        EXPECT_EQ(unique_dim_counts.size(0), 5);

        // Copy to CPU before accessing elements
        torch::Tensor cpu_values = unique_dim_values.cpu();
        torch::Tensor cpu_counts = unique_dim_counts.cpu();

        // Use accessors on the CPU tensors
        auto values_accessor = cpu_values.accessor<float, 1>();
        auto counts_accessor = cpu_counts.accessor<int64_t, 1>();

        // Check individual values using accessors
        EXPECT_EQ(values_accessor[0], 0.0f);
        EXPECT_EQ(values_accessor[1], 1.0f);
        EXPECT_EQ(values_accessor[2], 2.0f);
        EXPECT_EQ(values_accessor[3], 3.0f);
        EXPECT_EQ(values_accessor[4], 5.0f);

        // Check count values using accessors
        EXPECT_EQ(counts_accessor[0], 3);
        EXPECT_EQ(counts_accessor[1], 2);
        EXPECT_EQ(counts_accessor[2], 1);
        EXPECT_EQ(counts_accessor[3], 4);
        EXPECT_EQ(counts_accessor[4], 1);
        std::cout << "Finished test " << i << " \n" << std::endl;
    }
}

output:

Starting test 0
alloc ptr=0x762f79600000 size=44 device=0 stream=0
Starting unique_consecutive
alloc ptr=0 size=0 device=0 stream=0
alloc ptr=0x762f79600200 size=88 device=0 stream=0
alloc ptr=0 size=0 device=0 stream=0
alloc ptr=0 size=0 device=0 stream=0
alloc ptr=0 size=0 device=0 stream=0
alloc ptr=0x762f79600400 size=96 device=0 stream=0
alloc ptr=0x762f79600600 size=1535 device=0 stream=0
free  ptr=0x762f79600600 size=1535 device=0 stream=0
alloc ptr=0x762f79600600 size=40 device=0 stream=0
alloc ptr=0x762f79600800 size=255 device=0 stream=0
free  ptr=0x762f79600800 size=255 device=0 stream=0
free  ptr=0x762f79600400 size=96 device=0 stream=0
alloc ptr=0 size=0 device=0 stream=0
alloc ptr=0x762f79600400 size=20 device=0 stream=0
free  ptr=0x762f79600200 size=88 device=0 stream=0
Finished unique_consecutive
Finished test 0 

free  ptr=0x762f79600400 size=20 device=0 stream=0
free  ptr=0x762f79600600 size=40 device=0 stream=0
free  ptr=0x762f79600000 size=44 device=0 stream=0

 Starting test 1
alloc ptr=0x762f72800000 size=44 device=0 stream=0
Starting unique_consecutive
alloc ptr=0 size=0 device=0 stream=0
alloc ptr=0x762f72800200 size=88 device=0 stream=0
alloc ptr=0 size=0 device=0 stream=0
alloc ptr=0 size=0 device=0 stream=0
alloc ptr=0 size=0 device=0 stream=0
alloc ptr=0x762f72800400 size=96 device=0 stream=0
alloc ptr=0x762f72800600 size=1535 device=0 stream=0
free  ptr=0x762f72800600 size=1535 device=0 stream=0
alloc ptr=0x762f72800600 size=40 device=0 stream=0
alloc ptr=0x762f72800800 size=255 device=0 stream=0
free  ptr=0x762f72800800 size=255 device=0 stream=0
free  ptr=0x762f72800400 size=96 device=0 stream=0
alloc ptr=0 size=0 device=0 stream=0
alloc ptr=0x762f72800400 size=20 device=0 stream=0
free  ptr=0x762f72800200 size=88 device=0 stream=0
Finished unique_consecutive
Finished test 1 

free  ptr=0x762f72800400 size=20 device=0 stream=0
free  ptr=0x762f72800600 size=40 device=0 stream=0
free  ptr=0x762f72800000 size=44 device=0 stream=0

 Starting test 2
alloc ptr=0x762f72800000 size=44 device=0 stream=0
Starting unique_consecutive
alloc ptr=0 size=0 device=0 stream=0
alloc ptr=0x762f72800200 size=88 device=0 stream=0
alloc ptr=0 size=0 device=0 stream=0
alloc ptr=0 size=0 device=0 stream=0
alloc ptr=0 size=0 device=0 stream=0
alloc ptr=0x762f72800400 size=96 device=0 stream=0
alloc ptr=0x762f72800600 size=1535 device=0 stream=0
free  ptr=0x762f72800600 size=1535 device=0 stream=0
alloc ptr=0x762f72800600 size=40 device=0 stream=0
alloc ptr=0x762f72800800 size=255 device=0 stream=0
free  ptr=0x762f72800800 size=255 device=0 stream=0
free  ptr=0x762f72800400 size=96 device=0 stream=0
alloc ptr=0 size=0 device=0 stream=0
alloc ptr=0x762f72800400 size=20 device=0 stream=0
free  ptr=0x762f72800200 size=88 device=0 stream=0
Finished unique_consecutive
Finished test 2 

free  ptr=0x762f72800400 size=20 device=0 stream=0
free  ptr=0x762f72800600 size=40 device=0 stream=0
free  ptr=0x762f72800000 size=44 device=0 stream=0

@thenumberouscode
Copy link
Contributor Author

@syed-ahmed I've used the main branch to reproduce the bug. It occur again. So I guess the revert may work. please let me know if I am wrong.
the output build with main branch:

./consecutive 

 Starting test 0
alloc ptr=0x7d1f03600000 size=44 device=0 stream=0
Starting unique_consecutive
alloc ptr=0 size=0 device=0 stream=0
alloc ptr=0x7d1f03600200 size=88 device=0 stream=0
free  ptr=0 size=0 device=0 stream=0
alloc ptr=0 size=0 device=0 stream=0
alloc ptr=0 size=0 device=0 stream=0
alloc ptr=0 size=0 device=0 stream=0
alloc ptr=0x7d1f03600400 size=96 device=0 stream=0
free  ptr=0 size=0 device=0 stream=0
alloc ptr=0x7d1f03600600 size=1535 device=0 stream=0
free  ptr=0x7d1f03600600 size=1535 device=0 stream=0
alloc ptr=0x7d1f03600600 size=40 device=0 stream=0
free  ptr=0 size=0 device=0 stream=0
WARNING: free of unknown ptr=0
alloc ptr=0x7d1f03600800 size=255 device=0 stream=0
free  ptr=0x7d1f03600800 size=255 device=0 stream=0
free  ptr=0x7d1f03600400 size=96 device=0 stream=0
alloc ptr=0 size=0 device=0 stream=0
alloc ptr=0x7d1f03600400 size=20 device=0 stream=0
free  ptr=0 size=0 device=0 stream=0
free  ptr=0x7d1f03600200 size=88 device=0 stream=0
Finished unique_consecutive
Finished test 0 

free  ptr=0x7d1f03600400 size=20 device=0 stream=0
free  ptr=0 size=0 device=0 stream=0
WARNING: free of unknown ptr=0
free  ptr=0x7d1f03600600 size=40 device=0 stream=0
free  ptr=0x7d1f03600000 size=44 device=0 stream=0

 Starting test 1
alloc ptr=0x7d1efa800000 size=44 device=0 stream=0
Starting unique_consecutive
alloc ptr=0 size=0 device=0 stream=0
alloc ptr=0x7d1efa800200 size=88 device=0 stream=0
free  ptr=0 size=0 device=0 stream=0
alloc ptr=0 size=0 device=0 stream=0
alloc ptr=0 size=0 device=0 stream=0
alloc ptr=0 size=0 device=0 stream=0
alloc ptr=0x7d1efa800400 size=96 device=0 stream=0
free  ptr=0 size=0 device=0 stream=0
alloc ptr=0x7d1efa800600 size=1535 device=0 stream=0
free  ptr=0x7d1efa800600 size=1535 device=0 stream=0
alloc ptr=0x7d1efa800600 size=40 device=0 stream=0
free  ptr=0 size=0 device=0 stream=0
WARNING: free of unknown ptr=0
alloc ptr=0x7d1efa800800 size=255 device=0 stream=0
free  ptr=0x7d1efa800800 size=255 device=0 stream=0
free  ptr=0x7d1efa800400 size=96 device=0 stream=0
alloc ptr=0 size=0 device=0 stream=0
alloc ptr=0x7d1efa800400 size=20 device=0 stream=0
free  ptr=0 size=0 device=0 stream=0
free  ptr=0x7d1efa800200 size=88 device=0 stream=0
Finished unique_consecutive
Finished test 1 

free  ptr=0x7d1efa800400 size=20 device=0 stream=0
free  ptr=0 size=0 device=0 stream=0
WARNING: free of unknown ptr=0
free  ptr=0x7d1efa800600 size=40 device=0 stream=0
free  ptr=0x7d1efa800000 size=44 device=0 stream=0

 Starting test 2
alloc ptr=0x7d1efa800000 size=44 device=0 stream=0
Starting unique_consecutive
alloc ptr=0 size=0 device=0 stream=0
alloc ptr=0x7d1efa800200 size=88 device=0 stream=0
free  ptr=0 size=0 device=0 stream=0
alloc ptr=0 size=0 device=0 stream=0
alloc ptr=0 size=0 device=0 stream=0
alloc ptr=0 size=0 device=0 stream=0
alloc ptr=0x7d1efa800400 size=96 device=0 stream=0
free  ptr=0 size=0 device=0 stream=0
alloc ptr=0x7d1efa800600 size=1535 device=0 stream=0
free  ptr=0x7d1efa800600 size=40 device=0 stream=0
*** ERROR: free mismatch: 0x7d1efa800600 size=40 expected=1535
alloc ptr=0x7d1efa800600 size=40 device=0 stream=0
free  ptr=0 size=0 device=0 stream=0
WARNING: free of unknown ptr=0
alloc ptr=0x7d1efa800800 size=255 device=0 stream=0
free  ptr=0x7d1efa800800 size=255 device=0 stream=0
free  ptr=0x7d1efa800400 size=96 device=0 stream=0
alloc ptr=0 size=0 device=0 stream=0
alloc ptr=0x7d1efa800400 size=20 device=0 stream=0
free  ptr=0 size=0 device=0 stream=0
free  ptr=0x7d1efa800200 size=88 device=0 stream=0
Finished unique_consecutive
Finished test 2 

free  ptr=0x7d1efa800400 size=20 device=0 stream=0
free  ptr=0 size=0 device=0 stream=0
WARNING: free of unknown ptr=0
free  ptr=0x7d1efa800600 size=40 device=0 stream=0
free  ptr=0x7d1efa800000 size=44 device=0 stream=0

@ngimel
Copy link
Collaborator

ngimel commented Sep 19, 2025

Can we add code from #161789 as a test (or actually a specifically constructed example that would try to allocate to different-sized buffers with the same address? Custom allocators can be constructed with load_inline

@ngimel ngimel added the ciflow/trunk Trigger trunk jobs on your pull request label Sep 19, 2025
@ngimel
Copy link
Collaborator

ngimel commented Sep 19, 2025

@syed-ahmed are you ok with landing this PR as is? Even if it doesn't fix all the failures (I doubt it, tbh, @thenumberouscode examples show that it actually fixes the previous failure) it's a strict improvememt.

@syed-ahmed
Copy link
Collaborator

Yes, landing this as is sounds good to me. Let's keep the original bug open. @thenumberouscode thank you for the PR and the follow-ups!

@syed-ahmed
Copy link
Collaborator

@pytorchbot merge

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@ngimel
Copy link
Collaborator

ngimel commented Sep 19, 2025

@syed-ahmed @thenumberouscode can one of you take an AI to add the test as a follow-up, and to verify if original bug(s) are fixed, because currently @syed-ahmed sees failures even on top of this, whereas @thenumberouscode doesn't?

@ngimel
Copy link
Collaborator

ngimel commented Sep 19, 2025

@pytorchbot cherry-pick --onto release/2.9 -c regression

pytorchbot pushed a commit that referenced this pull request Sep 19, 2025
This change may also resolve #161789, though verification is still needed.

PR #130472 would introduced the problem of  freeing the same address without clean metadata. according to the below discussion, reverted it.
Pull Request resolved: #162950
Approved by: https://github.com/ngimel, https://github.com/eqy, https://github.com/syed-ahmed

(cherry picked from commit 4a160da)
@pytorchbot
Copy link
Collaborator

Cherry picking #162950

The cherry pick PR is at #163379 and it is recommended to link a regression cherry pick PR with an issue. The following tracker issues are updated:

Details for Dev Infra team Raised by workflow job

@syed-ahmed
Copy link
Collaborator

currently @syed-ahmed sees failures even on top of this, whereas @thenumberouscode doesn't?

I believe @thenumberouscode is also seeing per their last comment. I'll take the AI and will post the test on Monday.

@ngimel
Copy link
Collaborator

ngimel commented Sep 20, 2025

If I'm reading his comments correctly, bug doesn't repro with revert for him, and repros on the main branch.

@thenumberouscode
Copy link
Contributor Author

thenumberouscode commented Sep 22, 2025

If I'm reading his comments correctly, bug doesn't repro with revert for him, and repros on the main branch.

@ngimel Yes, I cannot reproduce the bug with the reverted code, but I can reproduce it on the main branch without the revert. cc @syed-ahmed

@thenumberouscode
Copy link
Contributor Author

@syed-ahmed @thenumberouscode can one of you take an AI to add the test as a follow-up, and to verify if original bug(s) are fixed, because currently @syed-ahmed sees failures even on top of this, whereas @thenumberouscode doesn't?

@

@syed-ahmed @thenumberouscode can one of you take an AI to add the test as a follow-up, and to verify if original bug(s) are fixed, because currently @syed-ahmed sees failures even on top of this, whereas @thenumberouscode doesn't?

@ngimel I'm willing to follow up and add the unit test as well. I can do this if @syed-ahmed
is too busy to write it.

mansiag05 pushed a commit to mansiag05/pytorch that referenced this pull request Sep 22, 2025
This change may also resolve pytorch#161789, though verification is still needed.

PR pytorch#130472 would introduced the problem of  freeing the same address without clean metadata. according to the below discussion, reverted it.
Pull Request resolved: pytorch#162950
Approved by: https://github.com/ngimel, https://github.com/eqy, https://github.com/syed-ahmed
cleonard530 pushed a commit to cleonard530/pytorch that referenced this pull request Sep 22, 2025
This change may also resolve pytorch#161789, though verification is still needed.

PR pytorch#130472 would introduced the problem of  freeing the same address without clean metadata. according to the below discussion, reverted it.
Pull Request resolved: pytorch#162950
Approved by: https://github.com/ngimel, https://github.com/eqy, https://github.com/syed-ahmed
@ngimel
Copy link
Collaborator

ngimel commented Sep 22, 2025

@thenumberouscode yes, please add the test, greatly appreciated!

@syed-ahmed
Copy link
Collaborator

syed-ahmed commented Sep 22, 2025

Added a test, need to make sure it's running in the CI and cleanup logging if needed: #163575

dsashidh pushed a commit to dsashidh/pytorch that referenced this pull request Sep 26, 2025
This change may also resolve pytorch#161789, though verification is still needed.

PR pytorch#130472 would introduced the problem of  freeing the same address without clean metadata. according to the below discussion, reverted it.
Pull Request resolved: pytorch#162950
Approved by: https://github.com/ngimel, https://github.com/eqy, https://github.com/syed-ahmed
atalman pushed a commit that referenced this pull request Sep 29, 2025
[CUDA] revert PR 130472 (#162950)

This change may also resolve #161789, though verification is still needed.

PR #130472 would introduced the problem of  freeing the same address without clean metadata. according to the below discussion, reverted it.
Pull Request resolved: #162950
Approved by: https://github.com/ngimel, https://github.com/eqy, https://github.com/syed-ahmed

(cherry picked from commit 4a160da)

Co-authored-by: thenumberouscode <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci-no-td Do not run TD on this PR ciflow/trunk Trigger trunk jobs on your pull request Merged open source release notes: cuda release notes category triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module

Projects

None yet

Development

Successfully merging this pull request may close these issues.

"cudaErrorIllegalAddress" in unique_consecutive (with return_inverse and RMM)

7 participants