conditionally enable hipsparse const descriptors for version >= 2.4.0, backport to release/1.13#1233
Merged
jeffdaily merged 4 commits intorelease/1.13from Jun 1, 2023
Merged
Conversation
cuSPARSE v12.0 has started to use const pointers for the descriptors, from `cusparse.h` (documentation is incorrect): ```cpp typedef struct cusparseSpVecDescr const* cusparseConstSpVecDescr_t; typedef struct cusparseDnVecDescr const* cusparseConstDnVecDescr_t; typedef struct cusparseSpMatDescr const* cusparseConstSpMatDescr_t; typedef struct cusparseDnMatDescr const* cusparseConstDnMatDescr_t; ``` Changing also the function signature for the corresponding destructors to accept a const pointer. This PR adds `ConstCuSparseDescriptorDeleter` working with `cusparseStatus_t (*destructor)(const T*)`. Some algorithm enums were deprecated during CUDA 11 and removed in CUDA 12, I replaced the following occurences ``` CUSPARSE_CSRMM_ALG1 -> CUSPARSE_SPMM_CSR_ALG1 CUSPARSE_COOMM_ALG1 -> CUSPARSE_SPMM_COO_ALG1 CUSPARSE_COOMM_ALG2 -> CUSPARSE_SPMM_COO_ALG2 ``` Pull Request resolved: pytorch#90765 Approved by: https://github.com/cpuhrsch
…pytorch#90897) [CUDA 12] Fix the endif guard position for cusparse const descriptors Related pytorch#90765 Pull Request resolved: pytorch#90897 Approved by: https://github.com/IvanYashchuk
See pytorch#91122 Summary: Some APIs are deprecated in newer version of CUDA. * cudaGraphInstantiate: From: ``` cudaGraphInstantiate ( cudaGraphExec_t* pGraphExec, cudaGraph_t graph, cudaGraphNode_t* pErrorNode, char* pLogBuffer, size_t bufferSize ) ``` To ``` __host__cudaError_t cudaGraphInstantiate ( cudaGraphExec_t* pGraphExec, cudaGraph_t graph, unsigned long long flags = 0 ) ``` * cudaProfilerInitialize: deprecated in cuda 11 and removed in cuda 12 Test Plan: GH CI Differential Revision: D41469051 Pull Request resolved: pytorch#91050 Approved by: https://github.com/jianyuh
…#1217) * conditionally enable hipsparse const descriptors * update hipsparse const API version condition to 2.4.0
jithunnair-amd
approved these changes
Jun 1, 2023
Collaborator
Author
|
CI is still running but build completed successfully. The build was the critical piece of this. The test2 suite hung during test_meta.py, reminds me of other hangs there. Merging since build passed. |
akashveramd
pushed a commit
that referenced
this pull request
Jun 13, 2025
This PR implements a core 'real' training loop in that it runs deepseekv2 model using a number of Titan components to train on real (C4) data with adamW and displays initial training loop metrics. There is a lot more to be done but the goal here is to get a true training loop going from which additional PRs will then improve upon it. <img width="1192" alt="Screenshot 2025-05-29 at 7 41 01 PM" src="https://github.com/user-attachments/assets/36ae2ff1-aa99-42c9-8b97-1e0a1ef8376e" /> A couple key highlights: a - the model is now controllable via toml or cmd line just like Titan main. Note that the expert parallel control is waiting for PR pytorch/torchtitan#1244 to land...atm it just manually puts ep to 2. b - we use the HF deepseek tokenizer and as a result I had to make a wrapper to deal with the bos and eos params passed by Titan. c - loss metrics, tps, etc are displaying but MFU and tflops need to be updated. A lot more improvements will come shortly but for now want to land this to ensure our base deepseek training loop is available to iterate on.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Backporting #1217 to release/1.13 branch. Required cherry-picking some upstream changes adding CUDA 12.0 support. Fixes SWDEV-403604.