Skip to content

conditionally enable hipsparse const descriptors for version >= 2.4.0, backport to release/1.13#1233

Merged
jeffdaily merged 4 commits intorelease/1.13from
release/1.13-hipsparse-const
Jun 1, 2023
Merged

conditionally enable hipsparse const descriptors for version >= 2.4.0, backport to release/1.13#1233
jeffdaily merged 4 commits intorelease/1.13from
release/1.13-hipsparse-const

Conversation

@jeffdaily
Copy link
Collaborator

@jeffdaily jeffdaily commented Jun 1, 2023

Backporting #1217 to release/1.13 branch. Required cherry-picking some upstream changes adding CUDA 12.0 support. Fixes SWDEV-403604.

IvanYashchuk and others added 4 commits June 1, 2023 17:52
cuSPARSE v12.0 has started to use const pointers for the descriptors, from `cusparse.h` (documentation is incorrect):
```cpp
typedef struct cusparseSpVecDescr const* cusparseConstSpVecDescr_t;
typedef struct cusparseDnVecDescr const* cusparseConstDnVecDescr_t;
typedef struct cusparseSpMatDescr const* cusparseConstSpMatDescr_t;
typedef struct cusparseDnMatDescr const* cusparseConstDnMatDescr_t;
```
Changing also the function signature for the corresponding destructors to accept a const pointer. This PR adds `ConstCuSparseDescriptorDeleter` working with `cusparseStatus_t (*destructor)(const T*)`.

Some algorithm enums were deprecated during CUDA 11 and removed in CUDA 12, I replaced the following occurences
```
CUSPARSE_CSRMM_ALG1 -> CUSPARSE_SPMM_CSR_ALG1
CUSPARSE_COOMM_ALG1 -> CUSPARSE_SPMM_COO_ALG1
CUSPARSE_COOMM_ALG2 -> CUSPARSE_SPMM_COO_ALG2
```

Pull Request resolved: pytorch#90765
Approved by: https://github.com/cpuhrsch
See pytorch#91122
Summary:
Some APIs are deprecated in newer version of CUDA.
* cudaGraphInstantiate:
From:
```
cudaGraphInstantiate ( cudaGraphExec_t* pGraphExec, cudaGraph_t graph, cudaGraphNode_t* pErrorNode, char* pLogBuffer, size_t bufferSize )
```
To
```
__host__​cudaError_t cudaGraphInstantiate ( cudaGraphExec_t* pGraphExec, cudaGraph_t graph, unsigned long long flags = 0 )
```
* cudaProfilerInitialize: deprecated in cuda 11 and removed in cuda 12

Test Plan: GH CI

Differential Revision: D41469051

Pull Request resolved: pytorch#91050
Approved by: https://github.com/jianyuh
…#1217)

* conditionally enable hipsparse const descriptors

* update hipsparse const API version condition to 2.4.0
@jeffdaily
Copy link
Collaborator Author

CI is still running but build completed successfully. The build was the critical piece of this. The test2 suite hung during test_meta.py, reminds me of other hangs there.

Merging since build passed.

@jeffdaily jeffdaily merged commit b4468f2 into release/1.13 Jun 1, 2023
akashveramd pushed a commit that referenced this pull request Jun 13, 2025
This PR implements a core 'real' training loop in that it runs
deepseekv2 model using a number of Titan components to train on real
(C4) data with adamW and displays initial training loop metrics.

There is a lot more to be done but the goal here is to get a true
training loop going from which additional PRs will then improve upon it.

<img width="1192" alt="Screenshot 2025-05-29 at 7 41 01 PM"
src="https://github.com/user-attachments/assets/36ae2ff1-aa99-42c9-8b97-1e0a1ef8376e"
/>

A couple key highlights:
a - the model is now controllable via toml or cmd line just like Titan
main. Note that the expert parallel control is waiting for PR
pytorch/torchtitan#1244 to land...atm it just
manually puts ep to 2.
b - we use the HF deepseek tokenizer and as a result I had to make a
wrapper to deal with the bos and eos params passed by Titan.
c - loss metrics, tps, etc are displaying but MFU and tflops need to be
updated.

A lot more improvements will come shortly but for now want to land this
to ensure our base deepseek training loop is available to iterate on.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants