-
Notifications
You must be signed in to change notification settings - Fork 308
Deprecate cub::FpLimits in favor of cuda::std::numeric_limits #3635
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
🟨 CI finished in 1h 58m: Pass: 94%/89 | Total: 2d 15h | Avg: 42m 54s | Max: 1h 12m | Hits: 145%/10896
|
| Project | |
|---|---|
| CCCL Infrastructure | |
| libcu++ | |
| +/- | CUB |
| Thrust | |
| CUDA Experimental | |
| python | |
| CCCL C Parallel Library | |
| +/- | Catch2Helper |
Modifications in project or dependencies?
| Project | |
|---|---|
| CCCL Infrastructure | |
| libcu++ | |
| +/- | CUB |
| +/- | Thrust |
| CUDA Experimental | |
| +/- | python |
| +/- | CCCL C Parallel Library |
| +/- | Catch2Helper |
🏃 Runner counts (total jobs: 89)
| # | Runner |
|---|---|
| 65 | linux-amd64-cpu16 |
| 8 | windows-amd64-cpu16 |
| 6 | linux-amd64-gpu-rtxa6000-latest-1 |
| 4 | linux-arm64-cpu16 |
| 3 | linux-amd64-gpu-rtx4090-latest-1 |
| 2 | linux-amd64-gpu-rtx2080-latest-1 |
| 1 | linux-amd64-gpu-h100-latest-1 |
cf55ae5 to
8e44ac4
Compare
🟩 CI finished in 1h 50m: Pass: 100%/90 | Total: 2d 17h | Avg: 43m 29s | Max: 1h 17m | Hits: 177%/12730
|
| Project | |
|---|---|
| CCCL Infrastructure | |
| libcu++ | |
| +/- | CUB |
| Thrust | |
| CUDA Experimental | |
| python | |
| CCCL C Parallel Library | |
| +/- | Catch2Helper |
Modifications in project or dependencies?
| Project | |
|---|---|
| CCCL Infrastructure | |
| libcu++ | |
| +/- | CUB |
| +/- | Thrust |
| CUDA Experimental | |
| +/- | python |
| +/- | CCCL C Parallel Library |
| +/- | Catch2Helper |
🏃 Runner counts (total jobs: 90)
| # | Runner |
|---|---|
| 65 | linux-amd64-cpu16 |
| 9 | windows-amd64-cpu16 |
| 6 | linux-amd64-gpu-rtxa6000-latest-1 |
| 4 | linux-arm64-cpu16 |
| 3 | linux-amd64-gpu-rtx4090-latest-1 |
| 2 | linux-amd64-gpu-rtx2080-latest-1 |
| 1 | linux-amd64-gpu-h100-latest-1 |
| _LIBCUDACXX_BEGIN_NAMESPACE_STD | ||
| template <> | ||
| struct CUB_NS_QUALIFIER::FpLimits<bfloat16_t> | ||
| struct __is_extended_floating_point<bfloat16_t> : true_type |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
question: Can you help me understand where this is used? My mental model of of our custom bfloat16_t and half_t is that we use them to "emulate" the native extended fp types in the absence of those. The reason I am asking is that I would like to make sure we're not promoting these "emulated" wrapper types to be a "real extended fp type" in places where it isn't.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's used by cuda::is_floating_point and cuda::std::numeric_limits. Adding bfloat16_t here allows __numeric_limits_impl to be chosen internally by cuda::std::numeric_limits. It also makes sure that since cuda::is_floating_point<__nv_bfloat16> is true, cuda::is_floating_point<bfloat16_t> is also true. So in short, it alignes the traits and limits for __nv_bfloat16 and its wrapper type.
The same goes for __half and half_t.
8e44ac4 to
3a81d5f
Compare
3a81d5f to
d2562b3
Compare
🟩 CI finished in 1h 52m: Pass: 100%/90 | Total: 2d 17h | Avg: 43m 27s | Max: 1h 21m | Hits: 177%/12730
|
| Project | |
|---|---|
| CCCL Infrastructure | |
| libcu++ | |
| +/- | CUB |
| Thrust | |
| CUDA Experimental | |
| python | |
| CCCL C Parallel Library | |
| +/- | Catch2Helper |
Modifications in project or dependencies?
| Project | |
|---|---|
| CCCL Infrastructure | |
| libcu++ | |
| +/- | CUB |
| +/- | Thrust |
| CUDA Experimental | |
| +/- | python |
| +/- | CCCL C Parallel Library |
| +/- | Catch2Helper |
🏃 Runner counts (total jobs: 90)
| # | Runner |
|---|---|
| 65 | linux-amd64-cpu16 |
| 9 | windows-amd64-cpu16 |
| 6 | linux-amd64-gpu-rtxa6000-latest-1 |
| 4 | linux-arm64-cpu16 |
| 3 | linux-amd64-gpu-rtx4090-latest-1 |
| 2 | linux-amd64-gpu-rtx2080-latest-1 |
| 1 | linux-amd64-gpu-h100-latest-1 |
(cherry picked from commit d85c66a)
|
Successfully created backport PR for |
(cherry picked from commit d85c66a)
…#3658) (cherry picked from commit d85c66a) Co-authored-by: Bernhard Manfred Gruber <[email protected]>
Pulled out of #3384.