CUDA default threads per block is unintentionally set to 512 instead of 1024

#62 (!) introduced a preprocessor check to allow Caffe to run on CUDA devices of compute capability < 2 by setting `CAFFE_CUDA_NUM_THREADS` (the number of threads per block used for most of Caffe's CUDA kernels) to 512 if needed.

According to my recent check and my reading of the CUDA programming guide, this code is not correct. `__CUDA_ARCH__` is not defined in host code (see https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#application-compatibility), meaning the check always fails. (Note that the preprocessor doesn't care that the macro isn't defined, it just treats it as zero (see https://gcc.gnu.org/onlinedocs/cpp/If.html.))

I don't know if this has any performance implications; it might be fine to just leave threads/block at 512 and remove the dead code.

I'm noting here since I don't have time to send a patch right now; feel free to do so, or I'll try to get to it in a few days.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CUDA default threads per block is unintentionally set to 512 instead of 1024 #3281

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

CUDA default threads per block is unintentionally set to 512 instead of 1024 #3281

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions