Hi Guys,
I am a newbie to Cuda. I am currently doing a performance comparison in Dynamic parallelism.
I have three kernels. I compared the performance with Host kernel launching and Device Kernel Launching (dynamic parallelism).
Dynamic parallelism parent kernel dimensions are
grid - (1, 0, 0)
block - (1, 0, 0)
And Each child kernels dimensions are detailed below. Launching happens after completion of previous kernel (Not recursicely). I have set the “cudaLimitDevRuntimeSyncDepth” to be 2, cudaLimitDevRuntimePendingLaunchCount" = 1024* 128
Host kernel launching dimensions are same to child kernel dimensions.
Followings are my kernel dimensions, Time taken to execute from Host launching, Device Launching.
| Calculation Type | Grid Dimension | Block Dimension | Host Launch | Device Launch |
-----------|----------------------|------------------|-------------------|---------------|---------------|
Kernel -1-|Map operation-------|-----1024-------|—1024------------|--------52.7us-|------119.2us–|
Kernel -2-|Reduce operation----|-----1024-------|—1024------------|-------183.7us-|------334.9us–|
Kernel -3-|Sort operation------|--------1-------|----512------------|-------221.7us-|------383.3us–|
I found some more details from [here][/http://users.ece.gatech.edu/~sudha/academic/class/ece8823/Lectures/Module-6-Microarchitecture/cuda-dyn-par.pdf]
The presentation explains dynamic Parallelism have some overhead in synchronization. And it says the kernel execution time should in be same.
But I observed the dynamic parallelism kernel execution time is higher than host kernel launching time.
I am not sure about the is there results. Or am I doing something wrong?
Test Enviroment
GPU - GeForce GTX 980
OS - Red Hat Enterprise Linux Server release 6.6 (Linux k7-1 2.6.32-504.el6.x86_64)
CPU - Intel(R) Core™ i7-4770 CPU @ 3.40GHz
The time stamps are taken after second running iteration.
Thank you in advance.
Vishwa