Skip to content

JakobSachs/toySIMT

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

toySIMT

A very basic implementation of the simplified (single-loop approximation) of a SIMT-Core as described in General-Purpose Graphics Processor Architecture by Aamodt (2018).

Features:

  • RISC-style ISA with only basic instructions required to demo basic divergence/reconvergence handling
    • divergence/reconvergence is handled via coded (usually compiler) reconvergence points, similar to NVIDIAs barrier-branch mechanism, but simpler
  • a smaller warp/wave size of 8 to have more readable outputs

Example program:

The TEST_PROG is executing something akin to this pseudo-CUDA program:

__device__ void foo() {
    if (threadIdx.x <= 3) {
        printf("%d\n",threadIdx.x);
    }
}

This demonstrates the basics of masking, diverging, and reconvergence.


Actual instructions:

LID 0 // load thread id into register 0
LI 1, 0x3 // load 0x3 into register 1
BGT 0, 1, 0x10, 0x10 // jumps over the PRINT if register 0 > register 1
PRINT 0
NOP

Running the simulation outputs:

CORE RUNNING:
         ACTIVE 11111111 PC 0000 RPC 0000       LID 0
         ACTIVE 11111111 PC 0008 RPC 0000       LI 1, 0x3
         ACTIVE 11111111 PC 0010 RPC 0000       BGT 0, 1, 0x10, 0x10
         ACTIVE 00001111 PC 0018 RPC 0020       PRINT 0
Thread 0: r0 = 0
Thread 1: r0 = 1
Thread 2: r0 = 2
Thread 3: r0 = 3
         ACTIVE 11111111 PC 0020 RPC 0020       NOP
         ACTIVE 11111111 PC 0028 RPC 0020       NOP
         ACTIVE 11111111 PC 0030 RPC 0020       NOP
...

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages