This repository contains the simulation of PreSIT based on GEM5 FS mode. PreSIT can predict the cryptographic results in SGX-style Integrity Tree (SIT) to reduce the performance overhead caused by SIT. Two kinds of computations will be predicted, including AES decryption and Message Authentication Code (MAC). More details are shown in IEEE TCAD 2025 Paper:
Xinrui Wang, Lang Feng, Zhongfeng Wang, PreSIT: Predict Cryptography Computations in SGX-Style Integrity Trees, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 44, no. 3, pp. 882-896, March 2025.
The following sections will introduce the details of the implementation.
Contact: Xinrui Wang ([email protected]) and Lang Feng ([email protected]).
- Folder
gem5: This folder contains the gem5 simulator, the primary realization of PreSIT is ingem5/src/mem/mem_ctrl.cc - Folder
programs: This folder contains the benchmarks that need to run on PreSIT. - Folder
qemuis empty, but needs to be installed by the user (will be further introduced later). - Folder
recordRest: This folder records the meta results of benchmarks. - Folder
linux_checkpoint: This folder records linux checkpoint after booting the linux system under different configurations, such as 2 or 4 cores or 32GB memory size. - Folder
riscv-toolchainis empty, but needs to install riscv gcc chain by the user (will be further introduced later). - Folder
riscv64-sampleis key for linux system. - Folder
disk-imagestores the ubuntu linux image, but now is absent. README_UbuntuImage.mdrecords how to get the linux imgae for multicore.README_BBL.mdrecords how to get the linux imgae for single core.riscv_diskis the linux disk image, storing the benchmark execution files for Gem5 Full-System (FS mode) simulation.ATT1/2.logrecords the results of the two simulated attacks mentioned in the paper.saveUbuntuImage.shis a shell script to store the linux booting checkpoint.TEE-paper-help.zipcontains the data and picture drawing python files for the paper.
After you download this repo, the first step is to replace the YOUR_PATH in this repo (this string will occur in a lot of files, you can use shell command or vscode to find and replace) to the absolute path of your repo, e.g. YOUR_PATH to /home/your_name/PreSIT-master
We need to use the FS to simulate PreSIT, and also need to reserve an enclave memory region to store the integrity tree. To achieve this goal, we refer to a website or README_BBL.md in this repo to build. Here is a brief introduction on how to build. For details, please refer website.
- Install the riscv gcc toolchain in
riscv-toolchain. The detailed information is shown in the website. Note that after you install it, you can seeriscv-toolchain/bin/riscv64-unknown-linux-gnu-*in this repo. - Build the
bblfollowing the instructions. You need to storebblinriscv64-sample/riscv-pk/build{memSize}G/bbl, thememSizecan be 8,16, and 32, which means you need to rebuild the bbl if you want to change your memory size. NOTE that we provide the prebuilt bbl in the right place, which you can directly use. - Build the
riscv_diskand put it the repo root directory NOTE this is enough for simulating single core configuration. - Download the ubuntu image to
disk-image/. The link is https://gem5.googlesource.com/public/gem5-resources/+/8786f809bbbbcfb9aff4e4df0d2d21848442d706/src/riscv-ubuntu/README.md (also stored in this repoREADME_UbuntuImage.md). Please follow the necessary steps in this link This is essential for simulating multicore configurations.
Now you get all the requirements for linux system. If the user only wants to simulate 1-core configuration, it is no need to get the ubuntu image.
Firstly, the reason for storing the checkpoint after booting is very simple: every time booting linux system consumes a lot of time, even under TimingCPU model, about 30 minutes on a regular PC. Therefore, we need to boot it first and then restore it to simulate our programs. NOTE that we have already built all the checkpoints under different configurations, stored in linux_checkpoint. The user can directly unzip the files in this folder to easily use. The user can also build their own checkpoint. This operation needs some special tricks, so I listed them below:
- Change line 70
#define ENC_PROTECT trueofgem5/src/mem/mem_ctrl.hhtofalse, which means to disable the integrity tree operations. And please changeline 109andline 110to the right configurations, andmemSizeinline 79ofgem5/configs/example/gem5_library/riscv-fs.py, details shown in Section 5.1. Remember to build Gem5 after you modify the C code. - Enter
gem5folder, and run the shellbuild_gem5.sh. - If you want to get single-core checkpoint, run the shell
saveUbuntuImage.sh 1. After booting the system, input your username and keyword (mentionedREADME_BBL.md). - If you want get the multi-core checkpoint, you can directly run shell
saveUbuntuImage.sh n(n means the number of cores). Then you will see the linux system with ubuntu is booting. Please first complete the operations inREADME_UbuntuImage.md(the linux username and password are also mentioned in this document), therefore you also need to installqemu. - After booting the system, you can connect to this linux via
telnet localhost <port>. Then the key is to executem5 checkpointandm5 exit. Then you will the latest checkpoint stored ingem5/m5out/checkpoint2/.
NOTE that there is a key file programs/SPEC/enclave.cc, which takes responsibility for initializing the enclave memory region.
All the compile scripts of this benchmark are stored in programs/SPEC. You need to set the path of SPEC in every compule_xxx.sh, where xxx could be 502/505/.... After that, you can directly enter programs/SPEC/ and run the shell compile_all.sh. Then you will get 11 cases in SPEC CPU 2017.
Enter programs/GAP/ and run the shell compile.sh. Then, you will get the execution files.
This benchmark is very hard to download, so we directly provide the compiled version in programs/Parsec. The user can directly enter the folder and unzip the files. If users want to build by themselves, please download the source code and imitate the SPEC compilation.
After you get all the compiled files, please copy them to the disk, i.e., riscv_disk or ubuntu.image. You can further refer to README_BBL.md or README_UbuntuImage.md for how to copy the execution files to the disk.
NOTE that we support 16GB + 1core, 16GB + 2cores, 16GB + 4cores, 4GB + 1core, and 32GB + 1core. Please remember and check every time you want to change the configurations as the following operations.
After getting the execution files and preparing the linux system, you can run the benchmarks. We first introduce the different configurations involved in our code. The different configurations can be done by setting the macro in gem5/src/mem/mem_ctrl.hh. We list all the supported configurations below:
// this is file "gem5/src/mem/mem_ctrl.hh"
#define ENC_PROTECT true //line 70
#define TURNON_PRE true //line 83
#define TURNON_PRE_AES false //line 84
#define INPROG true //line 95
#define PDA true //line 97
#define APA true //line 98
#define COREMEM 162 //line 109
#define _MEM 16 //line 110
#define ATT1 false //line 134
#define ATT2 false //line 135
#define ONLY_4KB false //line 137ENC_PROTECTindicates if we enable the integrity tree operations, which can be used when we want to obtain the checkpoint.TURNON_PREindicates if we enablePreSIT-BASICmentioned in the paper, which means we predict MAC + 1 OTP.TURNON_PRE_AESindicates if we enableOTPOmentioned in the paper, which means we predict MAC + 2 OTPs.INPROGindicates if the program we want to evaluate has loaded the checkpoint. Remember to set this value toTruewhen you restore the corresponding benchmark's checkpoint. Otherwise, please set this value toFalse.PDAindicates if we enablePrefetch Deciding Algorithmmentioned in the paper, which will decide if to predict every time by using the recorded history GAP (LFIFO).APAindicates ifwe enableAddress Preidcting Algorithmmentioned in the paper, which means if we use the address prediction algorithm or randomly predicting.COREMEMindicates the different configurations of the core number and memory size.161means 16GB+ 1core._MEMindicates the memory size to calculate the address mask.16means 16GB.ATT1indicates if we are simulating the rowhammer attack.ATT2indicates if we are simulating the replay attack.ONLY_4KBindicates if we only check the beginning address of the 4kb region.
And please further manually modify the memSize in line 79 of gem5/configs/example/gem5_library/riscv-fs.py.
As we mentioned in the paper, we first jump to the beginning segment of the benchmark, therefore we need to execute the program and then get each case's checkpoint. Then, we can enter the real simulation for performance measurement or other tests. Please set ENC_PROTECT as false and the right value of COREMEM and _MEM of the C code, and then build gem5. Remember to modify memSize of riscv-fs.py. Then you can just enter programs run:
# for single core configuration
./saveCprPerbench.sh
// for 2 cores
./saveCprPerbench4Core.sh 2
// for 4 cores
./saveCprPerbench4Core.sh 4Then, after getting all the benchmarks, manually modify CORENUM and _MEM to the right value based on your current configuration. Then also remember to set the INPROG to true, representing that the program is already being executed. You can enter programs and run:
# get 1-core results
./runAllSpec.sh [configuration] [maxInstruction]
# get 2-core results
./runAll2core.sh [configuration] [maxInstruction]
# get 4-core results
./runAll4core.sh [configuration] [maxInstruction]These shells listed above can automatically change TURNON_PRE and TURNON_PRE_AES and build the gem5.
The basic flow of running an attack is simple. The first step is to compile the attack programs:
cd programs/Attack1-gradient
make
cd ../Attack2-matrix
makeThen you need to copy the execution file to the disk file, similar to the benchmarks in Section 4.
Then please set the ATT1/2 to True, _MEM to 16, and COREMEM to 161 of mem_ctrl.hh.
cd gem5
./build_gem5.sh
./gem5-riscv/gem5/build/RISCV/gem5.opt -d ./gem5-riscv/gem5/m5out/checkpoint2 ./gem5-riscv/gem5/configs/example/gem5_library/riscv-fs.py --cptPath xx -timingThen you will see the linux shell window. Enter the directory of ATT1/2 and execute the execution file. You will get the results of the rowhammer/replay attack simulation.
NOTE that we only evaluate the performance of SIT and the performance improvement of our proposed PreSIT, we only focus on the latency of AES and HMAC2 (both are 40ns) when we are evaluating the SIT performance.
The main code of SIT parallel operation flow and PreSIT is located within gem5/src/mem/mem_ctrl.cc, and some critical data structures and functions are defined within gem5/src/mem/mem_ctrl.hh. In this section, we will roughly introduce the idea and operations of the implementation (following the instruction can hugely help the user understand, but the details still need the user to read carefully). We will follow the flow of memory writing or reading to introduce. The line number mentioned in this appendix all relates to the mem_ctrl.cc.
//memory controller receives the memory request from LLC
MemCtrl::recvTimingReq(PacketPtr pkt) //line 2760
{
//...
//clear the data structure for this round' process
splitOriPkt_cnt = 0; //line 3058
//...
//use the EncEvent to handle SIT operations
schedule(EncEvent, curTick()); //line 3084
//...
//use the
if(... hasResponseRead && pkt->isRead()) //line 3088
{
//...
}
}When the LLC needs to send the request, it will be captured by the recvTimingReq in line 2760 of the mem_ctrl.cc. The user can read this code to learn more information. Then the data structure for SIT operations and PreSIT will be cleared from line 3058. In line 3085, a critical event EncEvent will be scheduled to handle the SIT operations (gem5 is based on event-programming). Then, from line 3088, we handle the special situation: the reading request hit the write buffer. As the user could see, we will directly return true; in this function, which means the SIT operation is partly in parallel with LLC reading/writing request.
After the EncEvent being scheduled, the function void MemCtrl::EncStateMachine() will be called back to handle this event. For simplicity, scheduling EncEvent means to call EncStateMachine in a certain time. This function is designed as a statemachine by calling itself (schedule(EncEvent, [time])) to change the state.
MemCtrl::EncStateMachine() //line 620
{
//...
if (stateEnc == StatesEnc::SPLITAG) //line 632
//...
if (stateEnc == StatesEnc::RDCNT) //line 658
{
+ vault_engine.getCounterTrace(splitOriPkt[splitOriPkt_cnt] & ENCALVE_MASK); //line 725
+ bool evict = encCache.EncRead(counterAddr,holdCnt); //line 766
+ EncGenPacket(pkt_encBuild,req0,counterAddr,64,encCache.RdMiss->blk,0); //line 931
}
//...
if (stateEnc == StatesEnc::CNTOP) //line 1056
{
+ vault_engine.counterPropragate(writeHashNum); //1098
}
//...
if (stateEnc == StatesEnc::CNTSANITY) //line 1176 Deprecated
//...
if (stateEnc == StatesEnc::WBLIMB) //line 1222
{
+ updateTreeMAC(); //line 1233
+ bool ifwb = encCache.EncWrite() ... //line 1243
+ hasReset |= vault_engine.resetRecord[i]; //line 1271
}
//...
if (stateEnc == StatesEnc::DORESETTREE) //line 1335
{
... // RstSate; from line 1335 to 1818
}
//...
if (stateEnc == StatesEnc::WUPMAC) //line 1821
{
+ EncGenPacket(pkt_encBuild,req0,macCache.Evicted_ptr.addr,64,...); //line 1867
}
//...
if (stateEnc == StatesEnc::RRDMAC) //line 1883
{
+ bool evict = macCache.EncRead(MACbeginAddr,chunk); //line 1901
}
//...
if (stateEnc == StatesEnc::RRDMAC_WB) //line 1974
//...
if (stateEnc == StatesEnc::RRDMAC_REFILL) //line 1961
//...
}Details such as operations of the above states are shown below, RW means both memory reading and writing need this operation:
SPLITAGhandles the situation, where a large packet needs splitting to several packets. This can be ignored under riscv.RDCNT(RW Both) handles the operation of reading SIT counters from theencCache(is dedicated for caching counter and the definition is shown inmem_ctrl.hh) or memory, depending on if the counter is hit. (1) This state will first get the address of the counters in line 725, by a data strcuturevault_engine(details are shwon inmem_ctrl.hh), which is designed to handle the general operations needed by SIT, such as getting the address of counters and update the counter values. (2) Then, based on the counter address,encCachewill be accessed in line 766. (3) If the counter misses, the statemachine will generate a new memory reading request to fetch the counter in line 931.CNTOP(W Only) handles the memory writing's SIT counter increment. The counter will increase based on the writing address, by thevault_engine.counterPropragate()in line 1098.WBLIMB(W Only) will firstly update the calculated Tree MAC (please distinguish this with Data MAC by reading theFig 1of the paper) and hash byupdateTreeMAC()then write back them in line 1243. At the same,vault_enginewill also record if there will be a reset in line 1271.DORESETTREE(W only) handles the operations of counter reset, which includes a lot of operations, managed by another sub-statemachine namedRstSate. If the user is interested, more details are shown for line 1335 to 1818.WUPMAC(W only) will write the fresh/calculated Data MAC to the memory. Note that the operations inP.2, including writing back the Ciphertext, counter, and MAC will be parallel because of the write buffer of the memory controller.RRDMAC(R only) will read the Data MAC for verification when memory reading happens. Note that the Data MAC will also be cached bymacCache, which is defined inmem_ctrl.hh.RRDMAC_WBandRRDMAC_REFILL(R only) will handle the situation if the MAC misses or replacement happens inmacCache.
Besides, another function is also modified to process the next request (function processNextReqEvent from line 3885 to 4408) because we need to schedule or reschedule during SIT or prediction operations in EncEvent.
//line 3885 to 4408
void
MemCtrl::processNextReqEvent(MemInterface* mem_intr,
MemPacketQueue& resp_queue,
EventFunctionWrapper& resp_event,
EventFunctionWrapper& next_req_event,
bool& retry_wr_req) {
...
}MemCtrl::EncStateMachine() //line 620
{
//....
if (stateEnc == StatesEnc::RDONE) //line 2006
{
if (hasCorrectPrediction) //line 2044
{
stateEnc = StatesEnc::DATATRAN; //line 2054
}
else if(hasToWaitCal) //line 2056
{
stateEnc = StatesEnc::DATATRAN; //line 2080
}
else
{
if (overlapHashRDMAC < gapPre.getAvgPrefetch()) //line 2097
{
if (isChecking) //line 2102
{
stateEnc = StatesEnc::DATATRAN;
}
else
{
if ((overlapHashRDMAC + gapPre.getAvgGap()) < gapPre.getAvgPrefetch()) //line 2112
{
stateEnc = StatesEnc::DATATRAN;
}
else
{
stateEnc = StatesEnc::PREDICTION_PREFETCH_DECIDE;
}
}
}
else
{
stateEnc = StatesEnc::PREDICTION_PREFETCH_DECIDE;
}
}
}
}Then we will introduce the code for prediction. The first step for PreSIT is to decide whether to enable prediction this time in the state RDONE. The rough code for this deciding algorithm is shown above, and there are several situations:
- line 2044: If
hasCorrectPredictionis true, it represents getting the right prediction, we don't predict this time and jump toDATATRANand do nothing for prediction. - line 2050: If
hasToWaitCalis true, it represents that the right MAC is being calculated. Similar to line 2044, we do nothing for prediction but wait for the end of the calculation. - line 2097: The remaining time for prediction after overlapping reading MAC and verifying MAC is
overlapHashRDMAC. If this value is larger than the time for prefetching (line 2097), we directly enable the prediction. Otherwise, it will further analyze. - line 2102: If another memory reading/writing request is coming intothe memory controller, indicated by
isCheckingin line 2102, we will cancel the prediction because dealing with the new request is more urgent (for better performance). - line 2112: This is a condition for further seeing if the controller has enough time to predict, as introduced in
Fig 8of the paper.
//...
if (stateEnc==StatesEnc::PREDICTION_PREFETCH_DECIDE) //line
{
+ bool hasFindGroup = teePrediction.findGroupRange(preDataAddr,groupPreTInx,true); //line 2258
}
//...
if (stateEnc==StatesEnc::PREDICTION_PREFETCH_DO) //line 2409
//...
if (stateEnc == StatesEnc::PREDICTION_PREFETCH_WAIT) //line 2507
//...
if (stateEnc == StatesEnc::PREDICTION_CAL) //line 2512
{
+ schedule(EncEvent, curTick() + (overlapHashRDMAC-PrefetchCycle)*CYCLETICK ); //line 2563
}If the controller decides to predict, there are mainly four steps.
PREDICTION_PREFETCH_DECIDEwill decide the address to prefetch and predict. At the same time, it will access the prediction table (PT in the paper), which is namedteePredictionin the code. The group and entry will also be allocated in this state.PREDICTION_PREFETCH_DOandPREDICTION_PREFETCH_WAITwill prefetch the data and wait for the response from the memory side.PREDICTION_CALwill precompute MAC by scheduling the event to simulate the time consuming (line 2563).
As we can see, SIT operations or prediction operations need to access the memory. Therefore, we also need to handle the memory response after sending the memory access request. The function to deal with the response is processRespondEvent. This part is relatively simple, so the user can directly read the code.
// line 3222 to 3547
void
MemCtrl::processRespondEvent(MemInterface* mem_intr,
MemPacketQueue& queue,
EventFunctionWrapper& resp_event,
bool& retry_rd_req)