x86 virtual machine with unlimited registers
- CMake (2.8.12.2+)
- LLVM (3.8local)
- Google Test (
libgtest) (1.7.0+)
mkdir build
cd build
cmake .. -G Ninja # if you want to use ninja
ninja nolimix86
A binary nolimix86 is built in build/bin.
nolimix86 [options] <input file>
nolimix86 --help for more options.
nolimix86 -A to dump the parsed program on stdout.
nolimix86 -Y to dump the cpu's state at the end of the program
(in YAML format).
The goal of nolimix86 is to simulate a virtual machine with unlimited
registers, called temporaries or temps (or virtual registers in LLVM).
So this is actually a pseudo x86 assembly, using %t0, %t1, etc. as valid
registers.
This virtual machine is running the code directly from the AST, and can't
actually encode and map the code in memory, since there is no valid encoding
for %t# registers.
The syntax is based on the AT&T x86 assembly and is parsed using a modified version of LLVM's x86AsmParser, that can be found here.
This following modifications were applied:
X86AsmParser.cpp: Handle%t#registers in the lexer.X86Operand.h: Handle%t#registers as general purpose registers, allowing compatibility between general purpose registers and temporaries.
Here is a simple example of a valid pseudo x86 assembly code:
.text
l0:
movl $1, %t1
movl %t1, %t2
addl $2, %t2
push $42
pop %eax
push $101
ret
Here is the output of nolimix86 -Y, dumping the state of the CPU at the end
of the program:
---
cpu: x86
registers:
- name: t2
value: 3
- name: t1
value: 1
- name: ebp
value: 0
- name: esi
value: 0
- name: esp
value: 4
- name: eax
value: 42
- name: ebx
value: 0
- name: ecx
value: 0
- name: edi
value: 0
flags:
- name: pf
value: 0
- name: af
value: 0
- name: cf
value: 0
- name: of
value: 0
- name: sf
value: 0
- name: zf
value: 0
...
addcallcmpidiv[VM][not implemented yet]imuljajaejbjbejejgjgejljlejmpjnejslea[VM][not implemented yet]leavemovneg[VM][not implemented yet]poppushretsal[VM][not implemented yet]sar[VM][not implemented yet]sete[VM][not implemented yet]subtest[VM][not implemented yet]inc[AST][VM][not implemented yet]dec[AST][VM][not implemented yet]or[AST][VM][not implemented yet]and[AST][VM][not implemented yet]xor[AST][VM][not implemented yet]
eaxebpebxecxediedxesiespt[0-9]+
In order to build with the sanitizers on, pass the following flags to CMake.
-DNOLIMIX86_ASANfor ASAN ->./bin/nolimix86-asan-DNOLIMIX86_MSANfor MSAN ->./bin/nolimix86-msan-DNOLIMIX86_UBSANfor UBSAN ->./bin/nolimix86-ubsan-DNOLIMIX86_LSANfor LSAN ->./bin/nolimix86-lsan
- Parsing is done using the
llvm::X86AsmParser. - A subclass of
llvm::MCELFStreameris creating an ast using various hacks in order to detect the correct x86 instruction to create. Most of the instructions opcodes are declared insrc/x86/instructions.hh, allowing a range of MC opcodes to bind to a specific instructions, without minding the operands types.
-
The AST consists in a basic hierarchy which is visitable through the
acceptmember function. -
All the AST walks are done using visitors and using the
default_visitoras a base class. -
The apply visitor allows us to workaround the fact that no virtual method can be templated. It takes a class defining the
visitfunction and calls it with every node. It allows the visitor to be generic for all the instructions, depending on their operand count.
-
The VM is an AST visitor. The final AST being a list of instructions, the VM is using an iterator as the
instruction pointer, calledeipon x86. -
The VM is a template class containing a CPU, that handles all the memory and register/flags access.
-
All the memory accesses go through the
cpu::mmutemplate class, which translates host < -- > vm addresses, depending on the CPU.
-
The stack and the heap are basically wrappers around a big allocated memory that is accessed through the
cpu::mmu. -
The CPU owns a stack, a heap, pointing to a global memory.
- Registers ->
std::unordered_map<reg_t, word_t>. - Flags ->
std::unordered_map<flag_t, word_t>. - Globals ->
std::unordered_map<std::string, word_t>. (global symbols). - Memory ->
char[]. - MMU ->
cpu::mmu. - Stack ->
cpu::stackgoing through the MMU to access the Memory. - Heap ->
cpu::heapgoing through the MMU to access the Memory.
-
Most of the testing goes through parsing -> dumping -> diff.
-
Behaviour testing is done using the
-Yoption, by dumping the current CPU state of the program. This allows checking the expected state of a particular program.