Open Source Ghidra
The First Few Months
emteere
ghidracadabra
Recon MTL 2019
1/51
Outline
Ghidra Overview
New for 9.1: System Call Decompilation
New for 9.1: Sleigh Development Tools
Community Interaction
2/51
Ghidra Overview
• Full-featured SRE framework created by NSA Research.
• In development for ∼20 years.
• Primarily written in Java.
I Some C/C++.
I Can write scripts in Python.
• Designed for customizability and extensibility.
• Ghidra 9.0 publicly released March 2019.
• Source code released on Github April 2019.
• www.ghidra-sre.org
• https://github.com/NationalSecurityAgency/ghidra
3/51
Disassembler
4/51
Function Graph
5/51
Decompiler
6/51
P-code: Ghidra’s IR
Specified Using SLEIGH Language
7/51
Connected Tools
8/51
Scripting in Java and Python
9/51
Eclipse Integration
10/51
Multi-user Server with Version Control
11/51
support]$ ./analyzeHeadless ghidra://localhost/repo
-import /usr/bin/* -recursive
-postScript MyScript.py
Batch Processing with the Headless Analyzer
12/51
Version Tracking Tool
13/51
New Features for 9.1
14/51
Decompiling System Calls (syscalls)
• System calls are a way for a program to request a service
from the operating system.
• Services include process control, file management, device
management,...
• Typical implementation includes a native instruction and a
register, which we’ll call the system call register.
• When the instruction is executed, the value in the system call
register determines which function is called.
15/51
x64 Linux syscall
16/51
System Calls as User-defined Operations
• In this example, the syscall instruction implemented with a
pcodeop/CALLOTHER
• Such operators certainly have their uses, but not very
satisfying in this case.
17/51
Desired Behavior
• We’d like to see the correct function call in the decompiler:
I Correct name.
I Correct signature.
I Correct calling convention.
• We’d also like to get cross-references
18/51
• Need dataflow analysis to determine value in syscall register.
19/51
• Value in syscall register is not necessarily the syscall number
defined in system header file.
20/51
Additional Issues
• The system call register can be an OS decision — not
necessarily specified by ISA.
• System call numbers can change based on the OS
version/service pack.
• System calls might have their own calling convention.
• There can be more than one native instruction used to make
system calls (e.g., syscall and int 2e).
• Might not use a dedicated native system call instruction, e.g.,
system calls via CALL GS:[0x10].
21/51
Where to Put Them?
• In general, the code for system call targets is not in the
program’s address space.
• Where to put them in Ghidra?
• The OTHER space is used to store data from a binary that is
not loaded into memory.
I E.g., the .comment section of an ELF file.
• In 9.1, we’ve made the decompiler aware of the OTHER space.
• Recommendation for system calls:
I System call target should be in overlay(s) of the OTHER space.
I Use the system call number as the address in the overlay.
22/51
How to Get There?
• OK, great, we have a place for system call targets.
• How do you get there?
• New feature: Overriding References.
• Basically, this allows you to intercept certain Pcode ops on
their way to the decompiler and modify them.
I Change CALLOTHER ops to CALL ops and set destination.
I Change CALLIND to CALL ops and set destination.
I (plus a few others)
• See ResolveX86or64LinuxSystemCallsScript.java for an
example.
23/51
x64 Linux syscall with Overriding Reference
24/51
Functions in an Overlay of the OTHER Space
25/51
x64 Linux syscall Decompilation
Ghidra 9.1 (after running script)
26/51
Future Work
• We’d like an analyzer to be able to do this (mostly)
automatically.
• Ghidra has a notion of per-processor configuration
(.pspec files) and per-compiler configuration (.cspec files).
• System call data doesn’t quite fit this model.
• Ideally all the system call related configuration would be in
one place.
• Working on a notion of an OS/environment configuration.
• This will have other applications in Ghidra as well.
27/51
Sleigh Development Tools
• Sleigh
• SleighEditor
• Sleigh P-Code Tests
• Additional Techniques
• General Sleigh Development
28/51
Sleigh Processor Models
• Memory model
• Registers
• Display (printpiece)
• Decode patterns
• Semantics (Pcode)
Build it and the tools just work
Disassembly,Assembler(patch),Decompiler,Analysis...
29/51
Sleigh Processors
• Currently Included - evolving list
X86 16/32/64, ARM/AARCH64, PowerPC 32/64/VLE, MIPS 16/32/64/micro
68xxx, Java / DEX bytecode, PA-RISC, PIC 12/16/17/18/24, Sparc 32/64
CR16C, Z80, 6502, 8051, MSP430, AVR8, AVR32, and variants.
• Full Processor Contributions
Tricore, MCS-48
• Extensions, Improvements, and Bugs
ARM, PPC, 68xxx, AVR, PIC-16F, PPC, 6502, golang
• Seen in Development
SH-2, WebAssembly, Hexagon, Toshiba MeP-c4, Pic16F153xx, Arm4t-gba,
NVIDIA Falcon, PowerPC 750CL/CXe, WDC-65816, RISC-V, TI TMS9900
30/51
Sleigh Files
• LDEF
• PSPEC
• CSPEC
• SLASPEC
• SLA
• Java Files
• Manual Index
• Pattern Files
• Emulatornew
• Sleigh P-Code Testsnew
31/51
Sleigh Editor
• Syntax Coloring
• Hover
• Navigation
• Code Formatting
• Validation
• Quick Fixes
• Renaming
• Find References
• Content Assist
• Sleigh Compiler
Error Navigation
Xtext - DSL Framework for Eclipse
Eclipse IDE for Java and DSL Developers - 2019-03
32/51
Setting up Sleigh Editor - Xtext project
• Eclipse Help:Install New Software
I Add Archive: GhidraSleighEditor.zip
• Convert GhidraScript to Xtext project
I Allows for multi-file navigation
I Good for casual browsing
I Problem: all variables will be available (6502, PPC)
quick-fixes will be slower
• Best: Import as new Java Project - Ghidra/Processors/6502
• Large Sleigh projects can be slow - AARCH64 - 85K LOC
• Use separate Eclipse
33/51
Sleigh Editor
Quick Demo
After Edit - ReloadSleigh Script
Only works for some changes
No Structural changes - register, memory, pcodeop,. . .
34/51
Sleigh Editor - Future Features
• Better project integration
• Code-Mining - auto-comment
• Navigation from Ghidra to SleighEditor in Eclipse
• Templates of common idioms
• More Hovers
• Conversion of number to different formats
• Syntax coloring in the printpiece
• Refactoring:
Extract common patterns to sub-constructor
• Instruction Pattern Match Documentation
35/51
Sleigh P-Code Tests - Sleigh Testing Framework
• C code compiled for processor
• Small tests with known result
• General coverage of instructions emitted by C compilers
• Verifies core constructs - Addressing Modes, Registers
• Pcode Emulator to Execute and Verify
• Repeatable - regression testing
• Extendable - needs more cowbell
• Special case code - Assembly
36/51
Sleigh P-Code Tests - Tricore in Eclipse
State of All Processors
All Passing
37/51
Sleigh P-Code Tests - Example - Tricore
• Contribution - mumbel
• Surprisingly well written
• Call Context Save/Restore
• TRICORE O0 EmulatorTest
• EmulateInstructionStateModifier
38/51
Sleigh P-Code Tests - Debugging Sleigh
Debug One Failing test - lots of output
Directory - test-output / cache, logs, results
39/51
Sleigh P-Code Tests - Debugging tests
Match Unique to Read/Write
0x1f/0x1b should be 0x3f/0x1a, not extracting enough
40/51
InstructionInfo - Locating problems
41/51
External Disassembly Field
• binutils wrapper gdis
I Acts as a server
• Other Disassemblers
I dump/scrape
I code composer studio
• Verify, Debug, Mine
42/51
Script - CompareSleighExternal
43/51
Script - DebugSleighInstructionParse
44/51
Developing a Sleigh Module - What’s Good Enough?
• Disassembly
I Decode, Display, Flow instructions
• References
I Addressing modes
• Decompilation
I All Data Flow, pseudoOp In/Out, Logic, Math
• Emulation - EmulateInstructionStateModifier
• Theorem Proving - Detailed effects
• Partial languages - OK
I Use unimpl, BadInstruction(), pseudoOp
• Speed up the Process - Automate it
I Scraping disassembly / PDF
I Parse disassembly tables, XML descriptions
45/51
Developing a Sleigh Module - Now What?
• Tune for decompilation - calling convention
• Load format
I ELF, .opinion for magic machineID
• Tune for emulation - Sleigh P-Code Tests
• Analyzers
I Stock constant reference propagation can work well
I Write specialized register propagation - Page register
• Pattern Files - recognize common patterns or key functions
• Variants - Pointer checking, Control Flow Guard
I Decompiler Pcode UserOp injection
I Use context, Define, variants with Slaspec
• FID Files - Static library pattern matching
46/51
Contacting Us
• The Ghidra team is on Github.
• @NSAGov on Twitter announces new releases.
• The Ghidra team is not on Twitter, reddit, Slashdot,
VKontakte,. . .
47/51
Reporting Bugs
• Please report bugs!
• The perfect bug report includes:
1. Source code.
2. Relevant bytes from the binary.
3. XML Debug Function Decompilation from decompiler.
4. Stack trace if there is one.
• Often we need an entire function and surrounding instructions.
• Pictures work, but can limit triage.
• We reserve the right to ignore sketchy binaries :)
48/51
www.ghidra-sre.org Stats (June 25)
• 9.0.0: 302k downloads
• 9.0.1: 36k downloads
• 9.0.2: 100k downloads
• 9.0.4: 42k downloads
• Site views: 10.6M
• Video hits: 751k
Github Stats (June 25)
• 16145 stars
• 2019 forks
• 718 watching
• 608 issues, 272 open
• 111 pull requests, 35 open
49/51
References
• Xtext - itemis.com, https://www.eclipse.org/Xtext/
• mumbel - https://github.com/mumbel/ghidra/tree/tricore
• SleighEditor README.html, build README.txt
50/51
Questions?
51/51