Messing with Android Apps
Mobile Systems and Smartphone Security
(MOBISEC 2020)
Prof: Yanick Fratantonio
EURECOM 1
This class
- Learn how we go from source code (Java, C/C++) to APK
- Actual goals
- Learn the tech details on what's going on under the hood
- Learn how to go from APK back to higher-level representations
2
The Compilation Process
Java source code Kotlin source code C/C++ source code
.java .kt .kts .c .cpp .h
Java compiler Kotlin compiler
javac kotlinc
Java bytecode Executable
.class .jar by the JVM
DEX compiler
dx
Dalvik bytecode Executable Machine code
.dex by the DVM .so
3
Android Application Package (APK)
- An APK is a zip file (kinda)
- $ unzip app.apk
- Content
- AndroidManifest.xml (compressed)
- classes.dex (raw Dalvik bytecode)
- resources.arsc (compressed)
- res/*.xml (compressed)
4
Java/Dalvik bytecode
- It cannot be run by your processor
- It's code that can be run by a Virtual Machine
- Java VM / JVM
- Dalvik VM / DVM
- DVM
- It's a program (which your machine can run) that takes as input Dalvik
bytecode and somehow "executes" the intended behavior
5
Dalvik bytecode, it looks like this:
.method foo(ILcom/mobisec/Peppa;)I
.registers 5
invoke-virtual {v4, v3}, Lcom/mobisec/Peppa;->pig(I)I
move-result v0
add-int v1, v3, v0
return v1
.end method
6
Why a Virtual Machine?
- The same Dalvik bytecode (classes.dex) can run across
multiple devices / architectures
- The DVM is, instead, "custom" for each architecture
- Security benefit: since everything is run in a virtual
machine, the app's code is isolated
7
Dalvik bytecode verifier
- Before the Dalvik bytecode is run, it is processed by a
component called "Bytecode Verifier"
- The verifier checks that the bytecode is well-formed
- You can't do weird tricks ~> Dalvik is easy to disassemble
- Rephrased: If you want to invoke a method, no way you can hide an
"invoke-*" instruction + which method you are invoking
- Are the bad guys out of luck?
8
Bad guys' tricks
- Obfuscation tricks!
- Reflection (target methods are specified via "strings")
- Dynamic Code Loading
- Native code
- Reliable disassembling of arbitrary native code is an open problem
- There is no such thing as a "native code verifier"
- Well, there is something called "Google Native Client" (NaCl) but it doesn't apply here
9
Dalvik Bytecode
- Dalvik knows about OO concepts
- Classes, methods, fields, "object instances"
- Dest-to-src syntax
- E.g., "move r3, r2" means r2 → r3
- Types
- Built-in: V (void), B (byte), S (short), C (char), I (int), Z (boolean), ...
- Actual Classes (syntax: L<fullyqualifiedclassname>;)
- Landroid.content.Intent;
- Lcom.mobisec.Peppa;
10
Dalvik Bytecode
- The bytecode is nicely split in "methods"
- A.smali file for each class
- Dalvik is register-based
- Each method has its own register "frame"
- Methods' args are placed in the last registers of the frame
- If a method is non-static, the first argument is "this"
11
Register Frame
- Consider a method s.t.
- it takes 3 arguments
- its register frame has 6 registers
- The method will use
- Registers v0, v1, v2, v3, v4, v5
- Arguments are placed in v3, v4, v5
12
Register Model
- Very different model than CPU's registers
- They are NOT shared across methods
- But registers can contain values (for built-in types) and
references to objects
- Each object is stored in ONE register
- Including "complex" objects: you just store the reference
- Exception: LONG / DOUBLE, they take TWO *contiguous* registers
13
Example (Java)
class Peppa {
int pig(int x) {
return 2*x;
}
static int foo(int a, Peppa p) {
int b = p.pig(a);
return a+b;
}
}
14
Example (Dalvik bytecode)
.method pig(I)I
int pig(int x) {
.registers 3
return 2*x;
}
mul-int/lit8 v0, v2, 0x2
return v0
.end method Why *3* registers?
This!
15
Example (Dalvik bytecode)
.method static foo(ILcom/example/Peppa;)I
.registers 4
invoke-virtual {v3, v2}, Lcom/example/Peppa;->pig(I)I
move-result v0
add-int v1, v2, v0
static int foo(int a, Peppa p) {
int b = p.pig(a);
return v1
return a+b;
.end method
}
16
Example of Dalvik instructions (doc)
- Moving constants/immediates/registers into registers
- const v5, 0x123
- move v4, v5
- Math-related operations (many, many variants)
- add-int v1, v3, v0
- mul-int/lit8 v0, v2, 0x2
17
Example of Dalvik instructions
- Method invocation
- invoke-virtual {v4, v3}, Lcom/mobisec/Peppa;->pig(I)I
- invoke-static ...
- invoke-{direct, super, interface} ...
- Getting return value
- invoke-virtual {v4, v3}, Lcom/mobisec/Peppa;->pig(I)I
- move-result v5
18
Example of Dalvik instructions
- Set/get values from fields
- iget, iget-object, ...
- iput, iput-object, ...
- sget, sput ... (for static fields)
- Instantiate new object
- new-instance v2, Lcom/mobisec/Peppa;
19
Example of Dalvik instructions
- Conditionals / control flow redirection
- if-ne v0, v1, :label_a
...
:label_a
...
- goto :label_b
- Meta-instructions that contain "data"
- filled-new-array
20
Which component is actually executing Dalvik?
- In the past (up to Android 4.4)
- DVM, libdvm.so
- When about to execute a method, compile it and run
- Compile process: "Dalvik bytecode ~> machine code"
- Rephrasing: compilation is done "on demand"
- We refer to this as Just-In-Time compilation (JIT)
- Compiled code is stored in a cache
21
Then, Android ART
- ART stands for Android Run-Time
- It replaced the old DVM
- It was introduced in Android 4.4 as optional, mandatory in Android 5
- Ahead-Of-Time compilation
- Compilation happens at app installation time
22
ART vs DVM
- Pro: The app's boot and execution are MUCH faster
- Because everything is already compiled
- Cons: ART takes more space on RAM & disk
- Major cons:
- Installation time takes MUCH longer
- Bad repercussion on system upgrades, could take ~15 minutes
23
New Version of ART
- Profiled-guided JIT/AOT
- Introduced in Android 7
- ART profiles an app and precompiles only the
"hot" methods, the ones most likely to be used
- Other parts of the app are left uncompiled
24
New Version of ART
- It is pretty smart...
- It automatically precompiles methods that are
"near to be used"
- Precompilation only happens when the device is
idle and charging
- Biggest Pro: quick path to install / upgrade
25
DVM JIT vs ART AOT vs ART JIT/AOT
DVM JIT ART AOT ART JIT/AOT
App boot time slowest fastest trade-off
App speed slowest fastest trade-off
App install time fastest slowest trade-off
System upgrade time fastest slowest trade-off
RAM/disk usage lowest highest trade-off
26
ODEX: Optimized DEX
- DEX → dexopt → ODEX
- It is optimized DEX: faster to boot and to run
- Most (all?) system apps that start at boot are ODEXed
- Note: ODEX is an additional file, next to an APK
- Cons
- ODEX files take space
- Device-dependent (note: it is still bytecode)
27
The analogous of ODEX for ART is tricky...
- The new Android Run-Time uses two formats
- The ART format (.art files)
- It contains pre-initialized classes / objects
- The OAT files
- Compiled bytecode to machine code, wrapped in an ELF file
- It can contain one or more DEX files (the actual Dalvik bytecode)
- Obtained with dex2oat (usually run at install time)
28
The analogous of ODEX for ART is tricky...
- The confusing part: you still have .odex files!
- Now .odex files are OAT-formatted files!
29
When are these two formats used?
- ART format:
- Only one file: boot.art
- It contains the pre-initialized memory for most of the Android framework
- Huge optimization trick
- OAT format:
- One important file: boot.oat
- It contains the pre-compiled most important Android framework libraries
- All the "traditional" ODEX files are OAT files
- You can inspect them with Android-provided oatdump
30
When a new app is starting
- All apps processes are created by forking Zygote
- Zygote can be seen as the "init" of Android
- A "template" process for each app
- Optimization trick
- boot.oat is already mapped in memory
- No need to re-load the framework!
31
The Big Picture
Taken from stackoverflow 32
More information
- "Dalvik and ART" slides: link
- Write-up for an old CTF challenge I've solved a while ago
- It involves ART / OAT / etc.
- Here it is: link
33
Tools time!
34
Unpacking APKs
- unzip app.apk
- AndroidManifest.xml (compressed)
- classes.dex
- resources (compressed)
35
smali/baksmali
- $ baksmali classes.dex -o output
- Disassemble DEX files
- Output: a .smali file for each class
- Dalvik bytecode in "smali" format
- $ smali output -o patched.apk
- Assembler for DEX files
36
apktool
- apktool is awesome
- It embeds baksmali/smali
- It unpacks / packs APKs, including resources and
manifest files
- $ apktool d app.apk -o output
- $ apktool b output -o patched.apk
37
Signing apps
$ keytool -genkey -v -keystore debug.keystore -alias
androiddebugkey -keyalg DSA -sigalg SHA1withDSA -keysize
1024 -validity 10000
$ jarsigner -keystore <path to debug.keystore> -verbose
-storepass android -keypass android -sigalg SHA1withDSA
-digestalg SHA1 app.apk androiddebugkey
38
Disassembly vs. Decompilation
- Disassembly
- classes.dex binary file ~> Dalvik bytecode "smali" representation
- machine code bytes ~> assembly representation (mov eax, edx)
- Decompilation
- Go from assembly/bytecode to source code-level representation
- Dalvik bytecode ~> Java source code
39
How to decompile
- All-in-one tools
- JEB (commercial, VERY expensive)
- BytecodeViewer (pretty good one)
- jadx
- NEW: Ghidra, open source tool developed by NSA
- Using a Java decompiler (Java bytecode ~> Java)
- Dalvik bytecode ~> Java bytecode
- dex2jar
- Java bytecode ~> Java source code
- Jd-GUI
40
Decompilation
- Decompiling Dalvik bytecode is usually simple
- Packing techniques and obfuscation tricks try to make
decompilers' lives very difficult
- When they don't work, you gotta read the bytecode
41
aapt
- It comes with Android SDK
- <sdk>/build-tools/26.0.2/aapt
- It takes an APK as input
- It can dump tons of useful info
- Package name, components, main activity, permissions
- strings, resources, ...
- $ aapt dump badging <apk_path>
42
adb
- Tool to interact with apps and devices/emulators
- $ adb devices
- $ adb install app.apk
- $ adb uninstall com.mobisec.testapp # package name
43
adb
- $ adb logcat
- $ adb push file.txt /sdcard/file.txt # push to device
- $ adb pull /sdcard/file.txt file.txt # pull from device
44
adb
- $ adb shell
- Get a shell on the device
- $ adb shell ls
- Execute "ls" on the device
- $ adb shell am start -n <pkgname>/<component>
- $ adb shell pm grant <pkgname> <permission>
- $ adb shell dumpsys
45
Is this stuff actually useful?
- News from (3 days + 2 year) ago
- One guy analyzed an app for vending machines
- They were storing the "amount of money" in a content
provider INSIDE the app
- https://hackernoon.com/how-i-hacked-modern-vending-m
achines-43f4ae8decec
46