0% found this document useful (0 votes)
237 views75 pages

Gapb 2024 Aaos 101 Day3 Performance Analysis Tuning

Uploaded by

byron7cueva
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
237 views75 pages

Gapb 2024 Aaos 101 Day3 Performance Analysis Tuning

Uploaded by

byron7cueva
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 75

Performance

Analysis /
Tuning 101

Google Automotive Partner Bootcamp Google confidential2023


and proprietary
| Confidential
| Doand
notProprietary
distribute
CONFIDENTIALITY REMINDER

Everything shared in this presentation is under NDA

Google Automotive Partner Bootcamp Google confidential2023


and proprietary
| Confidential
| Doand
notProprietary
distribute
Ethan
Lee
Software Engineer

Google Automotive Partner Bootcamp Google confidential2023


and proprietary
| Confidential
| Doand
notProprietary
distribute
Agenda
01 A Scientific Approach to 05 Trace Analysis
Performance Analysis Walkthrough

02 Getting Started with 06 SQL Queries


Perfetto
07 Making Debugging
03 Anatomy of a Trace Easier

04 Perfetto Pitfalls 08 Performance Tuning

Google Automotive Partner Bootcamp Google confidential2023


and proprietary
| Confidential
| Doand
notProprietary
distribute
A Scientific
Approach to
Performance
Analysis

Google Automotive Partner Bootcamp Google confidential2023


and proprietary
| Confidential
| Doand
notProprietary
distribute
What is Performance Analysis?
● Performance issues require a systematic process to uncover
their root cause.
● The right tools need to be identified to gather insights into
critical parts of complex systems.

● There are a number of techniques which engineers can use


to delve deeper into the execution of a system.

Google Automotive Partner Bootcamp Google confidential2023


and proprietary
| Confidential
| Doand
notProprietary
distribute
Performance Analysis

What is Performance Analysis?


There are two techniques that are widely used for performance analysis: Tracing and Profiling

Tracing Profiling
● Tracing involves collecting highly detailed data about ● Profiling involves sampling some usage of a resource by a program.
system execution.
● The most common types are memory profiling and CPU profiling.
● Traces contain enough detail to build a timeline of events.
● Memory profiling surfaces information about heap memory allocation.
● Traces give us insight into what a program does over time (e.g. which
● CPU profiling gathers information about the call stack running on a
functions are being run) and context about execution (e.g. function
CPU over time.
call parameters).

Google Automotive Partner Bootcamp Google confidential2023


and proprietary
| Confidential
| Doand
notProprietary
distribute
Why Choose Perfetto?
Profiling and tracing have different use cases:

Why use profiling over tracing?

● Traces, while detailed, are impractical for capturing high-frequency events like every function call due to the sheer volume of data involved.

● Profilers address this limitation through sampling, selectively recording data points to drastically reduce storage requirements.

Why use tracing over profiling?

● Profilers offer valuable insights into where resources are consumed within a program's call stack, but they lack the ability to explain the underlying reasons behind
those resource allocations.

● For instance, a profiler might reveal that function foo() called malloc numerous times and allocated X bytes, but it cannot tell us why foo() was making those calls.

● Traces fill this gap by combining application and kernel events, providing in-depth context to understand the root cause of resource consumption.

Perfetto addresses this by supporting the collection, analysis and visualization of both tracing and profiling.

Google Automotive Partner Bootcamp Google confidential2023


and proprietary
| Confidential
| Doand
notProprietary
distribute
How to Use Perfetto Effectively
How does Perfetto and our performance analysis flow fit into our goals?

This approach allows one to easily Perfetto will enable one to Using Perfetto in this approach
compare the delta of a potential gather insights beyond just means that multiple iterations
regression. To achieve this, one surface level observations. will need to be collected. In order
should have an established It is imperative that we can to establish reliable metrics, it is
baseline to compare against. translate user-perceptible necessary to gather information
signals into measurable metrics on a large enough population to
that can be tested. capture reproducible issues.

Google Automotive Partner Bootcamp Google confidential2023


and proprietary
| Confidential
| Doand
notProprietary
distribute
How to Use Perfetto Effectively
How does Perfetto and our performance analysis flow fit into our goals?

Select the CUJ to Requires more Performance Issue


Record a Trace No Implement Fix
profile instrumentation? Fixed?

Yes Yes

Start Instrument the


CUJ End

Google Automotive Partner Bootcamp Google confidential2023


and proprietary
| Confidential
| Doand
notProprietary
distribute
Perfetto: A Feature Rich Tool

Ease of Use Flexibility Trace Analysis Data Mining


Perfetto provides an end-to-end Via Perfetto trace configs, users are Perfetto provides a One can leverage SQL-like syntax
solution to capture Android system able to modify tracing behavior via comprehensive trace viewer web to query the trace data, making
traces quickly to identify issues in buffers or data sources. For UI that empowers one to inspect, complex analysis easier.
critical user flows. example, one can easily change
visualize, and analyze the
data sources to capture various
ftrace events or atrace events. collected data.

Google Automotive Partner Bootcamp Google confidential2023


and proprietary
| Confidential
| Doand
notProprietary
distribute
How does Perfetto Work? https://perfetto.dev/docs

Record traces Analyze traces Visualize traces


System tracing Chrome tracing In-app tracing

Linux ftrace Trace Processor Perfetto UI


Android / Linux / MacOS / Win HTML / JS
Chrome-specific
App-specific data-sources
data-sources
/proc pollers

Trace importers Trace Processor


Heap profilers
Protobuf, JSON, systrace Web Assembly

Data sources Track event library


Linux/Android TRACE_EVENT(...)
SQL query engine ADB over WebUSB
Based on SQLite For Android
Tracing daemon Tracing service In-process
UNIX socket Mojo service thread

Trace-based metrics Works offline


JSON / Protobuf / CSV After first visit
Tracing C++ Library
Android / Linux / MacOS / Windows

Google Automotive Partner Bootcamp Google confidential2023


and proprietary
| Confidential
| Doand
notProprietary
distribute
How does
Perfetto Work? Record traces
System Wide Tracing for Android and Linux
System tracing Chrome tracing In-app tracing
● Kernel tracing is enabled via Linux ftrace, which allows kernel
events such as scheduling events and syscalls to be recorded.
Linux ftrace
Chrome-specific
● /proc pollers allow the sampling of process-wide cpu and App-specific data-sources
data-sources
memory counters over a time period. /proc pollers

● Heap profilers also enable capturing information for the Heap profilers
Native and Java heap.
Data sources Track event library
Linux/Android TRACE_EVENT(...)

Tracing daemon Tracing service In-process


UNIX socket Mojo service thread

Tracing C++ Library


Android / Linux / MacOS / Windows
https://perfetto.dev/docs

Google Automotive Partner Bootcamp Google confidential2023


and proprietary
| Confidential
| Doand
notProprietary
distribute
How does
Perfetto Work? Analyze traces Visualize traces
Trace Analysis

● The Trace Processor is a C++ library that takes in raw trace data
and surfaces it through an SQL interface for straight-forward Trace Processor Perfetto UI
querying. Android / Linux / MacOS / Win HTML / JS

● Trace importers allow simple ingestion of multiple formats

● Trace-based metrics creates pre-formatted and extensible Trace importers Trace Processor
Protobuf, JSON, systrace Web Assembly
queries that provide trace summaries. (e.g. CPU usage at
different frequency states).
SQL query engine ADB over WebUSB
Based on SQLite For Android

Trace-based metrics Works offline


JSON / Protobuf / CSV After first visit

https://perfetto.dev/docs

Google Automotive Partner Bootcamp Google confidential2023


and proprietary
| Confidential
| Doand
notProprietary
distribute
How does
Perfetto Work? Analyze traces Visualize traces
Trace Visualization

● A trace visualizer is instrumental for analysis and is powered by


WebAssembly. Trace Processor Perfetto UI
Android / Linux / MacOS / Win HTML / JS
● The Perfetto UI works fully offline after initial opening.

Trace importers Trace Processor


Protobuf, JSON, systrace Web Assembly

SQL query engine ADB over WebUSB


Based on SQLite For Android

Trace-based metrics Works offline


JSON / Protobuf / CSV After first visit

https://perfetto.dev/docs

Google Automotive Partner Bootcamp Google confidential2023


and proprietary
| Confidential
| Doand
notProprietary
distribute
Getting Started
with Perfetto

Google Automotive Partner Bootcamp Google confidential2023


and proprietary
| Confidential
| Doand
notProprietary
distribute
Quick Start: Collecting a Perfetto Trace
After defining an appropriate trace configuration, one can run the trace collection.
1. Download the recording script using the below command:

$ curl -O https://raw.githubusercontent.com/google/perfetto/master/tools/record_android_trace
$ chmod u+x record_android_trace

2. Start tracing using:

$ ./record_android_trace -o <trace-name>.trace -c <previous trace file>

3. Run the desired CUJ or experiment


4. End the trace using Ctrl+C for the command run in Step 2
5. The trace will be automatically be opened in the browser after the collection has completed

Google Automotive Partner Bootcamp Google confidential2023


and proprietary
| Confidential
| Doand
notProprietary
distribute
Quick Start:
Viewing a Trace
If one wants to open an existing trace file,
navigate to ui.perfetto.dev to open and
access a trace:

Google Automotive Partner Bootcamp Google confidential2023


and proprietary
| Confidential
| Doand
notProprietary
distribute
Quick Start:
Viewing a Trace
Once the trace is generated, one can also
generate a permalink to the existing trace
that can be shared:

Google Automotive Partner Bootcamp Google confidential2023


and proprietary
| Confidential
| Doand
notProprietary
distribute
Demo Video
An example video of a trace being
collected from beginning to end.

Google Automotive Partner Bootcamp Google confidential2023


and proprietary
| Confidential
| Doand
notProprietary
distribute
Special Case: Collect Boot Time Tracing
In Android TM+, the trace can be collected as seen previously. However the
following setting must be enabled before the device is restarted:
cat >> perfetto_boot.rc << 'EOF'
adb shell setprop persist.debug.perfetto.boottrace 1 service perfetto_boot /system/bin/perfetto --txt -c
/data/misc/perfetto-traces/boottrace.pbtxt -o
/data/misc/perfetto-traces/boottrace.perfetto-trace
In Android SC-, the following steps are required to setup the device:
class late_start
1. Boot tracing in Android SC- requires selinux to be set to permissive. disabled
2. The following .rc file on the right must be created. user shell
3. adb root && adb remount must be run to remount the device.
group nobody
4. Use the following commands to push the .rc and config file to the device:
oneshot
seclabel u:object_r:perfetto_exec:s0
stdio_to_kmsg
adb push perfetto_boot.rc /etc/init/perfetto_boot.rc capabilities DAC_READ_SEARCH
adb push perfetto_trace_config.textproto
/data/misc/perfetto-traces/boottrace.pbtxt on property:persist.perfetto.boottrace=1
rm /data/misc/perfetto-traces/boottrace.perfetto-trace
start perfetto_boot
EOF

Google Automotive Partner Bootcamp Google confidential2023


and proprietary
| Confidential
| Doand
notProprietary
distribute
Special Case: Collect Boot Time Tracing
The following steps are required to collect the trace:
1. Reboot the device using adb reboot
2. Stop perfetto and pull the trace:

adb shell pkill perfetto


adb pull /data/misc/perfetto-traces/boottrace.perfetto-trace

Google Automotive Partner Bootcamp Google confidential2023


and proprietary
| Confidential
| Doand
notProprietary
distribute
Trace
Anatomy

Google Automotive Partner Bootcamp Google confidential2023


and proprietary
| Confidential
| Doand
notProprietary
distribute
Trace Config Setup
Selecting the right trace config will allow one to collect the necessary data
from the system.

● Perfetto provides granular control over data collection. Unlike always-on logging systems
(e.g., Linux's rsyslog, Android's logcat), its tracing data sources start in an idle state.
● The TraceConfig is a protobuf message that controls your Perfetto tracing session. It
outlines:
● System-wide Settings:
○ Maximum trace duration.
○ Number and size of memory buffers.
○ Maximum output file size.
● Data Source Specifications:
○ For kernel tracing, which ftrace events to enable.
○ For the heap profiler, the target process name and sampling rate.
● Data Routing: Specifies which buffer each data source should write into
Note: a sample config can be found at perfetto.dev/docs/concepts/config

Google Automotive Partner Bootcamp Google confidential2023


and proprietary
| Confidential
| Doand
notProprietary
distribute
Perfetto Trace Config
How the Tracing Service Uses the TraceConfig
● The tracing service (traced) is your config manager. Trace config
When you start a tracing session, the service:
Duration: 10s
○ Reads System Settings: It determines its behavior
Buffers: #0/: 4MB Data source 1
based on the TraceConfig's outer section (duration, Buffers: #1/: 16MB e.g, ftrace
buffers, etc.).
Traced
○ Activates Data Sources: It finds Producers that tracing service
data source: “linux.ftrace”
match the data sources listed in the config. Then, #0 4 MB 10s
Ftrace_config {
it starts each Producer and provides the relevant Ftrace_events: “sched_switch” #1 16 MB Data source 2
Ftrace_events: “sched_wakeup” e.g, heap prof
DataSourceConfig settings. }

data source: “android.heapprofd”


heapprofd_config {
sampling_interval_bytes: 1
process_cmdline: “adbd”
Continuous_dump_config {
dump_phase_ms: 10000
Dump_interval_ms: 10000
}
}

Google Automotive Partner Bootcamp Google confidential2023


and proprietary
| Confidential
| Doand
notProprietary
distribute
Perfetto Trace Config
Defining Buffers:
● This section defines the number, size and policy of
in-memory buffers owned by the tracing service. # Defining several buffers
● Fill Policy: buffers: {
size_kb: 4096
○ A RING_BUFFER (default) fill policy will wrap over when
fill_policy: RING_BUFFER
full and replace the oldest trace data in the buffer.
}
○ A DISCARD fill policy will stop accepting data once full. buffers {
Dynamic Buffer Mapping: size_kb: 4096
fill_policy: RING_BUFFER
● The target_buffer field can be specified to indicate different
}
buffers for data sources.

Google Automotive Partner Bootcamp Google confidential2023


and proprietary
| Confidential
| Doand
notProprietary
distribute
Perfetto Trace Config
Defining Data Sources (Logcat)
This data source will enable Android logcat messages
to be shown:

data_sources: {
config {
name: "android.log"
android_log_config {
}
}
}

Google Automotive Partner Bootcamp Google confidential2023


and proprietary
| Confidential
| Doand
notProprietary
distribute
Perfetto Trace Config
Defining Data Sources (CPU Frequency)
data_sources: {
Various CPU frequency stats can be collected with the following config {
data sources: name: "linux.sys_stats"
target_buffer: 1
० Enabling the power/cpu_frequency ftrace event
० Setting cpufreq_period_ms > 0 (Note: only works on sys_stats_config {
Android SC-V2 and above) cpufreq_period_ms: 500
}
}

data_sources: {
config {
name: "linux.ftrace"
ftrace_config {
ftrace_events: "power/cpu_frequency"
ftrace_events: "power/cpu_idle"
ftrace_events: "power/suspend_resume"
}
}
}

Google Automotive Partner Bootcamp Google confidential2023


and proprietary
| Confidential
| Doand
notProprietary
distribute
Perfetto Trace Config
Defining Data Sources (Jankiness)
Jankiness can be examined with the frame timeline data source.

data_sources: {
config {
name: "android.surfaceflinger.frametimeline"
target_buffer: 2
}
}

Google Automotive Partner Bootcamp Google confidential2023


and proprietary
| Confidential
| Doand
notProprietary
distribute
Perfetto Trace Config
Defining Data Sources (linux.process_stats) data_sources: {
config {
● The linux.process_stats data source gathers per-process
name: "linux.process_stats"
statistics from the /proc/<pid>/status and
process_stats_config {
/proc/<pid>/oom_score_adj files on Linux systems
scan_all_processes_on_start: true
○ Process memory usage (RSS, VMSize, etc.) proc_stats_poll_ms: 1000
}
○ Open file descriptors }
}
○ Out-of-memory (OOM) score (indicates how likely the
kernel is to terminate the process when memory is low)

Google Automotive Partner Bootcamp Google confidential2023


and proprietary
| Confidential
| Doand
notProprietary
distribute
Perfetto Trace Config
data_sources: {

Defining Data Sources (linux.sys_stats) config {


name: "linux.sys_stats"
target_buffer: 1
sys_stats_config {
● The linux.sys_stats data source gathers a range of stat_period_ms: 500

system-level statistics from Linux. The following stat counters


stat_counters: STAT_CPU_TIMES

can be collected: meminfo_period_ms: 1000


meminfo_counters: MEMINFO_ACTIVE_ANON
○ Stat Counters (proc/stat): meminfo_counters: MEMINFO_ACTIVE_FILE
meminfo_counters: MEMINFO_INACTIVE_ANON
■ STAT_CPU_TIMES meminfo_counters: MEMINFO_INACTIVE_FILE
meminfo_counters: MEMINFO_KERNEL_STACK
● user: Time spent running in user mode meminfo_counters: MEMINFO_MLOCKED
meminfo_counters: MEMINFO_SHMEM
● nice: Time spent running niced user processes meminfo_counters: MEMINFO_SLAB

● system: Time spent in system (kernel) mode


meminfo_counters: MEMINFO_SLAB_UNRECLAIMABLE
meminfo_counters: MEMINFO_VMALLOC_USED

● idle: Time the process was idle meminfo_counters: MEMINFO_MEM_FREE


meminfo_counters: MEMINFO_SWAP_FREE

○ Mem Info Counters (proc/meminfo): vmstat_period_ms: 1000


■ Provides information such as free memory, vmstat_counters: VMSTAT_PGFAULT
vmstat_counters: VMSTAT_PGMAJFAULT
anonymous memory. vmstat_counters: VMSTAT_PGFREE
vmstat_counters: VMSTAT_PGPGIN
○ VM Stat Counters (proc/vmstat): vmstat_counters: VMSTAT_PGPGOUT
vmstat_counters: VMSTAT_PSWPIN
■ Provides information on virtual memory such as vmstat_counters: VMSTAT_PSWPOUT

page faults, pages in and out, etc. vmstat_counters: VMSTAT_PGSCAN_DIRECT


vmstat_counters: VMSTAT_PGSTEAL_DIRECT

○ Note: cpufreq_period_ms is only available above SC-V2. vmstat_counters: VMSTAT_PGSCAN_KSWAPD


vmstat_counters: VMSTAT_PGSTEAL_KSWAPD
■ The following error will be encountered otherwise: vmstat_counters: VMSTAT_WORKINGSET_REFAULT

■ No field named "cpufreq_period_ms" in # Below field not available on < Android SC-V2 releases.
cpufreq_period_ms: 500
proto SysStatsConfig. }
}
}

Google Automotive Partner Bootcamp Google confidential2023


and proprietary
| Confidential
| Doand
notProprietary
distribute
Perfetto Trace Config
Defining Data Sources (linux.sys_stats)

Google Automotive Partner Bootcamp Google confidential2023


and proprietary
| Confidential
| Doand
notProprietary
distribute
Perfetto Trace Config
Defining Data Sources (ftrace) data_sources: {
config {
name: "linux.ftrace"
Capturing ftrace events allows developers insights into kernel code. target_buffer: 2
They are useful for analyzing latency or performance issues outside ftrace_config {
of userspace. # Memory events
ftrace_events: "power/suspend_resume"
ftrace_events: "mm_event/mm_event_record"
● Memory Events ftrace_events: "kmem/rss_stat"
● Low Memory Killer Events ftrace_events: "ion/ion_stat"
ftrace_events: "dmabuf_heap/dma_heap_stat"
● Sched Events ftrace_events: "kmem/ion_heap_grow"
ftrace_events: "kmem/ion_heap_shrink"
# LMKD events
ftrace_events: "lowmemorykiller/lowmemory_kill"
ftrace_events: "oom/oom_score_adj_update"
ftrace_events: "oom/mark_victim"
# sched events
ftrace_events: "sched/sched_process_exit"
ftrace_events: "sched/sched_process_free"
ftrace_events: "sched/sched_switch"
ftrace_events: "sched/sched_wakeup"
ftrace_events: "sched/sched_wakeup_new"
ftrace_events: "sched/sched_waking"
}
}
}

Google Automotive Partner Bootcamp Google confidential2023


and proprietary
| Confidential
| Doand
notProprietary
distribute
Perfetto Trace Config
Defining Data Sources (ftrace)
In order to capture CPU scheduling events, ftrace_events: "sched/sched_switch" needs to be added to the linux.ftrace data source.
With this enabled the following can be captured:
० Threads scheduled per CPU
० Why a thread got de-scheduled (pre-emption, blocked by a mutex)
० When a thread becomes runnable

Google Automotive Partner Bootcamp Google confidential2023


and proprietary
| Confidential
| Doand
notProprietary
distribute
Perfetto Trace Config data_sources: {
config {
name: "linux.ftrace"
Atrace Categories: target_buffer: 2
ftrace_config {
Predefined groups of trace events that make it easier to enable tracing # Memory events
atrace_categories: "aidl"
for specific areas of the system.
atrace_categories: "am"

Fine-grained Process Tracing: atrace_categories: "dalvik"


atrace_categories: "binder_lock"
The atrace_apps functionality in Perfetto enables selective tracing of atrace_categories: "binder_driver"
atrace_categories: "disk"
specific applications on Android. It allows you to capture trace data only
atrace_categories: "freq"
from the processes of interest. atrace_categories: "idle"
atrace_categories: "gfx"
atrace_categories: "hal"
atrace_categories: "pm"
atrace_categories: "power"
atrace_categories: "rro"
# atrace apps
atrace_apps: "lmkd"
atrace_apps: "system_server"
atrace_apps: "com.android.systemui"
atrace_apps: "com.google.android.gms"
atrace_apps: "com.google.android.gms.persistent"
atrace_apps: "android:ui"
atrace_apps: "com.google.android.apps.maps"
}
}
}

Google Automotive Partner Bootcamp Google confidential2023


and proprietary
| Confidential
| Doand
notProprietary
distribute
Perfetto Trace Config
Writing to a Trace Output File:
If not recording time limit is specified, one will have to manually terminate
the tracing session. # No recording time limit (press CTRL+C to stop recording).
# Alternatively: uncomment the line below to set a time limit.
If duration_ms is specified then, the trace will terminate automatically. #duration_ms: 1800000
write_into_file: true
If write_into_file is true, then Perfetto will periodically stream results into a file_write_period_ms: 5000
trace file. max_file_size_bytes: 100000000000
flush_period_ms: 5000
Flush_period_ms defines the default drain period. A shorter period
means a smaller userspace buffer is required. However, this will increase
the performance intrusiveness of tracing.
Max_file_size_bytes is used to cap the size of a trace file.
Flush_period_ms is used to periodically issue a Flush() to all data
sources, forcing them to commit their data into the tracing service.

Google Automotive Partner Bootcamp Google confidential2023


and proprietary
| Confidential
| Doand
notProprietary
distribute
Anatomy of a Trace: Binder Transactions
There are two types of binder transactions:
1. Unidirectional: Using the oneway keyword in the AIDL language,
these transactions do not wait for a reply after sending a parcel.
2. Bidirectional: The transmitting end is blocked until it receives a reply.
Note: This is only available in UDC onwards.

Google Automotive Partner Bootcamp Google confidential2023


and proprietary
| Confidential
| Doand
notProprietary
distribute
Anatomy of a Trace: Bidirectional Transactions
Bidirectional Transaction:
Identified by a corresponding binder transaction and binder reply pair.

Google Automotive Partner Bootcamp Google confidential2023


and proprietary
| Confidential
| Doand
notProprietary
distribute
Anatomy of a Trace: Unidirectional Transactions
Unidirectional Transaction:
Indicated by an arrow in a Perfetto trace.

Google Automotive Partner Bootcamp Google confidential2023


and proprietary
| Confidential
| Doand
notProprietary
distribute
Common
Pitfalls

Google Automotive Partner Bootcamp Google confidential2023


and proprietary
| Confidential
| Doand
notProprietary
distribute
Common Pitfalls
External Process Interference
One of the reasons that a trace may be empty is that another process is using ftrace. Run either of the following to set the current_tracer to nop:
Run the following command to determine if the current_tracer is nop: > adb shell echo “nop > /sys/kernel/tracing/current_tracer”
> adb shell cat /sys/kernel/tracing/current_tracer > adb shell echo “0 > /sys/kernel/tracing/tracing_on"
> adb shell cat /sys/kernel/debug/tracing/current_tracer # older kernel may still use debugfs

Google Automotive Partner Bootcamp Google confidential2023


and proprietary
| Confidential
| Doand
notProprietary
distribute
Common Pitfalls
data_sources: {
Insufficient Buffer Size
config {
Another common pitfall is insufficient data buffer size. Increasing buffer name: "linux.ftrace"
size may alleviate scenarios in which key CUJ data is dropped.
target_buffer: 2
Excessive event collection ftrace_config {
If too many events are being collected, there are some that can be # Do not include the below:
dropped to avoid trampling the output trace file. Including sys_enter ftrace_events: "raw_syscalls/sys_enter"
and sys_exit will lead to all system calls being logged. The below trace ftrace_events: "raw_syscalls/sys_exit"
demonstrates this, where the tracks do not terminate.

Google Automotive Partner Bootcamp Google confidential2023


and proprietary
| Confidential
| Doand
notProprietary
distribute
Trace
Analysis

Google Automotive Partner Bootcamp Google confidential2023


and proprietary
| Confidential
| Doand
notProprietary
distribute
Trace Analysis Key Steps
Summary:
1. Narrow the search space: one can achieve this by determining the
beginning and ending points via Android system logs or atrace logs.

2. Inspect CPU, memory tracks, etc: This will help identify symptoms of a
regression so that the analysis window can be tightened.

3. Understand context: After capturing a smaller window, it is possible to


understand what actions are being performed. (Ex. What is occurring
during at this point in the user switch lifecycle?)

4. Identify Culprit Process: Given context, it is possible to visualize


offending processes in the trace. Adding more logging via atrace will
also allow one to trace points in a codepath.

5. Analyze thread-level interactions: Looking at markers such as thread


state and binder transactions during the window will allow one to make
informed hypotheses.

Google Automotive Partner Bootcamp Google confidential2023


and proprietary
| Confidential
| Doand
notProprietary
distribute
Trace Analysis Summary
Sample flow illustrating:
1. CUJ profiling
2. Trace recording
3. CUJ instrumenting
4. Performance Fixes

No

Requires
Select the CUJ to moremore
Requires Performance
Record a Trace No Implement Fix
profile instrumentation?
instrumenta Issue Fixed?
tion?

Yes Yes

Start Instrument the


CUJ End

Google Automotive Partner Bootcamp Google confidential2023


and proprietary
| Confidential
| Doand
notProprietary
distribute
Trace Analysis Walkthrough
Looking at a trace can be overwhelming. There are several key steps to help narrow down a problem area to root cause a performance issue. A trace analysis walkthrough
will help guide investigations. Initially a trace was collected that captured a user switch from user 10 to user 11.
1. Narrow the search space: Use Android logs to identify key starting and stopping points. In this case flag UserController.startUser-11-fg-start-mode-1 and
onCompletedEventUser 11 act as the stop and start points.

Google Automotive Partner Bootcamp Google confidential2023


and proprietary
| Confidential
| Doand
notProprietary
distribute
Trace Analysis Walkthrough
2. Inspect CPU and memory tracks: In this case, it is apparent that there are big and gold cores being underutilized during the user switch.

Big and Gold Cores Idle

Google Automotive Partner Bootcamp Google confidential2023


and proprietary
| Confidential
| Doand
notProprietary
distribute
Trace Analysis Walkthrough
After identifying the area of interest, it is necessary to zoom in on events occurring during this underutilization.
This can be achieved by zeroing in on log events or processes that have significant activity during that period.

Area of Interest

Google Automotive Partner Bootcamp Google confidential2023


and proprietary
| Confidential
| Doand
notProprietary
distribute
Trace Analysis Walkthrough
3. Understand Context: Here, only 4 out of 8 cores are being utilized, which undoubtedly contribute to a prolonged user switch. This idleness appears early on in the user
switch when user 11 is being started. SystemServiceManager is responsible for starting system services during user initialization. SystemServiceManager will wait until all
services are created. It is clear that com.android.role.RoleService is the last service to be initialized and also requires the most time.

com.android.role.RoleService
onUserStarting method returns

Google Automotive Partner Bootcamp Google confidential2023


and proprietary
| Confidential
| Doand
notProprietary
distribute
Trace Analysis Walkthrough
4. Identify Culprit Process: Android logs allow us to identify that the com.google.android.permissioncontroller process is largely responsible for starting the RoleService.
Zooming in further, it is apparent that the RoleControllerService thread handles majority of the initialization.

Google Automotive Partner Bootcamp Google confidential2023


and proprietary
| Confidential
| Doand
notProprietary
distribute
Trace Analysis Walkthrough
5. Look for thread level interactions: Inspecting the com.google.android.permissioncontroller process and its threads may reveal further details about thread state. For
example, a long period of uninterruptible sleep could indicate heavy I/O usage. In this case there is nothing that indicates anything out of the ordinary.

Google Automotive Partner Bootcamp Google confidential2023


and proprietary
| Confidential
| Doand
notProprietary
distribute
Trace Analysis Walkthrough
5. Analyzing the RoleControllerService thread reveals that there is an excess of costly binder transactions occurring.
It is clear that these inter-process communication transactions are the cause of the slow down.

Google Automotive Partner Bootcamp Google confidential2023


and proprietary
| Confidential
| Doand
notProprietary
distribute
Advanced
Topics:
SQL Queries

Google Automotive Partner Bootcamp Google confidential2023


and proprietary
| Confidential
| Doand
notProprietary
distribute
Data Mining Using SQL Queries
Beyond visually inspecting system issues via the Perfetto UI, it is also possible to gain a deeper understanding through SQL queries.
One can access SQL queries via the below interface:

Google Automotive Partner Bootcamp Google confidential2023


and proprietary
| Confidential
| Doand
notProprietary
distribute
Data Mining Using SQL Queries
One common example is collecting the CPU Time for slices. The first step is to build a table that links slices with their thread state.

DROP VIEW IF EXISTS slice_with_utid;


CREATE VIEW slice_with_utid AS
SELECT
ts,
dur,
slice.name as slice_name,
slice.id as slice_id, utid,
thread.name as thread_name
FROM slice
JOIN thread_track ON thread_track.id = slice.track_id
JOIN thread USING (utid);

DROP TABLE IF EXISTS slice_thread_state_breakdown;


CREATE VIRTUAL TABLE slice_thread_state_breakdown
USING SPAN_LEFT_JOIN(
slice_with_utid PARTITIONED utid,
thread_state PARTITIONED utid
);

Google Automotive Partner Bootcamp Google confidential2023


and proprietary
| Confidential
| Doand
notProprietary
distribute
Data Mining Using SQL Queries
From the previous table, the CPU time for each slide in a Running state can be listed.

SELECT slice_id, slice_name, SUM(dur) AS cpu_time


FROM slice_thread_state_breakdown
WHERE state = 'Running'
GROUP BY slice_id;

Google Automotive Partner Bootcamp Google confidential2023


and proprietary
| Confidential
| Doand
notProprietary
distribute
Making
Debugging
Easier

Google Automotive Partner Bootcamp Google confidential2023


and proprietary
| Confidential
| Doand
notProprietary
distribute
Make App Debugging Easier
Performance Analysis Flow:
1. CUJ profiling
2. Trace recording
3. CUJ instrumenting
4. Performance Fixes

No

Requires
Select the CUJ to moremore
Requires Performance
Record a Trace No Implement Fix
profile instrumentation?
instrumenta Issue Fixed?
tion?

Yes Yes

Start Instrument the


CUJ End

Google Automotive Partner Bootcamp Google confidential2023


and proprietary
| Confidential
| Doand
notProprietary
distribute
Make App Debugging Easier
A powerful feature that can help with debugging is adding atrace logs that will appear in Perfetto.
Java applications can add trace logs using android.os.Trace.
Native applications can add trace logs using ATrace_beginSection() / ATrace_setCounter() defined in <trace.h>

Trace.traceBegin(TRACE_TAG, "Class#method");

Trace.traceEnd(TRACE_TAG);

Google Automotive Partner Bootcamp Google confidential2023


and proprietary
| Confidential
| Doand
notProprietary
distribute
Make App Debugging Easier
Before:

After:

Google Automotive Partner Bootcamp Google confidential2023


and proprietary
| Confidential
| Doand
notProprietary
distribute
Performance
Tuning

Google Automotive Partner Bootcamp Google confidential2023


and proprietary
| Confidential
| Doand
notProprietary
distribute
Performance Tuning After Boot
What is Performance Tuning?
WIth the aid of Perfetto, it is possible to identify performance issues
and analyze their root cause. The next step is to implement solutions to
solve these issues. One of the ways to achieve this is by iteratively
tuning the performance of the system.

Post Boot Tuning


One of the opportunities for performance tuning is during post boot.
After boot complete, there is heavy resource contention as multiple
applications attempt to perform initialization. Classically, this is known
as the Thundering Herd Problem, where lack of system resources leads
to degraded performance. There is opportunity to both improve
memory and CPU usage during the critical window after boot complete.

Google Automotive Partner Bootcamp Google confidential2023


and proprietary
| Confidential
| Doand
notProprietary
distribute
Memory
Tuning

Google Automotive Partner Bootcamp Google confidential2023


and proprietary
| Confidential
| Doand
notProprietary
distribute
Kernel kswapd
Kswapd watermark levels
● kswapd is a Kernel task to manage available free memory.
● Kernel uses 3 watermarks per memory zone to track pressure
100
○ Min, Low, and High Sleep & Check

● When free memory <= Low and > Min


○ kswapd performs asynchronous/indirect memory reclaim. 75
● When free memory <= Min
○ kswapd performs synchronous/direct memory reclaim. High Watermark - Indirect claim
50
○ System becomes unstable
● When free memory >= High
○ kswapd temporarily enters sleep state. 25 Low Watermark - Aggressive indirect reclaim

○ Periodically checks the memory pressure.


Reclaim types Min Watermark - Direct reclaim
0
● Indirect memory reclaim Watermark levels
○ Increases kswapd CPU usage.
○ May slow down other processes depending on the CPU &
memory pressure. Total memory Low

● Direct memory reclaim High Min

○ All new allocations will be blocked until kswapd frees up


memory up to min watermark.

Google Automotive Partner Bootcamp Google confidential2023


and proprietary
| Confidential
| Doand
notProprietary
distribute
kswapd tuning
Kernel knobs for tuning kswapd behavior: ● /proc/sys/vm/min_free_kbytes - Amount of free memory kept in reserve at all
● /proc/sys/vm/swappiness - Defines the aggressiveness of swapping out times. Defines min watermark across all memory zones.
memory pages of inactive processes. ○ Recommended value range: 1% - 2% of total system memory.
○ Range: 0-100 Default: 60 ○ High values can scale up watermark buffer spaces leading to
○ High values can cause Kernel to swap out processes even when enough ■ kswapd freeing too much memory than needed.
memory is available. ■ Frequent kswapd invocation causing CPU contention to spike.
○ Low values can cause Kernel to not swap out processes even when the ○ Low values can cause kswapd to not free up enough memory leading to
available memory is low.
■ System slowdown
○ Recommendation:
■ Hangs/crashes
■ Devices with high physical memory - use lower swappiness values.
■ Memory fragmentation
■ Devices with low physical memory - use higher swappiness values.
● /proc/sys/vm/watermark_scale_factor - Used to scale the buffer spaces
between memory zone watermarks.
○ Range: 0 - 1000 Default: 10
■ 10 means buffer space is 0.1% of available memory.
■ 1000 means buffer space is 10% of available memory.
○ Low values can cause too much direct reclaim or kswapd not freeing up
enough memory in a single pass.
○ High values can cause kswapd to free up more memory than needed.

Google Automotive Partner Bootcamp Google confidential2023


and proprietary
| Confidential
| Doand
notProprietary
distribute
kswapd tuning example
Before After
● /proc/sys/vm/watermark_scale_factor - 1 ● /proc/sys/vm/watermark_scale_factor - 109
● /proc/sys/vm/min_free_kbytes - 144 MiB ● /proc/sys/vm/min_free_kbytes - 60 MiB
● /proc/sys/vm/swappiness - 60 ● /proc/sys/vm/swappiness - 60

Watermark Level Size Relative to Total Physical Memory

Remaining - 94.72% Remaining - 96.57%

High - 0.88% High - 0.95%

Low - 0.88% Low - 0.95%

Min - 3.51% Min - 1.51%

Before After

Google Automotive Partner Bootcamp Google confidential2023


and proprietary
| Confidential
| Doand
notProprietary
distribute
kswapd tuning example

Kswapd cpu time millis Working set refault file Page stolen by kswapd

1450 180000 800000

750000

1200 160000
700000

650000
950 140000

600000

700 12000 550000

Before After Before After Before After

Google Automotive Partner Bootcamp Google confidential2023


and proprietary
| Confidential
| Doand
notProprietary
distribute
Dex
Optimization

Google Automotive Partner Bootcamp Google confidential2023


and proprietary
| Confidential
| Doand
notProprietary
distribute
Dex Optimization
What is Dex Optimization?
By default in Android, apps are executed in interpreted mode. Dex
optimization allows for compilation of selected code paths to machine
code, which accelerates code execution.

Why is this Important?


Background dex optimization already occurs regularly as users interact
with their apps. However, after initial installation, boot time
performance may be degraded if all apps run in interpreted mode.
There is an opportunity to compile key apps ahead of time to greatly
reduce boot times.

Which Apps should be Dex Pre-Optimized?


When apps run in interpreted mode, they will also undergo JIT
compilation. Post-boot, a high level of JIT compilation will lead to a high
degree of CPU contention, further exacerbating degraded startup
times. Identifying processes with high JIT CPU time will indicate apps
that can be dex pre-optimized, forgoing JIT entirely.

Google Automotive Partner Bootcamp Google confidential2023


and proprietary
| Confidential
| Doand
notProprietary
distribute
Dex Optimization
Heavy JIT Compilation in Perfetto
JIT compilation activity can be visualized via Perfetto.

Within a specific process, one can identify the JIT thread pool tracks. In this example,
Car Assistant is displaying a large number of JIT compilation events.

Google Automotive Partner Bootcamp Google confidential2023


and proprietary
| Confidential
| Doand
notProprietary
distribute
Dex Optimization
INCLUDE PERFETTO MODULE slices.slices;
Perfetto Query for Top Processes with the Most JIT CPU Time
DROP VIEW IF EXISTS interesting_slices_d0;
Using the query will allow one to obtain a table as shown below: CREATE VIEW interesting_slices_d0 AS
select id as slice_id, ts, dur, name, track_id, track_name, thread_name, utid, tid,
process_name, upid, pid from _slice_with_thread_and_process_info where
depth=0;

DROP TABLE IF EXISTS slice_thread_state_breakdown;


CREATE VIRTUAL TABLE slice_thread_state_breakdown
USING SPAN_LEFT_JOIN(
interesting_slices_d0 PARTITIONED utid,
thread_state PARTITIONED utid
);

SELECT sum(dur) total_duration, count(*) instances, substr(name, 0,


IIF(instr(name, ' ') > 0, instr(name, ' '), IIF(instr(name, ',') > 0, instr(name, ','),
length(name)))) as prefix_name,
substr(process_name, 0, IIF(instr(process_name, ':') > 0, instr(process_name, ':'),
length(process_name))) as process_name_prefix FROM
slice_thread_state_breakdown
WHERE state = 'Running' and prefix_name = "JIT"
group by prefix_name, process_name_prefix
order by total_duration desc;

Google Automotive Partner Bootcamp Google confidential2023


and proprietary
| Confidential
| Doand
notProprietary
distribute
Dex Optimization
Configuration
How to Configure Dex Pre-Optimization
For more details refer to
https://source.android.com/docs/core/runtime/configure#build_options.
PRODUCT_DEXPREOPT_SPEED_APPS += \
Add packages to the following makefile configuration: MapsCarPrebuilt \

Google Automotive Partner Bootcamp Google confidential2023


and proprietary
| Confidential
| Doand
notProprietary
distribute
Dex Optimization Configuration
How to Verify that an App is Dex Pre-Opted
Run the following ADB command:

$ adb shell pm art dump com.google.android.apps.maps


# Older releases may need to use this command instead:
$ adb shell dumpsys package dexopt | grep -i
com.google.android.apps.maps -A 2

The following output indicates Google Maps is executed in interpreted mode: The following output indicates that Google Maps was dex pre-opted:

[com.google.android.apps.maps] [com.google.android.apps.maps]
path: path: /product/priv-app/MapsCarPrebuilt/MapsCarPrebuilt.apk
/system/product/priv-app/MapsCarPrebuilt/MapsCarPrebuilt.apk x86_64: [status=speed] [reason=prebuilt] [primary-abi]
x86_64: [status=verify] [reason=prebuilt]

Google Automotive Partner Bootcamp Google confidential2023


and proprietary
| Confidential
| Doand
notProprietary
distribute
Further Materials / Important Links
Summary of useful information per section:
Trace Configuration:
https://perfetto.dev/docs/reference/trace-config-proto
How to Collect a Perfetto Trace:
https://perfetto.dev/docs/quickstart/android-tracing
Android Boot Tracing:
https://perfetto.dev/docs/case-studies/android-boot-tracing
CPU Tracks:
https://perfetto.dev/docs/data-sources/cpu-scheduling
Memory Tracks:
https://perfetto.dev/docs/data-sources/memory-counters
Atrace Logging:
https://perfetto.dev/docs/data-sources/atrace

Google Automotive Partner Bootcamp Google confidential2023


and proprietary
| Confidential
| Doand
notProprietary
distribute
Thank you

Google Automotive Partner Bootcamp Google confidential2023


and proprietary
| Confidential
| Doand
notProprietary
distribute

You might also like