The Ultimate Guide to
Java Performance Tuning
Ender Aydin Orak
koders.co
1 INTRODUCTION
koders.co
WHAT YOU WILL LEARN ?
You WILL LEARN:
Application performance principles &
methods
You WILL LEARN:
JVM structure and internals regarding
application performance
You WILL LEARN:
Garbage Collection types and when to
use which
You WILL LEARN:
Monitoring, Profiling, Tuning,
Troubleshooting JVM applications
You WILL LEARN:
Using OS and JVM tools for better
application performance
You WILL LEARN:
Applying performance best practices
You WILL LEARN:
Java language level tips & tricks
YOU WILL PRACTICE ON:
• Dead locks • Collections
• Memory leaks • Locks
• Lock contention • Multithreading
• CPU utilization • Best practices
Performance Approaches
• Top-Down: Focus on top level application
• Application Developers (our approach)
Performance Approaches
• Bottom-Up: Focus on the lowest level: CPU.
• Performance Specialists
Performance Tuning Steps
Monitoring
Performance Tuning Steps
Profiling
Performance Tuning Steps
Tuning
JVM Overview &
2 INTERNALS
koders.co
Objectives
• JVM Runtime & Architecture
• Command Line Options
• VM Life Cycle
• Class Loading
JAVA PROGRAMMING LANGUAGE
• Object oriented, Garbage collected*
• Class based
• .java files (source) compiled into .class files (bytecode)
• JVM executes platform independent bytecodes
“All problems in computer science can be
solved by another level of indirection”
–DAVID WHEELER
JVM Overvıew
• JVM: Java Virtual Machine
• A specification (JCP, JSR)
• Can have multiple implementations
• OpenJDK, Hotspot*, JRockit (Oracle), IBM J9, much
more
• Platform independent: “Write once, run everywhere”
“All non-trivial abstractions, to some
degree, are leaky.”
–JOEL SPOLSKY
HOTSPOT VM ARCHITECTURE
HOTSPOT VM ARCHITECTURE
COMMAND LINE OPTIONS
• Standard: Required by JVM specification, standard
on all implementations (-server, -classpath)
• Nonstandard: JVM implementation dependent. (Start
with -X)
• Developer Options: Non-stable, JVM implementation
dependent options for specific cases (Start with -XX in
HotSpot VM)
JVM LIFE CYCLE
1. Parse command line options
2. Establish heap sizes and JIT compiler (if not specified)
3. Establish environment variables (CLASSPATH, etc.)
4. Fetch Main-Class from Manifest (if not specified)
5. Create HotSpot VM (JNI_CreateJavaVM)
6. Load Main-Class and get main method attributes
7. Invoke main method passing provided command line arguments
PERFORMANCE
3 Overview
koders.co
Objectives
• Key concepts regarding application performance
• Common performance problems and principles
• Methodology to follow in solving problems
QUESTIONS & Expectations
• Expected throughput ?
• Acceptable latency per request ?
• How many concurrent users/tasks ?
• Expected throughput and latency ?
• Acceptable garbage collection latency ?
Terminology
• CPU Utilization: Percentage of the CPU usage
(user+kernel)
• User CPU Utilization: the percent of time the application
spends in application code
TERMINOLOGY
• Memory Utilization: Memory usage percentage
(ram/swap)
• Swapping should be avoided all times.
TERMINOLOGY
• Lock Contention: The case where a thread or process
tries to acquire a lock held by another process or
thread.
• Prevents concurrency and utilization. Should be avoided as
much as possible.
TERMINOLOGY
• Network & Disk I/O Utilization: The amount of data
sent and received via network and disk.
• Should be traced and used carefully.
Performance
• Aspects of performance:
• Responsiveness
• Throughput
• Memory Footprint
• Startup Time
• Scalability
RESPONSIVENESS
• Ability of a system to complete assigned tasks within
a given time
• Critical on most of modern software applications
(Web, Desktop, CRUD apps, Web services)
• Long pause times are not acceptable
• The focus is on responding in short periods of time
THROUGHPUT
• The amount of work done in a specific period of time.
• Critical for some specific application types
(e.g. Data analysis, Batch operations, Report generation)
• High pause times are acceptable
• Focus is on how much work are getting done over a longer
period of time
Memory Footprint
• The amount of main memory used by the application
• How much memory ?
• How the usage changes ?
• Does application uses any swap space ?
• Dedicated or shared system ?
STARTUP TIME
• The time taken for an application to start
• Important for both the server and client applications
• “Time ‘till performance”
SCALABILITY
• How well an application performs as the load on it
increases
• Huge topic that shapes the modern software architectures
• Should be linear, not exponential
• Can be measured on different layers in a complex system
Scalability
Focus areas
• Java application performance
• Tuning JVM for throughput or responsiveness
• Discovery, troubleshooting and tuning JVM
Performance Methodology
• Our steps to follow
1.Monitoring
2.Profiling
3.Tuning
Performance Monitoring
• Non-intrusively collecting and observing performance
data
• Early detection of possible problems
• Essential for production environments
• Early stage for troubleshooting problems
• OS and JVM tools
PERFORMANCE PROFILING
• Collecting and observing performance data using
special tools
• More intrusive & has affect on performance
• Narrower focus to find problems
• Not suitable for production environments
PERFORMANCE TUNING
• Changing configuration, parameters or even source
code for optimizing performance
• Follows monitoring and profiling
• Targets responsiveness or throughput
Development PROCESS
PERFORMANCE PROCESS
JVM AND GARBAGE
4 COLLECTION
koders.co
Objectives
• What garbage collection is and what it does
• Types of garbage collectors
• Differences and basic use cases of different garbage
collectors
• Garbage collection process
Garbage collectıon
• In computer science, garbage collection (GC) is a
form of automatic memory management.
• The garbage collector, attempts to reclaim memory
occupied by objects that are no longer in use by the
program.
Garbage Collectıon
• Main tasks of GC
• Allocating memory for new objects
• Keeping live (referenced) objects in memory
• Removing dead (unreferenced) objects and reclaiming
memory used by them
GC Steps: MARKING
GC Steps: DELETION [normal]
GC Steps: DELETION [COMPACTING]
GENERATIONAL GC
• Hotspot JVM is split into generational spaces
WHY GENERATIONAL GC ?
• Object life patterns in OO languages:
• Most objects “die young”
• Older objects rarely references to young ones
GENERATIONAL GC
GC STEPS: YOUNG GC
GC STEPS: YOUNG GC
GC STEPS: YOUNG GC
GC STEPS: YOUNG GC
GC STEPS: YOUNG GC
GC STEPS: YOUNG GC
GC STEPS: YOUNG GC
OLD & PERMANENT GENERATIONS
GARBAGE
5 COLLECTORS
koders.co
Objectives
• Garbage collection performance metrics
• Garbage collection algorithms
• Types of garbage collectors
• JVM ergonomics
GC PERFORMANCE METRICS
• There are mainly 3 ways to measure GC
performance:
• Throughput
• Responsiveness
• Memory footprint
FOCUS: Throughput
• Mostly long-running, batch processes
• High pause times can be acceptable
• Responsiveness per process is not critical
FOCUS: RESPONSIVENESS
• Priority is on servicing all requests within a predefined
time interval
• High GC pause times are not acceptable
• Throughput is secondary
GC ALGORITHMS
• Serial vs Parallel
• Stop-the-world vs Concurrent
• Compacting vs Non-Compacting vs Copying
Serial vs Parallel
STOP-THE-WORLD vs CONCURRENT
• STW: Simpler, more pause time,
memory need is less, simpler to
tune
• CC: Complicated, harder to tune,
memory footprint is larger,
less pause time
CoMPACTING vs Non-Compactıng
TYPES OF GC
• Serial Collector
• Parallel Collector
• Young (Parallel Collector)
• Young & Old (Parallel Compacting Collector)
• Concurrent Mark-Sweep Collector
• G1 Collector
SERIAL / Parallel Collector
SERIAL COllector
• Serial collection for both young and old generations
• Default for client-style machines
• Suitable for:
• Applications that do not have low pause reqs
• Platforms that do not have much resources
• Can be explicitly enabled with: -XX:+UseSerialGC
PARALLEL COLLECTOR
• Two options with parallel collectors:
• Young (-XX+UseParallelGC)
• Young and Old (-XX+UseParallelOldGC - Compacting)
• Throughput is important
• Suitable for
• Machines with large memory, multiple processors & cores
CMS COLLECTOR
• Focus: Responsiveness
• Low pause times are required
• Concurrent collector
CMS COLLECTOR
g1 Collector
g1 Collector [REGIONS]
g1: YOUNG GC
g1: YOUNG GC
g1: YOUNG GC [end]
g1: PHASES
1. Initial Mark (stop-the world)
2. Root region scanning
3. Concurrent marking
4. Remark (stop-the-world)
5. Cleanup (stop-the-world & concurrent)
* Copying (stop-the-world)
g1: PHASES [INITIAL MARK]
g1: PHASES [Concurrent mark]
g1: PHASES [REMARK]
g1: PHASES [COPYING/CLEANUP]
g1: PHASES [AFTER COPYING]
COMMAND LINE
6 Monitoring
koders.co
Objectıves
• Using JVM command line tools
• jps, jmd, stat
• Monitor JVMs
• Identify running JVMs
• Monitor GC & JIT activity
MONITORING
• First step to observe & identify (possible) problems
MONITORING
WHAT TO MONITOR
• Parts of interest
• Heap usage & Garbage collection
• JIT compilation
• Data of interest
• Frequency and duration of GCs
• Java heap usage
• Thread counts & states
JDK COMMAND LINE TOOLS
• jps
• jmcd
• jstat
JIT COMPILATION
• JIT compiler: optimizer, just in-time compiler
• Command line tools to monitor
• -XX:+PrintCompilation (~2% CPU)
• jstat
• Data of interest
• Frequency, duration, opt/de-opt cycles, failed compilations
INTERFERING JIT COMPILER
• .hotspot_compiler file
• Turns of jit compilation for specified methods/classes
• Very rarely used
• Opt/de-opt cycles, failure or possible bug in JVM
INTERFERING JIT COMPILER
• Via .hotspot_compiler file:
• exclude Package/to/Class method
• exclude java/lang/String toString
• Via command line:
• -XX:CompileCommand=exclude,java/lang/String,toString
Monitoring OS
7 Performance
koders.co
Objectıves
• Monitor CPU usage
• Monitor processes
• Monitor network & disk & swap I/O
• On Linux (+Windows)
Terminology
• CPU Utilization: Percentage of the CPU usage
(user+kernel)
• User CPU Utilization: the percent of time the application
spends in application code
TERMINOLOGY
• Memory Utilization: Memory usage percentage and
whether all the memory used by process reside in
physical (ram) or virtual (swap) memory.
• Swapping (using disk space as virtual memory) is pretty
expensive and should be avoided all times.
TERMINOLOGY
• Lock Contention: The case where a thread or process
tries to acquire a lock held by another process or
thread.
• Prevents concurrency and utilization. Should be avoided as
much as possible.
TERMINOLOGY
• Network & Disk I/O Utilization: The amount of data
sent and received via network and disk.
• Should be traced and used carefully.
Monitoring CPU Usage
• Monitor general and process based CPU usage
• Key definitions & metrics
• User (usr) time
• System (sys) time
• Voluntary context switch (VCX)
• Involuntary context switch (ICX)
MONITORING CPU
• Key points
• CPU utilization
• High sys/usr time
• CPU scheduler run queue
Monitoring CPU Usage
• Tools to use (Linux)
• top • prstat
• htop • gnome-system-monitor
• vmstat
MONITORING MEMORY
• Key points
• Memory footprint
• Change in usage of memory
• Virtual memory usage
MONITORING MEMORY
• Tools to use (Linux)
• free
• vmstat
MONITORING DISK I/O
• Key points
• Number of disk accesses
• Disk access latencies
• Virtual memory usage
MONITORING DISK I/O
• Tools to use (Linux)
• iostat
• lsof
• iotop
MONITORING NETWORK I/O
• Key points
• Connection count
• Connection statistics & states
• Total network traffic
MONITORING NETWORK I/O
• Tools to use (Linux)
• netstat • iftop
• iptraf • monitorix
• tcpdump
USING
8 Visual Tools
koders.co
Objectıves
• Monitor Java applications using visual tools:
• JConsole
• VisualVM
• Mission Control
JConsole
• Ships with JVM
• Enables to monitor and
control JVM
• CPU, Memory,
Classloading, Threads
• Demo
VISUALVM
• Graphical monitoring,
profiling, troubleshooting
tool
• Has Profiling and
Sampling capabilities
• Has plugin support
(Visualgc, btrace and
more)
• Demo
MISSION CONTROL
• Comprehensive
application
• Better UI
• Lots of useful information
• Monitor,
operate,manage, profile
Java applications
• Demo
JMX - MANAGED BEANS
• JMX: Java Management Extensions
• Used to monitor & manage JVM
• Managed Beans (MBeans)
• Objects used to manage Java resources
• Managed by JMX agents
PROFILING JAVA
9 APPLICATIONS
koders.co
Objectives
• Profiling Java applications using:
• jmap and jhat
• JVisual VM
• Java Flight Recorder
JMAP and JHAT
• JVM command line tools
• jmap: Creates heap profile data
• jhat: Primitively Presents data in browser
• Demo
VISUALVM
• Sampling & profiling
abilites
• Sampling: less intrusive
• Demo
10 Profiling
Performance Issues
koders.co
Objectives
• Profiling Java applications to troubleshoot and
optimize
• Detecting memory leaks
• Detecting lock contentions
• Identifying anti-patterns in heap profiles
HEAP PROFILING
• Necessary when:
• Observing frequent garbage collections
• Need for a larger heap by application
• Tune application for better performance & hardware
utilization
HEAP PROFILING: TIPS
• What to look for ?
• Objects with
• a large amount of bytes being allocated
• a high number of object allocations
• Stack traces where
• large amounts of bytes are being allocated
• large number of objects are being allocated
HEAP PROFILING: TOOLS
• jmap and jhat
• Snapshot of the application
• Top consumers & Allocation stack traces
• Compare multiple snapshots
MEMORY LEAK
• Refers to the situation when an object unintentionally
resides in memory thus can not be collected by GC.
• Frequent garbage collection
• Poor application performance
• Application failure (Out of memory error) Frequent
garbage collection
MEMORY LEAK: TOOLS
• Visual VM
• Flight Recorder
• jmap and jhat
MEMORY LEAK: TIPS
• Monitor running application
• Look for memory changes, survivor generations
• Profile applications, compare snapshots
• Look for object count changes, top grovers
• Always use -XX:+HeapDumpOnOutOfMemoryError
parameter on production
LOCK CONTENTION
• Usage of synchronization utilities (synchronized,
locks, conc. collections, etc.) cause threads to wait or
perform worse.
• Should be kept as minimum as possible.
LOCK CONTENTION: MONITOR
• Things to observe:
• High number of voluntary context switches
• Thread states and state changes (Visual VM, Flight
Recorder)
• Possible deadlocks (jstack, Visual Tools)
PROFILING ANTI-PATTERNS
• Frequent garbage collections
• Overallocation of objects
• High number of threads
• High volume of lock contention
• Large number of exception objects
GARBAGE COLLECTION
11 Tuning
koders.co
Objectives
• Learning to tune GC by setting generation sizes
• Comparing and selecting suitable GC for
performance requirements
• Monitor and understand GC outputs
Garbage Collectıon
• Main tasks of GC
• Allocating memory for new objects
• Keeping live (referenced) objects in memory
• Removing dead (unreferenced) objects and reclaiming
memory used by them
JVM Heap Size Options
JVM Heap Size Options
-Xmx<size> : Maximum size of the Java heap
-Xms<size> : Initial heap size
-Xmn<size> : Sets initial and max heap sizes as same
-XX:MaxPermSize=<size> : Max Perm size
-XX:PermSize=<size> : Initial Perm size
-XX:MaxNewSize=<size> : Max New size
-XX:NewSize=<size> : Initial New size
-XX:NewRatio=<size> : Ratio of Young to Tenured space
GARBAGE COLLECTORS
• Serial Collector
• Parallel (Throughput) Collector
• Concurrent Mark-Sweep (CMS) Collector
• Garbage First (G1) Collector
SERIAL COLLECTOR
• Single-threaded young generation collector
• Single-threaded old generation collector
• Parameter: -XX:+UseSerialGC
SERIAL COLLECTOR: TIPS
• Not suitable for applications with high performance
requirements
• Can be suitable for client applications with limited
hardware resources
• More suitable for platforms that has less than 256
MB of memory for JVM and do not have multicores
PARALLEL COLLECTOR
• Multi-threaded young generation collector
• Multi-threaded old generation collector
• Parameters:
• -XX+UseParallelGC (Parallel Young, Single-Threaded Old)
• -XX:+UseParallelOldGC (Young&Old BOTH MultiThreaded)
PARALLEL COLLECTOR: TIPS
• Suitable for applications that target throughput rather
than responsiveness
• Suitable for platforms that have multiple processors &
cores
• -XX:ParallelGCThreads=[N] can be used to specify GC
thread count
• default = Runtime.availableProcessors() (JDK 7+)
• Better reduced if multiple JVMs running on the same machine
CMS COLLECTOR
• Multi-threaded young generation collector
• Single-threaded concurrent old generation collector
• Parameter: -XX:+ConcMarkSweepGC
CMS COLLECTOR: GOOD TO KNOW
• CMS targets responsiveness and runs concurrently.
And it doesn’t come for free.
• More memory (~20%) and CPU resources needed
• Memory fragmentation
• It can lose the race. (Concurrent mode failure)
CMS COLLECTOR: GOOD TO KNOW
• CMS has to start earlier to collect not to lose the race
• -XX:CMSInitiatingOccupancyFraction=n (default 60%, J8)
• n: Percentage of tenured space size
CMS COLLECTOR: TIPS
• Size young generation as large as possible
• Small young generation puts pressure on old generation
• Consider heap profiling
• Choose tuning survivor spaces
• Enable class-unloading if needed (appservers, etc.)
-XX:+CMSClassUnloadingEnabled, -XX+PermGenSweepingEnabled
CMS: TIPS
• TODO : CMS important parameters
G1 Collector
• Parallel and concurrent young generation collector
• Single-threaded old generation collector
• Parameter: -XX:+UseG1GC
• Expected to replace CMS (J9)
G1 Collector: GOOD TO KNOW
• Concurrent & responsiveness collector like G1.
Suitable for multiprocessor platforms and heap sizes
of 6GB or more.
• Targets to stay within specified pause-time
requirements.
• Suitable for stable and predictable GC time 0.5 seconds or
below.
G1 COLLECTOR: TIPS
• G1 optimizes itself to meet pause-time requirements.
• Do not set the size of young generation space
• Use 90% goal instead of average response time (ART)
• A lower pause-time goal causes more effort of GC,
throughput decreases
Language-Level
12 TIPS & TRICS
koders.co
Objectives
• Object allocation best practices
• Java reference types and differences between them
• Usage of finalizers
• Synchronization tips & tricks & best practices
OBJECTS: BEST PRACTICES
• The problem is not the object allocation, nor the
reclamation
• Not expensive: ~10 native instructions in common case
• Allocating small objects for intermediate results is fine
OBJECTS: BEST PRACTICES
• Use short-lived immutable objects instead of long-
lived mutable objects.
• Functional Programming is rising !
• Use clearer, simpler code with more allocations
instead of more obscure code with fewer allocations
• KISS: Keep It Simple Stupid
• “Premature optimization is root of all evil” - Donald Knuth
OBJECTS: BEST PRACTICES
• Large Objects are expensive !
• Allocation
• Initialization
• Different sized large objects can cause fragmentation
• Avoid creating large objects
JAVA REFERENCE TYPES
REFERENCES: SOFT REFERENCE
• “Clear this object if you don’t have enough memory, I
can handle that.”
• get() returns the object if it is not reclaimed by GC.
• -XX:SoftRefLRUPolicyMSPerMB=[n] can be used to
control lifetime of the reference (default 1000 ms)
• Use case: Caches
REFERENCES: WEAK REFERENCE
• “Consider this reference as if it doesn’t exist. Let me
access it if it is still available.”
• get() returns the object if it is not reclaimed by GC.
• Use case: Thread pools
REFERENCES: PHANTOM REFERENCE
• “I just want to know if you have deleted the object or
not”
• get() always returns null.
• Use Case: Finalize actions
FINALIZERS
• Finalizers are not equivalents of C++ destructors
• Finalize methods have almost no practical and
meaningful use case
• Finalize methods of objects are called by GC threads.
• Handled differently than other objects, create pressure on GC
• Time consuming operations lengthen GC cycle
• Not guaranteed to be called
LANGUAGE TIPS: STRINGS
• Strings are immutable
• String “literals” are cached in String Pool
• Avoid creating Strings with “new”
LANGUAGE TIPS: STRINGS
• Avoid String concatenation
• Use StringBuilder with appropriate initial size
• Not StringBuffer (avoid synchronization)
LANGUAGE TIPS: USE PRIMITIVES
• Use primitives whenever possible, not wrapper
objects.
• Auto Boxing and Unboxing are not free of cost.
LANGUAGE TIPS: AVOID EXCEPTIONS
• Exceptions are very expensive objects
• Avoid creating them for
• non-exceptional cases
• flow control
THREADS
• Avoid excessive use of synchronized
• Increases lock contention, leads to poor performance
• Can cause dead-locks
• Minimize the synchronization
• Only for the critical section
• As short as possible
• Use other locks, concurrent collections whenever suitable
Threads: TIPS
• Favor immutable objects
• No need for synchronization
• Embrace functional paradigm
• Do not use threads directly
• Hard to maintain and program correctly
• Use Executers, thread pools
• Use concurrent collections and tune them properly
CACHING
• Caching is a common source of memory leaks
• Avoid when possible
• Avoid creating large objects in the first place
• Mind when to remove any object added to cache
• Make sure it happens, in any condition
That’s all folks!
Congrats!
Ender Aydin Orak
koders.co