How we found a race condition in the AOSP with the Android emulator that affected the amount of heap space available to apps.
Rainforest supports testing native mobile applications on Android using the official android emulator from Google. Using emulators instead of real physical devices provides a bunch of benefits such as being able to reproduce issues locally (hard to do if you don’t have the same hardware device, but trivial if you can run the same emulator), better isolation (no need to wipe/worry about data leaking on a real device, we can just throw the entire emulator away and make a new one), faster turnaround (we are not limited by the number of physical devices), and some nice debugging features/functionality that the emulator supports (location spoofing, virtual camera support, etc).
One of our customers was experiencing periodic crashes of their application when testing in our Android 10 emulator. After some initial investigation it appeared that their application was exhausting the amount of heap space available leading to a crash. We found that in the instances where their application was crashing, the log messages would show that the heap was only 16 MiB. This was quite surprising to us since our emulators have ~4 GiB of memory. It was also very odd because further investigation (i.e. adb shell getprop dalvik.vm.heapsize
) showed that the heap size should have been 576m
. So why was the application crashing after only using 16 MiB?
How Android’s jvm is configured
To get to the bottom of this we needed to understand how the Android java virtual machine is configured. The short version is that on boot a process called zygote
is started. This process launches a jvm and preloads some common Android classes into it. When any application is started, the zygote
process is forked and the application starts running on the pre-initialized jvm instance. This saves the cost of initializing a new jvm instance every time an app starts.
There is a great post about how all of this works if you’re looking for more detail.
The important thing here is that zygote
is what controls the jvm runtime configuration, including the jvm heap size. Since Android is open source we can go look at the source to see how the jvm is configured.
In frameworks/base/core/jni/AndroidRuntime.cpp:770 we find:
/*
* The default starting and maximum size of the heap. Larger
* values should be specified in a product property override.
*/
parseRuntimeOption("dalvik.vm.heapsize", heapsizeOptsBuf, "-Xmx", "16m");
This argument should look familiar if you’ve worked with Java in the past: -Xmx
is the standard way to control the heap size in a Java application. This line of code sets the heap size to the value of the dalvik.vm.heapsize
property with a default of 16m
if that property doesn’t exist.
It also turns out that all the jvm arguments are logged by the zygote process on startup so we can check what the value being set on boot is. In our case we were seeing -Xmx 16m
as one of the arguments:
zygote : option[0]=-Xzygote
zygote : option[1]=-Xcheck:jni
zygote : option[2]=exit
zygote : option[3]=vfprintf
zygote : option[4]=sensitiveThread
zygote : option[5]=-verbose:gc
zygote : option[6]=-Xms4m
zygote : option[7]=-Xmx16m
zygote : option[8]=-Xusejit:true
Finding the race
After the Android emulator finishes booting we know dalvik.vm.heapsize
has the correct value. But we know from the logs that when zygote
initializes it’s being set to 16m. This means the value is either 16m at that time, or it’s unset and falling back to the default.
System properties
System properties are loaded by the init
binary before it starts executing the init scripts and read a few different files (see system/core/init/property_service.cpp:876 for the specific details); but in the case of the Android 10 emulator the only relevant one is /system/build.prop
. Checking this file reveals that there is no value for the dalvik.vm.heapsize
property specified. Searching the filesystem for other property files that contain this property setting comes up empty as well.
Android Init
Inside system/core/init/init.cpp:648, a function called process_kernel_cmdline()
runs which parses the kernel command line and creates Android properties out of what it finds there. These properties are created as ro.kernel.
. This way of setting properties is really only useful to the Android emulator so that it can allow users some settings knobs which impact these values inside the emulator. Since this is only setting ro.kernel.qemu.dalvik.vm.heapsize
, we’re still left wondering how dalvik.vm.heapsize
eventually gets set to the proper value.
Digging into the init
scripts, we discover in vendor/etc/init/hw/init.ranchu.rc:36-37 a call to setprop
which copies the value from the ro.kernel
property to the real one. In the main /init.rc we find out that zygote
is asked to start before this. Simplifying that down, the relevant parts of the init scripts look something like this:
# in /init.rc
on late-init
trigger zygote-start
trigger boot
on zygote-start
start zygote
# in vendor/etc/init/hw/init.ranchu.rc
on boot
setprop dalvik.vm.heapsize ${ro.kernel.qemu.dalvik.vm.heapsize}
It’s important to understand how these init scripts are parsed; thankfully this is described in detail in the AOSP source code. To save you some reading, the important part is that start zygote
is not synchronous (and even if it were, zygote
does not immediately read the system property). The race should become clear now.
The race in action:
- Android boots, and eventually the main
init
binary starts init
parses the kernel cmdline, which setsro.kernel.qemu.dalvik.vm.heapsize
to the correct value the emulator provides(576m)
init
starts running therc init
scriptson late-init
runs, which triggers thezygote-start
eventon zygote-start
runsstart zygote
, which launches thezygote
process asynchronouslyon late-init
continues running, and eventually triggersboot
- The race is lost:
zygote
reads thedalvik.vm.heapsize
property, finds it unset, and defaults to16m
on boot
frominit.ranchu.rc
runs, and callssetprop dalvik.vm.heapsize ${ro.kernel.qemu.dalvik.vm.heapsize}
- The race is won:
zygote
reads thedalvik.vm.heapsize
property which contains the correct value,576m
Making sure we always win the race
Now that we understand what is going wrong we need to figure out a fix. Fortunately it ends up being very simple in our case since we have root inside the emulator and can change any files we’d like: just statically configure dalvik.vm.heapsize
in /system/build.prop
instead of relying on it to come through as a kernel command line argument. This will be loaded early in the init process so that it’s present before zygote
launches and needs it. It’s important to make sure that this value matches the value the emulator is setting on the kernel command line, otherwise you can still encounter some inconsistency if the emulator wins the race and resets the value to something different.