<Edit 2>I managed to get Mono natively on top of Android's libc, Bionic! This resulted in another 50% or so speed improvement; I am guessing this is because Bionic is compiled for the Thumb instruction set, which is supposedly faster. The post has been updated once again with the new results</Edit>
<Edit 1>I've updated the performance scores with Mono as pulled from the trunk (I was using 2.0.1) and using eglib. It's gotten faster and uses even less memory.</Edit>
<Edit 0>The original article was only about Dalvik vs Mono, and begged the question, why didn't Google leverage Mono's existing open source technology? However, I found that Sun has an ARM version of their Java Runtime environment available as a 90 day trial. So I also ran the tests against that as well. Sun's ARM JRE is around as fast Mono, but at a greater memory cost. However, the Sun Java Runtime Environment is not open source, unlike Mono. So, it is not a viable runtime for the Open Handset Alliance's platform.</Edit>
Google has had a little egg on its face recently. They wrote up a 40 page comic touting the awesomeness Chrome V8's performance, only to be thoroughly trounced by TraceMonkey and Squirrelfish Extreme in comparative benchmarks (It's ok guys, I still prefer Chrome as my browser though).
So after that embarrassing showing, I was naturally a little skeptical about the supposed benefits that Dalvik provided for mobile devices. To better understand Dalvik's goals and inner workings, I watched an hour long presentation starring its creator, Dan Bornstein. The two line summary is that Dalvik is designed to minimize memory usage by maximizing shared memory. The memory that Dalvik is sharing are the common framework dex files and application dex files (dex is the byte code the Dalvik interpreter runs).
The first thing that bugged me about this design, is that sharing the code segments of dex files would be completely unnecessary if the applications were purely native. In Linux, the code segments of libraries are shared by all processes anyways. So, realistically, there is no benefit in doing this. In fact, Mono's managed assemblies also reap these same benefits of multiple processes sharing the same code segment in memory.
The second thing that bugged me about this presentation was Dan starts out talking about how battery life is not scaling with Moore's law, which is certainly true. But if the battery is the primary constraint on the device, why is Dalvik so concerned with minimizing memory usage? I am by no means a VM design guru, as I'm sure he is, but I can say the following with certainty:
- Total memory usage has absolutely no impact on battery life. The chips are being powered regardless of how much of their memory is being used. Increasing the total memory available on a device will also only cause marginal increase in battery drain. Memory is not something that taxes the battery compared to other components of the system.
- Battery life is primarily affected by how much you tax the processor and the other hardware components of the device: especially the use of 3G/EDGE and WiFi radios.
- Interpreting byte code will tax the processor and thus the battery much more than native/JIT code.
- Modern (Dream/iPhone comparable) hardware running Windows Mobile is rarely memory constrained, and they don't have a fancy memory sharing runtime. Memory constraints (in my mobile experience) become an issue on Windows Mobile when several applications are running at the same time. And this problem can be solved at the application framework level; such as how the Android Application life cycle is implemented. If all applications can suspend and restore at the system's whim, then memory consumption is trivialized. However, the application framework is not tied to the Dalvik runtime. (I.e., it can be ported to work with native code, Mono/.NET, JVM, whatever)
- Generally in applications, the code's memory footprint is trivial compared to the application data memory footprint (images, text, video, etc). Dalvik is overly concerned with optimizing the memory size of dex files and sharing memory. Dan's presentation did a comparison between the Browser's Java .class files versus the Dalvik .dex files (the .dex file is around 250k, around half the size of the .class files). My reaction to that is whoopity-shit. What happens when you start up the Browser? You head to your favorite webpage, it loads up a half dozen images which decompress to a raw R5G6B5 format, which then clocks in at several megabytes. That really trivializes the few hundred kilobytes that Dalvik is trying to save.
This leads me to believe that Google committed a classic performance optimization mistake: they are optimizing an aspect of the system that is trivial in the grand scheme. To poke a little nerdy fun at a portion of Dan's presentation, it is akin to tweaking your for loops to iterate downwards for better performance. And all the while the loops are being used perform an inefficient selection sort.
Regardless, all speculations and theories aside, let's let real world scenarios speak for itself. The T-Mobile G1, aka HTC Dream, has terrible battery life when compared to its siblings of the Windows Mobile variety. (I own or have owned a Dream, Touch, Touch Cruise, and Touch Diamond in the past year)
Runtime Memory Usage
My first test was to create a simple hello world program for both runtimes. Hello World would be printed to the screen, and then the thread would sleep for 30 seconds, allowing me to peek at the process' memory usage.
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
root 14500 43.0 3.8 14788 3816 pts/0 Rl+ 01:35 0:01 ./mono MonoTests.exe ...
root 1605 49.0 4.8 34940 4812 pts/2 Sl+ 20:10 0:00 /system/bin/dalvikvm ...
root 4464 64.0 6.7 180388 6740 pts/1 Sl+ 18:21 0:02 java -jar JavaTests ...
Ok, so this surprised me a bit. Mono needs around half the memory to start up? Using pmap on the dalvikvm process shows that it is referencing a lot more "base" system libraries than Mono. I suppose at the end of the day, it doesn't matter, because on Linux, libraries are loaded and shared between processes. I also took a pmap snapshot of Mono and Java for those interested (Sun's ARM JRE is quite bloated...).
I'll be the first to admit, these comparisons aren't fair at all. No interpreter will ever run as fast as native code. But, I'll test it anyway. These tests purposely steer clear of the calling into underlying libraries. The goal is to benchmark the memory usage and performance of the runtimes themselves by way of very simple applications. Click here to view the code for the Java and the C# tests.
Selection Sort Test
This test creates a reverse sorted array of integers between 0 and 1000 and sorts them into increasing order (and does it 10 times, excluding the results of the first). Lower numbers are better. Results:
|Java SE for Embedded||895|
Class (and Structure) method call Test
This test creates instantiates an array of 10000 FibContainer instances. FibContainer is a simple class:
public void Compute(FibContainer previous, FibContainer beforePrevious)
public void Compute(FibContainerStruct previous, FibContainerStruct beforePrevious)
public int getValue()
public void setValue(int value)
public void Compute(FibContainer previous, FibContainer beforePrevious)
It then iterates over the array and calculates and stores the Fibonacci series. The test notes 3 things: total memory in use by the runtime after allocating the array, the time to allocate the array, and the time to calculate the Fibonacci series (the method calls are intentional). Note that I also performed this same test on Mono with a feature not available in the Java language: I used a struct instead of a class. Smaller numbers are better in all cases. This test was run 50 times (the first excluded):
|Memory (bytes)||Allocation Time (ms)||Calculation Time (ms)|
Equivalent C++ code would allocate the amount of memory shown above. So, as you can see, Mono has around 33% less overhead when allocating classes. It is also around 8% faster at doing those allocations, and the calculation time completely blows Dalvik out of the water. And by way of intelligent usage of structs in Mono, you can leverage near bare metal memory usage. (Not to mention that arrays of structs containing blittable types are themselves blittable. This is very friendly for processor/memory caches. It also provides for easier interaction with native calls, such sending an array of vertices to OpenGL. But I digress.)
Long story short: from my initial, limited, and naive testing, Mono is faster and uses less memory than Dalvik. And it is not even designed to run on mobile devices. So it begs the question, why didn't Google just convert the .class files to CIL and use the Mono runtime? That way they wouldn't have alienated Java developers, would have access to open source Java libraries they so covet, wooed .NET developers, and wouldn't have needed to invent their own sub-par runtime!
I also want to test out the performance of the two Garbage Collectors as well as native function invocation (P/Invoke and JNI), but I'm hungry and will do that in a later post. Until next time friends!