java vs c on ARM

Posted: July 11, 2012 in linaro, open_source

To go hand in hand with my initial hadoop experiments, one of the questions I’ve had on my mind is the state of java on ARM, specifically how is the performance of the openjdk?

Measurements

I’ve picked SciBench2 which has both a C version and a java implementation. This allows us to compare the two for each algorithm implemented in SciBench2. We should be able to measure the ratio of C results to Java results on the same architecture and then compare those ratios between the the two architectures. If the ratio is greater on ARM than say Intel, then we know that the implementation of Java on ARM needs improvement.

In the case of SciMark2 there are 5 components to the benchmark, all measured in MFLOPS.

  1. FFT (1024)
  2. SOR (100×100)
  3. MonteCarlo
  4. Sparse matmult (N=1000, nz=5000)
  5. LU (100×100)

Running 10 measurements of the c version and java version on ARM and then the same on intel x86_64. The c version is built -O3. In the case of java we trust that the jit will do the right thing.

The JDK measured on intel x86_64 is:

java version “1.6.0_24”
OpenJDK Runtime Environment (IcedTea6 1.11.1) (6b24-1.11.1-4ubuntu3)
OpenJDK 64-Bit Server VM (build 20.0-b12, mixed mode)

The JDK measured on ARM is:

java version “1.6.0_24”
OpenJDK Runtime Environment (IcedTea6 1.11.1) (6b24-1.11.1-4ubuntu3)
OpenJDK Zero VM (build 20.0-b12, mixed mode)

gcc on ARM is version 4.6.3 (Ubuntu/Linaro 4.6.3-1ubuntu5)

gcc  on Intel x86_64 is version 4.6.3 (Ubuntu/Linaro 4.6.3-1ubuntu5)

The Intel x86_64 machine is an i5 with 16 gigs of memory running Ubuntu 12.04. The ARM machine is an OMAP4 TI PandaBoard ES with 1 gig of memory.

Results 

In the spreadsheet includes calculations for standard deviation, 95% confidence interval using Student’s T 9 degrees of freedom and last the ratios of C to Java performance on each platform.

Given the large difference in the ratios observed between ARM and Intel, I believe the evidence from this benchmark supports the assertion that the OpenJDK on ARM needs work. For each of the tests in Intel the ratio between C and Java performance is reasonably close to 1:1 or the same performance, yet on ARM the ratios between C and Java for the same code yields multiples ranging from 3.6x to 8.9x.

Advertisements
Comments
  1. I faced the similar problem in an embedded project, the Java Runtime did not perform as expected, seems like the versions available were not optimized for all ARMs families.

    We re implemented the web services in C and it works like a charm.

  2. Romain Perier says:

    The C version on the intel i5 is sometimes 60% faster than the java version and sometimes it’s 30% slower… and for you it’s “reasonably close to 1:1 or the same performance” ? seriously ? :O

    • tgallfoo says:

      Hi Romain,

      I think you’re missing the point of the exercise. It’s to get a rough idea of the intel jit verses the ARM jit. (And yes there is an ARM jit in jdk6, it’s not zeroshark) So results with the intel java jit yielding in the range of + or – ~1x as compared to C, compared to the ARM jit yielding – ~3x to ~9x slower than C is pretty clear.

      This exercise isn’t down to comparing instruction streams generated, cycles used to generate those streams nor did I even go so far as to validate that the implementation of the benchmarked algorithms aren’t being being “defeated” by the compiler such that what is calculations in code is turned into precomputing numbers.

      For a rough swag I do think it is reasonable.

  3. tgallfoo says:

    Building the latest openjdk7, things don’t appear to be much improved on arm. This is the thumb2 jit not the shark jit. The shark jit fails to build. I fixed a few of the compile time errors but it was getting to be late so I’ve left that for another day.

    tgall@proteus:~/java/icedtea/icedtea-2.1.1/openjdk.build$ bin/java -classpath ./scimark2lib.jar jnt.scimark2.commandline

    SciMark 2.0a

    Composite Score: 35.1878531800801
    FFT (1024): 24.171847997229058
    SOR (100×100): 74.2736274231726
    Monte Carlo : 9.123010397936557
    Sparse matmult (N=1000, nz=5000): 33.355050880430554
    LU (100×100): 35.0157292016317

  4. tgallfoo says:

    Here’s a link to a comparison that Bob Vandette at Oracle did between Java SE Embedded and dalvik in Android 2.2 in 2010. While dated the Tegra 2 SciMark2 numbers in particular got my attention as that would directly compare to what I’ve measured with the OpenJDK.
    https://blogs.oracle.com/javaseembedded/entry/how_does_android_22s_performance_stack_up_against_java_se_embedded

    It does make one wonder how much better the Jelly Bean’s Dalvik java implementation might be since the days of Froyo.

  5. Trevor Robinson says:

    You might be interested in this 2-part blog post, which compares Oracle’s EJRE client VM for ARM, the new server VM, and various OpenJDK VMs:

    Comparing JVMs on ARM/Linux
    https://blogs.oracle.com/jtc/entry/comparing_jvms_on_arm_linux
    https://blogs.oracle.com/jtc/entry/part_deux_comparing_jvms_on

    Clearly, OpenJDK does not yet represent the state of the art of Java on ARM, either in terms of stability or performance.

    My own tests with the Oracle EJRE server VM on ARMv7 (e.g. Pandaboard ES) have shown an ARM vs x86 performance ratio that is roughly the same between C and Java. My Java focus has been on SPECjvm2008, but this SciMark approach of comparing C vs Java directly is very interesting. I’ll have to give it a try when I get a chance.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s