skia_bench image decode on Android ICS, now ~100% faster.

Posted: January 11, 2012 in android, linaro, open_source

skia_bench on android measures a number of things. One portion of the benchmark measures libjpeg performance, specifically for 565 and 8888 image types that are specific to android.

Android (including the latest ice cream sandwich release) uses the old and quite crusty libjpeg library. This library while functional is missing a great deal of optimization. The libjpeg-turbo project (http://libjpeg-turbo.org) is a souped up and more importantly drop in replacement API compatible version of libjpeg. It is compatible with both versions 6.x and 8 that are in wide use across many a distro. However because libjpeg doesn’t have simd (Single instruction, multiple data) optimizations using NEON on ARM for instance, distributions have been pitching libjpeg for libjpeg-turbo.

At Linaro our Android Ice Cream Sandwich for instance replaces libjpeg with libjpeg-turbo. Likewise we recently worked with Ubuntu and as a result Precise (version 12.04) now includes libjpeg-turbo.

So what kind of performance bump can one see by switching from libjpeg to libjpeg-turbo?

https://wiki.linaro.org/TomGall/LibJpeg8  includes lots of raw numbers on intel and ARM machines. In short performance improvements measured by tjbench  are on the order of 200% to 300%.

libjpeg-turbo doesn’t include support for Android. So earlier this year, we ported libjpeg-turbo so it would include Android dependency on the aforementioned  nonstandard 8888 and 565 formats. This was a good first step however, no work had been done to optimize for 565 or 8888. Thus the performance for 565 and 8888 was about the same for libjpeg and libjpeg-turbo.

I’d like to see Android switch to use libjpeg-turbo so this week it was time to do some optimization that would give credence to that desire. Optimizations that can be measure with skia_bench would be the way to go.

Hack. Hack. Hack.

The results? A comparison of android’s libjpeg,  libjpeg-turbo without 565 or 8888 optimizations and libjpeg-turbo with 565 and 8888 optimizations can be found at : https://wiki.linaro.org/TomGall/SkiaBenchNumbers . Smaller numbers are better.

Half the time not bad or put another way, the newly improved libjpeg-turbo is 2 times faster than the old libjpeg! I’m sure you’d like that improvement on your Android phone or tablet!  I would!

Now we get to another part of the story that is important also reflects another aspect of what makes Linaro an important leader in ARM for Linux and Android.  You see the optimization code for 565 and 8888 already sort of existed. It was sitting out in a git archive more or less gathering dust. The essential step of putting it together with the already optimized libjpeg-turbo hadn’t been done. The code also hadn’t been pushing upstream to the communities that would most benefit. In Linaro we want Linux and Android on ARM to shine.

While it’s all put together and it works, that’s great. Now comes the most important steps, getting google’s Android engineers (and perhaps cyanogen mod too!) to accept it so that all might benefit. That’s the bar for success we aim for at Linaro and we will succeed.

Code : git://git.linaro.org/people/tomgall/libjpeg-turbo/libjpeg-turbo.git

Pull from the android branch.   git checkout -b android origin/android

Have fun!

Advertisements
Comments
  1. pip says:

    “In short performance improvements [by adding NEON SIMD} measured by tjbench are on the order of 200% to 300%.”

    “In Linaro we want Linux and Android on ARM to shine.”

    great, its good to see all these NEON SIMD optimisations to all the app’s being done so publicly thanks.

    now given that Apple also optimise all their iOS libraries for Neon and Altivec before that, to gain the extra speed benefit from even app’s that haven’t been SIMD optimised yet but do use lots of routines inside these SIMD optimised …

    will/does linaro also have blueprint for also assessing the generic ARM libraries glib etc for any and all potential NEON optimisations?

    as i recall the old PPC linux distro devs never bothered to do their OS libs SIMD optimisation work and so lost a lot of speed advantages to the apple folks and their app’s , i hope the greater ARM infrastructure ecosystem devs today don’t fall into that old PPC dev mindset of good enough so no/not much OS SIMD code here frame of mind…

    Oh and id like to see a lot more Linaro before/after timed speed benchmark results with SIMD optimisation code so as to gauge improvements over time like the x264/ffmpeg devs do on each routine they improve using their “checkasm bench” if Linaro can add something that to their test suite even better

  2. bruno says:

    Hi, how about skia_bench comparisons of libjpeg-turbo and the use of jpeg acceleration hardware found in many SoCs? how one can test skia_bench in both?

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s