Archive for the ‘Uncategorized’ Category

There is an interesting article I ran across today. Entitled, “Why would anybody buy an Apple Watch?” the article,  asks an interesting question through the lens of history. In 2007, many predicted that the iPhone would fail and had plenty of data to back up their stories. All of these people were right. Based on the data available at the time, it should have been a complete utter failure. None of this data took into account the human condition. The experience of being exposed to a mobile device with converged functionality and a multi-touch display. People liked it and smartphones across the board evolved into a new way. How many years did it take Apple to get to the point?

Next week Linaro Connect begins. Many experts within the ARM ecosystem will assemble in Burlingame California to interact and set plans for the next 6 months of engineering activity. Our collective job is not to just predict the future, it is to implement it.

At the heart of Open Source development is the mantra of release early, release often. Apple does not do this. They work and work and work and work some more and eventually release something. Open Source on the other hand iterates quickly. We strive to hit the stage where the human condition can be exposed to a design and implementation as soon as possible and subject our work to the rigors of many eyes so that evolutionary dead ends don’t last long.

The longer you wait to release something, the larger your risk.

Member companies that join Linaro are at an advantage. Through their membership they live at the nexus point of good fast iterative upstream engineering united with technical leadership. Failure happens. The faster you can fail by exposing the code to experts, the more you lower your risk and the quicker, through iteration, get onto the right track. Our members in turn are first to receive the fruits of those labors for their future products.

At a website called kickstarter inventors bring their ideas and expose them to a marketplace where people evaluate and fund the promising inventions.

Linaro is like kickstarter but better for our member companies. The ideas flow in from our members and engineering teams, are discussed at Connect and even outside of Connect, great engineering happens and the promising becomes the next great thing. At kickstarter you don’t get to influence the design, in Linaro a member company does.

See you at Connect. It’s going to be a great week.

Back to Gentoo

Posted: July 12, 2014 in Uncategorized

Back in 2003 I became a gentoo developer. I had been using gentoo prior to that as my Linux distro since it had good amd64 hardware support pretty much out of the gate. I had pieced together an amd64 box and at the time I thought trying out a new Linux distro was a good idea.

Then, I worked really hard on getting ppc64 up and running. At that time, while you could run 64 bit kernels on Power and ppc64 hardware, the user space was pretty much all 32 bit.

Gentoo today in 2014 is still in my opinion a good distro. There are essentially two modes of operation where you either build a package at the time you install it, or you can install from binaries via

As an open source developer I treasure the ability to easily install and test anything from source. Further I very much enjoy the ability to change compilation options for fiddling -O3, -mtune etc options to test out new compilers and see how performance improvements in codegen is coming along. I find it a much better environment than Open Embedded.

For me, I’ve been adding arm64 support to gentoo and this will be my primary focus in my “copious spare time.”

Both the Samsung Gear Live and LG G Android Wear watches are first generation hardware and software implementations.  I don’t have a copy of either. They are about the cost of a dev board so in the grand scheme for a developer it’s not necessarily hard to justify the cost to leap in and get involved.

From the WSJ review by Joanna Stern it feels like as an industry we best roll up our sleeves and get to work optimizing:

Performance wise, the Samsung edged out the LG, which tended to stutter and lag. And for their bulk, both watches’ battery lives should be better. They had to be charged at least once a day in proprietary charging cradles.

Really when you think about it, this is far more than just a wearable problem, we’ve got to evolve mobile devices so a daily charge cycle isn’t the norm.


Android64 on ARM’s Juno

Posted: July 2, 2014 in Uncategorized

I’m very pleased to point to the announcement of the initial Android64 release by Linaro. for ARM “Juno” hardware.

The Linaro Android team has been working very hard on this for some time and a very big congratulations is due to them.

It speaks volumes about what a team of companies who work together can achieve. Linaro is a very special player in the ARM ecosystem and I’m very pleased to be a part of it.

What other fun things might be running on Juno? 😀 Stay tuned.

The last session has pasted. As I write this, it’s sort of situation out of the twilight zone. I’ve managed to break my glasses. I’m fairly near sited but my vision isn’t good enough for my screen to be in focus at an average distance.

The last day of Connect we had two sessions. Friday is a tough day to run a session both on account of  people being tired. Numbers suffer and it seems like we are all subject a -20 IQ modifier to the technical discussion at times.

Friday Sessions

The ION session covered the current work in progress with John Stultz from the Android team and Sumit Semwal tag team presenting.  Since Plumbers there’s been a good amount of activity.  Colin Cross updated this code a fair amount as it was reviewed. He created a number of tests which John Stultz ported outside of Android, a dummy driver has been put together for testing on non ION enabled graphics stacks and the 115+ patch set was pushed up into staging. As of right now these patches build and run on ARM, x86_64 and ARM64. There’s more to do, the team is working to get the tests running in LAVA. There are a number of design issues yet to be worked out on the dma-buf side of things. The needs to be a constraint aware dma-buf allocation helper functions. Dma-buf needs to try and reuse some of the ION code so they are both reusing the same heaps. Then there needs to be a set of functions within dma-buf that will examine heap flags and allocate memory from the correct ION heap. It all boils down to having a more common infrastructure between dma-buf and ION such that the ION and dma-buf interfaces will rest nicely on what is there instead of being two separate and divergent things.

Benjamin Gaignard presented on Wayland / Weston. He reviewed the current status of the future replacement of X,  it’s status on ARM and how people can use it today. He covered current efforts to address some of the design failings of Wayland/Weston that assume all systems have graphics architectures like Intel. The use of hardware compositors over GPUs on ARM shows a disjoint view of the world as compared to intel that just assumes everyone has a GPU and will want to use things like EGLImage. This is a common theme which we must introduce time and time again to various developer communities who have limited ARM experience. At this point the focus Benjamin has been more to try and introduce into Wayland/Weston the ability to take advantage of dma-buf to promote the sharing of buffers over copying. It’s a slow go especially without the user space dma-buf helps which was from the session yesterday. Wayland/Weston is viable for use on ARM. It’s not perfect and we anticipate more work in this space first with dma-buf and then to take advantage of compositing hardware often found on ARM SoCs.

Summary for the week

Media & Lib Team

I’m pleased that the media team was able to get a list formulated of the next media libraries to port and optimize for AARCH64.  We synced with the ARM team which is also working in the area of optimization. This is vital so that we don’t replicated efforts accidentally.

Display Team

Bibhuti and I were able to sit down and discuss the hwcomposer project. We’ve set the milestones and we’ll set the schedule. I think it’s more than fair to say we’ll be showing code at the next Connect. The next step for Mali driver support on the graphics LSK branch includes further boards depending on their kernel status as well as some discussion about the potential to try and formally upstream the Mali kernel driver even in the face of likely community opposition.  We had great discussions with the LHG and no doubts we’ll be working together to support LHG like we do with other groups.


As discussed above the UMM team is heads down on creating their initial PoC for connecting the heap allocators to provide map-time post attach allocation. This is code in progress and a very important step in knitting the ION and dma-buf worlds closer together.


This was more of a quiet connect for GPGPU since projects are mid stream. GPGPU is more is a “sprint” like mode than a “connect” like mode. We did release the GPGPU whitepaper to the Friends of OCTO.

Thursday featured the UMM user space allocator helper discussion and the GPGPU status talk.

The UMM User Space Allocators discussion was given by Sumit Semwal and Benjamin Gaignard. The problem involves the need from user space to allocate and work with memory for sharing between devices. Consider a video pipeline or a web camera that is rendering to the screen. This work will help achieve a zero copy design without user space having to know hardware details such as memory ranges, and other device constraints.

Gil Pitney and I gave the GPGPU talk which covered the current efforts involving the GPGPU subteam. Gil is working on Shamrock which is the old Clover project evolved. He’s upgraded it to use current top of tree llvm and MCJIT for code gen. There’s still testing to do but these are excellent steps forward as getting off the old JIT was important. Shamrock provides a CPU only OpenCL implementation which is great for those that don’t want to implement their own drivers but still want to provide at least the basic functionality. In addition there will be via Shamrock a driver for TI DSP hardware. This is also quite a great step forward. Via this route, everyone can collaborate on the open source portion which takes care of the base library and this leave just the driver/codegen to be something that needs to be created by the board creator.

The other part of the talk was about accelerating SQLite with OpenCL. There was a past project that accomplished something similar but with CUDA. I’m working on this and it’s quite the enjoyable project. I’m just implementing OpenCL kernels so there is a ways to go.  It will serve as a good reference for what can be accomplished on ARM SoC systems which have OpenCL drivers. We typically don’t have as many shader units as modern desktop PCIe solutions in the intel universe. I do find it encouraging that the SQLite design is quite flexible and fits well with this kind of experiment.

I did also attend the Ne10 and Chromium optimizations for Cocos2D/Cocos2D-HTML5. This are ARM projects. Ne10 is essentially a library that sits above Neon intrinsics to give easier access to that functionality. Cocos is a popular cross platform engine that is particularly popular in the Android world for 2D UIs and game creation. There was some nice optimization work in and around various drawing primitives done by the ARM team for Chromium that end up helping Cocos.

Thursday included the first bit of quiet time I had all week to actually write some code. It didn’t last long but it did feel good as I’m in a very fun portion of implementing the optimized SQLite with OpenCL and it was hard to set that work aside while Connect is on.

Our first Graphics Working Group session of this connect was today. We reviewed the ongoing efforts to optimize various media libraries for AARCH64. James Yu and Ragesh Radhakrishnan talked about their work involving libpng, libjpeg-turbo, libvpx and pixman. With libjpeg-turbo it features a refresh of the android port. This allows libjpeg-turbo to be a drop in replacement for libjpeg which is currently part of AOSP. The performance difference is clear. Libjpeg-turbo contains quite a number of optimization work over the past few years while libjpeg has languished for a variety of unfortunately political reasons. The reasons?  libjpeg-turbo is better than twice as fast as libjpeg.

James talked his libvpx work which of cource includes VP8 and VP9 support. He’s essentially replaced the past hand coded assembler with a version that used neon intrinsics. Comparing hand coded vs neon intrinsics on ARMv7, there’s a bit of a degrade in performance that needs to be looked into. It’s nearly 10%. In prior efforts with libpng, a similar switch yielded statistically no change in performance.

On the good news side, progress continues on porting past optimized libraries so that they are also optimized for AARCH64 in preparation for the arrival of real hardware.

The other important goal of the session was to gather input on the next set of libraries that should be optimized for AARCH64. We’ve a limited amount of time. We’ve a limited amount of hands. We won’t have everything optimized by the time real hardware shows up. We want to optimize first what is viewed as most important.

Unfortunately we didn’t get a lot of direction. There seems to be a feeling that codecs and such that are in android should be considered first priority. Tho there was some rightful dissent on that concept from those interested in traditional linux.

Besides leaning towards android the other concept that seemed to be there was we should give priority for video over audio. This makes sense as while default fallbacks for audio use more CPU, they generally won’t completely consume a CPU with loss of quality as compared to a video codec that will suffer in framework and quality.

libav was discussed as being potentially important to 3rd party application developers. There doesn’t appear to be solid usage numbers that have been identified as of yet as far as how much libav is in use by 3rd party application developers on Android. I would presume there must be some as libav is a good choice for “odd” formats.

In the afternoon we reviewed our internal list of codecs and media libs for porting and optimizing for AARCH64 and with some other attendees from connect we assembled our list for the “next” libs to put time and attention into. We’ll be reviewing this with others first but generally speaking it looks something approx like this:

  1. mpeg4
  2. webrtc (audio portion)
  3. aac  (from AOSP)
  4. flac
  5. vorbis
  6. mp3
  7. h265
  8. speex

libav is a discussion point yet so that might very well find it’s way onto the list.

LCA14 GWG Day 2

Posted: March 4, 2014 in Uncategorized

Tuesday was yet another day with no sessions hosted by the Graphics Working Group. Wednesday, Thursday Friday are our big days. So for us it was a day of our own meetings and attending working group sessions.

I went to the 64 bit toolchain status meeting. It was good to hear that LLVM’s MCJIT is at least able to pass it’s tests on AARCH64. This is important for future versions of OpenCL for instance on AARCH64. I do wish there was a bit more focus on llvm for AARCH64 tho. With significant projects like Chromium seriously considering a move to llvm it seems wise. 

The next session I went to was the one on SQLite optimization. SQLite is database at the heart of and in very common use across Android, iOS, Linux and so on. It’s an important foundation block so it’s optimization can be valuable. Using the cortex strings work and applying that to SQLite on Android yielded the Linaro Android team some significant results. (20-35%)  The Android use case is an interesting one in that the databases tend to be smaller so while speed is one factor to optimize for, space and battery usage are also important. This work came at it only from the speed angle. It’s still a work in progress so will keep in tune.

Related to that as part of our GPGPU efforts I’m in the midst of optimizing SQLite with OpenCL. I’m at the stage of writing the OpenCL kernels so it’s too early to start talking about microbenchmark results. I’m also aiming at a slightly different use case. I’m looking at more sizable databases and more common operations that would be found with use as part of a LAMP or LAMP-like stack. I’ll be talking about this effort as part of the GPGPU session on Thursday.

During the afternoon hacking time, I was in solid meetings all day. This is the life of a tech lead sometimes. While it would be great to be heads down in vi this week, doing so would look out in many wonderful opportunities to talk to many people.

On Tuesday about 1/2 of GWG and the LHG got together and were talking about a variety of topics. Mostly at an architectural level and some of the factors that go into making good technology choices when it comes to video playback and so on. As LHG gets off the ground they have a number of interesting (and fun!) challenges ahead of them.

The Media and Libs subteam got together and we held a project review. This is a perfect activity for connect. It is quite good to perform detailed reviews from time to time. The good news is that the libvpx (VP8 &VP9), libjpeg-turbo and pixman optimization efforts for AARCH64 are coming along quite well with patches flowing upstream. Today (Wednesday) we’ll be meeting to set the next list of libs that we will optimize for AARCH64.

Monday was the first day of Linaro Connect here in Macau. It was also a somewhat lighter day for the Graphics Working Group as our team wasn’t hosting any sessions today. 

I attended the ARM VM standards meeting in the morning. It’s a proposal to put together a standard or whitepaper to give guidance on VMs on ARM systems. What drew me in was the seeming possibility the document would come down with a position on graphics drivers within an ARM VM. Through the course of the meeting it became clear that individual VM implementations such as KVM or XEN would not be mandated to as far as drivers were concerned from a graphics perspective. So alls well.

I also attended the RDK overview. That was an informational meeting about the RDK that Linaro had help move over to use OE as a basis. They did a great job to spread out the RDK to use the layer system within OE effectively. 70% of the packages in the resulting new OE based RDK are OE infrastructure. It was good to hear and see the success.

The afternoon was the first initial day of hacking. We had an short team meeting as I need to go and present to the Linaro TSC with the engineering status for the Graphics Working Group. 

Our team goals for the week include:

1 – Display Subteam

set hwcomposer milestones and dates

Further LSK board support discussions with members

Meet with LHG, come up to speed on LHG directions

2 – Media & Libs Subteam

Set direction for next 6-12 months on which libraries to optimize for AARCH64 next based on member and attendee input

Sync with ARM team also doing work with in this space

what tools is the ARM team using in the course of their work

3 – UMM Subteam

Interactive design and feedback on the user space allocator helper and vetting with the Wayland/Weston zero copy real world

gralloc discussion

4 – GPGPU Subteam

Heads down working out bugs with Shamrock, discuss further feature implementation 

GPGPU whitepaper release

Going crazy with hardware

Posted: February 3, 2014 in Uncategorized

So with the release of the AMD A10-7850 Kaveri, I thought I would jump in and get one. My current “server” is a mere System 76  i5 from about 3 years ago so it’s starting to get a bit old…  The larger reason tho is I wanted to get an HSA compliant system as well as something I could drop a R9 R290 into and use for OpenCL for a whole heck of a lot of GPU goodness. That’s two sets of GPUs,  one on the processor die and one on the graphics card. Instant environment for some benchmarks 🙂 to compare OpenCL on card vs OpenCL where there are shared page tables with the CPU vs eventually HSA.

So what did I pick up?

Well first the processor an AMD A10-7850K.

The motherboard it’ll go into is an Asus ATX A88X-Pro which is able to overclick memory to 2400Mhz

The memory I picked up is two sticks of Corsair Dominator DDR3 PC3 19200.

For cooling I did get a Corsair Hydro H100i. Be interesting to see how well that works. I very much enjoy the ARM world were we don’t have to think about this class of cooling. Having to drop $50-$100 for cooling is less than fun.

Then for a case and I honestly thought about letting it just set on the desk a 550D mid tower also by Corsair.

All in all should be a pretty sweet system. I have a couple of SSD drives I’ll be dropping into it so it should rock out pretty well. It’s not a Mac Pro… which *sigh* I would really like but for Linux box, it should do well.

After it’s assembled I’ll post again.