Thursday featured the UMM user space allocator helper discussion and the GPGPU status talk.
The UMM User Space Allocators discussion was given by Sumit Semwal and Benjamin Gaignard. The problem involves the need from user space to allocate and work with memory for sharing between devices. Consider a video pipeline or a web camera that is rendering to the screen. This work will help achieve a zero copy design without user space having to know hardware details such as memory ranges, and other device constraints.
Gil Pitney and I gave the GPGPU talk which covered the current efforts involving the GPGPU subteam. Gil is working on Shamrock which is the old Clover project evolved. He’s upgraded it to use current top of tree llvm and MCJIT for code gen. There’s still testing to do but these are excellent steps forward as getting off the old JIT was important. Shamrock provides a CPU only OpenCL implementation which is great for those that don’t want to implement their own drivers but still want to provide at least the basic functionality. In addition there will be via Shamrock a driver for TI DSP hardware. This is also quite a great step forward. Via this route, everyone can collaborate on the open source portion which takes care of the base library and this leave just the driver/codegen to be something that needs to be created by the board creator.
The other part of the talk was about accelerating SQLite with OpenCL. There was a past project that accomplished something similar but with CUDA. I’m working on this and it’s quite the enjoyable project. I’m just implementing OpenCL kernels so there is a ways to go. It will serve as a good reference for what can be accomplished on ARM SoC systems which have OpenCL drivers. We typically don’t have as many shader units as modern desktop PCIe solutions in the intel universe. I do find it encouraging that the SQLite design is quite flexible and fits well with this kind of experiment.
I did also attend the Ne10 and Chromium optimizations for Cocos2D/Cocos2D-HTML5. This are ARM projects. Ne10 is essentially a library that sits above Neon intrinsics to give easier access to that functionality. Cocos is a popular cross platform engine that is particularly popular in the Android world for 2D UIs and game creation. There was some nice optimization work in and around various drawing primitives done by the ARM team for Chromium that end up helping Cocos.
Thursday included the first bit of quiet time I had all week to actually write some code. It didn’t last long but it did feel good as I’m in a very fun portion of implementing the optimized SQLite with OpenCL and it was hard to set that work aside while Connect is on.