CMOS and Rolling Shutter Artifacts / Double Precision Compute

While I finish up a more interesting CS-style post (SIMD in financial calculations), just a couple of interesting news items I thought worth sharing.

In a prior entry on optical versus electronic image stabilization I noted rolling shutter artifacts –an image distortion where moving subjects or the entire frame during motion can be skewed and distorted — and their negative impact on electronic stabilization.

During video capture, especially under less than ideal conditions, it is a significant cause of distortion that often goes unnoticed until you stabilize the frames.

Sony announced a CMOS sensor with a data buffering layer that allows it to have something approximating a global shutter (Canon previously announced something similar). While their press release focuses on moving subjects in stabilized type situations, the same benefit dramatically reduces the rolling shutter skew during motion of video capture. It also offers some high speed capture options which is enticing.

Sony sensors are used by almost every mobile device now, so it’ll likely see very rapid adoption across many vendors.

Another bit of compelling news for the week is the upcoming release of the GP100, Pascal-based workstation GPU/compute device by nvidia (they previously released the P100 server based devices).

Double-precision calculations of upwards of 5 TeraFLOPS (for comparison a 72-core/AVX-512 Knights Landing Xeon Phi 7290 offers about 3.4 TeraFLOPS of DP performance, while any traditional processor will be some two magnitudes lower even when leveraging full SIMD such as AVX2). Traditionally these workstation cards massively compromised double precision calculations, so this update brings it into much greater utility for a wide array of uses (notably the financial and scientific world where the significant digits of single precision made it unworkable).


I’m okay.

Months of incredible levels of stress, coupled with weeks of little sleep and the recent sudden passing of my brother Darrell, yielded some deeply irrational, illogical thinking that I of course regret, but can’t erase from my timeline. For any sort of intellectual exercise, like software development, no sleep+stress = a recursive loop of ineffectiveness that generates more stress and even less sleep.

I apologize to those I caused distress, from family to remote individuals who I’ve never met but who cared. This is my first opportunity to post anything on this, and the lack of communications wasn’t an intentional act of dramatic exercise.

In any case, a quick set of thanks-

  • thanks to the many people who cared
  • thanks to the Halton Regional Police (the many fantastic officers, and one profoundly talented K9), who literally saved my life
  • thanks to the remarkable staff of Joseph Brant Hospital
  • thanks to the patients/community of 1 West @ JBH, who opened my eyes to the battles that so many are fighting, and offered friendship and support during a rough period

Not to make light of this ridiculous and very serious situation, but this event yielded a few life happenings that I never expected to have in my biography-

  • Chased by a number of officers through a frigid river
  • Taken down/bitten by a police K9 (who was a model of K9 professionalism, and is a beautiful, extraordinarily well-trained canine officer)
  • Committed involuntarily under the mental health act (e.g. escorted by a guard, locked ward, bed checks, restricted to a hospital gown for several days)

It was an interesting experience that I don’t plan on repeating. The week+ in the hospital gave me the quietest, most reflective period that I’ve had since…forever.

It was the first time I’ve ever truly, successfully meditated. The real, lotus-pose, mind-at-ease meditation that went on for tens of minutes.

I learned an enormous amount about myself as a result (and got various health tests that were long overdue, being conveniently located and all), and came out of it a much better person. Finally dealt with some pretty severe social anxiety that has always been a problem for me.

And for those concerned, various other things got resolved to a good outcome at the same time. The stress+overwhelming tiredness clouded my eyes to options that were available, and everything else is in a much better place.


Gallus / Lessons In Developing For Android

I recently unpublished Gallus — the first and only gyroscope stabilizing hyperlapse app on Android, capturing and stabilizing 4K/60FPS video in real time with overlaid instrumentation like route, speed, distance, etc — from the Google Play Store. It had somewhere in the range of 50,000 installs at the time of unpublishing, with an average rating just under 4.

It was an enjoyable project to work on primarily because it was extremely challenging (75% of the challenge being dealing with deficiencies of Android devices), and in the wake of the change a previous negotiation to sell the technology/source code and possible involvement was revived. That old discussion had been put on indefinite hold in an advanced stage when a buyer was acquired by a larger company, which quite reasonably led them to regroup to analyze how their technologies overlapped and what their ongoing strategy would be.

I put far too many eggs into that deal going smoothly, leading to significant personal costs. It taught me a lesson about contingencies. I’m pleased it is coming to fruition now.

I never had revenue plans with the free app by itself (no ads, no data collection, no coupling with anything else…just pure functionality), and ultimately the technology sell was my goal.

Nonetheless, the experience was illuminating, so I’m taking a moment to document my observations for future me to search up later-

  • Organic search is useless as a means of getting installs. Use a PR-style approach. 50,000 is two magnitudes below my expectation.
  • No matter how simple and intuitive your interface is, people will misunderstand it. When a product is widely used we naturally think “I must be doing something wrong” with the worst interfaces, putting in the time and effort to learn its idioms and standards. When a product isn’t widely used we instead think “This product is doing something wrong.” and blame the product. I abhor the blame the user type mentality, but at the same time it was eye opening how little time people would spend with the most rudimentary of investigations before giving up.
  • The higher expectations are about your product — the more a user wants to leverage it and sees it adding value to their life — the more vigorous and motivated their anger/disappointment is if for some reason they can’t. Over the duration on the market some of the feedback I got because an app wouldn’t work on someone’s strange brand device that I’d never heard of, with a chipset that I’d never seen before, was extraordinary. A sort of “you owe me this working perfectly on my device“.
  • The one-star-but-will-five-star-if-you… users are terrible people. There’s simply no other way to put it. Early on I played along, but quickly learned that they’ll be back with a one star in the future with a new demand.
  • I have spoken about the exaggeration of Android fragmentation on here many times before: For most apps it is a complete non-issue. Most apps can reasonably target 4.4 and above with little market impact. Cutting out obsolete devices will often just save you from negative feedback later — it is Pyrrhic to reach back and support people’s abandoned, obsolete devices — but if you do try to reach back for 100% coverage it’s made easier with the compatibility support library.
    At the same time, though, if you touch the camera pipeline it is nothing but a world of hurt. The number of defects, non-conformance, broken codecs, and fragile systems that I encountered on devices left me shell-shocked. Wrong pixel formats, codecs that die in certain combinations with other codecs, the complete and the utter mystery of what a given device can handle (resolution, bitrate, i-frame interval, frame rate, etc).For a free little app I certainly couldn’t have a suite of test devices (for a given Samsung Galaxy S device there are often dozens of variations, each with unique quirks), and could only base my work on the six or so devices I do have, so it ends up being a game of “if someone complains just block the device”, but then I’d notice months earlier that someone on a slight variation of the device gave a five star review and reported great results.

I enjoyed the project, and finally it looks like it will pay off financially, but the experience dramatically changed my future approach to Android development.

Programming An Itch Away

My eldest son’s rig consists of a gaming PC as his main device, with one of my old laptops on the side for ancillary use.

The audio output on his motherboard of the desktop failed mysteriously (with no driver or BIOS fix resolving the issue, the hardware of that subsystem seemingly failing), and swapping out the motherboard isn’t something I’m keen to do given the activation issues, despite having one available.

With the holiday season, getting a wireless headset or USB soundcard wouldn’t be a speedy venture. And anyways it was an opportunity for a fun software distraction.

Visual Studio powered up, a C++ solution pursued, and an hour later a solution was built. Capturing low-latency 44100 32-bit floating point stereo master mix audio on the desktop PC, resampling it to 48000 16-bit integer audio, compressing it with Opus (the resampling is courtesy of Opus demanding a multiple of 6000 sample rate, while Opus itself is purely because both machines are on wifi, and often have large network traffic, so keeping packets tiny improved the chances of a speedy delivery), UDP sending it to an argument-driven target, where the process reverses and yields an extraordinarily high fidelity reproduction of the source audio, with UDP packets tiny enough that they’re easily delivered by the network even under high saturation situations.

The audio delay is less than 10ms, which is imperceptible, and is a magnitude or more below the delay of many Bluetooth headsets.

So he games and the audio from his desktop plays on his laptop, which he has a set of high quality wired headphones connected to. It works well for now, until a long term solution is implemented (probably the motherboard swap). I could jimmy up an Android solution in minutes, having already done some Opus codec/UDP transport projects.

Those are some of the most rewarding projects. Even if I undertook the facile imagination of fantasy billing out those hours and declaring that the opportunity cost lost, those fun projects expose us to technologies and avenues, gaining educational value. Doing strange and interesting side projects is the vehicle of my most interesting ideas (as is misinterpreting or making assumptions about descriptions of products, then discovering that my assumptions or guesses are vastly off the mark, but have merit as a novel invention)

It’s a pretty trivial need and solution, but having an itch, slamming out a solution, and having a bulk of code simply work with minimal issue on first run is a glorious feeling. In this case the single defect among the sender and receiver, despite the fact that these were APIs that I’d never used before and it involved a considerable amount of bit mangling and buffer management code — the traditional shoot-yourself-in-the-foot quagmires — was a transposition of loop variables in nested loops.

Such a great feeling of satisfaction doing something like that. It is quite a nice change from the large scale projects with slower rewards that we generally ply, where rewards come slowly, if at all, diluted in the effluence of time.

Of course doing a microproject like this yields the sort of tab hilarity that we often endure when we’re dealing with technologies or APIs we don’t normally use.

And for the curious, there are some products that do what I described (send the mixed master audio from a PC to other devices), but each that we tested yielded quarter second or more latency, even over a direct twisted-pair, which just made it useless for the purpose. And even if a suitable solution existed, I really just wanted to build something, so I would have unfairly discarded it regardless.


3D XPoint is Pretty Cool

Five years ago I wrote a post about SSDs/flash storage being a necessary ingredient for most modern build outs, and opponents were wasting time and efforts by not adopting them in their stack. While it is profoundly obvious now, at the time there was a surprising amount of resistance from many in the industry who were accustomed to their racks of spinning rust, RAID levels, and so on. People who had banked their profession on a lot of knowledge about optimizing against extremely slow storage systems (a considerable factor in the enthusiasm for NoSQL), so FUD ruled the day.

Racks of spinning rust still have a crucial role in our infrastructure, often treated as almost nearline storage (stuff you seldom touch, and when you do the performance is so out of bounds of normal expectations). But many of our online systems are worlds improved with latency in microseconds instead of milliseconds courtesy of flash. It changed the entire industry.

In a related piece I noted that “Optimizing against slow seek times is an activity that is quickly going to be a negative return activity.” This turned out to be starkly true, and many efforts that were undertaken to engineer around glacially slow magnetic and EBS IOPS ended up being worse than useless.

We’re coming upon a similar change again, and it’s something that every developer / architect should be considering because it’s about to be real in a very big way.

3D XPoint, co-developed by Micron and Intel (the Intel one has some great infographics and explanatory videos), is a close to RAM-speed, flash-density, non-volatile storage/memory technology (with significantly higher write endurance than flash, though marketing claims vary from 3x to as high as 1000x), and it’s just about to start hitting the market. Initially it’s going to be seen in very high performance, non-volatile caches atop slower storage: the 2TB TLC NVMe with 32GB of 3d xpoint non-volatile cache (better devices currently have SLC flash serving the same purpose), offering extraordinary performance, both in throughput and IOPS / latency, while still offering large capacities.

Over a slightly longer period it will be seen in DRAM-style, byte-accessible form (circumventing the overhead of even NVMe). Not as literally the main memory, which still outclasses it in pure performance, but as an engineered storage option where our databases and solutions directly and knowingly leverage it in the technology stack.

2017 will be interesting.