On Specializing AKA Databases Are Easy Now

After hanging my shingle in the independent consulting business — trying to pay the bills while exploring entrepreneurial efforts[1] – one of my original specialties was high performance database systems.

I pitched my expertise primarily in large-scale high-demand database systems, providing solutions that would dramatically improve operational performance and relieve pain points. Just as a facet of my network of contacts this was usually in the financial industry.

It made for relatively easy work: Many groups don’t understand how to properly leverage their database products, their data needs, or their query profile, so I could often promise at least a 50x performance improvement in some particular pain point on existing hardware, and achieve that and more with ease.

I started backing away from the database world, however, because many database products have dramatically improved and reduced the ability to shoot one’s own feet, but more significantly hardware has exponentially improved.

From massive memory, caching the entirety of all but the most enormous databases, to flash storage (and now Optane storage) that makes the most inefficient of access patterns still effectively instant. Many have gone from a big RAID array that could barely manage 1000 IOPS, on a platform with maybe 32GB of RAM, to monster servers with many many millions of IOPS and sometimes TBs of very high speed memory. Even the bottom feeder lowest single instance AWS services are usually fairly adequately equipped.

It has gotten hard to build a poorly performance database system, at least at the scale that many firms operate at. Expert implementations can still yield significant speedups, but to many potential customers a 100ms query is effectively identical to a 1ms query, with low enough loading that hardware saturation isn’t a concern (or the hardware so inexpensive that a cluster of ten huge machines is fine despite being significant overkill if properly implemented). The most trivial of caching and even a modicum of basic knowledge (e.g. rudimentary indexes) is usually enough to achieve workable results.

Egregious inefficiency is operable. Which is a great thing in many cases, and is similar to our move to less efficient runtimes and platforms across the industry. There are still the few edge situations demanding extreme efficiency, but as with assembly programming it just isn’t enough to demand a focus for many.

The possible market for database optimization just isn’t as large or as lucrative. And when things are awry on that millions of IOPS system with 2TB of RAM and 96 cores, it’s usually so catastrophically, fundamentally broken that it isn’t really a fun pursuit, usually necessitating a significant rebuild.

For the past half year I’ve moved primarily to cryptocurrencies and their related technologies (e.g. public blockchain implementations and standards, as well as payment systems). Not chasing a “whale” (or having anything to do with the enormous bubble of some currencies), or hoping to ride a bubble to great riches (I own zero coins of any cryptocurrency, and see the current market as incredibly dangerous), but rather becoming completely knowledgeable of every emergent standard, and of the currently dominant platforms and their source code, and offering that knowledge to individuals and firms looking to integrate, to leverage, or to bet. It has been rewarding.

All the while I continue with the entrepreneurial efforts. My latest is with the iOS ARKit, which has been a fun experience. I should announce something on that soon.

Otherwise I continue with fun projects. I still have a contour mapping application on the go. It started as a real estate sales project, but that went so quickly to great success that it became just an interesting project to explore Kotlin, undertow, targeting iOS and Android and then learning about pressure propagation, etc. Then WebGL mapping, millisecond time synchronization (I use four current smartphones in concert), etc. I have a mostly complete posting on that, long promised, that I’ll finish soon.

[1] – I hit some financial roadblocks, leading to a crisis, when I put aside all revenue production that try to focus entirely on entrepreneurial efforts that seemed, for so long, on the absolute cusp of success. Lessons learned.

Summer Draws To A Close

I hope those of you in the Northern Hemisphere had a glorious summer. And for those in the Southern, I hope a great summer awaits.

Now that the summer break draws to a close, it’s time to get back on projects and roll out some deliverables. Two personal projects I’ve committed to delivering in the near future are a video app — for gradual, trickle monetization1 reasons — and a multi-device contour mapping/property mapping app leveraging reference barometers, GPS and GLONASS, where available. That one is just for fun. Along with various professional things (I’m available for your projects). And I’ll start spinning up content on here.

Otherwise it’s been a period without much to talk about on here.

One of the few things really interesting over the summer has been the adoption of Kotlin as a first-class language in Android Studio 3. Kotlin is a product of JetBrains, the creators of the excellent IntelliJ IDE (which Android Studio is based upon), and it’s a language I didn’t pay attention to previously: I veer away from tools and languages that require less common dependencies when handed off to other teams, and it can be a problem during technical due diligence. Now that Kotlin is more accessible for a major platform, I finally took the dive in learning and adopting it for those projects where the JVM or similes (e.g. Android) are a part of the solution.

And it’s actually a really compelling language. The sort of fluid, intuitive programming that is similar to Go programming for me. It dramatically reduces the enormous trove of boilerplate that Java often demands.

Ultimately programming languages are largely interchangeable. Almost anything can be implemented in just about any language. The number of lines will vary, the readability fluctuate, etc, but in the end you don’t have to ever change languages. But there is something almost indescribable about languages like Go, and now Kotlin, where the implementation is so fluid with your thought process that it simply makes implementing more complex solutions effortless. I am a big fan of Go but readily acknowledge that it has an enormous litany of deficiencies, but something about it just makes great solutions appear. Other languages have great features on paper, yet seemingly nothing of consequence ever seems to be created with it. Kotlin is unique in that not only does it bring that power, it also has a pretty compelling set of modern features.

It is hardly perfect, of course: Its construction was clearly bounded by the limits of the JVM (though you can target native code and several other platforms), so it doesn’t have the greenfield benefits of something like Rust, but it is an enormous improvement over Java when I have to work in that domain.

1 – A big change in focus of my efforts is that I’m going to focus far more on sustainable recurring income (both in the hired help and built apps for monetization variety), versus “shoot for the moon” type initiatives. Technology sales are unbelievably long, drawn out, and risky, with a process that is almost impossible unless you’re willing to be personally “acquired” in the transaction, committing to moving in the process (which I am not willing to do. Short term on sites are fine, but changing countries is not). Thousands of hours and in the end the roadblocks end up being the most trivial of things, all of it distracting from other income sources. Consulting efforts have their own gamut of problems, but some recurring revenue is far better than remote odds of an occasional large jackpot.

Updates: Pay Apps / Date-Changing Posts / Random Projects

Recently a couple of readers noticed that some posts seemed to be reposted with contemporary dates. The explanation might be broadly interesting, so here goes.

I host this site on Amazon’s AWS, as that’s where I’ve done a lot of professional work, I trust the platform, etc. It’s just a personal blog so I actually host it on spot instances — instances that are bid upon and can be terminated at any moment — and there was a dramatic spike late in the week on the pricing of c3 instances, exceeding my bid maximum. My instance was terminated with extreme prejudice. I still had the EBS volume, and could easily have replicated the data on the new instance for zero data loss (just a small period of unavailability), however I was just heading out so I just ramped up an AMI image that I’d previously saved, posted a couple of the lost posts from Google cache text, and let it be. Apologies.

Revisiting Gallus

Readers know I worked for a while on a speculative app called Gallus — a gyroscope-stabilized video solution with a long litany of additional features. Gallus came close to being sold as a complete technology twice, and was the source of an unending amount of stress.

Anyways, recently wanted a challenge of frame-v-frame image stabilization and achieved some fantastic results, motivated by my Galaxy S8 that features OIS (which it provides no developer accessible metrics upon), but given the short range of in-camera OIS it can yield an imperfect result. The idea with be a combination of EIS and OIS, and the result of that development benefits everything. I rolled it into Gallus to augment the existing gyroscope feature, coupling both for fantastic results (it gets rid of the odd gyro mistiming issue, but still has the benefit that it fully stabilizes with highly dynamic and complex scenes). Previously I pursued purely a big pop outcome — I only wanted a tech purchase, coming perilously close — but this time it’s much more sedate in my life and my hope is relaxed. Nonetheless it will return as a pay app, with a dramatically simplified and optimized API. I am considering restricting it only to devices I directly test on first hand. If there are zero or a dozen installs that’s fine, as it’s a much different approach and expectation.

Project with my Son

Another project approaching release is novelty app with my son, primarily to acclimate him to “team” working with git. Again expectations are amazingly low and it’s just for fun, but might make for the source of some content.

Small Optimizations

A couple of months ago I posted an entry titled Rust Faster Than C? Not So Fast. It was simple analysis of the k-nucleotide benchmark game, my interest piqued by Rust taking the lead over C.

After a brief period of profiling I proposed some very simple changes to the C entrant, or rather to the hash code that it leverages (klib) that would allow C to retake the lead. It was pursued purely in a lighthearted fashion, and certainly wasn’t a dismissal of Rust, but instead was just a curiosity given how simple that particular benchmark was (versus other benchmarks like the regex benchmark where Rust’s excellent regular expression library annihilates the Boost regex implementation, which itself eviscerates the stdlib regex implementation in most environments).

The changes were simple, and I take absolutely no credit for the combined work of others, but given that a number of people have queried why I didn’t publish it in a normal fashion, here it is.

Original Performance
3.9s clock time. 11.0s user time.

[on the same hardware Rust yields 3.6s clock time, 9.8s of user time, taking the lead]

Remove a single-line pre-check that while ostensibly for speed-ups, actually was a significant net negative.

Updated Performance
3.3s clock time. 9.55s user time.

Such a trivial change, removing what was assumed to be a performance improvement, yielded a 15% performance improvement. This particular change has zero positive negative impact so I have submitted a pull request.

Switch the flags from bit-packing into 32-bit integers to fixed-position byte storage

Updated Performance
2.9s clock time. 7.4s user time.

As mentioned in the prior post, the implementation used a significant amount of bit packing, necessitating significant bitshifting. By moving to a byte storage it remains incredibly efficient for a hash table, but significantly reduces high-usage overhead.

If the first pull request is accepted, C will take the lead again. Does it matter? Not at all. But it’s the fun of the spirit of the game.

Regardless, as I finally start contributing to public projects it was an opportunity to share the notion of profiling and the high impact of incredibly simple changes.

Sleepless Nights for Software Developers

A recent Ask HN raised the question “What do you use Machine Learning for?“, and the top answer, by ashark, is golden-

I use it as something to worry about not knowing how to use, and how that might make me unemployable in a few years, while also having no obvious need for it at all, and therefore no easy avenue towards learning it, especially since it requires math skills which have completely rusted over because I also never need those, so I’d have to start like 2 levels down the ladder to work my way back up to it.

I’ve found it very effective in the role of “source of general, constant, low-level anxiety”.

This is an accurate assessment for most of us. We anxiously watch emerging technologies, patterns and practices, trying to decide where to focus some of our limited time. Worried about missing something and finding ourselves lost in the past with a set of obsolete skills. So we endlessly watch the horizon, trying to separate the mirages from the actual, deciding what to dive into.

I recently started casually diving into TensorFlow / machine learning. TensorFlow is the machine learning toolset released by Google, and represents the edge of the current hype field (ousting the industry trying to fit all problems into blockchain-shaped solutions, which itself relegated nosql to the trough of disillusionment). It’s just layers and layers of Things I Don’t Know.

GRPC. Bazel. SciPy. Even my Python skills are somewhat rusty. Most of the maths I’m still fairly adept at, and have a handle on CUDA and the Intel MKL, but getting TensorFlow to build on Windows, itself a remarkably painful process (at least in my experience, where having VS2017 and VS2015 on the same machine is a recipe for troubles, and while you can just install the binaries I am a fan of working with a build to allow me to dive into specific operations), yields such an enormous base of dependencies that it gives the feeling of operating some enormously complex piece of machinery. It was much easier to build on Ubuntu, but still represents layers and layers of enormously complex code and systems.

It’s intimidating. It’s the constant low-level stress that we all endure.

And the truth is that the overwhelming majority of tasks in the field will never need to directly use something like Tensorflow. They might use a wrapped, pre-trained and engineered model for a specific task, but most of us will never yield payoff for an in-depth knowledge of such a tool, beyond satisfying intellectual curiosity.

But we just don’t know. So we stay awake at night browsing through the tech sites trying to figure out the things we don’t currently know.

EDIT: I should add the disclaimer that I’m not actually losing sleep over this. I’m actually calm as a Hindu cow regarding technology, and probably am a little too enthused about new things to learn. But it still presents an enormous quandary when my son asks, for instance, what he should build a service layer for a game he’s building. “Well….”