Sleepless Nights for Software Developers

A recent Ask HN raised the question “What do you use Machine Learning for?“, and the top answer, by ashark, is golden-

I use it as something to worry about not knowing how to use, and how that might make me unemployable in a few years, while also having no obvious need for it at all, and therefore no easy avenue towards learning it, especially since it requires math skills which have completely rusted over because I also never need those, so I’d have to start like 2 levels down the ladder to work my way back up to it.

I’ve found it very effective in the role of “source of general, constant, low-level anxiety”.

This is an accurate assessment for most of us. We anxiously watch emerging technologies, patterns and practices, trying to decide where to focus some of our limited time. Worried about missing something and finding ourselves lost in the past with a set of obsolete skills. So we endlessly watch the horizon, trying to separate the mirages from the actual, deciding what to dive into.

I recently started casually diving into TensorFlow / machine learning. TensorFlow is the machine learning toolset released by Google, and represents the edge of the current hype field (ousting the industry trying to fit all problems to into blockchains, which itself relegated nosql to the trough of disillusionment). It’s just layers and layers of Things I Don’t Know.

GRPC. Bazel. SciPy. Even my Python skills are somewhat rusty. Most of the maths I’m still fairly adept at, and have a handle on CUDA and the Intel MKL, but getting TensorFlow to build on Windows, itself a remarkably painful process (at least in my experience, where having VS2017 and VS2015 on the same machine is a recipe for troubles, and while you can just install the binaries I am a fan of working with a build to allow me to dive into specific operations), yields such an enormous base of dependencies that it gives the feeling of operating some enormously complex piece of machinery. It was much easier to build on Ubuntu, but still represents layers and layers of enormously complex code and systems.

It’s intimidating. It’s the constant low-level stress that we all endure.

And the truth is that the overwhelming majority of tasks in the field will never need to directly use something like Tensorflow. They might use a wrapped, pre-trained and engineered model for a specific task, but most of us will never yield payoff for an in-depth knowledge of such a tool, beyond satisfying intellectual curiosity.

But we just don’t know. So we stay awake at night browsing through the tech sites trying to figure out the things we don’t currently know.

EDIT: I should add the disclaimer that I’m not actually losing sleep over this. I’m actually calm as a Hindu cow regarding technology, and probably am a little too enthused about new things to learn. But it still presents an enormous quandary when my son asks, for instance, what he should build a service layer for a game he’s building. “Well….”

Windows Is Already Dead, It Just Doesn’t Know It Yet

Microsoft has released their competitor to the Chromebook in the guise of a $999 laptop running a pared-down, crippled version of Windows, with the option to pay a premium to enable normal functionality. Some partners have released more cost effective variations (sans the fabric keyboard).

Only it’s too late. Windows is already dead, lurching forward in a zombie state. It is a reminder of what once was, rolling from the inertia that launched it decades ago.

Let me back up for a second.

I’m typing this into a WordPress textarea on a Windows 10 box. My daughter’s PC runs Windows 10. My sons’ PC runs Windows 10. The other computer runs OSX 10.12. My laptop runs Ubuntu 17.04. My smartphone runs Android 7.1.2. Another runs iOS 10.3. My wife’s smartphone runs Android. Samsung, Huawei, Apple, HTC, home built, Dell…a lot of companies are represented. I develop in Visual Studio 2017, but also Android Studio and xcode and IntelliJ and gcc and clang and the Intel compiler, among others. I’ve built systems on SQL Server, pgsql, mysql, and others. I’ve hosted on Apache, IIS, and nginx. I used to live on .NET, and continue to keep on eye on the viability of the .NET Core initiative.

We’re all over the place as a family. I don’t wave flags or declare affiliations. I don’t clutch onto something worried that some hard won skills will become obsolete (woe but for my wonderful COM skills). Cases and situations merit different choices.

Adapt. Darwin. I Ching. Whatever man, we gotta roll with it.

Mega corporations with tens of thousands of very smart employees are shifting the world, and we evolve and leverage their work to propel ourselves and the people we work with and for ahead. Or we swim against the current.

When a possible shift comes along we pause to evaluate and consider the impact, determining if it should change our focus and the application of our limited time. Is it the time to consider Metro (rebranded “Windows apps”) apps?

Analyses and situations vary, but from an initial investigation this represents no new life for Windows. No renewed purpose for the Windows Store. It is unlikely to have an impact on the market, beyond some purchased, highly publicized wins. The same sort of vague puffery that we saw with the disastrous failure of the Windows Phone is being used to prop up the future potential of this entrant (“Microsoft is so big and has so much cash it’s a sure thing” we heard again and again).

Only Office isn’t the beachhead it once was anymore, and is another legacy hangover1. The Windows Store is an embarrassment. Neither is a deeply compelling justification.

The Windows management foundation is a hindrance, not an empowerment. Most IT orgs are grossly out of their competency with the current stack of Windows tech, countless firms still desperately clutching onto Windows 7. The various expansions added over the years, from management automation interfaces to scripting objects to a mountain of DCOM and COM objects, have left it in a dangerous place where the base surface area for attack is simply enormous, and the only way most firms will get a handle — preventing the growing instances of ransomware attacks — is to implode their whole system and start anew on a simple foundation.

I like Windows. I like a lot of the moves Microsoft is making recently. But Windows Phone is dead (as it has been for years), and Windows is just another option, with this having every hint of being another failed attempt at throwing goodwill and money towards a fading market.

The glory days of Windows — those days when Microsoft dictated the direction of technology — are long gone. Many millions of PCs will continue to run Windows, but as a platform it has marginal relevance.

-post note-

I write low-effort, lazy contemplation pieces like this occasionally, and they seldom earn me fans or new readers: Some people get offended because they disagree (or become defensive which is unfortunately all too common in this industry), or cheer it on because they already agree: Responses bias towards “that is the most obvious thing in the world…did you write this five years ago?”, or “this is delusional! Have you seen Microsoft’s share prices lately, and I know an accountant that uses Quickbooks on Windows XP so this can’t be true”.

It’s just a thought piece from my current perspective, and a career where the industry went from Microsoft being the behemoth that set the tone and direction of the industry, to a company that is now following the IBM arc, cashing in while they can in areas that they entrenched in a decade+ ago and trying desperately to get traction in whatever competitors are succeeding in.

There are huge numbers of Windows devices out there. It clearly is alive. But it has become essentially a replaceable commodity, and holds extraordinarily little influence.

1 – Not just due to alternative office suites, both web and native, but instead that automation is impacting everything, including the utility of Office. Many of the tasks once put together in Office, and calculations/scenarios done in Excel, managed over countless person hours, have been either eliminated or vastly reduced. In the financial markets tasks that once had legions of employees spinning Excel spreadsheets and sending off carefully created Word docs have been turned into simple web apps and scheduled tasks. The number of jobs that rely on or even leverage Office has collapsed.

Social Anxiety and the Software Developer

A brief little diversionary piece that I hope will prove useful for someone out there, either in identifying their own situation, or in understanding it in others. This is a very selfish piece — me me me — but I hope the intent can be seen in a good light. I suspect that the software development field draws in a lot suffering social anxiety.

This piece is in the spirit of talking openly and honestly about mental health, which is something that we as a community and a society don’t do enough.

A couple of months ago I endured (and caused others to endure) a high stress event. I certainly haven’t tried to strike it from memory (the internet never forgets), and in many ways a lot of positives have come from it and it has been a profound period of personal growth since.

One positive is that I finally faced a lifelong burden of social anxiety, both pharmacologically and behaviorally, a big part being simply realizing that it was a significant problem. I know from emails to my previous mention of enduring this that it struck some readers as perplexing: I’ve worked in executive, lead, and senior positions at a number of organizations. I have a domain under my own name and put myself out there all the time1. I’m seemingly very self-confident, if not approaching arrogance at times.

That isn’t just a facade: I am very confident in my ability to face an intellectual or technical challenge and defeat it. In the right situation I am forceful with my perspective (not because it’s an opinion strongly held, but because I think it’s right, but will effortlessly abandon it when convinced otherwise).

Confidence isn’t a solution to social anxiety, however. It’s possible if not probable for them to live in excess alongside each other. In many ways I think an bloated ego is a prerequisite.

Many choices — as trivial as walking the dog — were made under the umbrella of avoiding interactions. Jobs were avoided if they had a multi-step recruitment process. Investments were shunned if they weren’t a singular solution to everything, and even then I would avoid the interactions necessary to get to a resolution.

I succeeded in career and personally entirely in spite of these handicaps, purely on the back of lucking into a skillset at a perfect time in history. I am utterly convinced that at any other time in history this would have been devastating to any success. Be good at something and people overlook a lot.

And it was normalized. One of the things about this reflective period is that suddenly many of the people who I know and love realized “Hey, that was pretty strange…” It seemed like a quirk or like being shy (which we often treat as a desirable trait), but in reality it was debilitating, and had been from my formative years.

There are treatments for it. I’m two months into this new perspective and I can say that the results are overwhelming. I will never be a gregarious extrovert, but life is so much less stressful just living without dreading encountering a neighbour, or getting a phone call, etc.

1 – The online existence is almost abstract to me, and I’ve always kept it that way. I have always dreading people who I know in “real life” visiting this blog (sometimes family or coworkers have mentioned a piece and it has made me go silent for months, hoping to lose their interest), reading any article I’ve written or anything written about me, etc. That is too real, and was deeply uncomfortable to me. Nonetheless there have been times I’ve realized I said something in error and a cold sweat overcomes me, changing all plans to get to a workstation and fix the error.

Floating-Point Numbers / An Infinite Number of Mathematicians Enter A Bar

tldr; The bits of a floating-point number have shifting exponents, with negative exponents holding the fractional portion of real numbers.

The Basics

Floating-point numbers aren’t always a precise representation of a given real or integer value. This is CS101 material, repeated on programming boards regularly: 0.1 + 0.2 != 0.3, etc (less known, for a single-precision value 16,777,220 = 16,777,219 & 4,000,000,000 = 4,000,000,100).

Most of us are aware of this limitation, yet many don’t understand why, whether they missed that class or forgot it since. Throughout my career I’ve been surprised at the number of very skilled, intelligent peers – brilliant devs – who have treated this as a black box, or had an incorrect knowledge of their function. More than one has believed such numbers stored two integers of a fraction, for instance.

So here’s a quick visual and interactive explanation of floating-point numbers, in this case the dominant IEEE 754 variants such as binary32, binary64, and a quick mention of binary16. I decided to author this as a lead-in to an upcoming entry about financial calculations, where these concerns become critical and I need to call out to it.

I’m going to start with a relevant joke (I’d give credit if I knew the origin)-

An infinite number of mathematicians walk into a bar. The first orders a beer, the second orders half a beer, the third orders a quarter of a beer, and so on. The bartender pours two beers and says “know your limits.”

In the traditional unsigned integer that we all know and love, each successively more significant bit is worth double the amount of the one before (powers of 2) when set, going from 20 (1) for the least significant bit, to 2bitsize-1 for the most significant bit (e.g. 231 for a 32-bit unsigned integer). If the integer were signed the top bit would be reserved for the negative flag (which entails a discussion about two’s-complement representations that I’m not going to enter into, so I’ll veer clear of negative integers).

An 8-bit integer might look like (the bits can be toggled for no particular reason beyond keeping the attention of readers)-

Why stop at 20 (1) for the LSB? What if we used 16 bits and reserved the last 8 bits for negative exponents? For those who remember basic math, a negative exponent -x is equal to 1/nx, e.g. 2-3 = 1/23 = 1/8.

Behold, a binary fixed-point number: Click on some of the negative exponents to yield a real number. The triangle demarcates between whole and fractional exponents. In this case the number has a precision of 1/256, and a max magnitude of 1/256th under 256.

We’re halfway there.

A floating-point number has, as the name states, a floating point. This entails storing a separate value detailing the shift of the exponents (thus defining where the point lies — the separation between the exponent 0 and negative exponents, if any).

Before we get into that, one basic about floating-point numbers: They have an implicit leading binary 1. If a floating-point value had only 3 value/fraction bits and they were set to 000, the actual value of the floating-point is 1000 courtesy of this leading implicit bit.

To explain the structure of a floating-point number, a decimal32 — aka single-precision — floating-point number has 23 mantissa bits (the actual value, sometimes called the fraction) and an implicit additional top bit of 1 as mentioned, ergo 24 bits defining the value. These are the bottom 23 bits in the value: bits 0-22.

The exponent shift of a single-precision value occupies 8-bits and while the standard allows for that to be a signed 8-bit integer, most implementations use a biased encoding where 127 (e.g. 01111111) = 0 such that the exponent shift = value – 127 (so below is incrementally negative, above is incrementally positive). A value of 127 indicates that the [binary decimal point/separation between 0 and negative exponents] lies directly after the implicit leading 1, while <127 move it successively to the right, and >127 numbers move it to the left. The exponent bits sit above the mantissa, occupying bits 23-30.

At the very top — the most significant bit — lies a flag indicating whether the value is negative or not. Unlike with two’s-complement values seen in pure integers, with floating point numbers a single bit swaps the value to its inverse value. This is bit 31.

“But how can a floating-point value hold 0 if the high bit of the value/mantissa/fraction is always 1?”

If all bits are set to 0 — the flag, exponent shift and the value — it represents a value 0, and if just the flag is 1 it represents -0. If the exponent shift is all 1s, this can indicate either NaN or Inf depending upon whether the fractional portion has values set. Those are the magic numbers of floating points.

Let’s look at a floating-point number, starting with one holding the integer value 65535, with no fractional.

With this sample you have the ability to change the exponent shift — the 8-bit shift integer of the single-precision floating point — to see the impact. Note that if you were going to use this shift in an actual single-precision value, you would need to add 127 to the value (e.g. 10 would become 137, and -10 would be 117).

The red bordered box indicates the implicit bit that isn’t actually stored in the value. In the default state again it’s notable that with a magnitude of 65535 — the integer portion occupying 15 real bits and the 1 implicit bit — the max precision is 1/256th.

If instead we stored 255, the precision jumps to 1/65536. The precision is dictated by the magnitude of the value.

To present an extreme example, what if we represented the population of the Earth-

Precision has dropped to 29 — 512. Only increments of 512 can be stored.

More recently the industry has seen an interest in something called half-precision floating point values, particularly in compute and neural net calculations.

Half-precision floating point values offer a very limited range, but fit in just 16-bits.

That’s the basic operation of floating point numbers. It’s a set of value bits where the exponent range can be shifted. Double-precision (binary64) floating points up the value/fraction storage to 53-bits (52-bits stored, plus the 1 intrinsic bit), and the exponent shift to 11 bits offering a far greater precision and/or magnitude, coming closer to the infinite number of mathematicians at representing small scale numbers. I am not going to simulate such a number on here as it would exceed the bounds of reader screens.

Hopefully this has helped.

Basic Things You Should Know About Floating-Point Numbers

  • Single-precision floating point numbers can precisely hold whole numbers from -16,777,215 to 16,777,215, with zero ambiguity. Many users derive the wrong assumption from the fractional variances that all numbers are approximations, but many (including fractional representations that fit within the magnitude — e.g. 0.25, 0.75, 0.0042572021484375, etc — can be precisely stored. The key is that the number is the decimal representation of a fraction where the denominator is a power of 2 and lies within the precision band of a given magnitude)
  • Double-precision floating point numbers can precisely hold whole numbers from -9,007,199,254,740,991 to 9,007,199,254,740,991. You can very easily calculate the precision allowed for a given magnitude (e.g. if the magnitude is between 1 and 2, the precision is within 1/4503599627370496, which for the vast majority of uses is well within any reasonable bounds.
  • Every number in JavaScript is a double-precision floating point. Your counter, “const”, and other seeming integers are DPs. If you use bitwise operations it will temporarily present it as an unsigned 32-bit int as a hack. Some decimal libraries layer atop this to present very inefficient, but sometimes necessary, decimal representations.
  • Decimal types can be mandated in some domains, but represent a dramatic speed compromise (courtesy of the reality that our hardware is extremely optimized for floating-point math). With some analysis of the precision for a given task, and intelligent rounding rules, double-precision is more than adequate for most purposes. There are scenarios where you can pursue a hybrid approach: In an extremely high Internal Rate of Return calculation I use SP to get to an approximate solution,
    and then decimal math to get an absolutely precise solution (the final, smallest leg).
  • On most modern processors double-precision calculations run at approximately half the speed of single-precision calculations (presuming that you’re using SIMD, where an AVX unit may do 8 DP calculations per cycle, or 16 SP calculations per cycle). Half-precision calculations, however, do not offer any speed advantage beyond reducing memory throughput and scale necessary. The instructions to unpack and pack binary16 is a relatively new addition.
  • On most GPUs, double-precision calculations are dramatically slower than single-precision calculations. While most processors have floating point units that perform single-precision calculations on double-precision hardware or greater, most offering SIMD to do many calculations at once, GPUs were built for single-precision calculations, and use entirely different hardware for double precision calculations. Hardware that is often in short supply (most GPUs offer 1/24 to 1/32 the number of DP units). On the flip side, most GPUs use SIMD on single-precision hardware to do multiple half-precision calculations, offering the best performance of all.
  • Some very new compute-focused devices offer spectacular DP performance. The GP100 from nvidia offers 5 TFLOPS of DP calculations, about 10 TFLOPS of SP calculations, and 20 TFLOPS of half-precision calculations. These are incredible new heights.

Revisiting Technical Debt

A recent post, Strive For Technical Debt [edit: It was recent at the time I originally authored this, but then I got sidetracked and had never hit publish until coming across this old post while readying a new post], netted a couple of emails questioning whether I was encouraging or accepting bad coding practices.

Absolutely not.

Learn your tools (and use the right tools, always reassessing and upgrading platforms and skills as beneficial). Abstract appropriately. Build with a consideration for the future. Use best practices (small, self-contained functions, SOLID, maximize testability, model data intelligently, etc).

Do the best you can in the real-world constraints.

There is a middle ground, though, and it’s in that compromised musty middle where delivered solutions actually get built. But here’s where the disconnect between the idealized and the reality lies, and I mentioned it in the prior post to some confusion: The countless time-wasting, best-intentions no-code-debt initiatives that never actually delivered anything have long been forgotten, having zero legacy. Instead many of us are surrounding by sub-optimal solutions, lamenting the hodge-podge of systems and solutions that power our organizations.

We declare the bad practices by focusing on the faults of projects that actually delivered. To exaggerate the effect, the more important and critical the project, the more we study and hold it up for scrutiny.

Tell me again about the problems with the codebase of Linux or Firefox or MySQL, or that data process that is the foundation of your organization?

If you are in the privileged position of lamenting technical debt, the alternate timeline is overwhelming one filled with failure. Having technical debt to complain about is often the best possible outcome, and has a strong correlation with a project having succeeded.

This whole line of thought came up as I recently reconnected with a coworker from almost two decades ago. At the time I was a junior in a small firm, and was given the schlub work of taking in some sloppy, error-filled data from remote monitoring sites and generating reports. The customer (who was also an investing partner) didn’t particularly care about this task, and paid a minimal stipend for these ancillary reports.

This was a process done in Excel. Someone previously had created some VBScript automations that would do extremely rudimentary data checking and cleaning (the source data often had gaps, invalid and out of reasonable bounds values, etc — the physical collection process was failure prone), and then we’d put it in another spreadsheet with a report page. It was a large accumulation of “technical debt”, but I had the benefit of starting with a self-documented process that was currently accepted by the business and its partners despite many faults and omissions, so I knew the boundaries of what I could do without drawn out business and planning meetings.

The existing code gave me the framework of a starting point. I never knew who made what came before, but was thankful that they laid the original path.

I did know that I didn’t want to be hand mangling a bunch of Excel nonsense every day — a minefield of manual steps in the process making recurring errors inevitable — so like all lazy developers the world over I automated it. This was completely unsanctioned, and was just a way of clearing my docket of the mind numbing work so I could focus on the fun stuff.

I created a data model (originally to an MS Access database, later upgrading to SQL Server because it happened to be on the MSDN install discs we got quarterly) to hold the data, and then a simple Windows service that monitored a directory for files, importing them as they became available, applying all of the rules exactly as original coded and auditing all changes and errors found. I hashed out a quick web solution to allow on-demand report availability checking and generation.

It was rushed (the core quite literally built in two days), very specialized, and the code was just a big ball of mud. Virtually all of the data cleaning was done in a giant function that was essentially a transcoding of the script in the original spreadsheet.

Over time I’d do small fixes to edge conditions that weren’t expected (on just about every daylight savings time change — for some reason the source data was logged in local time — the data would be malformed in some fun way), or to add new data rules based upon feedback from the client, and just to spend those small opportunities refactoring the code. I remember we would use the term “AI” to laughably describe the rules, when at best it was an expert system, doing a probability correctness analysis on the various correlated values (e.g. exhaust, fuel flow, temperature, RPMs, etc) based upon our understanding of the domain, determining which to trust more, and cascade correcting as a result.

It worked. For years it dutifully churned through those files daily with minimal change, generating automated reports that it would send out to the client, with a basic but at the time way ahead of the curve web application for on-demand reporting that became the base of other functions. The client was so impressed by the outcome of something that they had delegated as garbage work that large new technology projects and funding started rolling in, the project growing into real time monitoring and control of power generation stations, among other things.

The team grew as a result. As the outcome of a desire by a developer to automate some manual processes, people gained employment and an enterprise grew.

And this code from years earlier kept churning through files, generating outputs and value. Every now and then some random real or imagined deficiency would be noted: I remember being grilled on the “scalability” of the solution: It was literally serving a single person at a single client — a massive energy company — on the occasional daily and weekly report runs, low-end hardware vastly outpacing our needs for the data volumes encountered or even hypothetically planned, and could easily be scaled out via a single tenancy model, but we had to essentially invent deficiencies to find faults.

At this point I had moved on to bigger and better things when I got a query from my old boss: Everyone was at a loss to understand what the old code did, you see, and they wanted to rewrite it using the current best practices, on a current silver-bullet technology stack. They wanted me to write a document detailing the process flow and steps.

From an old friend who still worked at the firm I knew that internally the discussion was much less generous, and it was a group in quagmire, lamenting, with much ululations, how they were stuck in the mud for months if not years on end, held down by this drowning burden of technical debt. The debt that was, ironically, the origin of their employment.

They had tried to replace it a couple of times over the years, every initiative heading to failure courtesy of second-system syndrome: After so much boastful talk for years, simply replacing with improved technology and leveraging new processes and patterns couldn’t possibly be sufficient.

For those who aren’t accustomed to the term second-system syndrome (or effect), or who haven’t lived through it or dealt with its corruption, when developers look to replace a system they often really want to justify replacing it through overly broad ambitions and expectations, which is easy when a solution is failing and a heroic outcome can be easily achieved, but is trebly difficult when the project has been a success for years and advantages are largely theoretical. We can’t simply do a small, manageable task, transcoding and refactoring an existing project quietly and transparently towards a better level of fitness, which in most cases is a trivial, easily pursued task, but instead we need to have broad ambitions about a new project that is going to be everything to everyone.

Replacing a simple, task-specific data importing and fixed reporting process? Well the replacement had better have a declarative data modeling engine that can accommodate any possible form and format of data, into an infinitely versatile data model that is scaled across machines, is database agnostic, uses the coolest sounding new tech, etc.

It is virtually assured product failure. Rinse and repeat. Wave one’s arms around to blame the past for the present.

This isn’t petty or spiteful — I am proud of that project, but absolutely acknowledge the many problems in its implementation — but it’s an interesting dilemma that there is little glory in replacing functional systems, so a lot of bad blood can be the result.