Revisiting Technical Debt

A recent post, Strive For Technical Debt [edit: It was recent at the time I originally authored this, but then I got sidetracked and had never hit publish until coming across this old post while readying a new post], netted a couple of emails questioning whether I was encouraging or accepting bad coding practices.

Absolutely not.

Learn your tools (and use the right tools, always reassessing and upgrading platforms and skills as beneficial). Abstract appropriately. Build with a consideration for the future. Use best practices (small, self-contained functions, SOLID, maximize testability, model data intelligently, etc).

Do the best you can in the real-world constraints.

There is a middle ground, though, and it’s in that compromised musty middle where delivered solutions actually get built. But here’s where the disconnect between the idealized and the reality lies, and I mentioned it in the prior post to some confusion: The countless time-wasting, best-intentions no-code-debt initiatives that never actually delivered anything have long been forgotten, having zero legacy. Instead many of us are surrounding by sub-optimal solutions, lamenting the hodge-podge of systems and solutions that power our organizations.

We declare the bad practices by focusing on the faults of projects that actually delivered. To exaggerate the effect, the more important and critical the project, the more we study and hold it up for scrutiny.

Tell me again about the problems with the codebase of Linux or Firefox or MySQL, or that data process that is the foundation of your organization?

If you are in the privileged position of lamenting technical debt, the alternate timeline is overwhelming one filled with failure. Having technical debt to complain about is often the best possible outcome, and has a strong correlation with a project having succeeded.

This whole line of thought came up as I recently reconnected with a coworker from almost two decades ago. At the time I was a junior in a small firm, and was given the schlub work of taking in some sloppy, error-filled data from remote monitoring sites and generating reports. The customer (who was also an investing partner) didn’t particularly care about this task, and paid a minimal stipend for these ancillary reports.

This was a process done in Excel. Someone previously had created some VBScript automations that would do extremely rudimentary data checking and cleaning (the source data often had gaps, invalid and out of reasonable bounds values, etc — the physical collection process was failure prone), and then we’d put it in another spreadsheet with a report page. It was a large accumulation of “technical debt”, but I had the benefit of starting with a self-documented process that was currently accepted by the business and its partners despite many faults and omissions, so I knew the boundaries of what I could do without drawn out business and planning meetings.

The existing code gave me the framework of a starting point. I never knew who made what came before, but was thankful that they laid the original path.

I did know that I didn’t want to be hand mangling a bunch of Excel nonsense every day — a minefield of manual steps in the process making recurring errors inevitable — so like all lazy developers the world over I automated it. This was completely unsanctioned, and was just a way of clearing my docket of the mind numbing work so I could focus on the fun stuff.

I created a data model (originally to an MS Access database, later upgrading to SQL Server because it happened to be on the MSDN install discs we got quarterly) to hold the data, and then a simple Windows service that monitored a directory for files, importing them as they became available, applying all of the rules exactly as original coded and auditing all changes and errors found. I hashed out a quick web solution to allow on-demand report availability checking and generation.

It was rushed (the core quite literally built in two days), very specialized, and the code was just a big ball of mud. Virtually all of the data cleaning was done in a giant function that was essentially a transcoding of the script in the original spreadsheet.

Over time I’d do small fixes to edge conditions that weren’t expected (on just about every daylight savings time change — for some reason the source data was logged in local time — the data would be malformed in some fun way), or to add new data rules based upon feedback from the client, and just to spend those small opportunities refactoring the code. I remember we would use the term “AI” to laughably describe the rules, when at best it was an expert system, doing a probability correctness analysis on the various correlated values (e.g. exhaust, fuel flow, temperature, RPMs, etc) based upon our understanding of the domain, determining which to trust more, and cascade correcting as a result.

It worked. For years it dutifully churned through those files daily with minimal change, generating automated reports that it would send out to the client, with a basic but at the time way ahead of the curve web application for on-demand reporting that became the base of other functions. The client was so impressed by the outcome of something that they had delegated as garbage work that large new technology projects and funding started rolling in, the project growing into real time monitoring and control of power generation stations, among other things.

The team grew as a result. As the outcome of a desire by a developer to automate some manual processes, people gained employment and an enterprise grew.

And this code from years earlier kept churning through files, generating outputs and value. Every now and then some random real or imagined deficiency would be noted: I remember being grilled on the “scalability” of the solution: It was literally serving a single person at a single client — a massive energy company — on the occasional daily and weekly report runs, low-end hardware vastly outpacing our needs for the data volumes encountered or even hypothetically planned, and could easily be scaled out via a single tenancy model, but we had to essentially invent deficiencies to find faults.

At this point I had moved on to bigger and better things when I got a query from my old boss: Everyone was at a loss to understand what the old code did, you see, and they wanted to rewrite it using the current best practices, on a current silver-bullet technology stack. They wanted me to write a document detailing the process flow and steps.

From an old friend who still worked at the firm I knew that internally the discussion was much less generous, and it was a group in quagmire, lamenting, with much ululations, how they were stuck in the mud for months if not years on end, held down by this drowning burden of technical debt. The debt that was, ironically, the origin of their employment.

They had tried to replace it a couple of times over the years, every initiative heading to failure courtesy of second-system syndrome: After so much boastful talk for years, simply replacing with improved technology and leveraging new processes and patterns couldn’t possibly be sufficient.

For those who aren’t accustomed to the term second-system syndrome (or effect), or who haven’t lived through it or dealt with its corruption, when developers look to replace a system they often really want to justify replacing it through overly broad ambitions and expectations, which is easy when a solution is failing and a heroic outcome can be easily achieved, but is trebly difficult when the project has been a success for years and advantages are largely theoretical. We can’t simply do a small, manageable task, transcoding and refactoring an existing project quietly and transparently towards a better level of fitness, which in most cases is a trivial, easily pursued task, but instead we need to have broad ambitions about a new project that is going to be everything to everyone.

Replacing a simple, task-specific data importing and fixed reporting process? Well the replacement had better have a declarative data modeling engine that can accommodate any possible form and format of data, into an infinitely versatile data model that is scaled across machines, is database agnostic, uses the coolest sounding new tech, etc.

It is virtually assured product failure. Rinse and repeat. Wave one’s arms around to blame the past for the present.

This isn’t petty or spiteful — I am proud of that project, but absolutely acknowledge the many problems in its implementation — but it’s an interesting dilemma that there is little glory in replacing functional systems, so a lot of bad blood can be the result.