Floating-Point Numbers / An Infinite Number of Mathematicians Enter A Bar

tldr; The bits of a floating-point number have shifting exponents, with negative exponents holding the fractional portion of real numbers.

The Basics

Floating-point numbers aren’t always a precise representation of a given real or integer value. This is CS101 material, repeated on programming boards regularly: 0.1 + 0.2 != 0.3, etc (less known, for a single-precision value 16,777,220 = 16,777,219 & 4,000,000,000 = 4,000,000,100).

Most of us are aware of this limitation, yet many don’t understand why, whether they missed that class or forgot it since. Throughout my career I’ve been surprised at the number of very skilled, intelligent peers – brilliant devs – who have treated this as a black box, or had an incorrect knowledge of their function. More than one has believed such numbers stored two integers of a fraction, for instance.

So here’s a quick visual and interactive explanation of floating-point numbers, in this case the dominant IEEE 754 variants such as binary32, binary64, and a quick mention of binary16. I decided to author this as a lead-in to an upcoming entry about financial calculations, where these concerns become critical and I need to call out to it.

I’m going to start with a relevant joke (I’d give credit if I knew the origin)-

An infinite number of mathematicians walk into a bar. The first orders a beer, the second orders half a beer, the third orders a quarter of a beer, and so on. The bartender pours two beers and says “know your limits.”

In the traditional unsigned integer that we all know and love, each successively more significant bit is worth double the amount of the one before (powers of 2) when set, going from 20 (1) for the least significant bit, to 2bitsize-1 for the most significant bit (e.g. 231 for a 32-bit unsigned integer). If the integer were signed the top bit would be reserved for the negative flag (which entails a discussion about two’s-complement representations that I’m not going to enter into, so I’ll veer clear of negative integers).

An 8-bit integer might look like (the bits can be toggled for no particular reason beyond keeping the attention of readers)-

Why stop at 20 (1) for the LSB? What if we used 16 bits and reserved the last 8 bits for negative exponents? For those who remember basic math, a negative exponent -x is equal to 1/nx, e.g. 2-3 = 1/23 = 1/8.

Behold, a binary fixed-point number: Click on some of the negative exponents to yield a real number. The triangle demarcates between whole and fractional exponents. In this case the number has a precision of 1/256, and a max magnitude of 1/256th under 256.

We’re halfway there.

A floating-point number has, as the name states, a floating point. This entails storing a separate value detailing the shift of the exponents (thus defining where the point lies — the separation between the exponent 0 and negative exponents, if any).

Before we get into that, one basic about floating-point numbers: They have an implicit leading binary 1. If a floating-point value had only 3 value/fraction bits and they were set to 000, the actual value of the floating-point is 1000 courtesy of this leading implicit bit.

To explain the structure of a floating-point number, a decimal32 — aka single-precision — floating-point number has 23 mantissa bits (the actual value, sometimes called the fraction) and an implicit additional top bit of 1 as mentioned, ergo 24 bits defining the value. These are the bottom 23 bits in the value: bits 0-22.

The exponent shift of a single-precision value occupies 8-bits and while the standard allows for that to be a signed 8-bit integer, most implementations use a biased encoding where 127 (e.g. 01111111) = 0 such that the exponent shift = value – 127 (so below is incrementally negative, above is incrementally positive). A value of 127 indicates that the [binary decimal point/separation between 0 and negative exponents] lies directly after the implicit leading 1, while <127 move it successively to the right, and >127 numbers move it to the left. The exponent bits sit above the mantissa, occupying bits 23-30.

At the very top — the most significant bit — lies a flag indicating whether the value is negative or not. Unlike with two’s-complement values seen in pure integers, with floating point numbers a single bit swaps the value to its inverse value. This is bit 31.

“But how can a floating-point value hold 0 if the high bit of the value/mantissa/fraction is always 1?”

If all bits are set to 0 — the flag, exponent shift and the value — it represents a value 0, and if just the flag is 1 it represents -0. If the exponent shift is all 1s, this can indicate either NaN or Inf depending upon whether the fractional portion has values set. Those are the magic numbers of floating points.

Let’s look at a floating-point number, starting with one holding the integer value 65535, with no fractional.

With this sample you have the ability to change the exponent shift — the 8-bit shift integer of the single-precision floating point — to see the impact. Note that if you were going to use this shift in an actual single-precision value, you would need to add 127 to the value (e.g. 10 would become 137, and -10 would be 117).

The red bordered box indicates the implicit bit that isn’t actually stored in the value. In the default state again it’s notable that with a magnitude of 65535 — the integer portion occupying 15 real bits and the 1 implicit bit — the max precision is 1/256th.

If instead we stored 255, the precision jumps to 1/65536. The precision is dictated by the magnitude of the value.

To present an extreme example, what if we represented the population of the Earth-

Precision has dropped to 29 — 512. Only increments of 512 can be stored.

More recently the industry has seen an interest in something called half-precision floating point values, particularly in compute and neural net calculations.

Half-precision floating point values offer a very limited range, but fit in just 16-bits.

That’s the basic operation of floating point numbers. It’s a set of value bits where the exponent range can be shifted. Double-precision (binary64) floating points up the value/fraction storage to 53-bits (52-bits stored, plus the 1 intrinsic bit), and the exponent shift to 11 bits offering a far greater precision and/or magnitude, coming closer to the infinite number of mathematicians at representing small scale numbers. I am not going to simulate such a number on here as it would exceed the bounds of reader screens.

Hopefully this has helped.

Basic Things You Should Know About Floating-Point Numbers

  • Single-precision floating point numbers can precisely hold whole numbers from -16,777,215 to 16,777,215, with zero ambiguity. Many users derive the wrong assumption from the fractional variances that all numbers are approximations, but many (including fractional representations that fit within the magnitude — e.g. 0.25, 0.75, 0.0042572021484375, etc — can be precisely stored. The key is that the number is the decimal representation of a fraction where the denominator is a power of 2 and lies within the precision band of a given magnitude)
  • Double-precision floating point numbers can precisely hold whole numbers from -9,007,199,254,740,991 to 9,007,199,254,740,991. You can very easily calculate the precision allowed for a given magnitude (e.g. if the magnitude is between 1 and 2, the precision is within 1/4503599627370496, which for the vast majority of uses is well within any reasonable bounds.
  • Every number in JavaScript is a double-precision floating point. Your counter, “const”, and other seeming integers are DPs. If you use bitwise operations it will temporarily present it as an unsigned 32-bit int as a hack. Some decimal libraries layer atop this to present very inefficient, but sometimes necessary, decimal representations.
  • Decimal types can be mandated in some domains, but represent a dramatic speed compromise (courtesy of the reality that our hardware is extremely optimized for floating-point math). With some analysis of the precision for a given task, and intelligent rounding rules, double-precision is more than adequate for most purposes. There are scenarios where you can pursue a hybrid approach: In an extremely high Internal Rate of Return calculation I use SP to get to an approximate solution,
    and then decimal math to get an absolutely precise solution (the final, smallest leg).
  • On most modern processors double-precision calculations run at approximately half the speed of single-precision calculations (presuming that you’re using SIMD, where an AVX unit may do 8 DP calculations per cycle, or 16 SP calculations per cycle). Half-precision calculations, however, do not offer any speed advantage beyond reducing memory throughput and scale necessary. The instructions to unpack and pack binary16 is a relatively new addition.
  • On most GPUs, double-precision calculations are dramatically slower than single-precision calculations. While most processors have floating point units that perform single-precision calculations on double-precision hardware or greater, most offering SIMD to do many calculations at once, GPUs were built for single-precision calculations, and use entirely different hardware for double precision calculations. Hardware that is often in short supply (most GPUs offer 1/24 to 1/32 the number of DP units). On the flip side, most GPUs use SIMD on single-precision hardware to do multiple half-precision calculations, offering the best performance of all.
  • Some very new compute-focused devices offer spectacular DP performance. The GP100 from nvidia offers 5 TFLOPS of DP calculations, about 10 TFLOPS of SP calculations, and 20 TFLOPS of half-precision calculations. These are incredible new heights.

Revisiting Technical Debt

A recent post, Strive For Technical Debt [edit: It was recent at the time I originally authored this, but then I got sidetracked and had never hit publish until coming across this old post while readying a new post], netted a couple of emails questioning whether I was encouraging or accepting bad coding practices.

Absolutely not.

Learn your tools (and use the right tools, always reassessing and upgrading platforms and skills as beneficial). Abstract appropriately. Build with a consideration for the future. Use best practices (small, self-contained functions, SOLID, maximize testability, model data intelligently, etc).

Do the best you can in the real-world constraints.

There is a middle ground, though, and it’s in that compromised musty middle where delivered solutions actually get built. But here’s where the disconnect between the idealized and the reality lies, and I mentioned it in the prior post to some confusion: The countless time-wasting, best-intentions no-code-debt initiatives that never actually delivered anything have long been forgotten, having zero legacy. Instead many of us are surrounding by sub-optimal solutions, lamenting the hodge-podge of systems and solutions that power our organizations.

We declare the bad practices by focusing on the faults of projects that actually delivered. To exaggerate the effect, the more important and critical the project, the more we study and hold it up for scrutiny.

Tell me again about the problems with the codebase of Linux or Firefox or MySQL, or that data process that is the foundation of your organization?

If you are in the privileged position of lamenting technical debt, the alternate timeline is overwhelming one filled with failure. Having technical debt to complain about is often the best possible outcome, and has a strong correlation with a project having succeeded.

This whole line of thought came up as I recently reconnected with a coworker from almost two decades ago. At the time I was a junior in a small firm, and was given the schlub work of taking in some sloppy, error-filled data from remote monitoring sites and generating reports. The customer (who was also an investing partner) didn’t particularly care about this task, and paid a minimal stipend for these ancillary reports.

This was a process done in Excel. Someone previously had created some VBScript automations that would do extremely rudimentary data checking and cleaning (the source data often had gaps, invalid and out of reasonable bounds values, etc — the physical collection process was failure prone), and then we’d put it in another spreadsheet with a report page. It was a large accumulation of “technical debt”, but I had the benefit of starting with a self-documented process that was currently accepted by the business and its partners despite many faults and omissions, so I knew the boundaries of what I could do without drawn out business and planning meetings.

The existing code gave me the framework of a starting point. I never knew who made what came before, but was thankful that they laid the original path.

I did know that I didn’t want to be hand mangling a bunch of Excel nonsense every day — a minefield of manual steps in the process making recurring errors inevitable — so like all lazy developers the world over I automated it. This was completely unsanctioned, and was just a way of clearing my docket of the mind numbing work so I could focus on the fun stuff.

I created a data model (originally to an MS Access database, later upgrading to SQL Server because it happened to be on the MSDN install discs we got quarterly) to hold the data, and then a simple Windows service that monitored a directory for files, importing them as they became available, applying all of the rules exactly as original coded and auditing all changes and errors found. I hashed out a quick web solution to allow on-demand report availability checking and generation.

It was rushed (the core quite literally built in two days), very specialized, and the code was just a big ball of mud. Virtually all of the data cleaning was done in a giant function that was essentially a transcoding of the script in the original spreadsheet.

Over time I’d do small fixes to edge conditions that weren’t expected (on just about every daylight savings time change — for some reason the source data was logged in local time — the data would be malformed in some fun way), or to add new data rules based upon feedback from the client, and just to spend those small opportunities refactoring the code. I remember we would use the term “AI” to laughably describe the rules, when at best it was an expert system, doing a probability correctness analysis on the various correlated values (e.g. exhaust, fuel flow, temperature, RPMs, etc) based upon our understanding of the domain, determining which to trust more, and cascade correcting as a result.

It worked. For years it dutifully churned through those files daily with minimal change, generating automated reports that it would send out to the client, with a basic but at the time way ahead of the curve web application for on-demand reporting that became the base of other functions. The client was so impressed by the outcome of something that they had delegated as garbage work that large new technology projects and funding started rolling in, the project growing into real time monitoring and control of power generation stations, among other things.

The team grew as a result. As the outcome of a desire by a developer to automate some manual processes, people gained employment and an enterprise grew.

And this code from years earlier kept churning through files, generating outputs and value. Every now and then some random real or imagined deficiency would be noted: I remember being grilled on the “scalability” of the solution: It was literally serving a single person at a single client — a massive energy company — on the occasional daily and weekly report runs, low-end hardware vastly outpacing our needs for the data volumes encountered or even hypothetically planned, and could easily be scaled out via a single tenancy model, but we had to essentially invent deficiencies to find faults.

At this point I had moved on to bigger and better things when I got a query from my old boss: Everyone was at a loss to understand what the old code did, you see, and they wanted to rewrite it using the current best practices, on a current silver-bullet technology stack. They wanted me to write a document detailing the process flow and steps.

From an old friend who still worked at the firm I knew that internally the discussion was much less generous, and it was a group in quagmire, lamenting, with much ululations, how they were stuck in the mud for months if not years on end, held down by this drowning burden of technical debt. The debt that was, ironically, the origin of their employment.

They had tried to replace it a couple of times over the years, every initiative heading to failure courtesy of second-system syndrome: After so much boastful talk for years, simply replacing with improved technology and leveraging new processes and patterns couldn’t possibly be sufficient.

For those who aren’t accustomed to the term second-system syndrome (or effect), or who haven’t lived through it or dealt with its corruption, when developers look to replace a system they often really want to justify replacing it through overly broad ambitions and expectations, which is easy when a solution is failing and a heroic outcome can be easily achieved, but is trebly difficult when the project has been a success for years and advantages are largely theoretical. We can’t simply do a small, manageable task, transcoding and refactoring an existing project quietly and transparently towards a better level of fitness, which in most cases is a trivial, easily pursued task, but instead we need to have broad ambitions about a new project that is going to be everything to everyone.

Replacing a simple, task-specific data importing and fixed reporting process? Well the replacement had better have a declarative data modeling engine that can accommodate any possible form and format of data, into an infinitely versatile data model that is scaled across machines, is database agnostic, uses the coolest sounding new tech, etc.

It is virtually assured product failure. Rinse and repeat. Wave one’s arms around to blame the past for the present.

This isn’t petty or spiteful — I am proud of that project, but absolutely acknowledge the many problems in its implementation — but it’s an interesting dilemma that there is little glory in replacing functional systems, so a lot of bad blood can be the result.

Mindful Software Development

This is a minor distraction piece while I complete the promised article on some fun high performance SIMD financial calculations across several programming languages.

What Is Mindfulness?

Skip this section if you just want to get right to the part speaking directly to software development.

The topic of mindfulness often segues to meditation, which is a rewarding activity that unfortunately falls in Venn diagrams of various superstitions and spiritual beliefs: I’m not pitching any of those. While I’ve long pursued the practice of meditation (finally having some success as of late1), my interest is philosophical and psychological: Gaining a tool of mental relaxation and focus, both critical abilities in the software development/technology field. This piece is more an observation of personal discoveries rather than advice, so consider skeptically, implement appropriately, and adapt according to your own results.

Mindfulness is something we all experience2, and is common when we encounter changing or unexpected conditions.

The first snowfall of the season is often a mindful experience. The crisp air buffeting against your face. The gentle murmur of snowflakes blanketing the ground, absorbing and dulling the ambiance of the normal environmental sounds. Everything feeling remote and less real, and a little more magical. We note the warmth and humidity when returning indoors, with a new awareness of every unique scent.

We enjoy the crackle of the fireplace, and the smell of burning wood. The water gurgling to a boil over the wood stove, and the ebbs and flows of patterns created when poured into a cup with hot chocolate mix. The billowing steam dancing atop the cup. The comfort of a warm drink.

We have a heightened sense of mental clarity.

That is mindfulness. It is being completely involved with the present – by being truly present – absorbing and interpreting and enjoying all of the senses.

My dog, Piper, hanging out while I work

Put in a computer science perspective, mindfulness is like assigning the process responsible for interpreting the now a high priority, preventing it from being pushed to a background role. Mindful periods are often recalled as if time had slowed down (reported especially when we experience emergency mindfulness, like in a car accident or fall), which from a computer science perspective has a rational explanation: Normally our senses, which are the significant arbiter of our perception of time, are a low priority background thread, consuming some hypothetically low percentage of our thought capacity, but in a period of mindfulness it gets far more of the cycles, experiencing much more in a given period of time.

Some experience mindfulness with chemical assistance, where we might become fascinated by things that we normally overlook. A glass of wine in and I find myself captivated by diced onions sizzling in a pan as they dance atop a thin layer of hot oil, overwhelmed with the beauty of the aromas. I become fascinated as slivers of steam wisp off to their escape.

People’s experiences with various illicit substances, often at enormous human cost, is oft pursued as chemically assisted mindfulness. To enjoy and experience the present, people often sacrifice their future.

Meditation can be used to exercise mindfulness, gaining a better handle on safely achieving it on demand. A common exercise is mindful breathing where you sit in a comfortable position (e.g. one where you’re safe and secure and not in distracting discomfort, usually in a quiet location) and focus on the senses involved with breathing: The rush of the air into your nose; The rise of your chest; The fall as you breath out of your mouth. Or you might mindfully focus your attention on parts of your body, “scanning” the various feelings that ordinarily you completely disregard.

Mindfulness is being fully involved. Of being completely invested and aware of your existence at that moment. Not caught in the past or the future, or planning or reconsidering or reliving, or endlessly going through internal narratives and planning about social interactions and obligations, anxiously worried about the meeting tomorrow or reliving the meeting from yesterday, but being entirely focused on the moment and the beauty and complexity of the world around us.

That’s mindfulness.

It seems obvious and trivial, and seemingly available on demand, but actively engaging in mindfulness is a difficult task, our minds having an evolutionary tendency to try to move anything routine or expected to a background process, clearing capacity like an over-eager paging algorithm for the event that something unexpected comes along. Noisy contemplation about the past and the future fill the vacuum.

From a survival perspective, where historically threats were everywhere and our continued existence was a moment to moment gambit, having a mental priority model and IPC that focused on deviations from the norm would be an advantage: Ignore everything not deemed a threat or critical, and only apprise me consciously of changes while I worry about tigers in the area and contemplate how they almost got me a week ago and ways to avoid making that mistake again, predicting how they might get me this time. Taking in the trivia of a moment isn’t an advantage or a luxury to be enjoyed when threats are everywhere and many activities have a very rapid risk/reward return.

Survival dictates that you only stop and smell the roses when they’re novel and you’re figuring out if you can eat them or if somehow they’ll eat you. From then on they’re irrelevant.

We now live in a world where threats are rare, and our survival (or rather prosperity) usually relies upon long term activities, often with a long payback period. Where increasingly careers depend upon intense, sustained focus and attention to very small details, but we often live incredibly repetitive lives, with the same sensory inputs day after day, performing close to identical tasks. Driving the same roads. Drinking the same coffee. Making the same small talk. Having the same arguments. Doing many of the same tasks in slightly different ways.

A copy of a copy of a copy.

The software development field isn’t immune to repetition, and many if not most developers have spent decades doing minor variations on a theme. Most social news sites see the bulk of their traffic during the audience’s work days as we look for something to try to add some novelty to our days, rapidly jumping into and out of links on social news, rationalizing that it’s somehow making us better at our jobs.

[note: try creating a list of “how social news made me better today” at the end of each day. Most days it will be entirely empty, because the truth is that most content is extremely low value, and the minority of enriching content is often quickly scanned as we look to see the things to find wrong – particularly on programming related venues – or just to confirm our longstanding assumptions]

We pass time making grand plans for the future, and anxiously reflecting on the many plans that drifted into the past uncompleted.

Mindfulness is spending some time in the present. It makes life more enjoyable. It makes it more interesting, and makes us better witnesses of our environment, where we are surrounded by incredible beauty (cue the oft parodied plastic bag scene from American Beauty).

Being mindful adds enjoyment to life, and gives us more control over our monkey brain. Controlled mindfulness is the ability to appreciate the routine: That coffee that was just like every other coffee you’ve drank for the past three years is arguably better than one that the greatest king in the middle ages could have enjoyed. As is almost every food you enjoy.

We are surrounded by incredibly luxuries, and enormous beauty, that the process of acclimation makes us jaded and dulled about.

On Demand Mindful Programming

Software development is a mental exercise. We have problems that we focus our neural powers upon and generate results.

Optimal software development is a balance between planning for the future, incorporating lessons from the past, and implementing in the now. The now is the part where we work in the IDE, fingers to the keyboard, grinding out beautiful code and designs. Or it’s the part where we work on a whiteboard and plan for the future. Or it’s the part where we analyze the defects of the past and plan the changes the improve the situation.

Whatever embarrassment, pride, shame, or arrogance we have about the past, and whatever worries about scaling or security or re-usability we might have for the future, the now is when we make a difference.

So what does mindfulness have to do with programming?

Mindfulness is the key to optimal effectiveness in this field.

The zone — the magical mental state where one becomes intensely focused on their effort — is applied mindfulness. The ability to actively control or at least maximize the likelihood of a mental flow state is very beneficial.

I’m not going to describe how to meditate on here – it is a very well worn subject – but it is worth an investment of some time. Meditation is the longer term approach, but given all that I’ve written above what could we draw as lesson for a non-practitioner today? How could you better achieve mindfulness today?

I have a couple of suggestions to give a try. The hope is to force your monkey brain to focus on the now, which can be used to bridge to a flow state.

Mix up your environment and routine.

Change the background color of your IDE. Change the font color. Change the font (a well known trick in the proof-reading world). Sit at a different desk. Sit in a different building (e.g. a library or a cafe, or the cafeteria, tailoring to your ability to ignore distractions). Sit on a bouncy ball. Turn your monitor to portrait mode. Switch to a laptop. Put your mouse on the left side. Skip breakfast, or skip lunch. Use a standing desk one day and a sitting desk the next. Start at 6am one day and noon the next (obviously contingent on ability to do so). Take different routes to work, or stop at different shops. Take different doors. Experience different scents. Talk to different people. Work in the dark. Work outside.

Change things up daily.

This notion of changing one’s environment goes completely contrary to most advice in this industry, where the dominant advice is one where you find the conceptually perfect setup with the perfect colors and fonts and seating and situation, and then follow a routine regiment. The foundation of that approach is that you focus on only the code, all of the other distractions removed by making them blend into the background.

It’s wrong, or at least wrong for most people. The net result of this consistency is that life becomes a grey and dull monotony with a detachment from the now. We skitter around looking for something to try to draw us in (we look for a funny cat picture to jar us into conscious existence), and lose the ability to focus or solve difficult problems.

This notion of focusing better after mixing things up has been demonstrated countless times, but people usually draw the wrong conclusion.

Switching to a standing desk likely didn’t make you more productive. Change itself made you more productive. The same can be said of the countless blog posts on going out to lunch, staying in for lunch, sitting on a yoga ball, using a new chair, working in cafes, getting up extra early, working in notepad (in digital or paper form) or a different IDE, etc. People often euphorically report on the increased productivity seen by a change, never attributing it change itself. There are seldom follow-up reports on these silver bullets as the new and novel turns into the old and mundane.

People experience change and report magnificent results. We often become more capable when a new member joins our team and there’s that period of the new, and just about everyone reports how fantastic it is when they first start a new job, or move to that cool new platform or language. That renewed sense of focus and purpose and achievement.

There is a well known effect that’s oft referenced called the Hawthorne effect, which in a nutshell is the belief that being monitored makes people (temporarily) more productive (demonstrating the observer effect). In that original experiment the observers were trying to measure if lighting changes would improve productivity, so they changed the lighting and productivity improved, slowly fading back to normal. They changed it again and productivity improved, returning to normal. Then they changed it to the original lighting and productivity improved yet again, seemingly undermining the entire study. There have been many similar experiments, including some where the subjects weren’t aware that they were being studied, and again a change of conditions yielded improved productivity.

The reason seems obvious, doesn’t it? While we all strive for consistency under the notion of removing distractions, for most of us consistency is the greatest distraction of all. Monotony makes the effort of mentally focusing on the now a herculean task (which can be embraced as essentially monotonous meditation, of a sort, where the intent is to deeply force a focus on the mundane, such as the case of focused breathing, or even things like a Japanese tea ceremony — this isn’t really usable when it comes to the environment around your work, however), in absolute conflict with the intended effect.

Switch your world up. See how it impacts your focus on the code. Look into meditation.

[the pictures in this post were taken minutes before hitting publish on this post — my dog Piper who always hangs in here when I’m working in the home office, and a few out the back as spring makes its presence known]

1 – I’ve started meditating while exercising on the elliptical. I love the elliptical because it’s smooth and presents very little abuse on the body, and allows me to be mentally free. I have found it very useful to practice meditation while doing this, killing two birds with one stone as it is. And yes, you can meditate doing almost anything that doesn’t demand your attention, and the rhythm and blood flow of the elliptical makes the meditation extra immersive.

2 – As always, there’s a caveat about terminology: Get in a discussion with a serious Zen aficionado and mindfulness will often demand a more restrictive set of conditions, and some demand that it mean being essentially free of thought. In this piece I am referring to mindfulness as truly being wholly and completely present, which is something that we seldom truly experience.

Rust Faster Than C? Not So Fast…

On Hacker News a few days ago –

That submission links to the Computer Language Benchmark Game leaderboard for k-nucleotide, where the top performer is a Rust entrant at the time of this post.

Which is impressive. Rust is a relatively young language (although it heavily leverages LLVM on the back-end, standing on the shoulder of a giant), and is making waves while bringing a lot of needed concurrency and memory safety benefits.

For some classes of problems, Rust is proving itself a compelling option even in performance critical code.

This is a very simple test of rudimentary functionality, however, so if C is lagging comparatively, some part of its implementation is sandbagging or focusing on another optimization.

Fired up VTune And Profiling the Laggards

A quick profile and the overwhelming outlier were identified as the set of instructions geared to bit mangling to test and set occupied and deleted flags: In an effort to optimize memory usage, khash stores flags — one for empty or used, and another to indicate deletion — for each hash bucket in a bit array, 16 buckets sharing a single 32-bit integer courtesy of bit packing.

While memory efficient, the process of determining the host integer for a bucket, and then the bit offsets for each constituent, adds significant overhead to the process: It’s a tiny cost by itself, but when iterated hundreds of millions of times becomes substantial. This was made worse by the code style (shoehorning generic-type functionality in a C header file) making it difficult for the compiler to optimize to bit test instructions.

This sort of optimization is an interesting one because the overhead of bit mangling could theoretically be balanced out by a better cache hit rate — more data in less memory — though this is unlikely to hold for an open addressing, double-hash implementation, as they’re generally cache unfriendly.

Curious to the impact, I switched the flag type to an unsigned char and eliminated the bit shifting (there is still fixed bit testing for the first and second bits). The trivial modification to khash.h can be found here (edit: I updated that version slightly to add another small optimization and to improve the readability of the flags).

C Takes The Lead Again

This yielded identical calculations/results, and the same max memory usage (130,592 per time -v), but completed some 32% faster than the original khash implementation, the hashing being significantly faster with the lagging points moving to input parsing and other overhead of the test. My test device is obviously not the same as the one they’re running (though the results are very similar), but assuming the same speedup it would push the C implementation to 4.32 seconds —  a sizable lead — courtesy of a trivial, and arguably optimal change. Memory overhead is slightly worse, with 6 wasted bits per bucket, but the change dramatically reduces the overhead of bucket operations.

GCC Yielded Faster Code Than Clang

As an aside, several of the comments on that discussion questioned why clang wasn’t used, the hypothesis being that it’d yield better performance courtesy of more optimization. In my quick tests, including generating profile outputs and using profile-guided optimizations, GCC generated better optimized code, always winning the performance battle for this test. This was between GCC 6.2.0 and clang 3.9.1. And as an interesting aside, the GCC and clang built binaries were significantly faster running in a VirtualBox Linux VM1 than the binary built with either Visual Studio 2015 or 2017, running in the host environment, even when factoring out the slow stdin handling of Windows.

The C/C++ compiler in Visual Studio has gotten very good (and surprises me with its standards compliance, once its biggest weakness), but falls behind in this test.

Conclusion

There is no conclusion: Rust is great, C is great, and profilers are an awesome tool to lean on. It was a fun exercise in leveraging a performance profiler (in this case VTune). It took seconds to identify the “culprit”, which was a memory optimization that comes at the cost of a significant number of CPU cycles. Shifting the focus to performance, and away from optimizing the memory profile, and performance improves significantly.

These game/toy benchmarks don’t mean a tremendous amount, and no nights should go without sleep over them, but they are a fun exercise and an entertaining distraction.

1 – Totally irrelevant for this test (this is not a candidate for vectorization), but it’s notable that you can enable AVX2 availability in VirtualBox VMs via-

VBoxManage setextradata "$vm_name" VBoxInternal/CPUM/IsaExts/AVX2 1

CMOS and Rolling Shutter Artifacts / Double Precision Compute

EDIT (2017-03-01) – It’s been a bit quiet on here, but that will change soon as some hopefully compelling posts are finished. I’ve taken a couple of weeks to get into a daily fitness and relaxation routine (I would call it a meditation if that term wasn’t loaded with so much baggage), organize life better, etc. Then it’s back to 100% again with these new habits and behaviors.

While I finish up a more interesting CS-style post (SIMD in financial calculations across a variety of programming languages), just a couple of interesting news items I thought worth sharing.

In a prior entry on optical versus electronic image stabilization I noted rolling shutter artifacts –an image distortion where moving subjects or the entire frame during motion can be skewed and distorted — and their negative impact on electronic stabilization.

During video capture, especially under less than ideal conditions, it is a significant cause of distortion that often goes unnoticed until you stabilize the frames.

Sony announced a CMOS sensor with a data buffering layer that allows it to have something approximating a global shutter (Canon previously announced something similar). While their press release focuses on moving subjects in stabilized type situations, the same benefit dramatically reduces the rolling shutter skew during motion of video capture. It also offers some high speed capture options which is enticing.

Sony sensors are used by almost every mobile device now, so it’ll likely see very rapid adoption across many vendors.

EDIT: Sony is already showing off a device with a memory-layer equipped CMOS, so it’s going to become prevalent quickly.

Another bit of compelling news for the week is the upcoming release of the GP100, Pascal-based workstation GPU/compute device by nvidia (they previously released the P100 server based devices).

Double-precision calculations of upwards of 5 TeraFLOPS (for comparison a 72-core/AVX-512 Knights Landing Xeon Phi 7290 offers about 3.4 TeraFLOPS of DP performance, while any traditional processor will be some two magnitudes lower even when leveraging full SIMD such as AVX2). Traditionally these workstation cards massively compromised double precision calculations, so this update brings it into much greater utility for a wide array of uses (notably the financial and scientific world where the significant digits of single precision made it unworkable).