Email Addresses Need A Checksum

I get other people’s email.

I grabbed a Google account very early, in the invitation days, and got the first.last@gmail.com gold standard (which by Google rules means I also have firstlast@gmail.com, fir.st.l.a.s.t@gmail.com, etc. These derivatives can be a powerful, but often just confuse people).

Since then I’ve gotten thousands of emails intended for other people. From grocery stores. Art dealers. Hairdressers. Car rental agencies. Hoteliers. Flight itineraries. School newsletters and personal appeals. Square receipts. Alumni groups.

Where possible, when email is sent by a real human being and not a black-hole noreply source, I try to alert people to update their addresses, though it’s surprising how often the issue repeats anyways.

All of these were presumably intended for people sharing variations of my name (e.g. Denis), or with the same name but who had to resort to some sort of derivative such as firstMlast@gmail.com.

Many of the errant emails have privileged or time sensitive information, and a lot of them are actionable.

Square receipts allowing me to rate the retailer and leave feedback, alongside some CC details. Hotel reservations that allow me to cancel or change the reservation with absolutely no checks or controls beyond that the email is in hand. Rewards cards through which I can redeem or transfer points.

Some have highly personal, presumably confidential information.

emotions-371238_640

In many if not most of these cases the email address was likely transmitted verbally1. To the retailer, grocery store clerk, or over a reservation phone line to a travel agent or hotel representative. Alternately it might have been entered on some second screen device (my iCloud account receives the email for more than one stranger’s Facebook accounts).

For a vanity domain it usually means it goes to some ignored catch-all, but on a densely populated host like gmail it yields deliveries of possibly sensitive data to the wrong people, as almost every variation is occupied.

Email addresses should have a checksum. A simple mechanism through which human beings can confirm that information was conveyed properly. Even the most trivial of checksums would provide value, eliminating the vast majority of simple mistakes.

For instance to calculate a CRC32 of a variety of email address derivatives, displaying the base32 (32 digits, or 5 bits each digit, whereas the 32 in CRC32 refers to bits) of the bottom 5 bits of the most and then least significant bytes (totally arbitrary, but sound. This is extremely trivial in a world where launch vehicles are landing on floating barges) would yield-

first.last@gmail.com EW
first.lst@gmail.com 6U
firstMlast@gmail.com XM
frst.last@gmail.com ZS

“My email address is f  i  r  s  t   period   l a s t @ g m a i l . c o m”

“Okay, got it. 6U?”

“Nope, I must have misspoken. Let me restate that – … ”

“Okay, got it. EW?”

“Perfect!”

(and of course every user would quickly know and remember their checksum. This wouldn’t be something the user is calculating on demand)

When I’m forced to use my atrophied hand-writing to chicken scratch an email address on a form, a simple two digit checksum should yield a “go / no go” processing of the email address: If it isn’t a valid combination (whether because the email address or the checksum aren’t being interpreted correctly), contact me to verify, and certainly don’t start sending sensitive information.

Two digits of base32 yields 10-bits of entropy, or 1024 variations. Obviously this is useless against intentional collisions, but against accidental data “corruption” it would catch errors 99.9%+ of the time.

Technical Aside: Email addresses theoretically can contained mixed case, but in practice the vast majority of the email infrastructure is case-insensitive.

The Pragmatic Footer

Gmail and the other vendors aren’t going to start displaying email address checksums. Forms and retailers and Square aren’t going to start changing their apps and forms to capture or display email data entry checksums.

As with prior “improve the system” exercises, it’s more a theoretical while discussing the concepts that we regularly encounter. While it doesn’t work for telephone exchanges, more data transfer should be happening via NFC or temporary QR codes than being verbally relayed.

It’s a fun thought exercise to go back and think of how the system could have been improved from the outset, given the reality that information transfer is often human and thus imperfect. For instance all email address have a standardized checksum suffix – first.last+EW@gmail.com. Or whatever.

If you develop a system where humans verbally or imperfectly transmit information, and it’s important that it is stated and understood correctly, consider a checksum.

 

1 – I had a speech impediment as a young child, courtesy of a Jamie Oliver-esque mega tongue that was trying to escape the confines of my mouth. This made me more aware of the general sloppiness of verbal data transmissions as a problem, later noticing that it’s a fairly universal issue.