Last Post Before the Switchover

A few final updates before I switch over the content of this site (a topic that I’ll detail later in this piece).

Life has kept me very busy so it has been delayed a bit.

The Office

Recently I’ve been spending some time looking for office space in the limited time between project work. Not only do some engagements and situations benefit from it, I’m seeking more frequent changes of scenery and dynamics, and an ability to handle growth.

Office space is a fascinating, deeply subjective area where the things that should be prioritized vary dramatically for the person. In this case I’m able to decide based upon my own priorities, while considering future peers.

Commute and accessibility. Dynamics of the area. View. Neighbours. Coffee shops. Restaurants. A bakery and maybe some grocery options to grab a couple of things before heading home.

Commute matters a lot to me. I once spent a few years commuting over an hour each way and it was a trying experience. This varies dramatically by the person, with some enjoying that part of their day. I do enjoy the peace and quiet of a drive or a trip on the GO, listening to select music or podcasts, etc. It can be a nice daily time out.

An hour each way is just way too much, however, leaving too little of the day remaining.

imageaday 2018 08 17 by Dennis Forbes on 500px.com

Dynamics…for a few years I worked in the core of Toronto, in the heart of Bay street. I spent lunches eating street food (miserable choices in Toronto) at Nathan Phillips Square, trying out restaurants, walking down to the waterfront, etc. I enjoyed it, but it quickly grew less novel. I learned that there’s a bandwidth beyond which additional options have a declining value. The same working in NYC. I love both cities, and if my life had me living there of course I’d love to work down there, but when it necessitates a grueling commute the payoff isn’t as worthwhile.

Given a giant plate of options and we winnow down to a tiny set of food options. A tiny set of coffee options. A tiny set of relaxation options.

For that reason I’m fine being in an exurb town. It has future growth potential problems given that it would require commuting from the broader area (though I am a big believer in telecommuting, with the office being an occasional converging point or option for others — the exception and not the norm), but that just isn’t as limiting as it once was.

But I want to be in a dynamic area, and am limiting my choices to those options. Where there are a number of lunch and coffee options. There are areas to run, or bike, or just sit at a park, and there are libraries and other public gathering places within walking distance (whether for hosting or participating in a workshop, volunteering, etc). Where there are layers of access options including buses.

I’m also limiting to an area with a view of something. Not a gray highway. Not an industrial wasteland. Some sort of interesting view.

There are a good number of options across the area, in the various nearby cores. It has made the choices more difficult than I imagined.

On How This Site Will Be Rebuilt

While the revised blog engine is still built in Go (with autocert, utilizing HTTP push, and a panoply of “cool shiny things”), I am going to abandon the blog format and its trappings.

No RSS. No feigned promise of regular updates.

No easy thought or opinion pieces that I too often resort to.

Instead just technical, “long form” pieces, each of which I’ve spent considerable time on.

I already have a number in various states of completion, each of which it felt like a waste intermixing with various facile passing pieces, or in the blog format where there is an implied time-sensitivity to the pieces.

Image A Day - 2018-08-14 by Dennis Forbes on 500px.com

I have a piece on height mapping a property precisely with smartphones. Another related one on 3D representation of OpenStreetMap data with public elevation data. Another on a greenfield deep learning implementation. Another is just an interpretive, and arguably artistic, expression on technology.

And I’m just going to wash away every prior entry. It is a new slate. An about, a page of opinions, and then an index of long-form technical pieces that I’m proud of.

That transition is coming soon.

I absolute adore the people who I have as readers, and hope you enjoy some of the new pieces.

-Dennis

The Mysterious Redirected Web Request

I was sitting at the kitchen table working on a project a while back when a commercial came on advertising “quick-cook weekday eggs“.

It’s an ad campaign from the egg farmers of Canada (this country uses supply management for the primary staple type items — cheese, chicken, dairy, etc — which means each has fairly robust advocacy groups and has a healthy state without enormous agricultural subsidies like you see in the US), presumably to remind people that eggs are a speedy cook even when time is limited.

I was curious if they really sold egg cartons with this weekday branding on it, so I pulled up Google and typed in “weekday eggs”. It suggested the autocomplete of “weekday eggs real” and to satisfy my curiosity if people really wondered, I chose that.

The top link was to “Introducing the new Weekday Eggs – Cossette“, a non-TLS http result on the responsible ad agency’s website. I clicked it.

I was greeted with a “YOUR FIREFOX BROWSER IS EXPLOITED” etc page. The classic scam page with blinking text, bold colours, and alerts on navigation exhorting you to pay for a solution.

Weird.

I kill the tab and go through the process again but this time I get the ad agency site. In many recreations of this process I’ve never gotten the scam page again.

Paranoia rises. What was the source of this misdirection? Was the call coming from inside the house?

I immediately began an audit every piece of software on the laptop (Firefox itself being the latest, with a very minimal set of add-ons including uBlock Origin), then evaluating anything that could possibly be intercepting HTTP traffic. I am very cautious with the software that comes into my life so it’s a fairly achievable audit.

The laptop is a Lenovo Yoga 720, from a company notorious for their Superfish debacle. Could some of the laptop software be responsible for periodically intercepting legitimate connections? A thorough analysis, including with targeted debug sessions to see the entire call stack from Firefox through the operating system, seemed to exclude this possibility.

Then I had to look at my Asus Router (VPNFilter is making the rounds and while my router is behind a router is behind a router, there is always a possibility), and then to the cable company provided router. Did either of those interfere with normal traffic? I set up web tests to load a variety of HTTP resources around the net, verifying it for adulteration, logging every redirection.

Nothing.

I have no answers. This mystery pervades.

And in the end it could be malicious software on the other side. The site seems to be hosted on a bank of IPs that feature a number of other basic static sites, and it could be a shady revenue scheme to redirect a low enough percentage of requests that it could always be attributed to other things and waved away. At this point that seems the most likely scenario. Either that or my internet provider, or someone in between, is interfering with traffic.

So this post has no payoff in the end. I just had to document a mystery that still bothers me, returning to my mind-space more often than it should. It reminds me why TLS everywhere is so critical.

Embrace AMP or AMP Wins

I’ve written about AMP (Amplified Mobile Pages) on here a few times.  To recap, it’s a standard of sort and coupled set of code.

If you publish a page through AMP, you are limited to a very narrow set of HTML traits and behaviors, and a limited, curated set of JavaScript to provide basic functionality, ad networks, video hosting, metrics, with scripts hosted by the Google owned and operated cdn.ampproject.org. You also implicitly allow intermediaries to cache your content.

If you search Google using a mobile device, links with a little ⚡ icon are AMP links that will be loaded from the Google cache, and by rule (which is verified and enforced) live within the limitations of AMP. You can’t claim AMP conformance and then resort to traditional misbehavior.

The news carousel is populated via AMP links.

Many publishers have gotten on the AMP bandwagon. Even niche blogs have exposed AMP content via a simple plug-in.

AMP is winning.

But it has significant deficiencies, for which it has earned a large number of detractors. There are technical, privacy and web centralization issues that remain critical faults in the initiative.

Anti-AMP advocacy has reached a fevered pitch. And that negative advocacy is accomplishing exactly nothing. It is founded in a denial that is providing a clear road for AMP to achieve world domination.

Because in the end it is a better user experience. Being on a mobile device and seeing the icon ⚡ is an immediate assurance that not only will the page load almost instantly, it won’t have delay-load modal overlays (Subscribe! Like us on Facebook!), it won’t throw you into redirect hell, it won’t have device-choking scripts doing spurious things.

Publishers might be embracing a Pyrrhic victory that undoes them in the end, but right now AMP makes the web a more trustworthy, accessible things for users. It is a better experience, and helps avoid the web becoming a tragedy of the commons, where short-sighted publishers desperate for a short-term metric create such a miserable experience that users stay within gated communities like Facebook or Apple News.

We could do better, but right now everyone has exactly the wrong approach in confronting AMP.

“We don’t need AMP: We have the powerful open web, and publishers can make their pages as quick loading and user-friendly as AMP…”

This is a losing, boorish argument that recurs in every anti-AMP piece. It is akin to saying that the EPA isn’t necessary because industry just needs to be clean instead. But they won’t. AMP isn’t an assurance for the publisher, it’s an assurance to the user.

AMP isn’t a weak, feel-good certification. To publish via AMP you allow caching because that cache host validates and forcefully guarantees to users that your content lives within the confines of AMP. You can’t bait and switch. You can’t agree to the standard and then do just this one small thing. That is the power of AMP. Simply saying “can’t we all just do it voluntarily” misses the point that there are many bad actors who want to ruin the web for all of us.

But the argument that as a subset it therefore isn’t needed — missing the point entirely — is self-defeating because that argument has short circuited any ability to talk about the need that AMP addresses, and how to make a more palatable, truly open and truly beneficial solution.

We need an HMTL Lite. Or HTMLite to be more slogan-y.

The web is remarkably powerful. Too powerful.

We have been hoisted by our own petard, as the saying goes.

With each powerful new innovation in web technologies we enable those bad actors among us who degrade the experience for millions. For classic textual content of the sort that we all consume in volumes, it is destructive to its own long term health. Many parties (including large players like Apple and Facebook) have introduced alternatives that circumvent the web almost entirely.

A subset of HTML and scripting. A request header passed by the browser that demands HTMLite content, with the browser and caching agents enforcing those limits on publishers (rejecting the content wholesale if it breaks the rules).

We need to embrace the theory of AMP while rejecting the centralized control and monitor that it entails.

This isn’t simply NoScript or other hacked solutions, but needs to be a holistic reconsiderations of the basics of what we’re trying to achieve. Our web stack has become enormously powerful, from GL to SVG to audio and video and conferencing and locations and notifications, and that is just gross overkill for what we primarily leverage it for. We need to fork this vision before it becomes a graffiti-coated ghetto that only the brave tread, the userbase corralled off into glittery alternatives.

AMP Isn’t All Bad

AMP (Accelerated Mobile Pages) is generally reviled in the tech community. Highly critical pieces have topped Hacker News a number of times over the past couple of months. That Register piece ends with the declaration “If we reject AMP, AMP dies.“, which you can ironically read in AMP form.

The complaint is that AMP undermines or kills the web. A lesser complaint that it has poor usability (though not all criticism has held up).

Web Developers Can’t Stop Themselves

Facebook has Instant Articles. Apple has News Format. Google has AMP.

Everyone can leverage AMP, whether producer or consumer. Bing already makes use of AMP in some capacity, as can any other indexer or caching tier. AMP, if available, is publicly announced on the source page (via a link rel=”amphtml” tag) and available to all, versus the other formats that are fed directly into a silo. A quick survey of the Hacker News front page found almost half of the entries had AMP available variants, made possible given that exposing AMP is often nothing more than a simple plug-in on your content management system (and would be a trivial programming task even on a greenfield project).

The impetus for these varied formats is the harsh reality that the web has been abused, and is flexible to the point of immolation. This is especially prevalent on mobile where users are less likely to have content blockers or the ability to easily identify and penalize abusive behaviors.

Auto-play videos, redirects (back capture), abusive ads, malicious JavaScript even on reputable sites, model dialogs (subscribe! follow us on Facebook!), content reflowing that happens dozens of times for seconds on end (often due to simple excessive complexity, but other times an intentional effort to solicit accidental ad clicks as content moves). Every site asking to send desktop notifications or access your location. Gigantic video backgrounds filling the above the fold header for no particular reason.

In an ideal world web properties would refrain from such tragedy of the commons behaviors, worried about offending users and on their best behavior. The prevalent usage doesn’t motivate that, however: many simply see whatever tops Hacker News or Reddit or trending on Facebook and jump in and out of content sources, each site having incredibly little stickiness. The individual benefit of good behavior for any particular site declines.

Bad behavior worsens. Users become even less a check on practices. The good emergent sites suffer, everyone sticking to a tiny selection of sites that they visit daily. It parallels the Windows software download market where once we freely adopted whatever was new and interesting, but after pages of toolbars and daemons and malware many just install the basics and take no risks, new entrants finding no market for their efforts.

AMP (and the other options) is the natural outcome of the wild web. It represents padded walls that constrains bad behavior, giving the content priority. It isn’t appropriate for rich web apps, or even marginally interactive pieces like my bit on floating point numbers, but for the vast majority of media it is a suitable compromise, representing an excellent compromise of the power of HTML with the constraint to yield a speedily rendering, low resource utilization solution. Most AMP pages rendering extraordinarily quickly, with absolutely minimal CPU and network usage. Yes, sites could just optimize their content without being forced to, but somehow we’ve been moving in exactly the opposite direction for years. A simple cooperative effort will never be fruitful.

Google thus far has stated that they do not prioritize AMP content in search results, and given how fervently the SEO industry watches their rankings this isn’t as cloaked as one might imagine. They do, however, have a news carousel for trending topics (e.g. “news”), and most if not all of those entries in the carousel are AMP pages on mobile.

The news carousel has merited significant criticism. For instance a given carousel has a selection of items on a trending topic (e.g. Trump), and swiping within one of the articles brings you to the next in the carousel. As a user, this is fantastic. As a publisher, it’s an attack on non-consumption, easily giving users a trustworthy, speed mechanism of consuming their content (and seeing their ads and their branding, etc).

Other criticism is more subtle. For instance all AMP pages load a script at https://cdn.ampproject.org/v0.js, which of course is served up by a surrogate of Google. This script has private caching and is mandated by the spec, and is utilized for metrics/tracking purposes. Ideally this script would be long cached, but currently it is held for just 50 minutes. If a browser were leveraging AMP it could theoretically keep a local copy for all AMP content, disregarding the caching rules.

And most criticisms are just entirely baseless. Claims that it renders content homogeneous and brand-less, for instance, yet each site can drop a header with a link to their site, as always, just as they always could. For instance The Register does in the initially linked piece, with a logo and link to the homepage. And then there’s simple user confusion, like the blogger who claimed that Google was “stealing” his traffic after he enabled AMP and discovered that yes, AMP implies allowing caching.

Be Charitable

The root of most toxicity on online conversation boards is a lack of charity: Assuming that everyone who disagrees or does something different is an idiot, malicious, has ill intent or is a part of a conspiracy, etc. I could broaden that out and say that the root of most toxicity throughout humanity comes from the same source. If people realized that others just made mistakes when they made a dumb move on the roadway — the foibles of humanity — instead of taking it as a personal affront that must be righted, we’d all have less stressful lives.

This applies to what businesses do as well. We can watch moves like AMP and see only the negatives, and only malicious, ad-serving intentions, or we can see possible positive motives that could potentially benefit the web. Google has developed AMP openly and clearly, and has been responsive to criticism, and the current result is something that many users, I suspect, strongly benefit from.

I’d take this even further and say that the model should be carried to a “HTML Lite” that rolls back the enormous flexibility of HTML5 to a content-serving subset, much like AMP but geared for the desktop or other rich clients. If we could browse in HTML Lite on the majority of sites, enabling richness only for those few that make a credible case, it would be fantastic for the web at large.

Micro-benchmarks as the Canary in the Coal Mine

I frequent a number of programming social news style sites as a morning ritual: You don’t have to chase every trend, but being aware of happenings in the industry, learning from other people’s discoveries and adventures, is a useful exercise.

A recurring source of content are micro-benchmarks of some easily understood sliver of our problem space, the canonical example being trivial web implementations in one’s platform of choice.

A Hello World for HTTP.

package main

import (
   "fmt"
   "net/http"
)

func handler(w http.ResponseWriter, r *http.Request) {
   fmt.Fprintf(w, "Hello world!")
}

func main() {
   http.HandleFunc("/", handler)
   http.ListenAndServe(":8080", nil)
}

Incontestable proof of the universal superiority of whatever language is being pushed. Massive numbers of meaningless requests served by a single virtual server.

As an aside that I should probably add as a footnote, I still strongly recommend that static and cached content be served from a dedicated platform like nginx (use lightweight unix sockets to the back end if on the same machine), itself very likely layered by a CDN. This sort of trivial type stuff should never be in your own code, nor should it be a primary focus of optimizations.

Occasionally the discussion will move to a slightly higher level and there’ll be impassioned debates about HTTP routers (differentiating URLs, pulling parameters, etc, then calling the relevant service logic), everyone optimizing the edges. There are thousands of HTTP routers on virtually every platform, most differentiated by tiny performance differences.

People once cut their teeth by making their own compiler or OS, but now everyone seems to start by making an HTTP router. Focus moves elsewhere.

In a recent discussion where a micro-benchmark was being discussed (used to promote a pre-alpha platform), a user said in regards to Go (one of the lesser alternatives compared against)-

it’s just that the std lib is coded with total disregard for performance concerns, the http server is slow, regex implementation is a joke”

total disregard. A jokeSlow.

On a decently capable server, that critiqued Go implementation, if you’re testing it in isolation and don’t care about doing anything actually useful, could serve more requests than seen by the vast majority of sites on these fair tubes of ours. With a magnitude or two to spare.

100s of thousands of requests per second is simply enormous. It wasn’t that long ago that we were amazed at 100 requests per second for completely static content cached in memory. Just a few short years ago most frameworks tapped out at barely double digit requests per second (twas the era of synchronous IO and blocking a threads for every request).

As a fun fact, a recent implementation I spearheaded attained four million fully robust web service financial transactions per second. This was on a seriously high-end server, and used a wide range of optimizations such as a zero-copy network interface and secure memory sharing between service layers, and ultimately was just grossly overbuilt unless conquering new worlds, but it helped a sales pitch.

Things improve. Standards and expectations improve. That really was a poor state of affairs, and not only were users given a slow, poor experience, it often required farms of servers for even modest traffic needs.

Choosing a high performance foundation is good. The common notion that you can just fix the poor performance parts after the fact seldom holds true.

Nonetheless, the whole venture made me curious what sort of correlation trivial micro-benchmarks hold to actual real-world needs. Clearly printing a string to a TCP connection is an absolutely minuscule part of any real-world solution, and once you’ve layered in authentication and authorization and models and abstractions and back-end microservices and ORMs and databases, it becomes a rounding error.

But does it indicate choices behind the scenes, or a fanatical pursuit of performance, that pays off elsewhere?

It’s tough to gauge because there is no universal web platform benchmark. There is no TPC for web applications.

The best we have, really, are the TechEmpower benchmarks. These are a set of relatively simple benchmarks that vary from absurdly trivial to mostly trivial-

  • Return a simple string (plaintext)
  • Serialize an object (containing a single string) into a JSON string and return it (json)
  • Query a value from a database, and serialize it (an id and a string) into a JSON string and return it (single query)
  • Query multiple values from a database and serialize them (multiple queries)
  • Query values from a database, add an additional value, and serialize them (fortunes)
  • Load rows into objects, update the objects, save the changes back to the database, serialize to json (data updates)

It is hardly a real world implementation of the stacks of dependencies and efficiency barriers in an application, but some of the tests are worlds better than the trivial micro-benchmarks that dot the land. It also gives developers a visible performance reward, just as Sunspider led to enormous Javascript performance improvements.

So here’s the performance profile of a variety of frameworks/platforms against the postgres db on their physical test platform, each clustered in a sequence of plaintext (blue), JSON (red), Fortune (yellow), Single Query (green), and Multiple Query (brown) results. The vertical axis has been capped at 1,000,000 requests per second to preserve detail, and only frameworks having results for all of the categories are included.

When I originally decided that I’d author this piece, my intention was to actually show that you shouldn’t trust micro-benchmarks because they seldom have a correlation with more significant tasks that you’ll face in real life. While I’ve long argued that such optimizations often indicate a team that cares about performance holistically, in the web world it has often been the case that products that shine at very specific things are often very weak in more realistic use.

But in this case my core assumption was only partly right. The correlation between the trivial micro-benchmark speed — simply returning a string — and the more significant tasks that I was sure would be drown out by underlying processing (when you’re doing queries at a rate of 1000 per second, an overhead of 0.000001s is hardly relevant), is much higher than I expected.

  • 0.75 – Correlation between JSON and plaintext performance
  • 0.58 – Correlation between Fortune and plaintext performance
  • 0.646 – Correlation between Single query and plaintext performance
  • 0.21371 – Correlation between Multiple query and plaintext performance

As more happens in the background, outside of the control of the framework, invariably the raw performance advantage is lost, but my core assumption was that there would be a much smaller correlation.

So in the end this is simply a “well, that’s interesting” post. It certainly isn’t a recommendation for any framework or the other — developer aptitude and suitability for task reign supreme — but I found it interesting.