Eat Your Brotli / Revisiting Why You Should Use Nginx In Your Solutions

Google recently deployed brotli lossless transport compression in the Canary and Dev channels of Chrome. This is the compression algorithm that they introduced late last year, hyping up compared to competitors.

If your Chrome variant is equipped, you can enable it via (in the address bar)-

chrome://flags/#enable-brotli

It is limited to HTTPS-only currently, presumably to avoid causing issues with poorly built proxies.

Brotli is already included in the stable releases of Chrome and Firefox, albeit only to support the new, more compressible WOFF 2.0 web font standard. The dev channel updates just extend the use a bit, allowing the browser to declare a new Accepts-Encoding option, br (it was originally “bro”, but this was changed for obvious reasons), and has authored support for servers to serve up brotli compressed data in the form of an nginx module (itself a very lightweight wrapper around the brotli library. Nginx really is a study in elegant design).

One of the great things about these on-demand extensible web standards is that they enable incremental progress without disruption — you aren’t cutting anyone out by supporting them (browsers that don’t support this can remain oblivious, with no ill effect), but you can enhance the experience for users on capable devices. This is true for both HTTP/2 and brotli.

Overhyped Incremental Improvements

Most of the articles about the new compression option are over the top-

Google’s new algorithm will make Chrome run much faster” exclaims The Verge. “Google Chrome Is Getting a Big Speed Boost” declares Time.

Brotli will not reduce the size of the images. It will not reduce the size of the auto-play video. It can reduce the size of the various text-type resources (HTML, JavaScript, CSS), however the improvement over the already widely entrenched deflate/gzip is maybe 20-30%. Unless your connection is incredibly slow, in most cases the difference will likely be imperceptible. It will help with data caps, but once again it’s unlikely that the text-based content is really what’s ballooning usage, and instead it’s the meaty videos and images and animated GIFs that eat up the bulk of your transfer allocation.

Other articles have opined that it’ll save precious mobile device space, but again brotli is for transport compression. Every single browser that I’m aware of caches files locally in file-native form (e.g. a PNG at rest stays compressed with deflate because that’s a format-specific internal static compression, just as most PDFs are internally compressed, but that HTML page or JavaScript file transport compressed with brotli or gzip or deflate is decompressed on the client and cached decompressed).

In the real world, it’s unlikely to make much difference at all to most users on reasonably fast connections, beyond those edge type tests where you make an unrealistically tiny sample fit in a single packet. But it is a small incremental improvement, and why not.

One “why not” not might be if compression time is too onerous, and many results have found that the compression stage is much slower than existing options. I’ll touch on working around that later regarding nginx.

But still it’s kind of neat. New compression formats don’t come along that often, so brotli deserves a look.

Reptitions == Compressibility

Brotli starts with LZ77, which is a “find and reference repetitions” algorithm seen in every other mainstream compression algorithm.

LZ77 implementations work by looking some window (usually 32KB) back in the file to see if any bits of data have repeated, and if they have replacing repetitions with much smaller references to the earlier data. Brotli is a bit different in that every implementation lugs along a 119KB static dictionary of phrases that Google presumably found were most common across the world of text-based compressible documents. So when it scans a document for compression, it not only looks for duplicates in the past 32KB window, it also uses the static dictionary as a source of matches. They enhanced this a bit by adding 121 “transforms” on each of those dictionary entries (which in the code looks incredible hack-ish. Things like checking for matches with dictionary words and the suffix ” and”, for instance, or for capitalization variations of the dictionary words).

As a quick detour, Google has for several years heavily used another compression algorithm – Shared Dictionary Compression for HTTP. SDCH is actually very similar to Brotli, however instead of having a 119KB universal static dictionary, SDCH allows every site to define their own, domain-specific dictionary (or dictionaries), then using that as the reference dictionary. For instance a financial site might have a reference dictionary loaded with financial terminology, disclaimers, clauses, etc.

However SDCH requires some engineering work and saw extremely little uptake outside of Google. The only other major user is LinkedIn.

So Brotli is like SDCH without the confusion (or flexibility) of server-side dictionary generation.

The Brotli dictionary makes for a fascinating read. Remember that this is a dictionary that is the basis for potentially trillions of data exchanges, and that sits at rest on billions of devices.

Here are a couple of examples of phrases that Brotli can handle exceptionally well-

the Netherlands
the most common
background:url(
argued that the
scrolling="no"
included in the
North American
the name of the
interpretations
the traditional
development of
frequently used
a collection of
Holy Roman Emperor
almost exclusively
" border="0" alt="
Secretary of State
culminating in the
CIA World Factbook
the most important
anniversary of the
style="background-
<li><em><a href="/
the Atlantic Ocean
strictly speaking,
shortly before the
different types of
the Ottoman Empire
under the influence
contribution to the
Official website of
headquarters of the
centered around the
implications of the
have been developed
Federal Republic of

Thousands of basic words across a variety of languages, and then collections of words and phrases such as the above example, comprise the Brotli standard dictionary. With the transforms previously mentioned, it supports any of these in variations such as pluralization, varied capitalization, suffixed with words like ” and” or ” for”, and a variety of punctuation variations.

So if you’re talking about the Federal Republic of the Holy Roman Emporor against the Ottoman Empire, Brotli has your back.

For really curious readers, I’ve made the dictionary available in 7z-compressed (fun fact – 7z uses LZMA) text file format if you don’t want to extract it from the source directly.

Should You Use It? And Why I Love Nginx

One of the most visited prior entries on here is Ten Reasons You Should Still Use Nginx from two+ years ago. In that I exclaim how I love having nginx sitting in front of solutions because it offers a tremendous amount of flexibility, at very little cost or risk: It is incredibly unlikely that the nginx layer, even if acting as a reverse proxy across a heterogeneous solution that might be built in a mix of technologies (old and new), is a speed, reliability or deployment weakness, and generally it will be the most robust, efficient part of your solution.

The nginx source code is a joy to work with as well, and the Google nginx module — a tiny wrapper around Mozilla’s brotli library, itself a wrapper around the Google brotli project, is a great example of the elegance of extending nginx.

In any case, another great benefit of nginx is that it often gains support for newer technologies very rapidly, in a manner that can be deployed on almost anything with ease (e.g. IIS from Microsoft is a superb web server, but if you aren’t ready to upgrade to Windows Server 2016 across your stack, you aren’t getting HTTP/2. The coupling of web servers with OS versions isn’t reasonable).

Right now this server that you’re hitting is running HTTP/2 for users who support it (which happens to be most), improving speeds while actually reducing server load. This server also supports brotli because…well it’s my play thing so why not. It supports a plethora of fun and occasionally experimental things.

Dynamic brotli compression probably isn’t a win, though. As Cloudflare found, the extra compression time required for brotli nullifies the reduced transfer times in many situations — if the server churns for 30ms that could have been transfer milliseconds, it’s a wash. Not to mention that under significant load it can seriously impair operations.

Where brotli makes a tonne of sense, however, and this holds for deflate/gzip as well, is when static resources are precompressed in advance on the server, often with the most aggressive compression possible. At rest the javascript file might sit in native, gzip, and brotli forms, the server streaming whichever one depending upon the client’s capabilities. Nginx of course supports this for gzip, and the Google brotli module fully supports this static-variation option as well. No additional computations on the server at all, the bytes start being delivered instantly, and if anything it reduces server IO. Just about every browser supports gzip at a minimum, so this static-at-rest-compressed strategy is a no-brainer, the limited downside being the redundant storage of a file in multiple forms, and the administration of ensuring that when you update these files you update the variations as well.

Win win win. Whether brotli or gzip, a better experience for everyone.