MySQL Cluster Edition — Web Scale

I’ve been casually building out for two reasons-

  1. Because I really believe in the core idea and the value it can bring to users. I actually think that it will make the world a slightly better place
  2. As a technical activity to exercise my knowledge of various tools and techniques I wouldn’t deal with enough professionally to maintain competency

If it weren’t for #2, the site would be a co-located IIS 7.5-hosted web farm running a C# ASP.NET presentation layer against a WCF C# service layer atop SQL Server instances using flash-storage on a high-reliability failover cluster.

I could easily architect such a site for multi-site redundancy and with performance to host the most exaggerated probable uses.

But that would be the easy path. The easy path isn’t very interesting.

So instead I’ve gone the AWS multi-availability zone route. nginx and node.js (the former for static content and as a reverse proxy, with the latter serving separate instance roles as both a service and a presentation services layer) on Ubuntu 12.04 EC2 instances over EBS hooked together with heavily secured IPSec tunnels.

Powerful. High performance. Incredibly scalable. Cheap.

The data layer, however, has been a very difficult choice. As detailed in a prior post, one of the continuing surprises of many NoSQL solutions is the profound lack of basic performance: While they scale out, some with much more difficulty than others, the cost per transaction is sometimes unreasonable, even if the core product itself may come gratis. It is absurd reading of relatively low volume sites with token income running across dozens of servers.

No thanks.

The one solution that has come out of nowhere to become my primary choice — making me sure that I must be missing some profound failing — is MySQL Cluster Edition. Shared-nothing horizontal scalability and redundancy/reliability, and performance and functionality that has few compromises.

I trialed the memcached interface and while it requires the dev branch of 1.6, the performance is just incredible – 30,000 retrievals per second on a small virtual instance (6x better than I got out of Riak, bizarrely with 1/2 the CPU usage). That’s a write-back implementation, it should be noted, meaning that for the appropriate uses the memcached instance serves not only as a memory cache and simple API, but also a hyper-speed interface to the actual source data itself.

They really seemed to get it right. It is simply a much more elegant implementation than the classic “check the cache and if it isn’t there load it from the database and then push it to the cache“. In the MySQL CE + memcached world you check the “cache” that pulls it from the database, and when you push it back to the cache it pushes back directly to the database. And you still have the full capacity to do distributed SQL on the cluster against that “NoSQL” served data. Clearly using NDB there are extra costs of querying and communicating the data between nodes, however with recent iterations they’ve added data node intelligence to perform basic SQL tasks at the source, dramatically improving general query performance.

Just brilliant. I love everything about it. It needed the prodding of the various NoSQL upstarts to make it progress to where it is, but the result is superlative.

Leave a Reply

Your email address will not be published. Required fields are marked *