Premature Scaling

Those who’ve read my missives over the years know that I’ve attempted initiatives with a variety of “NoSQL” technologies.

Voldemort, Membase(d), Cassandra, MongoDB, and most recently Riak.

While every prior attempt has ended in maddening failure — products grossly oversold and underdelivering — I am working really hard at getting Riak to deliver.

Riak is yielding single-node read-performance of ~5000-6000 small key retrievals per second (an example case being user authentications — e.g. given a username get the user credentials) on a well-resourced virtual machine, completely saturating the CPU. While this is with a very naive benchmark (20 workers using the protocol buffers interface), it is exactly the sort of use that I will make of the product so it perfectly suits my purposes and is entirely relevant.

SQL Server with the same resources offers up ~15,000 key retrievals per second with relatively low CPU usage (TDS and execution plan costs being the critical path that limits the top end). This is with all of the overhead, and with that functionality and flexibility, that comes with a full RDBMS, including secured objects and sets, implicit transactions, etc.

I am not seriously proposing SQL Server as the alternative, but simply note that for something that a full-scale RDBMS is not at all targeted at (simple KV lookups), it does it more efficiently, which is likely a surprise to many.

Riak is a very impressive project. The Dynamo-style scale out potential is incredible. The reliability potential is fantastic. Alas, I don’t want to be forced into premature or excessive scale outs, leading to the classic issue that functionality is delayed or simplified because the computing resources grow too big to be cost effective or manageable.

I want to do the most with the least. I’d rather have a reliable setup with 2 machines rather than 6, 12, or 24.

When so much computing resources go to do so little (KV lookups, something that decent libraries do in the 5 million ops/S range on a single core in a managed runtime) , I have to question the efficiency of the implementation. While I appreciate the amazing many node capacity of Riak and the built-in replication and conflict resolution, it shouldn’t add so much overhead on such simple lookups.

I am in love with Riak the theory. It is a simply gorgeous architecture.

But I can’t ignore the cost and inefficiency that it, perhaps, represents. Having three times+ the machines, VM or not, to service the same load is not reasonable.

Riak is still the number one candidate, but first I have to eliminate MySQL Cluster Edition from contention. There are benchmarks showing it hitting a million+ KV lookups per second on a single instance. I cannot discount such a differential, even if it does have the maligned “SQL” in its name.

Leave a Reply

Your email address will not be published. Required fields are marked *