Everything You Read About Databases Is Obsolete

Six and a half years ago I wrote (in a piece about NoSQL) –

Optimizing against slow seek times is an activity that is quickly going to be a negative return activity.

This time has long since passed, yet much of the dogma of the industry remains the same as it was back when our storage tier was comprised of 100 IO/second magnetic drives. Many of our solutions still have query engines absolutely crippled by this assumption (including pgsql. mssql has improved, but for years using it on fast storage was as exercise in waste as it endlessly made the wrong assumptions around conserving IOs).

There are now TB+ consumer storage solutions with 300,000 IO/second (3000x the classic magnetic drive, while also offering sequential rates above 3.5GB/s…yes, big G) for under $600. There are enterprise solutions serving 10,000,000 IOPS.

That’s if your solution even needs to touch the storage tier. Memory is so inexpensive now, even on shared hosting like AWS, that all but the largest databases sits resident in memory much of the time. My smartphone has 3GB, and could competently host the hot area of 99.5%+ of operational databases in memory.

For self-hosted hardware, TBs of memory is now economical for a small business, while tens of GBs is operationally inexpensive on shared hosting.

I totally made up that 99.5% stat, but it’s amazing how relatively tiny the overwhelming bulk of databases I encounter in my professional life are, yet how much so many fret about them.

Obviously writes still have to write thru, yet when you remove the self-defeating tactic of trying to pre-optimize by minimizing read IO — eliminating denormalization, combined-storage (e.g. document-oriented), materializations and trigger renderings/precomputing, excessive indexes, etc — in most cases writes narrow to a tiny trickle1.

When writes reduce, not only does practical write performance increase (given that you’re writing much less per transaction, the beneficial yield increases), the effectiveness of memory and tier caches increases as the hot area shrinks, and the attainability of very high performance storage options improves (it’s a lot more economical buying a 60GB high performance, reliable and redundant storage tier than a 6TB system. As you scale up data volumes, often performance is sacrificed for raw capacity).

Backups shrink and systems become much more manageable. It’s easy to stream a replica across a low grade connection when the churn of change is minimized. It’s easy to keep backups validated and up to date when they’re GBs instead of TBs.

Normalize. Don’t be afraid of seeks. Avoid the common pre-optimizations that are in the whole destructive to virtually every dimension of a solution on modern hardware (destroying write performance, long-term read performance, economics, maintainability, reliability). Validate assumptions.

Because almost everything written about databases, and from that much of what you read, is perilously outdated. This post was inspired when seeing another database best practices guideline make the rounds, most suggestions circling a very dated notion that every effort should be made to reduce IOs, the net result being an obsolete, overwrought solution out of the gates.

1 – One of the most trying aspects of writing technical blog entries is people who counter with edge cases to justify positions: Yes, Walmart has a lot of writes. So does Amazon, CERN and Google too. The NY Taxi Commission logs loads of data, being in a city area of tens of millions.

There are many extremely large databases with very specialized needs. They don’t legitimize the choices you make, and they shouldn’t drive your technical needs.

But What Does That Practically Mean?

One of the principle advocacy points used to pitch many NoSQL solutions is the flexibility of the schema. Whether it’s an amorphous binary blob in a key-value database, or a loosely structured JSON collection in a document-oriented database, the general benefit is that it’s up to you, and it remains flexible with each and every database entry. Contrasted against traditional databases where there is often significant up-front considerations of every table and column and relationship — choices that are very strictly enforced, reminding you of their presence each and every day — there is a compelling narrative that minus that up front investment you’ve saved yourself time and effort.

But the selling point has never convinced me (just as it didn’t convince me when the same thought process was behind the advocacy of “database within a database” solutions, an sql_variant serving all needs).

How do you write an application with no data structure How, practically, is the lack of a formal database schema a real benefit in the life cycle of an actual product, after the honeymoon period of “boy did I save time not having to consider a schema!” has passed?

It generally isn’t a benefit. Instead the emergent schema gets tightly bound with the application layer, the database benefit reduced to facilitating less expensive data model transitions (relative to the worst case of some relational models, with MySQL traditionally being a particularly bad implementation when you need to make schema changes on very large sets of data). The problem in this back-loaded model is that the database is often the longest-living part of the application, with many modern applications sharing the database descendant of implementations done decades ago, while the GUI is often wholesale rewrite after rewrite in completely differing technology stacks. Yet you lose all of the benefits of strong typing — to use a general language term — constraints.

It is, more often than not, a foolish assumption that a temporary gain will yield long term benefits, when the opposite is generally true.

The database world is hardly alone in these sorts of superficially-beneficial solutions. If you’ve worked on a development team for any length of time, you will have enjoyed the experience of having a participant who knows just enough to, essentially, be dangerous. Where vague, shapeless solutions cure all ills.

Mocks, Test Drive Development, Agile, Waterfall, MVC, Magical Data Engine. It’s silver-bullet time!

It isn’t that these solutions are bad in isolation — they are fantastic at the right time, place and situation (just as schema-free NoSQL databases most certainly have a place) — but the woefully ignorant application of them to the wrong problems render them project sabotage: It becomes a distraction that is worse than a simple time waste because it derides meaningful efforts. This destroys projects and teams when all improvements face the same barrage of inappropriate silver-bullet solutions.

In any case, it has been very odd seeing all of the coming-to-senses in the NoSQL space over the past year. I dare say that many of the posts that I authored previously have been proven out (and worth noting that since then I’ve done more with NoSQL solutions than many of the people who I debated with on those points).

MySQL Cluster Edition — Web Scale

I’ve been casually building out https://www.tewdew.com for two reasons-

  1. Because I really believe in the core idea and the value it can bring to users. I actually think that it will make the world a slightly better place
  2. As a technical activity to exercise my knowledge of various tools and techniques I wouldn’t deal with enough professionally to maintain competency

If it weren’t for #2, the site would be a co-located IIS 7.5-hosted web farm running a C# ASP.NET presentation layer against a WCF C# service layer atop SQL Server instances using flash-storage on a high-reliability failover cluster.

I could easily architect such a site for multi-site redundancy and with performance to host the most exaggerated probable uses.

But that would be the easy path. The easy path isn’t very interesting.

So instead I’ve gone the AWS multi-availability zone route. nginx and node.js (the former for static content and as a reverse proxy, with the latter serving separate instance roles as both a service and a presentation services layer) on Ubuntu 12.04 EC2 instances over EBS hooked together with heavily secured IPSec tunnels.

Powerful. High performance. Incredibly scalable. Cheap.

The data layer, however, has been a very difficult choice. As detailed in a prior post, one of the continuing surprises of many NoSQL solutions is the profound lack of basic performance: While they scale out, some with much more difficulty than others, the cost per transaction is sometimes unreasonable, even if the core product itself may come gratis. It is absurd reading of relatively low volume sites with token income running across dozens of servers.

No thanks.

The one solution that has come out of nowhere to become my primary choice — making me sure that I must be missing some profound failing — is MySQL Cluster Edition. Shared-nothing horizontal scalability and redundancy/reliability, and performance and functionality that has few compromises.

I trialed the memcached interface and while it requires the dev branch of 1.6, the performance is just incredible – 30,000 retrievals per second on a small virtual instance (6x better than I got out of Riak, bizarrely with 1/2 the CPU usage). That’s a write-back implementation, it should be noted, meaning that for the appropriate uses the memcached instance serves not only as a memory cache and simple API, but also a hyper-speed interface to the actual source data itself.

They really seemed to get it right. It is simply a much more elegant implementation than the classic “check the cache and if it isn’t there load it from the database and then push it to the cache“. In the MySQL CE + memcached world you check the “cache” that pulls it from the database, and when you push it back to the cache it pushes back directly to the database. And you still have the full capacity to do distributed SQL on the cluster against that “NoSQL” served data. Clearly using NDB there are extra costs of querying and communicating the data between nodes, however with recent iterations they’ve added data node intelligence to perform basic SQL tasks at the source, dramatically improving general query performance.

Just brilliant. I love everything about it. It needed the prodding of the various NoSQL upstarts to make it progress to where it is, but the result is superlative.

Premature Scaling

Those who’ve read my missives over the years know that I’ve attempted initiatives with a variety of “NoSQL” technologies.

Voldemort, Membase(d), Cassandra, MongoDB, and most recently Riak.

While every prior attempt has ended in maddening failure — products grossly oversold and underdelivering — I am working really hard at getting Riak to deliver.

Riak is yielding single-node read-performance of ~5000-6000 small key retrievals per second (an example case being user authentications — e.g. given a username get the user credentials) on a well-resourced virtual machine, completely saturating the CPU. While this is with a very naive benchmark (20 workers using the protocol buffers interface), it is exactly the sort of use that I will make of the product so it perfectly suits my purposes and is entirely relevant.

SQL Server with the same resources offers up ~15,000 key retrievals per second with relatively low CPU usage (TDS and execution plan costs being the critical path that limits the top end). This is with all of the overhead, and with that functionality and flexibility, that comes with a full RDBMS, including secured objects and sets, implicit transactions, etc.

I am not seriously proposing SQL Server as the alternative, but simply note that for something that a full-scale RDBMS is not at all targeted at (simple KV lookups), it does it more efficiently, which is likely a surprise to many.

Riak is a very impressive project. The Dynamo-style scale out potential is incredible. The reliability potential is fantastic. Alas, I don’t want to be forced into premature or excessive scale outs, leading to the classic issue that functionality is delayed or simplified because the computing resources grow too big to be cost effective or manageable.

I want to do the most with the least. I’d rather have a reliable setup with 2 machines rather than 6, 12, or 24.

When so much computing resources go to do so little (KV lookups, something that decent libraries do in the 5 million ops/S range on a single core in a managed runtime) , I have to question the efficiency of the implementation. While I appreciate the amazing many node capacity of Riak and the built-in replication and conflict resolution, it shouldn’t add so much overhead on such simple lookups.

I am in love with Riak the theory. It is a simply gorgeous architecture.

But I can’t ignore the cost and inefficiency that it, perhaps, represents. Having three times+ the machines, VM or not, to service the same load is not reasonable.

Riak is still the number one candidate, but first I have to eliminate MySQL Cluster Edition from contention. There are benchmarks showing it hitting a million+ KV lookups per second on a single instance. I cannot discount such a differential, even if it does have the maligned “SQL” in its name.

Trouble with Securing Riak

Database systems like SQL Server and Oracle have long featured pervasive security.

You define the available logins, transports and encryption. You grant each login specific rights, then using ownership chaining to build a secure, only-what-is-absolutely-necessary data platform.

Security was never optional. It wasn’t bolted on later.

While originally intended for client-server applications, where each user is authenticated directly by the database with all of the authorization that entails, in most modern architectures only service accounts hit the database. Even in that sort of service oriented install, database authentication and authorization helps you restrict the subsystems to doing only the actions that their purpose demands.

The authentication service can only ever call the auth.PerformLogin stored procedure, for instance.

It’s a very good last defense. Security is maximized through layering.

The complete absence of security in many NoSQL solutions stands in stark contrast to those traditional solutions. Riak, for instance, simply has no security whatsoever. The security documentation is essentially just a placeholder.

The advice is that you should roll your own by layering security in other parts of the system. Install an Apache or Nginx reverse proxy in front of it, for instance. Build a node.js application just to apply permission restrictions. Layer some ipsec rules for added measure.

I’m not faulting Riak (quite the opposite — my cost to commercially use what is a very impressive product is $0): Every other NoSQL system I’ve looked at has at most rudimentary security. The single root user, for instance, that can do anything.

Many try to see the best in these deficiencies, reminding a bit of the early days of MySQL when every feature they had yet to get around to implementing was instead held as an advantage: When transactional integrity, stored procedures, robust storage systems, user types, etc, were purportedly an unwanted extra burden. Until they get implemented when their utility became obvious and of value to all.

This lack of security adds friction and complexity to the implementation. How, for instance, to maintain the incredible scalability of Riak (the reason I chose it as the key/value database of tewdew for a very unique need that it perfectly suits) while choosing one of these layered options. How to ensure absolute security under varying network conditions (e.g. spinning up a new instance or operating in another vendor’s virtual machine farm when I have minimal trust that the peers will always be controlled by me, yet need to dynamically facilitate adding and removing machines).

I’m considering these questions now, and continue to research options. If any reader has any thoughts on implementing credible security with Riak, I’d be very interested in hearing it. It’s a learning experience for me, though I feel quite certain that I’ll come up with a great solution that I’ll document here.