Getting Started with Apache Cassandra (a “NoSQL” front-runner) on Windows

Cassandra has gotten a lot of hype lately, having been recently chosen as the nucleus of the Digg upgrade simultaneous with Reddit taking baby-steps to the platform. Digg is promoting their revised technology stack as enabling a “wicked fast”experience that is much more individualized, while Reddit is thus far only really using it as a drop-in key/value replacement for MemcacheDB.

And of course Cassandra is well known for its use by Facebook and Twitter.

Naturally, given the white-hot hype, most want to see what the big deal is. Emerging web technologies often require that you either have a Linux box available (either a physical box or a virtual instance in a product like VirtualBox), however with just a couple of minor config changes and deviations from the docs, you can do a trial run and kick the wheels on Windows as a first class host, even if any possible production use would almost certainly see it deployed to Linux.

Cassandra is layered over Java, and of course a benefit of that platform is that it is inherently cross platform.

  1. Download and install the latest JavaJDK. Ensure, post install, that the JAVA_HOME environment variable is pointed at the root of your JDK install (which on a 64-bit machine might be C:\ProgramFiles (x86)\Java\jdk1.6.0_18).
  2. Download ApacheAnt. Uncompress to the folder C:\Apache\Ant(giving you files likeC:\Apache\Ant\bin\ant.bat).
  3. DownloadCassandra. Given that you’re probably going to be playing around for a while, go with the 6.0b2 copy, downloading the bin version. Uncompress the package into the location C:\Apache\Cassandra.
  4. Open a command prompt and navigate to the Cassandra directory(e.g. after running C:, do a cdC:\Apache\Cassandra). Run the commandC:\Apache\Ant\bin\ant ivy-retrieve. This will download Cassandra depedencies.
  5. Edit C:\Apache\Cassandra\conf\storage-conf.xml,updating lines 188 – 193 to replace each instance of/var/lib/cassandra/ references withC:\Apache\Cassandra\Files\.
  6. Copy the files fromC:\apache\cassandra\build\lib\jars (which are thefiles that ant downloaded) to C:\apache\cassandra\lib.It isn’t the most elegant solution but it’s the most concise in point form.
  7. In a command prompt, after running C: and cd\Apache\Cassandra, run the commandbin\cassandra. Cassandra should start up successfully(and if applicable the Windows firewall will ask if it should make an exception). If it doesn’t start successfully you likely didn’t follow one of the prior points correctly.
  8. In another console window navigate toC:\Apache\Cassandra and run the commandbin\cassanda-cli. At the prompt run the commandconnect localhost/9160 and you should connect. You cannow try out some of the simple set and get commands you can find in the README.txt.
  9. Start reading up on the Thrift API, the basics of datastorage, what a “super-columnis”, and so on.

I’ve been playing around with various NoSQL* solutions for some time,however given the incredible hype — which is strangely coupled with a complete lack of any objective measure — I’ve decided to put it to the test. In the next couple of days a high-performance SSD will arrive and I’ll gather some metrics for objective purposes, because the message being sold doesn’t technically pass the B.S. test.

* – A better name than “NoSQL” is desperately needed. Backronyms and revisionist history — seriously, guys, “Not Only SQL?” — don’t solve the problem that the name is incendiary, inaccurate, and a little ridiculous. KVDBMS works for some of the products, but isn’t quite applicable to richer solutions like Cassandra.

Cassandra is a very, very cool product, and I immediately see lots of very interesting uses for it, but what is most striking is what is missing from the product. It is so intensely bare-bones at the moment, which is exactly how MySQL made inroads: When it first became the first-love of many of the same people and sites that now herald NoSQL (the same people who almost without fail rallied behind PHP which…well…enough said), it was almost comically deficient as a database product, but as it grew those features it grew away from its core contingent.

Exciting times regardless. There are many niches in the technology space, in which the appropriate solutions should be applied, so it is always worth keeping an open mind.