Wednesday, August 22, 2012

A compendium of database-y stuff

I found myself wandering through some rather random database-y stuff recently; here's some of the fun stuff I've been reading this week:

  • Cassandra query performance. Aaron Morton, a Cassandra committer, posted a slide deck entitled: Cassandra SF 2012 - Technical Deep Dive: query performance. It's a very interesting presentation, although by the time you're a few dozen slides into the presentation, and he's described 8 or 10 different ways to try to tweak the Cassandra commit and cache algorithms, will you be as freaked out as I was?
  • OpenLDAP MDB. I think they're positioning this as an embedded data store:
    MDB is an ultra-fast, ultra-compact key-value data store. It uses memory-mapped files, so it has the read performance of a pure in-memory database while still offering the persistence of standard disk-based databases, and is only limited to the size of the virtual address space.
    Read more at: The MDB site.

    It's cool to look at what people do to make an embedded data store as absolutely blindingly fast as possible.

  • Distributed transactions for Google App Engine. If I'm understanding this correctly, these guys built a consistent distributed data store on top of a collection of independent local data stores.

    It's an extremely interesting paper, and a 1-hour video presentation by the author is also available from that page.

    Our contribution is that we provide transactional semantics without restriction (1): we create a kind of transaction that works across objects not in the same Entity Group. We call these transactions "Distributed Transactions" (DTs) in order to distinguish them from the original GAE "Local (to one Entity Group) Transactions".

    When using our Distributed Transactions the set of objects operated upon must be specified directly by their Keys, as one does with an object store, not by predicates on their properties, as does with a general relational query.

  • HBase Replication: Operational Overview. Available here.

    HBase's replication support continues to evolve. It still looks like a very complex system that is quite hard to monitor and diagnose. The basic tool appears to be to do some sort of massive data diff on your running system(s):

    A standard way to verify is to run the verifyrep mapreduce job, that comes with HBase. It should be run at the master cluster and require slave clusterId and the target table name. One can also provide additional arguments such as start/stop timestamp and column families. It prints out two counters namely, GOODROWS and BADROWS, signifying the number of replicated and unreplicated rows, respectively.
  • MemSQL Architecture. Yet another in-memory database that claims to be the fastest one of all. In a recent blog entry describing their benchmarking, the developers say:
    MemSQL is an in-memory database that stores all the contents of the database in RAM but backs up to disk. MongoDB and MySQL store their data on disk, though can be configured to cache the contents of the disk in RAM and asynchronously write changes back to disk. This fundamental difference influences exactly how MemSQL, MongoDB and MySQL store their data-structures: MemSQL uses lock-free skip lists and hash tables to store its data, whereas MongoDB and MySQL use disk-optimized B-trees.

    Todd Hoff reports his notes from interviewing the MemSQL team here, saying:

    On the first hearing of this strange brew of technologies you would not be odd in experiencing a little buzzword fatigue. But it all ends up working together. The mix of lock-free data structures, code generation, skip lists, and MVCC makes sense when you consider the driving forces of data living in memory and the requirement for blindingly fast execution of SQL queries.
  • PVLAN, VXLAN and cloud application architectures. I don't know very much about modern networking technologies, so most of this page went way over my head, but it sounds like the evolution in the networking world is as fast as it is in the data world.
    In short – you need multiple isolated virtual network segments with firewalls and load balancers sitting between the segments and between the web server(s) and the outside world.

    VXLAN, NVGRE or NVP, combined with virtual appliances, are an ideal solution for this type of application architectures. Trying to implement these architectures with PVLANs would result in a total spaghetti mess of isolated and community VLANs with multiple secondary VLANs per tenant.

  • Why BTrees beat Hashing for sharding. I've written about the Galaxy project before; it's new but quite intriguing. In this post, they try to do some benchmarking as well as some analytical reasoning.
    not only have we got O(1) (amortized) worst-case performance in terms of network roundtrips, but the actual constant is much less than 1 (because C>>2b), which is what we'd get when using a distributed hash-table. This is perfect scalability, and it is a direct result of the properties of the B+-tree (and other similar tree data structures), and is not true for all data structures.

No comments:

Post a Comment