Tuesday, December 7, 2010

Progress in logging systems

Twenty years ago, I made my living writing the transaction logging components of storage subsystems for database systems. This is a specialty within a specialty within a specialty:

  • Database systems, like operating systems, file systems, compilers, networking protocols, and the like, are a type of "systems software". Systems software are very low-level APIs and libraries, on top of which are built higher-level middleware and applications.

  • Inside a database system, there are various different components, such as SQL execution engines, query optimizers, and so forth. My area of focus was the storage subsystem, which is responsible for managing the underlying physical storage used to store the data on disk: file and index structures, concurrency control, recovery, buffer cache management, etc.

  • Inside the storage subsystem, I spent several years working just on the logging component, which is used to implement the "write ahead log". The basic foundation of a recoverable database is that, everytime a change is made to the data, a log record which describes that change is first written to the transaction log; these records can later be used to recover the data in the event of a crash.



When I was doing this work, in the 80's and 90's, I was further sub-specializing in a particular type of logging system: namely, a position-independent shared-memory multi-processor implementation. In this implementation, multiple processes attach to the same pool of shared memory at different virtual addresses, and organize the complex data structures within that shared memory using offsets and indexes, rather than the more commonly used pointer based data structures. (As a side note, I learned these techniques from a former co-worker, who is now once again a current co-worker, though we're working on completely different software now. The world turns.)

Anyway, I became somewhat disconnected from logging systems over the years, so I was fascinated to stumble across this paper in this summer's VLDB 2010 proceedings. The Aether team are investigating the issues involved with logging systems on multi-core and many-core systems, where ultra-high concurrency is a main goal.

As the paper points out, on modern hardware, logging systems have become a significant bottleneck for the scalability of database systems, and several databases have coped in rather painful ways, providing hideous "solutions" such as COMMIT_WRITE = NOWAIT which essentially discards the "write ahead" aspect of write ahead logging in search of higher performance.

The authors are pursuing a different strategy, leveraging the several decades worth of work in lock-free shared data structures to investigate how to build massively concurrent logging systems which don't become a bottleneck on a many-core platform. These techniques include things such as Nir Shavit's Diffracting Trees and Elimination Trees.

I think this is fascinating work; it's great to see people approaching logging systems with a fresh eye, and rather than trying to avoid them, as so many of the modern "no sql" systems seem to want to do, instead they are trying to improve and strengthen this tried-and-true technology, addressing its flaws rather than tossing it out like last century's bathwater.

Ultra-high-concurrency data structures are a very interesting research area, and have been showing great progress over the last decade. I think this particular sub-field is far from being played out, and given that many-core systems appear to be the most likely future, it's worth investing time to understand how these techniques work, and where they can be most effectively applied. I'll have more to say about these topics in future posts, but for now, have a read of the Aether paper and let me know what you think!

No comments:

Post a Comment