Thursday, December 30, 2010

Some situations where power efficiency isn't the desired answer

Here's an interesting collection of inter-related notes about people who were surprised when their brand new spiffy computers were running substantially slower than their old computers:



The bottom line is that modern CPUs are incredibly sophisticated, and are capable of dynamically speeding up and slowing down in response to their workload, running faster (and using more power, generating more heat, etc.) when they need to get lots of work done, but automatically slowing themselves down when they aren't busy.

However, as Jeff Atwood and his team at StackOverflow found, sometimes this automatic speedup/slowdown functionality doesn't work right, and the only thing that you might notice is that your brand new server is running slower than your old one. Gotta love this perspective on the behavior:

My hat is off to them for saving the planet by using less power, but my pants are down to them for killing performance without informing the users. In the last few weeks, I’ve seen several cases where server upgrades have resulted in worse performance, and one of the key factors has been throttled-down CPUs. In theory, the servers should crank up the juice according to demand, but in reality, that’s rarely the case. Server manufacturers are hiding power-saving settings in the BIOS, and Windows Server ships with a default power-saving option that throttles the CPU down way too often.


Jeff Atwood is an incredibly alert and aware programmer; I have to wonder how many other users out there are being bit by this behavior and are completely unaware that it is occuring to them.

It looks like there is a CPUID tool available from Intel for Mac OS X: MacCPUID. It seems to work on my system, though it's hard to compare it to the CPU-Z screen shots from the articles above. Is there a better tool to run on a Mac OS X system?

The American healthcare system is annoying

Groan. It's things like this that make me almost willing to move to England, or New Zealand, or Denmark, or someplace that has a decent view of the importance of healthcare to a society.

Apparently, the 2010 U.S. Healthcare reform act was very delicately worded when it came to the much-bally-hooed change to enable parents to include their children in their coverage up through age 26. The insurance companies apparently worded the bill in such a way that medical insurance plans are now provided for such overage dependents.

But the legislation does not cover dental insurance.

Nor vision insurance.

As though your teeth weren't part of your health.

Or your eyesight.

Stupid politicians.

So my grown son, who has been trying for 14 months now to find a job, and who is subsisting on temporary employment through a staffing agency, is at least finally covered by my health plan.

But not by my dental plan. Nor by my vision plan.

Meanwhile, what did the dental insurance company report last month?


While economic conditions continue to influence parts of our business, steady consumer demand for our insurance and retirement products has contributed to consistent sales and positive aggregate net flow results.


Well, good for them. I hope they sleep better with those fine aggregate net flow results.

Wednesday, December 29, 2010

Where set theory meets topical humor

It's topical! It's set theory! It's humorous! I won't be the first person to link to this, but even if you can barely spell "Venn diagram" you'll enjoy reading this nice short essay.

Learning from the Skype outage

I'm not much of a Skype user recently, but in the past I used it quite a bit; it's a great service!

So I wasn't much impacted by last week's Skype system outages, but I was still interested, because Skype is a big complex system and I love big complex systems :)

If, like me, you're fascinated by how these systems are built and maintained, and what we can learn from the problems of others, you'll want to dig into some of what's been written about the Skype outage:


Building immense complicated distributed systems is incredibly hard; I've been working in the field for 15 years and I'm painfully aware of how little I really know about this.

It's wonderful that Skype is being so forthcoming about the problem, what caused it, what was done to fix it, and how it could be avoided in the future. I am always greatful when others take the time to write up information like this -- post-mortems are great, so thanks Skype!

Tuesday, December 28, 2010

The 2010 One Page Dungeon Contest

A nice posting over at Greg Costikyan's Play This Thing alerted me to the 2010 One Page Dungeon contest.

There are a lot of entrants, and I'm not really much of a tabletop RPG player, so most of the entries went over my head, but I did rather enjoy reading through the nicely formatted PDF collection of the 2010 winners.

Personally, I liked "Trolls will be Trolls", "Velth, City of Traitors", and "Mine! Not yours!" the best, though all of them were quite nice.

Monday, December 27, 2010

Fish Fillets

Oh my, I am completely 100% addicted to Fish Fillets! I've always loved puzzle games, and this one is superb.

Thank you ALTAR Interactive for allowing the world to continue to enjoy your delightful game!

Saturday, December 25, 2010

Traditional DBMS techniques in the NoSQL age

Nowadays the so-called "NoSQL" techniques are all the rage. Everywhere you look it's Dynamo, Cassandra, BigTable, MongoDB, etc. Everybody seems to want to talk about relaxed consistency, eventual consistency, massive scalability, approximate query processing, and so on.

There's clearly a lot of value in these new paradigms, and it's indeed hard to see how Internet-scale systems could be built without them.

But, for an old-time DBMS grunt like me, raised on the work of people like Gray, Stonebraker, Mohan, Epstein, Putzolu, and so forth, it's a breath of extremely fresh air to come across a recent Google paper: Large-scale Incremental Processing Using Distributed Transactions and Notifications.

Google, of course, are pioneers and leaders in Internet-scale data management, and their systems, such as BigTable, Map/Reduce, and GFS, are well known. But this paper is all about how traditional database techniques still have a role to play in Internet-scale data management.

The authors describe Percolator and Caffeine, systems for performing incremental consistent updates to the Google web indexes:

An ideal data processing system for the task of maintaining the web search index would be optimized for incremental processing; that is, it would allow us to maintain a very large repository of documents and update it efficiently as each new document was crawled. Given that the system will be processing many small updates concurrently, an ideal system would also provide mechanisms for maintaining invariants despite concurrent updates and for keeping track of which updates have been processed.


They describe how they use ideas from traditional DBMS implementations, such as transaction isolation, and two-phase commit, to provide certain guarantees that make new approaches to maintaining Google's multi-petabyte indexes feasible:

By converting the indexing system to an incremental system, we are able to process individual documents as they are crawled. This reduced that average document processing latency by a factor of 100, and the average age of a document appearing in a search result dropped by nearly 50 percent.


Since Google have for many years been the poster child for Internet-scale data management, it's an event of significant importance in this age of NoSQL architectures and CAP-theorem analysis to read a paragraph such as the following from Google's team:

The transaction management of Percolator builds on a long line of work on distributed transactions for database systems. Percolator implements snapshot isolation by extending multi-version timestamp ordering across a distributed system using two-phase commit.


What goes around comes around. Reading the paper, I was reminded of the days when I first got interested in DBMS technology. In the late 1970's, data processing tended to be done using what was then called "batch" techniques: During the day, the system provided read-only access to the data, and accumulated change requests into a separate spooling area (typically, written to 9-track tapes); overnight, the day's changes would be run through a gigantic sort-merge-apply algorithm, which would apply the changes to the master data, and make the system ready for the next day's use. Along came some new data processing techniques, and systems could provide "online updates": operators could change the data, and the system could incrementally perform the update while still making the database available for queries by other concurrent users.

Now it's 40 years later, and the same sort of changes are still worth doing. The authors report that the introduction of Percolator and Caffeine provided a revolutionary improvement to the Google index:

In our previous system, each day we crawled several billion documents and fed them along with a repository of existing documents through a series of 100 MapReduces. Though not all 100 MapReduces were on the critical path for ever document, the organization of the system as a series of MapReduces meant that each document spent 2-3 days being indexed before it could be returned as a search result.

The Percolator-based indexing system (known as Caffeine) crawls the same number of documents, but we feed each document through Percolator as it is crawled. The immediate advantage, and main design goal, of Caffeine is a reduction in latency: the median document moves through Caffeine over 100x faster than the previous system.


The paper is very well written, thorough, and complete. If you are even tangentially involved with the world of "Big Data", you'll want to carve out an afternoon and spend it digging through the paper, chasing down the references, studying the pseudocode, and thinking about the implications. Thanks Google for publishing these results, I found them very instructive!

Wednesday, December 22, 2010

Sterling on Assange

I've been mostly baffled by the WikiLeaks saga; I didn't know what it meant, and I've been waiting for someone capable of doing a "deep reading", as they say in literature classes.

Today, along comes the world's best writer on technology and culture, Bruce Sterling, and his essay on Julian Assange and the Cablegate scandal is the best work I've yet seen to explain and interpret what's occurring:


That’s the real issue, that’s the big modern problem; national governments and global computer networks don’t mix any more. It’s like trying to eat a very private birthday cake while also distributing it. That scheme is just not working. And that failure has a face now, and that’s Julian Assange.


Sterling has both the experience and the brilliance to interpret these events in the light of all of modern culture, tying together banking scandals, MP3 file sharing, the Iraq war, the Clinton/Lewinsky scandal, the Velvet Revolution, and more, taking you back to 1947, and on to tomorrow. Sterling's essay does more than just take you through what's happened, and why it matters: it peers into the future, as the best writers can do, and opens your eyes to what may lie ahead:


For diplomats, a massive computer leak is not the kind of sunlight that chases away corrupt misbehavior; it’s more like some dreadful shift in the planetary atmosphere that causes ultraviolet light to peel their skin away. They’re not gonna die from being sunburned in public without their pants on; Bill Clinton survived that ordeal, Silvio Berlusconi just survived it (again). No scandal lasts forever; people do get bored. Generally, you can just brazen it out and wait for public to find a fresher outrage. Except.

It’s the damage to the institutions that is spooky and disheartening; after the Lewinsky eruption, every American politician lives in permanent terror of a sex-outing. That’s “transparency,” too; it’s the kind of ghastly sex-transparency that Julian himself is stuck crotch-deep in. The politics of personal destruction hasn’t made the Americans into a frank and erotically cheerful people. On the contrary, the US today is like some creepy house of incest divided against itself in a civil cold war. “Transparency” can have nasty aspects; obvious, yet denied; spoken, but spoken in whispers. Very Edgar Allen Poe.


It's a brilliant essay, every word of which is worth reading. If you've got the time, you won't regret spending it reading The Blast Shack.

Thursday, December 16, 2010

Insert Coin

Perhaps the best part of this delightful video homage to old video games is the end, where the two artists describe the behind-the-scenes techniques that they used to make the video.

Lights theory

OK, so where do I go for a basic introduction to the theory and practice of Christmas tree lights?

In particular, where can I find a self-help guide that covers topics such as:

  • When part, but not all, of a string isn't staying lit, what is causing that? How can I find and replace the one piece which is causing the problem?

  • When one or more lights in the string flash, when they are supposed to stay lit steadily, or stay lit steadily, when they are supposed to flash, what is causing that, and how can I find and replace the one piece which is causing the problem?

  • What configurations prolong or, conversely, reduce the life of the string of lights? Does connecting strings in certain orders end-to-end change their behavior? Why is that?



Surely there must be some resource that saves me from futilely manipulating the unlit bulbs for 15 minutes, then giving up in despair and switching to another string...

Wednesday, December 15, 2010

Apache Derby 10.7 has been released

The 10.7 release of the Apache Derby project is now live on the Apache website!

Congratulations to the Derby team, they continue to do wonderful work! I didn't have many direct contributions to this release, as I've been extremely busy with other projects and spending less time on Derby recently. However, several of my Google Summer of Code students made substantial contributions to this release:

  • Nirmal Fernando contributed the query plan exporting tool, which can format a captured Derby query plan as XML for further analysis or for formatting for easier comprehension

  • Eranda Sooriyabandara contributed to the TRUNCATE TABLE effort.



I believe the Unicode database names feature was also a GSoC-contributed feature.

I hope to continue being part of the Derby community in the future. Even if I'm not directly contributing features and bug fixes, I still enjoy spending time on the mailing lists, learning from the work that others on the project do, answering questions and participating in the discussions, etc. It's been a great group of people to be involved with, and I'm pleased to be a member of the Derby community.

If you're looking for a low-footprint, reliable, high-quality database, and even more so if you're looking for one implemented in Java, check out Derby.

Big progress on the San Francisco Bay Bridge

If you're a construction junkie (and what software engineer isn't?), it's been an exciting fall for the San Francisco Bay Bridge replacement project. This week, the crews

began hoisting the third of four sets of giant steel pieces that will make up the legs of the 525-foot-tall tower

Read more about the current events here.

But of course, if you're a construction junkie, just reading about the bridge isn't cool enough, so go:


Metaphors are of course crucial:

"The tower is like a stool with four legs," Caltrans spokesman Bart Ney said. "We hope to have the four legs in place by Christmas. Then we can put the seat on top."

It's like a giant present, for the entire San Francisco Bay Area!

Saturday, December 11, 2010

El Nino, La Nina, and the Atmospheric River

In California, around this time of year, we often hear the local weather forecasters discussing El Nino, La Nina, and how the long term forecast for the winter suggests this or that.

I've always been rather baffled by the discussion, because in the short time available to them, the forecasters rarely have enough time to really explain what they're observing, why it matters, and how they're deriving their conclusions.

But I've been reading the blog of a Seattle-area forecaster, Cliff Mass, and he's written several great posts this fall explaining how La Nina affects the weather of the West Coast of the USA.

Short summary: the changed ocean temperature affects the air patterns, and the "Atmospheric River", often called the "Pineapple Express" by Bay Area forecasters when it causes warm, wet air from Hawaii to come streaming right at Northern California, shifts just a couple degrees in direction and instead of being pointed at California, becomes pointed at Oregon/Washington instead. The result: very very wet weather in Oregon and Washington, rather dryer-than-usual weather in California.

Here's some of Mass's recent essays on the subject:

And a bonus link to a short Science article on the subject: Rivers in the Sky are Flooding the World with Tropical Waters.

If you're looking for something a bit different to read, you could do much worse than tuning into Cliff Mass's blog from time to time. I particularly enjoy how he illustrates his articles with the charts and graphs of various forecasting tools, showing how these tools are used, and how forecasters continue to improve their technology in order to further their understanding of the world's weather.

Great essay on waiting in line

Via Cory Doctorow at BoingBoing comes a pointer to this marvelous article about the theory and practice of designing waiting lines for theme park attractions, specifically the waiting lines at Walt Disney World and its new Winnie-the-Pooh attraction.

You probably have no idea that an essay about lining up for a ride could be anywhere near this fascinating and absorbing, but it is:

It's simply a beautiful, expertly executed experience, and the real world seems to fade away slowly as we descend into the perfect dream state. The surrender is so complete that nobody ever seems to notice several significant logic gaps which the queue sees no reason to explain, but rather leaves mysterious. How, for example, do we end up in outer space? It's just there, at the end of a hallway, as if outer space could be on the other side of any ordinary door.


As the author points out, the special magic of doing this well is that the simple activity of waiting in line is part of what builds and reinforces the entire experience of the ride:

The Haunted Mansion, similarly, conjures up an ethereal "house" out of painted walls and suggestive darkness and so we think there's more there than there really is, but we believe the house is really there because we've seen its' exterior. It's hard to not be fooled into believing that there is a real interior inside a solid looking exterior house or facade, or a real room behind a solid-looking door.


About 15 years ago, I had my first experience with Disney's ride reservation system. This is the process by which you can reserve a time slot for one of the more popular rides (Indiana Jones, etc.), and then you simply show up at the appointed time and enter a special pathway which enables you to skip the majority of the line and go directly to the ride.

Ride reservations definitely resolved one of the bigger problems that Disney was having, and made it possible for visitors, with a bit of planning, to avoid spending all day waiting in line to ride only a handful of rides.

However, I recall distinctly remarking to my mystified family that one of the downsides of the new approach was that, for a lot of the rides, "waiting in the line was actually a lot of the fun". When you just walk right up, get on the ride, and walk back away again, somehow the ride isn't anywhere near as fun.

Find 10 minutes. Pour yourself a cup of coffee (tea, soda, milk, etc.) Get comfortable and sit down and read the article. It won't be wasted time, I promise.

Thursday, December 9, 2010

Jim Gettys on TCP/IP and network buffering

If you're at all interested in TCP/IP, networking programming, and web performance issues, you'll want to run, not walk, to this fantastic series of posts by the venerable Jim Gettys:


Here's a little taste to wet your whistle, and get you hankering for more:

You see various behavior going on as TCP tries to find out how much bandwidth is available, and (maybe) different kinds of packet drop (e.g. head drop, or tail drop; you can choose which end of the queue to drop from when it fills). Note that any packet drop, whether due to congestion or random packet loss (e.g. to wireless interference) is interpreted as possible congestion, and TCP will then back off how fast to will transmit data.

... and ...

The buffers are confusing TCP’s RTT estimator; the delay caused by the buffers is many times the actual RTT on the path. Remember, TCP is a servo system, which is constantly trying to “fill” the pipe. So by not signalling congestion in a timely fashion, there is *no possible way* that TCP’s algorithms can possibly determine the correct bandwidth it can send data at (it needs to compute the delay/bandwidth product, and the delay becomes hideously large). TCP increasingly sends data a bit faster (the usual slow start rules apply), reestimates the RTT from that, and sends data faster. Of course, this means that even in slow start, TCP ends up trying to run too fast. Therefore the buffers fill (and the latency rises).


Be sure to read not only the posts, but also the detailed discussions and commentary in the comment threads, as there has been lots of back-and-forth on the topics that Gettys raises, and the follow-up discussions are just as fascinating as the posts.

Be prepared, it's going to take you a while to read all this material, and I don't think that Gettys is done yet! There is a lot of information here, and it takes time to digest it.

At my day job, we spend an enormous amount of energy worrying about network performance, so Getty's articles have been getting a lot of attention. They've provoked a number of hallway discussions, a lot of analysis, and some new experimentation and ideas. We think that we've done an extremely good job building an ultra-high-performance network architecture, but there's always more to learn and so I'll be continuing to follow these posts to see where the discussion goes.

The ASF Resigns From the JCP Executive Committee

You can read the ASF's statement here.

Wednesday, December 8, 2010

The slaloming king

Even if you don't know much about chess, you should check out this game. At move 39, Black gives up his rook, realizing that, of his other pieces, only his other rook has legal moves. Therefore, Black believes he can draw the game via "perpetual check", for if White were to capture the remaining rook, it would be a stalemate.

However, White looks far enough ahead to see that he can manage to walk his king all the way across the board, to a safe square entirely on the opposite side of the board, at which point White will be able to capture the rook while simultaneously releasing the stalemate.

If you're not a great chess fan, just click on move 39 ("39. Rg4xg7") on the game listing on the right side, then step through the final 20 moves of the game to watch in delight as White's king wanders back-and-forth, "slaloming" up the chessboard to finally reach the safe location.

Delightful!

White Nose Syndrome report in Wired

The latest issue of Wired has a long and detailed report from the front lines of the battle against White Nose Syndrome. It's a well-written and informative article, but unfortunately not filled with much hope for those wishing to see an end to the bat die-off.

Tuesday, December 7, 2010

Mathematical doodling

This combines two of the best things in the entire world: mathematics, and doodling!

ALL mathematics classes should be like this!

Progress in logging systems

Twenty years ago, I made my living writing the transaction logging components of storage subsystems for database systems. This is a specialty within a specialty within a specialty:

  • Database systems, like operating systems, file systems, compilers, networking protocols, and the like, are a type of "systems software". Systems software are very low-level APIs and libraries, on top of which are built higher-level middleware and applications.

  • Inside a database system, there are various different components, such as SQL execution engines, query optimizers, and so forth. My area of focus was the storage subsystem, which is responsible for managing the underlying physical storage used to store the data on disk: file and index structures, concurrency control, recovery, buffer cache management, etc.

  • Inside the storage subsystem, I spent several years working just on the logging component, which is used to implement the "write ahead log". The basic foundation of a recoverable database is that, everytime a change is made to the data, a log record which describes that change is first written to the transaction log; these records can later be used to recover the data in the event of a crash.



When I was doing this work, in the 80's and 90's, I was further sub-specializing in a particular type of logging system: namely, a position-independent shared-memory multi-processor implementation. In this implementation, multiple processes attach to the same pool of shared memory at different virtual addresses, and organize the complex data structures within that shared memory using offsets and indexes, rather than the more commonly used pointer based data structures. (As a side note, I learned these techniques from a former co-worker, who is now once again a current co-worker, though we're working on completely different software now. The world turns.)

Anyway, I became somewhat disconnected from logging systems over the years, so I was fascinated to stumble across this paper in this summer's VLDB 2010 proceedings. The Aether team are investigating the issues involved with logging systems on multi-core and many-core systems, where ultra-high concurrency is a main goal.

As the paper points out, on modern hardware, logging systems have become a significant bottleneck for the scalability of database systems, and several databases have coped in rather painful ways, providing hideous "solutions" such as COMMIT_WRITE = NOWAIT which essentially discards the "write ahead" aspect of write ahead logging in search of higher performance.

The authors are pursuing a different strategy, leveraging the several decades worth of work in lock-free shared data structures to investigate how to build massively concurrent logging systems which don't become a bottleneck on a many-core platform. These techniques include things such as Nir Shavit's Diffracting Trees and Elimination Trees.

I think this is fascinating work; it's great to see people approaching logging systems with a fresh eye, and rather than trying to avoid them, as so many of the modern "no sql" systems seem to want to do, instead they are trying to improve and strengthen this tried-and-true technology, addressing its flaws rather than tossing it out like last century's bathwater.

Ultra-high-concurrency data structures are a very interesting research area, and have been showing great progress over the last decade. I think this particular sub-field is far from being played out, and given that many-core systems appear to be the most likely future, it's worth investing time to understand how these techniques work, and where they can be most effectively applied. I'll have more to say about these topics in future posts, but for now, have a read of the Aether paper and let me know what you think!

Monday, December 6, 2010

Ubuntu 10.10 kernel hardening, ptrace protection, and GDB attaching

Today I happened to try to use the gdb debugger to try to attach to an already-running process, and failed:

ptrace: Operation not permitted.


After a certain amount of bashing-of-head-against-wall and cursing-of-frustration-didn't-this-work-before activities, I did a bit of web searching, and found:



I'm not completely sure what to make of this, but the suggested workaround:

# echo 0 > /proc/sys/kernel/yama/ptrace_scope

(executed as root) seems to have done the trick, for now.

If this happened to be your particular nightmare as well, hopefully this saved you a few seconds of anguish...

Friday, December 3, 2010

Perforce 2010.2 enters Beta testing

I'm pleased to see that the 2010.2 version of the Perforce Server is now online and available for testing. If you aren't familiar with the notion of Beta test, here's the wikipedia definition.

I'm excited about this release; I think it contains a number of features which Perforce sites will find useful. Perforce has devoted a lot of attention to the needs of high-end SCM installations recently, and this release contains a number of enhancements specifically targeted at administrators of large, complex Perforce installations.

I was pleased to be able to be part of the team that delivered this release, and I'm looking forward to getting some feedback from users of the release.

The software life cycle never ends, of course, and we're already busy gathering ideas and making plans for the next iteration!

If you get a chance to try out the beta of the new server, let me know what you think!