Tuesday, June 29, 2010

A classic paper on VMWare

Some papers are classics, and can be read over and over again.

I recently returned to a paper I'd read more than 5 years ago, Carl Waldspurger's Memory Resource Management in VMWare ESX Server, and was amazed at how fresh, readable, and fascinating it remains.

If you've never explored the internals of VMWare, if you aren't familiar with terms like ballooning, content-based virtual memory page sharing, or the idle memory tax, you'll want to read this paper.

Every time I (re)read about the idea of ballooning, I'm struck by how simple, yet perfectly appropriate, this idea is. The paper introduces the problem as follows:

In general, a meta-level page replacement policy must make relatively uninformed resource management decisions. The best information about which pages are least valuable is known only by the guest operating system within each VM. Although there is no shortage of clever page replacement algorithms, this is actually the crux of the problem. A sophisticated meta-level page replacement algorithm is likely to introduce performance anomalies due to unintended interactions with naive memory management policies in guest operating systems.


Suppose the meta-level policy selects a page to reclaim and pages it out. If the guest OS is under memory pressure, it may choose the very same page to write to its own virtual paging device. This will cause the page contents to be faulted in from the system paging device, only to be immediately written out to the virtual paging device.


A small balloon module is loaded into the guest OS as a pseudo-device driver or kernel service. It has no external interface within the guest, and communicates with ESX Server via a private channel. When the server wants to reclaim memory, it instructs the driver to "inflate" by allocating pinned physical pages within the VM, using appropriate native interfaces. Similarly, the server may "deflate" the balloon by instructing it to deallocate previously-allocated pages.

The balloon is a program that performs an incredibly useful function, simply by allocating some memory when you ask it to, and then, later, deallocating that memory. It's brilliant!

Apparently, many of the ideas in this project came from experiences with the Disco project at Stanford in the mid-1990's, a project I hadn't paid much attention to at the time. I'll try to go track down some of those references and see what I learn from that.

No comments:

Post a Comment