2011 Poster Sessions : VM-based Auditing System for Kernel Development and Debugging

Student Name : Michael Chan
Advisor : David Cheriton
Research Areas: Computer Systems
Developing and debugging an operating system kernel is a big challenge. State corruption issues are particularly notorious to debug. For example, our TCP-SMO project involves extending the Linux kernel's TCP implementation to support multicast. In this case, It is insufficient to rely solely on end-to-end data transfer tests, because these might succeed while the internal TCP state is corrupted. Instead, we need auditing functions that check for TCP state machine invariance violations at key locations of the implementation. The Linux kernel provides some facilities for this purpose, such as the printk statement, and kernel state dumps via procfs and debugfs. These methods add significant debugging code into the kernel, and require non-trivial integration into user-space test code. For instance, printk statements need to be modified every time different state needs to be dumped, whereas debugfs tracing needs to be explicitly toggled by the user program. Moreover, in the absence of address space protection, it is common for small changes like memory references to cause the kernel to hang, requiring a reboot of the development system. This poses two significant problems. Firstly, frequent reboots of physical machines are time-consuming, lowering debugging efficiency. Secondly, a hard reboot of the system may have side effects on other parts of the system, in particular the file system. Therefore, dumped state intended for post-mortem debugging may be lost or corrupted.

To address these issues, we present an auditing system based on KVM virtual machines to develop and debug the Linux TCP implementation. VMs in this environment can be restarted in less than a minute. TCP state can be selectively dumped from arbitrary points of the kernel implementation into the host machine file system. The state is then fed through a customizable, modular auditing pipeline, which implements the state machine invariance checks. We demonstrate that only a few lines of annotation in the kernel source code is needed to enable auditing, and that the auditing system is readily customizable to debug different bugs in the Linux TCP implementation. Although our focus has been on protocol debugging in Linux, we believe this approach is applicable to development of other parts, such as the file system and virtual memory system, of any kernel as well.

Michael Chan is a PhD candidate in the Distributed Systems Group at Stanford University's Computer Science department. His research interests are in networking and distributed systems. He is currently working on TCP extensions in the Linux kernel's networking stack to enhance transport-layer services for large-scale reliable distributed applications. He was previously involved in the OpenFlow project and was responsible for building kernel modules for client-side network mobility solutions.