Error Virtualization: Building a Reactive Immune System for Software Services
Speaker: Stelios Sidiroglou, PhD student of Computer Science at Columbia University.
Date: 14 June 2006 Time: 11:00
Location: "Mediterranean Studies" Seminar Room, FORTH, Heraklion, Crete.
Host: Euaggelos Markatos


We propose a reactive approach for handling a wide variety of software failures, ranging from remotely exploitable vulnerabilities to more mundane bugs that cause abnormal program termination ( e.g., illegal memory dereference) or other recognizable bad behavior (e.g., computational denial of service). Our emphasis is in creating ``self-healing'' software that can protect itself against a recurring fault until a more comprehensive fix is applied.

Briefly, our system monitors an application during its execution using a variety of external software probes, trying to localize (in terms of code regions) observed faults. In future runs of the application, the "faulty'' region of code will be executed by an instruction-level emulator. The emulator will check for recurrences of previously seen faults before each instruction is executed. When a fault is detected, we recover program execution to a safe control flow. Using the emulator for small pieces of code, as directed by the observed failure, allows us to minimize the performance impact on the immunized application. We discuss the overall system architecture and a prototype implementation for the x86 platform. We show the effectiveness of our approach against a range of attacks and other software failures in real applications such as Apache, sshd, and Bind. In this talk, I will also present our on-going work on a new technique for retrofitting legacy applications with exception handling techniques, which we call {\it Autonomic Software Self-Healing Using Error Virtualization Rescue Points} (ASSURE). ASSURE is a general software fault-recovery mechanism that uses operating system virtualization techniques to provide ``rescue points'' that an application can use to recover execution to, in the presence of faults. When a fault occurs at an arbitrary location in the program, we restore program execution to a "rescue point'' and imitate its observed behavior to propagate errors and recover execution.


Stelios Sidiroglou is PhD student of Computer Science at Columbia University and a member of the Network Security Lab. He received his Masters and Bachelors in Electrical Engineering from Columbia University and WPI, respectively. His research interests include, software survivability, large-scale system security and privacy and anonymity.

© Copyright 2007 FOUNDATION FOR RESEARCH & TECHNOLOGY - HELLAS, All rights reserved.