In recent years multi-core computer systems have left the realm of high-performance computing and virtually all of today's desktop computers and embedded computing systems are equipped with several processing cores. Still, no single parallel programming model has found widespread support and parallel programming remains an art for the majority of application programmers. In addition, there exists a plethora of sequential legacy applications for which automatic parallelization is the only realistic hope to benefit from the potentially increased processing power of modern multi-core systems. In this talk we present a novel approach to extracting and exploiting parallelism from sequential applications. We use profiling to overcome the limitations of static data and control flow analysis enabling more aggressive parallelization. A key contribution of this work is a whole-program representation that supports profiling, parallelism extraction and exploitation. We demonstrate how this enhances conventional parallelization by incorporating support for array and coupled reduction operations as well as multi-level loop partitioning and pipeline stage replication. We have applied our technique targeting two different forms of parallelism, namely data and pipeline parallelism. First, we demonstrate the effectiveness of our parallelisation strategy in extracting data-level parallelism using the NAS and SPEC FP benchmarks. Our approach not only yields significant improvements when compared with state-of-the-art parallelizing compilers, but comes close to and sometimes exceeds the performance of manually parallelized codes. Second, we present an enhanced code generation methodology which targets both pipeline and data parallelism. We have evaluated on a set of multimedia and stream processing benchmarks and demonstrate speedups of up to 4.7 on a eight-core Intel Xeon machine.
Georgios Tournavitis is currently a PhD student at the Institute for Computing Systems Architecture (ICSA) of the University of Edinburgh. His research interests lie in the general areas of compilation and programming languages for parallel architectures. More specifically, he is interested in compiler-based and runtime techniques that enable compilers to extract high-level parallelization skeletons from sequential applications. Most recently he also started working on compiler-directed optimizations for saving static-power in the cache hierarchy of Chip Multi-Processors. He holds an Engineering Diploma and an MSc in Computer Engineering from the University of Patras, Greece. As part of his MSc project he designed and implemented a multi- threaded Software Distributed Shared Memory (SDSM) system for clusters of Multi-Processors.