This research has been supported by an NSF CAREER Award.
A UNIFIED FRAMEWORK FOR MULTILEVEL PARALLELIZATION ON DEEP COMPUTING SYSTEMS (2004–2009)
Project Summary   Members   Publications   Home
PROJECT SUMMARY
This research activity involves the design, development and deployment of a programming framework for explicit multilevel parallelization using a global address space model. The framework targets the upcoming generation of Petaflop-class supercomputers, which are based on architectural substrates with multiple levels of on-chip and off-chip parallelism and deep memory hierarchies. The research addresses the need for increased productivity and utilization of the National supercomputing power, by attempting to close the gap between computer architecture innovation and parallel programming practices. The main goal of this activity is to reduce the programming effort required for mapping algorithmic parallelism to hierarchical hardware components with heterogeneous means for parallel execution and assist programmers in deriving balanced designs of layered parallel applications. The programming framework investigated in this research unifies parallel programming models and methodologies and enables faster adaptation of parallel code to new hardware platforms. Concurrently, it forms a basis for education and training of interdisciplinary student audiences in high performance programming. The parallel programming component of this research is designed around standard C++ templates with notation for nested threads and iterators. The notations for parallelism are coupled with a templated representation of data, which allows for arbitrary partitioning, sharing, and coherence control at multiple levels of parallel execution constructs. While the programmer highlights nested parallelism, the orchestration and management of multigrain threads and data are delegated to the compiler and the runtime system. The research investigates novel methods for controlling the granularity of multilevel parallelism via vertical analysis of the program. Periodicity analysis and selective runtime tracing are used as means to derive effective data distribution and layout schemes without user intervention. Alongside runtime analysis, new resource-driven scheduling strategies and novel microprocessor features, including on-chip multithreading, on-chip SIMD parallelism and speculative execution, are incorporated into the parallelization and program optimization processes.

INVESTIGATORS AND PROJECT MEMBERS
Principal Investigator
Dimitris Nikolopoulos, Associate Professor, University of Crete

Senior Collaborators
Christos Antonopoulos, Assistant Professor, University of Thessaly
Alexandros Stamatakis, Research Group Leader, Technical University of Munich
Andreas Stathopoulos, Professor, College of William and Mary

Junior resarchers
Filip Blagojevic, Ph.D.
Matthew Curtis-Maury, Ph.D.
Richard Tran Mills, Ph.D.
Scott Schneider, Doctoral student
Jae-seung Yeom, Doctoral student
Benjamin Rose, M.Sc.
Scheduling Dynamic Parallelism on Accelerators.
Filip Blagojevic, Costin Iancu, Katherine A. Yelick, Dimitrios S. Nikolopoulos, Benjamin Rose and Matthew Curtis-Maury. Proceedings of the 6th ACM Conference on Computing Frontiers (CF), pages 161–170, Ischia, Italy, May 2009.
Scheduling Dynamic Parallelism on the Cell BE.
Filip Blagojevic, Costin Iancu, Katherine A. Yelick, Dimitrios S. Nikolopoulos, Benjamin Rose, and Matthew Curtis-Maury. Proceedings of the 15th Meeting of the IBM HPC Systems Scientific Computing User Group (SCICOMP), Barcelona, Spain, May 2009.
A Comparison of Programming Models for Multiprocessors with Explicitly Managed Memory Hierarchies.
Scott Schneider, Jae-seung Yeom, Benjamin Rose, John C. Linford, Adrian Sandu and Dimitrios S. Nikolopoulos. Proceedings of the 14th ACM SIGPLAN Symposium on Principles an Practice of Parallel Programming (PPOPP), pages 131–140, Raleigh, NC, February 2009.
Set-top Supercomputing: Scalable Software for Scientific Simulations.
Dimitrios S. Nikolopoulos. ERCIM News, Issue 74, July 2008.
Scheduling Asymmetric Parallelism on a PlayStation3 Cluster.
Filip Blagojevic, Matthew Curtis-Maury, Jae-Seung Yeom, Scott Schneider, and Dimitrios S. Nikolopoulos. Proceedings of the 8th IEEE International Symposium on Cluster Computing and the Grid (CCGRID), pages 146–153, Lyon, France, May 2008.
Cell-SWat: Modeling and Scheduling Wavefront Computations on the Cell B/E.
Ashwin Aji, Filip Blagojevic, Wu chun Feng, and Dimitrios S. Nikolopoulos. Proceedings of the 5th ACM International Conference on Computing Frontiers (CF), pages 13–22, Ischia, Italy, May 2008.
Unified Scheduling of Polymorphic Parallelism on the Cell Processor.
Dimitrios S. Nikolopoulos. Abstracts of the 2008 SIAM Conference on Parallel Processing for Scientific Computing, Miniworkshop on the Cell Processor, 1 pp., Atlanta, GA, March 2008.
Supporting I/O-intensive Workloads on the Cell Architecture.
Muhammad Mustafa Rafique, Ali R. Butt, and Dimitrios S. Nikolopoulos. Proceedings of the 6th USENIX Conference on File and Storage Systems (FAST), short abstract, 2 pp., San Jose, CA, February 2008.
Modeling Multigrain Parallelism on Heterogeneous Multicore Processors: A Case Study of the Cell BE.
Filip Blagojevic, Xizhou Feng, Kirk Cameron, and Dimitrios S. Nikolopoulos. Proceedings of the 3rd International Conference on High-Performance Embedded Architectures and Compilers (HIPEAC), Lecture Notes in Computer Science Volume 4917, pages 38–52, Göteborg, Sweden, January 2008.
Runtime Scheduling of Dynamic Parallelism on Accelerator-Based Multi-core Systems.
Filip Blagojevic, Dimitrios S. Nikolopoulos, Alexandros Stamatakis, Christos D. Antonopoulos, and Matthew Curtis-Maury. Parallel Computing, 33(10-11):700–719, November 2007.
Synthesizing Parallel Programming Models for Asymmetric Multi-Core Systems.
Dimitrios S. Nikolopoulos and Kirk W. Cameron. Abstracts of the Eleventh Workshop on High Performance Embedded Computing (HPEC), 1 pp., Lexington, MA, September 2007.
System Software Challenges and Opportunities on Asymmetric Multicore Processors.
Dimitrios S. Nikolopoulos. Proceedings of the 2007 Fall Creek Falls Conference – Panel on Key Challenges Presented by Next Generation Hardware Systems, invited presentation, 1 pp., Nashville, TN, September 2007.
Runtime and Programming Support for Memory Adaptation in Scientific Applications via Local Disk and Remote Memory.
Richard Tran Mills, Chuan Yue, Andreas Stathopoulos, and Dimitrios S . Nikolopoulos. Journal of Grid Computing, 5(2):213–234, June 2007.
RAxML-CELL: Parallel Phylogenetic Tree Construction on the Cell Broadband Engine.
Filip Blagojevic, Alexandros Stamatakis, Christos Antonopoulos, and Dimitrios S. Nikolopoulos. Proceedings of the 21st IEEE/ACM International Parallel and Distributed Processing Symposium (IPDPS), 10 pp., Long Beach, CA, March 2007.
Dynamic Mulitgrain Parallelization on the Cell Broadband Engine.
Filip Blagojevic, Dimitrios S. Nikolopoulos, Alexandros Stamatakis, and Christos Antonopoulos. Proceedings of the 12th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPOPP), pages 90–100, San Jose, CA, March 2007. Best Paper Award.
Exploring Programming Models and Optimizations for the Cell Broadband Engine using RAxML.
Filip Blagojevic and Dimitrios S. Nikolopoulos. Abstracts of the 2006 Virginia Tech High-End Computing Challenge, 14 pp., September 2006.
Runtime Support for Memory Adaptation in Scientific Workloads via Local Disk and Remote Memory.
Chuan Yue, Richard Tran Mills, Andreas Stathopoulos, and Dimitrios S. Nikolopoulos. Proceedings of the 15th IEEE International Symposium on High Performance Distributed Computing (HPDC), pages 183–194, Paris, France, June 2006. Best Paper Award Nominee (one of five papers).
Scalable Locality-Conscious Multithreaded Memory Allocation.
Scott Schneider, Christos D. Antonopoulos, and Dimitrios S. Nikolopoulos. Proceedings of the 2006 ACM SIGPLAN International Symposium on Memory Management (ISMM), pages 84–94, Ottawa, Canada, June 2006.
Factory: An Object-Oriented Parallel Programming Substrate for Deep Multiprocessors.
Scott Schneider, Christos D. Antonopoulos, and Dimitrios S. Nikolopoulos. Proceedings of the Seventh IEEE International Conference on High Performance Computing and Communications (HPCC), Lecture Notes in Computer Science Volume 3726, pages 223–232, Sorrento, Italy, September 2005.
Integrating Multiple Forms of Multithreaded Execution on SMT Processors: A Quantitative Study with Scientific Workloads.
Matthew Curtis-Maury, Tanping Wang, Christos D. Antonopoulos, and Dimitrios S. Nikolopoulos. Proceedings of the Second International Conference on the Quantitative Evaluation of Systems (QEST), pages 199–209, Torino, Italy, September 2005.
smt-SPRINTS: Software Precomputation with Intelligent Streaming for Resource-Constrained SMTs.
Tanping Wang, Christos D. Antonopoulos, and Dimitrios S. Nikolopoulos. Proceedings of EuroPar'2005 (EUROPAR), Lecture Notes in Computer Science Volume 3648, pages 710–719, Lisbon, Portugal, August 2005.
An Evaluation of OpenMP on Current and Emerging Multithreaded Processors.
Matthew Curtis-Maury, Xiaoning Ding, Christos D. Antonopoulos, and Dimitrios S. Nikolopoulos. Proceedings of the First International Workshop on OpenMP (IWOMP), Lecture Notes in Computer Science Volume 4315, pages 133–142, Eugene, OR, June 2005. Best Paper Award.
Runtime Support for Integrating Precomputation and Thread-Level Parallelism on Simultaneous Multithreaded Processors.
Tanping Wang, Filip Blagojevic, and Dimitrios S. Nikolopoulos.Proceedings of the 7th ACM SIGPLAN Workshop on Languages, Compilers and Runtime Support for Scalable Systems (LCR), Volume 81 of ACM International Conference Proceedings Series, 12 pp., Houston, TX, October 2004.
Adapting to Memory Pressure from within Scientific Applications on Multiprogrammed COWs.
Richard Tran Mills, Andreas Stathopoulos, and Dimitrios S. Nikolopoulos. Proceedings of the 18th IEEE International Parallel and Distributed Processing Symposium (IPDPS), 10 pp., Santa Fe, NM, April 2004.
Dynamic Tiling for Effective Use of Shared Caches on Multithreaded
Dimitrios S. Nikolopoulos. Processors. International Journal of High Performance Computing and Networking, 2(1):22–35, 2004.

© copyright Dimitrios S. Nikolopoulos. Last modification: , by dsn.