CARV Laboratory
Motivation

Multicore architectures

Throughout the history of semiconductor technology, there has been a continuous advance in materials science and fabrication techniques that produced increasingly smaller and faster processors, as predicted in Moore's Law.  Recently, however, the industry has reached the physical limits of the materials, slowing and stopping this progress. To answer the ever-growing market demand for processing power, manufacturers turned towards multicore processor architectures.  Dual- and quad-core processors are currently the norm among home users, whereas there are high-end prototypes with as many as 48 (SCC), 54 (Vega3) or 64 (Tile64) high-performance, general-purpose processors.

Parallel programming

The onset of multicore processors has made parallel programming increasingly important, both in the parallelization of existing software and in the development of new software.  Traditionally, parallel programming was considered to be targeted towards high-performance computing, and limited to expert programmers. The dominant parallel programming paradigm of threads and lock-based synchronization requires the programmer to reason about the myriad implicit and explicit interactions of threads through shared memory and synchronization, making parallel programming difficult and error prone.  This difficulty of parallel programming has brought a growth of novel, high-level parallel programming models aiming to facilitate multicore programming.

High-productivity programming in Java

High productivity languages aim to reduce the expertise required and increase the productivity of non-expert programmers. Java is a representative high-productivity language that provides high-level abstractions, allowing non-expert programmers to write portable, high-complexity applications, by hiding low-level details like memory management, and providing a wide range of reusable library components. Java also targets parallel programming, by supporting a wide range of programming models for parallelism apart from threads, such as tasks, futures, events, etc.  Moreover, it includes a large library of efficient, parallel implementations of popular data structures, providing reusable high-level building blocks to high-productivity programmers. Common concurrent data structures, like stacks and queues, are the most widely used inter-thread communication structures, and thus major building blocks of parallel software.  So, the design of effective such structures is of major importance for ease of programming, scalability, and power efficiency.

Cache coherency and shared memory

Java, like most high-productivity languages, expresses parallelism assuming a relaxed model of shared memory, requiring cache coherence between all processor memories.  However, cache coherence protocols do not scale well with the number of cores and the complexity of on-chip interconnection networks.  Thus, several state-of-the-art multicore architectures do not support cache coherence, as it limits scalability and energy efficiency.  Until recently, energy consumption was not crucial in multicore architecture design; it is now becoming very important as the core count increases and energy poses a limiting factor in processor performance and thermal behavior. With multicore processors finding widespread use in a wide range of applications from embedded low-power systems to massive cloud computing datacenters, we expect this trend to continue into the future.  More specifically, we assume that processor architectures with hundreds of cores will face difficulties in supporting cache coherence across all cores in hardware; we believe a more promising approach is supporting cache coherence only within islands of cores. In this approach, coherence can be implemented cost-effectively and use explicit communication mechanisms, such as remote DMA, remote load-store instructions, or message passing between islands, to sustain scalability.