The performance of multithreaded applications on multicore architectures depends largely on locality and communication. However, most performance analyses are architecture-dependent, and hence insights gleaned from an application's behavior on one platform may not apply when the application is run on another. In contrast, architecture-independent metrics allow a program's performance to be analyzed across a range of architectures without incurring the overhead of repeated profiling and analysis. We propose multicore-aware reuse distance, which captures the inherent locality properties of an application along with the impact of inter-thread data interactions. We then show how statistical sampling and parallelization can speed this analysis up by orders of magnitude with minimal loss of accuracy, enabling the use of privatized O(1) data structures, reduced synchronization, and sampling rates as low as one in a million.
Vijay S. Pai [https://engineering.purdue.edu/~vpai] received his PhD from Rice University in 2000. He joined the faculty of Purdue University in 2004 after serving as an assistant professor at Rice University and a senior developer at iMimic Networking. He received the NSF CAREER award in 2003 and the Wilfred "Duke" Hesselberth Award for Teaching Excellence in 2007. He was a primary developer and maintainer of the publicly-available Rice Simulator for ILP Multiprocessors (RSIM), and has advised the creation and free public distribution of Spinach (network interface simulator), Toast (peer-to-peer video-on-demand system), and SpeakAll! (augmented communication iPad app for teachers of special-needs children).