Manufacturing processors with aggressive CMOS topologies paves the way to more efficient and powerful computers, but also introduces new challenges to the computer industry. As the architecture of processors evolves towards massively multi-core paradigm, the efficient design and use of Network On Chips (NoCs) is paramount, but the exploitation of such devices is hampered by increasing variability and defect rates. In the field of supercomputers, the energy consumption and cost of processor chips decreased so much that the Off-Chip network now takes a very significant share of the supercomputer cost, energy and application latencies.
Inside chips, we then propose the adoption of more complex routing algorithms to better exploit the processor resources in the presence of defects. This approach increases the interconnect throughput by up to a factor 10 in the presence of 20 % of defects. We also propose a method for applications to deploy dynamically their task across cores while avoiding faulty cores, avoiding less efficient cores, and minimizing the expected energy consumption. Finally, a few elements of the NoC simulation model that was developed for this thesis are explained, with a focus on its graphical visualization features.
Outside chips, our focus is on the proposition of topologies allowing better performances, while reducing the interconnect cost. Hence, the team at NII contributed several randomly-generated topologies that take advantage of the small world effect and the physical clustering of nodes to both reduce the network diameter and cabling cost. While this approach is very effective for latency-bound applications, throughput-bound applications are generally best run on a meshed or hierarchical topology. Another issue is that supercomputers are often built in more than one stage, and hence the cabling should be incremental too, which is far from trivial for high-radix topologies.
Hence, we propose a method for cabling multiple topologies in a more efficient and incremental way, thus enabling a more agile deployment of the Off-Chip interconnect in supercomputers.