This paper introduces packet chaining, a simple and effective method to
increase allocator matching efficiency and hence network performance,
particularly suited to networks with short packets and short cycle
times. Packet chaining operates by chaining packets destined to the same
output together, to reuse the switch connection of a departing packet.
This allows an allocator to build up an efficient matching over a number
of cycles like incremental allocation, but not limited by packet length.
For a 64-node 2D mesh at maximum injection rate and with single-flit
packets, packet chaining increases network throughput by 15% compared to
a highly-tuned router using a conventional single-iteration separable
iSLIP allocator, and outperforms significantly more complex allocators.
Specifically, it outperforms multiple-iteration iSLIP allocators and
wavefront allocators by 10% and 6% respectively, and gives comparable
throughput with an augmenting paths allocator.
Packet chaining achieves this performance with a cycle time comparable
to a single-iteration separable allocator. Packet chaining also reduces
average network latency by 22.5% compared to a single-iteration iSLIP
allocator. Finally, packet chaining increases IPC up to 46% (16%
average) for application benchmarks because short packets are critical
in a typical cache-coherent chip multiprocessor.
Packet Chaining: Efficient Single-Cycle Allocation for On-Chip Networks

20.12.2011
Speaker : George Michelogiannakis, <i>Stanford University</i>
Date : 20.12.2011
Time: 11:15 - 12:00
Location : "Stelios Orphanoudakis" Seminar Room, FORTH. Heraklion, Crete.
Host : Manolis Katevenis
Date : 20.12.2011
Time: 11:15 - 12:00
Location : "Stelios Orphanoudakis" Seminar Room, FORTH. Heraklion, Crete.
Host : Manolis Katevenis
Abstract:
Bio:
George Michelogiannakis is finishing his PhD studies at Stanford
University. His thesis is focusing on energy-efficient flow control for
on-chip networks. It evaluates bufferless flow control and proposes
elastic buffer flow control to provide network buffering with minimal
cost and without the complications of bufferless networks, by using
pipeline flip-flops for storage. He has also investigated hierarchical
on-chip networks for large-scale chip multiprocessors. His last work
focuses on increasing allocation efficiency in network routers to reach
or exceed wavefront and augmenting paths, without extending the delay path.