Packet Switch Architecture R&D

Networks and the Internet are key components of all computer and communication systems. Routers form the basic infrastructure for all IP networks. Switches are an essential component of every router, as well as the basic building block for any high-performance (non-shared-medium) network. Switches make extensive use of specialized hardware engines, and their architecture radically differs from computer processor architectures. Among other R&D topics, our Laboratory has conducted extensive R&D work on Packet Switch and Router Architecture, since 1985, as described below; for further details, follow the links at the topic headings.

Commodity Switches

Our current work is guided by the vision of Commodity Switches becoming a reality in the next years. Commodity switches must be low-cost, high-performance, universal building blocks for switching and routing across the whole spectrum from WAN to MAN, LAN, system area, storage area, embedded system networks, (multi-) processor-memory interconnects, and networks-on-a-chip.

New markets for switches are opening up, currently and in the next few years. LAN's migrated from bus-based to switch-based. Computer I/O is increasingly switch-based (SAN - system/storage area networks). Buses in high performance computing servers are being replaced by switches. Embedded systems are increasingly based on multiple, networked processors. Systems-on-a-chip go to networks-on-a-chip (NoC) to get the performance that buses cannot offer. The volume of these markets will be substantially higher than telecommunication switches and routers. As switches enter this wider market, their architecture must be adapted accordingly, and their price must drop. This economy-of-scale effect may then alter the telecommunication (WAN) router market, in the same manner that PC's and workstations affected the supercomputer market: clusters of inexpensive, mass-made, generic commodity components replaced expensive, special-purpose machines.

Contemporary switch architectures vary widely, evolve rapidly, oftentime suffer from excessive complexity, and do not yet meet a number of objective goals, especially if one considers the above wide spectrum of application domains. We must search to discover the unifying and simplifying concepts for the switches at all of the above scales, from internet WAN's down to the network-on-chip level. Discovering these will allow reuse of design and great cost savings. If we use an analogy to the wide spectrum of processor architectures before the mid-eighties, contemporary switch architectures are still in their "pre-RISC" stage: the "RISC architecture" for switches still remains to be found....

1. Multi-Gigabit Switching Fabrics

Buffered Crossbars: crossbar switches are internally non-blocking, but require complex centralized schedulers and only work with fixed-size cells. However, by including small buffers (hundreds of bytes) at each crosspoint, operation with variable-size packets becomes feasible, and scheduling is dramatically simplified: distinct servers at each input and each output collectively but still independently schedule the set of flows and are loosely coordinated through backpressure signals. We showed (2001-02) that such distributed WFQ scheduling approximates very well the ideal weighted max-min fair allocation, and we studied the factors affecting convergence time. We are currently (2003- ) designing a variable-size packet buffered crossbar switch.

Multilane Backpressure (Credit-Based Flow Control) in Buffered Switching Fabrics: multi-stage switch fabrics allow us to scale packet switches to very large numbers of ports. Scalability requires distributed packet scheduling, which in turn implies internal buffering in the switching elements. Multilane backpressure in the fabric allows the switching elements to only use on-chip buffer memory, while the majority of the packets are buffered at the inputs, in virtual-output queues (VOQ), thus greatly reducing the cost of the fabric. We proposed the use of backpressure in 1987 (IEEE JSAC, Oct.87), and then used it in our Telegraphos (1993-95) and ATLAS I (1995-98) switches.

Pipelined Memory is a novel, patented organization that we designed (1993-95) for the shared buffer and associated switching and cut-through functions in a switch or router. It is simpler and smaller than other alternative organizations, and is particularly suited for VLSI technologies; it was used in the Telegraphos and ATLAS I switches. FORTH owns the USA patent 5,774,653 on the pipelined memory shared buffer switch.

ATLAS I, a 10 Gbit/s single-chip 16x16 ATM switch with backpressure: this 6-million-transistor 0.35-micron CMOS chip --a general-purpose building block for gigabit networking-- was designed in CARV, ICS-FORTH (1995-98) and was fabricated by ST Microelectronics. It provided credit-based flow control (multilane backpressure) with 32 thousand virtual channels, sub-microsecond cut-through latency, logical output queues in a shared buffer, 3 priority levels, multicasting, and load monitoring.

Benes Fabrics with Internal Backpressure: the Benes topology is a multi-stage fabric known to yield, for large N, the lowest-cost NxN non-blocking switches. We applied our buffered fabric architecture to this topology by combining (2001-2002): per-flow backpressure, multipath routing (inverse multiplexing), and cell resequencing. Flow merging was needed, to bring the cost of backpressure down to O(N) per switching element.

2. Quality-of-Service (QoS)

Per-Flow Queueing. The provision of QoS guarantees by modern, advanced-architecture network systems requires the differentiation of traffic into multiple flows, and the isolation among them by providing a separate queue for each. Managing so many (hundreds or thousands to possibly millions) queues at high speed typically requires the assistance of specialized hardware. We have worked on such multi-queue management implementations at different cost and performance levels.

Weighted-Round-Robin Scheduling. After the competing flows have been isolated using per-flow queueing, fairly allocating the available bandwidth to them requires a weighted-round-robin scheduler. We have investigated in detail various methods to perform this, at different cost and performance levels, starting in 1986 (IEEE JSAC Oct. 1987), and continuing with various architectures, up to our pipelined heap manager (ICC'2001) for weighted fair queueing (WFQ) at the rate of 20 to 40 Gbps, and our fast parallel comparator tree for WFQ at 40 Gbps and beyond under fast changes to the set of eligible flows.

3. IP over ATM

Wormhole IP over ATM. To enable low-cost internetworking at gigabit rates, we were among the first to propose and analyze this technology (1998). Inspired from the wormhole-routing multiprocessor interconnection networks of the 80's, it allows one to turn existing ATM networks into gigabit IP routers (in addition to serving the normal ATM traffic), with the mere addition of low-cost wormhole-IP devices; IP routing delay is minimized, owing to virtual cut-through forwarding of the ATM cells. We built and successfully tested (1999) an FPGA-based prototype for a bi-directional 155 Mbps link.

4. Home Networking, Network Processors, Security, Real-Time

The Lab has worked and is active in high-speed and low-cost network architectures for a wide range of applications, using modern technology: (i) home networking; (ii) network processor applications and architectures; (iii) crossbar scheduling ("FIRM", Serpanos, Infocom 2000); (iv) exploitation of emerging technologies, like embedded systems, real-time operating systems; (v) security architectures and systems, etc.

5. Past work in hardware

  • parallel supercomputer architecture (1991-94);
  • high-speed UART macrocell (1991, chip & board implemented);
  • Sbus-to-TAXI interface (1992, chip design);
  • interleaved Rambus memory controller (1994, chip design);
  • Telegraphos I switch (1995, multi-FPGA board implemented);
  • Telegraphos II switch (1996, chip & test board implemented);
  • pipelined memory demonstrator (1995, full-custom chip implemented);
  • PCI/i960 based systems, and device drivers for them (1997-98);
  • SDRAM high-throughput buffer for switches (1998, FPGA board implemented);


© copyright ICS-FORTH, Crete, Greece.
Permission to make digital/hard copies of all or part of this material without fee is granted provided that the copies are made for personal use, they are not made or distributed for profit or commercial advantage, the FORTH copyright notice, the title of the publication and its date appear, and notice is given that copying is by permission of the Foundation for Research & Technology -- Hellas (FORTH). To copy otherwise, in whole or in part, to republish, to post on servers, or to redistribute to lists, requires prior specific written permission and/or a fee.

Up to CARV Lab Home Page Last updated: Mar. 2003, by M. Katevenis.