The problem of cooperatively performing a set of t tasks in a decentralized computing environment subject to failures is one of the fundamental problems in distributed computing. The setting with partitionable networks is especially challenging, as algorithmic solutions must accommodate the possibility that groups of communicating processors become disconnected (and, perhaps, reconnected) during the computation. The efficiency of task-performing algorithms is often assessed in terms of work: the total number of tasks, counting multiplicities, performed by all of the processors during the computation. In general, the scenario where the processors are partitioned into g disconnected components causes any task-performing algorithm to have work W(tg) even if each group of processors performs no more than the optimal number of T(t) tasks.
Given that such pessimistic lower bounds apply to any scheduling algorithm, we pursue a competitive analysis. Specifically, in the talk I will present a simple randomized scheduling algorithm for p asynchronous processors, and compare its performance to that of an omniscient off-line algorithm with full knowledge of the future changes in the communication medium. I will describe the notion of computation width, which associates a natural number with a history of changes in the communication medium, and I will show both upper and lower bounds on work-competitiveness in terms of this quantity. Specifically, we will see that the simple randomized algorithm obtains the competitive ratio (1+ cw / e), where cw is the computation width and e is the base of the natural logarithm (e=2.7182...); this competitive ratio is tight. Finally, I will explain how the algorithm could be implemented using a Group Communication Service implementation while maintaining its work-competitiveness.
Joint work with Alex A. Shvartsman and Alexander Russell.
Undergraduate studies at the University of Cyprus, Cyprus (B.Sc. in Mathematics, 1998) and graduate studies at the University of Connecticut, U.S.A. (M.Sc. and Ph.D. in Computer Science and Engineering, 2002 and 2003 respectively). He has worked as a Teaching and Research Assistant at the University of Connecticut (1998-2003) and at the University of Cyprus (Visiting Lecturer 2004, Lecturer 2005-present). His research interests span the Theory and Practice of Distributed and Parallel Computing: in particular, Fault-tolerance and Dependability, Algorithms and Complexity, Communication Networks and Paradigms, Group Communication Systems, Distributed Atomic Objects, Cooperative and Non-cooperative Computations, and Dynamic Computing Environments. He has published in prestigious journals and conference proceedings in his areas of study.