We give a graph-theoretical model for off-line fault-tolerant scheduling of dataflow algorithms onto multiprocessor architectures with distributed memory. Our framework allows the modeling of both processor and channel failures of the ``fail silent'' type (either transient or permanent), and failure masking is achieved by replicating operations and data communications. We show that, in general, the graph representing a fault-tolerant scheduling may have circuits; hence, the classical computation of starting/ending times, based upon a topological ordering, is inapplicable. We then provide a notion of ``pseudo-topological ordering'' that permits the computation of the starting/ending times even in the case of cyclic graphs. We also derive algorithms for computing the timeouts that are used for failure detection.
"Proceedings of the Joint International Conference on Formal Modeling and Analysis of Timed Systems (FORMATS'04) and Formal Techniques for Real-Time and Fault-Tolerant Systems (FTRTFT'04), Grenoble, France, September 2004, Springer Verlag, Lecture Notes in Computer Science, vol. 3253, pages 215-230, 2004.
gzipped postscript file.