| |
Scalable Self-Stabilization via Composition
William Leal and Anish Arora
ICDCS 2004
Systems tend to be subject to transient faults, including hardware faults,
errors in program code, inappropriate composition of
components, wrong input, operator error, and so forth. In large or distributed
systems it is often impractical or impossible to mask
the effect of these faults. Self-stabilization is an approach that addresses the
problem by allowing a system to recover from
arbitrary faults.
Objections to the practical use of stabilization have centered around problems
of scale. Because of potential interferences
between actions, global reasoning over the entire system is in general
necessary. The complexity of this task increases dramatically as systems grow in
size. Alternatives to dealing with this complexity focus on reset and
composition. For reset, the problem is that any fault, no matter how minor, will
cause a complete system reset with potentially significant lack of availability.
For existing compositional alternatives, including compositional reset, severe
restrictions on candidate systems are imposed.
To address these issues, we give a framework for composition in which global
reasoning and detailed system knowledge are not
necessary, and which apply to a significantly wider range of systems than has
hitherto been possible. In this framework, we explicitly identify for each
component which other components it can corrupt. Additionally, the correction of
one component often depends on the prior correction of one or more other
components, constraining the order in which correction can take place. Given
appropriate component stabilizers such as detectors and correctors, we offer
several ways to coordinate system correction, depending on what is actually
known about the corruption and correction relations.
By reducing the design of and reasoning about stabilization to local activities
involving each component and the neighbors with which it interacts, the
framework is scalable. Reset is generally avoided by using the correction
relation to check and correct only where necessary. By including both correction
and corruption relations, the framework subsumes and extends other compositional
approaches.
Though not directly a part of this work, we mention tools and techniques that
can be used to help calculate the dependency and corruption relations and to
help create create the necessary stabilizers.
To illustrate the theory, we show how this framework has been applied in our
work in sensor networks. [pdf] [bibtex
entry]
[pdf] [bibtex
entry] Extended technical report version
|