Stochastic Models for Fault Tolerance: Restart, Rejuvenation and Checkpointing

Stochastic Models for Fault Tolerance: Restart, Rejuvenation and Checkpointing

Katinka Wolter

As glossy society is dependent upon the fault-free operation of advanced computing platforms, procedure fault-tolerance has turn into an critical requirement. for this reason, we'd like mechanisms that warrantly right provider in instances the place process elements fail, be they software program or components. Redundancy styles are customary, for both redundancy in area or redundancy in time.

Wolter’s e-book info equipment of redundancy in time that must be issued on the correct second. particularly, she addresses the so-called "timeout choice problem", i.e., the query of selecting the proper time for various fault-tolerance mechanisms like restart, rejuvenation and checkpointing. Restart exhibits the natural process restart, rejuvenation denotes the restart of the working setting of a job, and checkpointing contains saving the procedure kingdom periodically and reinitializing the approach on the most up-to-date checkpoint upon failure of the process. Her presentation contains a short advent to the equipment, their designated stochastic description, and likewise facets in their effective implementation in real-world systems.

The publication is concentrated at researchers and graduate scholars in procedure dependability, stochastic modeling and software program reliability. Readers will locate right here an updated evaluation of the most important theoretical effects, making this the one entire textual content on stochastic versions for restart-related problems.

Show sample text content

Download sample