Distributed Systems. Instructor: Dr. Rajiv Misra, Department of Computer Science and Engineering, IIT Patna. A distributed system is a software system in which components located on networked computers communicate and coordinate their actions by passing messages. The components interact with each other in order to achieve a common goal. Distributed applications (distributed apps) are applications or software that runs on multiple computers within a network at the same time and can be stored on servers or with cloud computing. This course provides an in-depth understanding of fundamental principles and models underlying the theory, algorithms, and systems aspects of distributed computing. Few Emerging topics such as Peer-to-Peer computing, Distributed Hash Table, Google File System, HDFS, Spark, Sensor Networks and Security in Distributed Systems will also be covered for significant impact. Upon completing this course, students will have intimate knowledge about how things work in a distributed environment.
Lecture 11 - Checkpointing and Rollback Recovery
This lecture covers the following topics: Concept of Checkpointing and Rollback Recovery; Preliminaries: Messages, Issues, Domino Effect, Problem of Livelock; Different Rollback Recovery Schemes (i) Checkpoint Based Recovery Schemes (ii) Log-based Rollback Recovery Schemes; Checkpointing and Recovery Algorithms.