Journaling File System - Rationale

Rationale

Updating file systems to reflect changes to files and directories usually requires many separate write operations. This makes it possible for an interruption (like a power failure or system crash) between writes to leave data structures in an invalid intermediate state.

For example, deleting a file on a Unix file system involves two steps:

  1. Removing its directory entry.
  2. Marking space for the file and its inode as free in the free space map.

If a crash occurs between steps 1 and 2, there will be an orphaned inode and hence a storage leak. On the other hand, if only step 2 is performed first before the crash, the not-yet-deleted file will be marked free and possibly be overwritten by something else.

Detecting and recovering from such inconsistencies normally requires a complete walk of its data structures. This must typically be done before the file system is next mounted for read-write access. If the file system is large and if there is relatively little I/O bandwidth, this can take a long time and result in longer downtimes if it blocks the rest of the system from coming back online.

To prevent this, a journaled file system allocates a special area—the journal—in which it records the changes it will make ahead of time. After a crash, recovery simply involves reading the journal from the file system and replaying changes from this journal until the file system is consistent again. The changes are thus said to be atomic (not divisible) in that they either:

  • succeed (succeeded originally or are replayed completely during recovery), or
  • are not replayed at all (are skipped because they had not yet been completely written to the journal before the crash occurred).

Read more about this topic:  Journaling File System