This month, we mark the first anniversary of the "Serving With Linux" column.
In these 12 months, Linux has already carved out a firm place as one of the top server OSes. As we saw in May's column, clusters are an important piece of the puzzle, making Linux a complete and comprehensive offering for enterprise servers. The other important piece is the journaling file system.
Why are journaling file systems important? How do they work? Which journaling file systems exist for Linux?
Journaling file systems are safer than traditional file systems because they keep track of changes applied to the disks' content on a separate log file. They either commit a change or roll back in a transactional manner, much like an RDBMS.
Ext2 Is Just Not Enough
Every Linux distro used the ext2 file system by default at installation, although Linux in its various versions has been supporting many other file systems as well. Among these additional file systems there are FAT, VFAT, HPFS (OS/2), the NT file system, Sun's UFS, and many more.
The designers of ext2 were primarily concerned with efficiency and performance issues. Ext2 does not write file meta-data (information about a file, such as permissions, ownership and creation/access times) synchronously with the file's content. In other words, Linux first writes the file's content and then only later, when it finds time to do so, the file's meta-data. If a power failure occurs after updating the file's content, but BEFORE updating the file's meta-data the file system might be inconsistent. This is a fine example of trading overall file system security for performance.
In environments with high-volume file system operations (for instance a free e-mail Web services such as Hotmail©), this is a rather big risk. A journaling file system helps this environment.
Imagine now that you are updating a directory entry. You've just modified 23 file entries in the fifth block of some giant directory entry. Just as the disk is in the middle of writing this block there is a power-outage; the block is now incomplete, and therefore corrupted.
During reboot, Linux (like all Unix machines) runs a program called "fsck" (file system check) that steps through the entry file system validating all entries and making sure that blocks are allocated and referenced correctly. It will find this corrupted directory entry and attempt to repair it. There is no certainty that fsck will actually manage to repair the damage. Quite often, actually, it does not. Sometimes, in a situation as described above, all the directory entries can be lost.
For large file systems, fsck can take a very long time. On a machine with many gigabytes of files, fsck can run for up to 10 or more hours. During this time, the system is obviously un-usable and this represents for some shops an unacceptable amount of downtime. This is where journaling file systems help.