8.6 Code: logging

A typical use of the log in a system call looks like this:


              1
              
              
              
              begin_op();


              2
              
              
              
              ...


              3
              
              
              
              bp = bread(...);


              4
              
              
              
              bp->data[...] = ...;


              5
              
              
              
              log_write(bp);


              6
              
              
              
              ...


              7
              
              
              
              end_op();

begin_op (kernel/log.c:127) waits until the logging system is not currently committing, and until there is enough unreserved log space to hold the writes from this call. log.outstanding counts the number of system calls that have reserved log space; the total reserved space is log.outstanding times MAXOPBLOCKS. Incrementing log.outstanding both reserves space and prevents a commit from occurring during this system call. The code conservatively assumes that each system call might write up to MAXOPBLOCKS distinct blocks.

log_write (kernel/log.c:215) acts as a proxy for bwrite. It records the block’s sector number in memory, reserving it a slot in the log on disk, and pins the buffer in the block cache to prevent the block cache from evicting it. The block must stay in the cache until committed: until then, the cached copy is the only record of the modification; it cannot be written to its place on disk until after commit; and other reads in the same transaction must see the modifications. log_write notices when a block is written multiple times during a single transaction, and allocates that block the same slot in the log. This optimization is often called absorption. It is common that, for example, the disk block containing inodes of several files is written several times within a transaction. By absorbing several disk writes into one, the file system can save log space and can achieve better performance because only one copy of the disk block must be written to disk.

end_op (kernel/log.c:147) first decrements the count of outstanding system calls. If the count is now zero, it commits the current transaction by calling commit(). There are four stages in this process. write_log() (kernel/log.c:179) copies each block modified in the transaction from the buffer cache to its slot in the log on disk. write_head() (kernel/log.c:103) writes the header block to disk: this is the commit point, and a crash after the write will result in recovery replaying the transaction’s writes from the log. install_trans (kernel/log.c:69) reads each block from the log and writes it to the proper place in the file system. Finally end_op writes the log header with a count of zero; this has to happen before the next transaction starts writing logged blocks, so that a crash doesn’t result in recovery using one transaction’s header with the subsequent transaction’s logged blocks.

recover_from_log (kernel/log.c:117) is called from initlog (kernel/log.c:55), which is called from fsinit(kernel/fs.c:42) during boot before the first user process runs (kernel/proc.c:535). It reads the log header, and mimics the actions of end_op if the header indicates that the log contains a committed transaction.

An example use of the log occurs in filewrite (kernel/file.c:135). The transaction looks like this:


              1
              
              
              
                  begin_op();


              2
              
              
              
                  ilock(f->ip);


              3
              
              
              
                  r = writei(f->ip, ...);


              4
              
              
              
                  iunlock(f->ip);


              5
              
              
              
                  end_op();

This code is wrapped in a loop that breaks up large writes into individual transactions of just a few sectors at a time, to avoid overflowing the log. The call to writei writes many blocks as part of this transaction: the file’s inode, one or more bitmap blocks, and some data blocks.