8.6 Code: logging
A typical use of the log in a system call looks like this:
1 begin_op();
2 ...
3 bp = bread(...);
4 bp->data[...] = ...;
5 log_write(bp);
6 ...
7 end_op();
begin_op
(kernel/log.c:127)
waits until
the logging system is not currently committing, and until
there is enough unreserved log space to hold
the writes from this call.
log.outstanding
counts the number of system calls that have reserved log
space; the total reserved space is
log.outstanding
times
MAXOPBLOCKS
.
Incrementing
log.outstanding
both reserves space and prevents a commit
from occurring during this system call.
The code conservatively assumes that each system call might write up to
MAXOPBLOCKS
distinct blocks.
log_write
(kernel/log.c:215)
acts as a proxy for
bwrite
.
It records the block’s sector number in memory,
reserving it a slot in the log on disk,
and pins the buffer in the block cache
to prevent the block cache from evicting it.
The block must stay in the cache until committed:
until then, the cached copy is the only record
of the modification; it cannot be written to
its place on disk until after commit;
and other reads in the same transaction must
see the modifications.
log_write
notices when a block is written multiple times during a single
transaction, and allocates that block the same slot in the log.
This optimization is often called
absorption.
It is common that, for example, the disk block containing inodes
of several files is written several times within a transaction. By absorbing
several disk writes into one, the file system can save log space and
can achieve better performance because only one copy of the disk block must be
written to disk.
end_op
(kernel/log.c:147)
first decrements the count of outstanding system calls.
If the count is now zero, it commits the current
transaction by calling
commit().
There are four stages in this process.
write_log()
(kernel/log.c:179)
copies each block modified in the transaction from the buffer
cache to its slot in the log on disk.
write_head()
(kernel/log.c:103)
writes the header block to disk: this is the
commit point, and a crash after the write will
result in recovery replaying the transaction’s writes from the log.
install_trans
(kernel/log.c:69)
reads each block from the log and writes it to the proper
place in the file system.
Finally
end_op
writes the log header with a count of zero;
this has to happen before the next transaction starts writing
logged blocks, so that a crash doesn’t result in recovery
using one transaction’s header with the subsequent transaction’s
logged blocks.
recover_from_log
(kernel/log.c:117)
is called from
initlog
(kernel/log.c:55),
which is called from fsinit
(kernel/fs.c:42) during boot before the first user process runs
(kernel/proc.c:535).
It reads the log header, and mimics the actions of
end_op
if the header indicates that the log contains a committed transaction.
An example use of the log occurs in
filewrite
(kernel/file.c:135).
The transaction looks like this:
1 begin_op();
2 ilock(f->ip);
3 r = writei(f->ip, ...);
4 iunlock(f->ip);
5 end_op();
This code is wrapped in a loop that breaks up large writes into individual
transactions of just a few sectors at a time, to avoid overflowing
the log. The call to
writei
writes many blocks as part of this
transaction: the file’s inode, one or more bitmap blocks, and some data
blocks.