8.9 Code: Inodes
To allocate a new inode (for example, when creating a file),
xv6 calls
ialloc
(kernel/fs.c:199).
ialloc
is similar to
balloc
:
it loops over the inode structures on the disk, one block at a time,
looking for one that is marked free.
When it finds one, it claims it by writing the new
type
to the disk and then returns an entry from the inode table
with the tail call to
iget
(kernel/fs.c:213).
The correct operation of
ialloc
depends on the fact that only one process at a time
can be holding a reference to
bp
:
ialloc
can be sure that some other process does not
simultaneously see that the inode is available
and try to claim it.
iget
(kernel/fs.c:247)
looks through the inode table for an active entry (ip->ref
>
0
)
with the desired device and inode number.
If it finds one, it returns a new reference to that inode
(kernel/fs.c:256-260).
As
iget
scans, it records the position of the first empty slot
(kernel/fs.c:261-262),
which it uses if it needs to allocate a table entry.
Code must lock the inode using
ilock
before reading or writing its metadata or content.
ilock
(kernel/fs.c:293)
uses a sleep-lock for this purpose.
Once
ilock
has exclusive access to the inode, it reads the inode
from disk (more likely, the buffer cache) if needed.
The function
iunlock
(kernel/fs.c:321)
releases the sleep-lock,
which may cause any processes sleeping
to be woken up.
iput
(kernel/fs.c:337)
releases a C pointer to an inode
by decrementing the reference count
(kernel/fs.c:360).
If this is the last reference, the inode’s
slot in the inode table is now free and can be re-used
for a different inode.
If
iput
sees that there are no C pointer references to an inode
and that the inode has no links to it (occurs in no
directory), then the inode and its data blocks must
be freed.
iput
calls
itrunc
to truncate the file to zero bytes, freeing the data blocks;
sets the inode type to 0 (unallocated);
and writes the inode to disk
(kernel/fs.c:342).
The locking protocol in
iput
in the case in which it frees the inode deserves a closer look.
One danger is that a concurrent thread might be waiting in
ilock
to use this inode (e.g., to read a file or list a directory),
and won’t be prepared to find that the inode is no longer
allocated. This can’t happen because there is no way for
a system call to get a pointer to an in-memory inode if it has
no links to it and
ip->ref
is one. That one reference is the reference owned by the
thread calling
iput
.
The other main danger is that a concurrent call to
ialloc
might choose the same inode that
iput
is freeing.
This can happen only after the
iupdate
writes the disk so that the inode has type zero.
This race is benign; the allocating thread will politely wait
to acquire the inode’s sleep-lock before reading or writing
the inode, at which point
iput
is done with it.
iput()
can write to the disk. This means that any system call that uses the file
system may write to the disk, because the system call may be the last one having
a reference to the file. Even calls like
read()
that appear to be read-only, may end up calling
iput().
This, in turn, means that even read-only system calls
must be wrapped in transactions if they use the file system.
There is a challenging interaction between
iput()
and crashes.
iput()
doesn’t truncate a file immediately when the link count for the file
drops to zero, because some process might still hold a reference to the inode in
memory: a process might still be reading and writing to the file, because it
successfully opened it. But, if a crash happens before the last process closes
the file descriptor for the file, then the file will be marked allocated on disk
but no directory entry will point to it.
File systems handle this case in one of two ways. The simple solution is that on recovery, after reboot, the file system scans the whole file system for files that are marked allocated, but have no directory entry pointing to them. If any such file exists, then it can free those files.
The second solution doesn’t require scanning the file system. In this solution, the file system records on disk (e.g., in the super block) the inode inumber of a file whose link count drops to zero but whose reference count isn’t zero. If the file system removes the file when its reference count reaches 0, then it updates the on-disk list by removing that inode from the list. On recovery, the file system frees any file in the list.
Xv6 implements neither solution, which means that inodes may be marked allocated on disk, even though they are not in use anymore. This means that over time xv6 runs the risk that it may run out of disk space.