8.8 Inode layer
The term inode can have one of two related meanings. It might refer to the on-disk data structure containing a file’s size and list of data block numbers. Or “inode” might refer to an in-memory inode, which contains a copy of the on-disk inode as well as extra information needed within the kernel.
The on-disk inodes are packed into a contiguous area of disk called the inode blocks. Every inode is the same size, so it is easy, given a number n, to find the nth inode on the disk. In fact, this number n, called the inode number or i-number, is how inodes are identified in the implementation.
The on-disk inode is defined by a
struct dinode
(kernel/fs.h:32).
The
type
field distinguishes between files, directories, and special
files (devices).
A type of zero indicates that an on-disk inode is free.
The
nlink
field counts the number of directory entries that
refer to this inode, in order to recognize when the
on-disk inode and its data blocks should be freed.
The
size
field records the number of bytes of content in the file.
The
addrs
array records the block numbers of the disk blocks holding
the file’s content.
The kernel keeps the set of active inodes in memory
in a table called itable
;
struct inode
(kernel/file.h:17)
is the in-memory copy of a
struct
dinode
on disk.
The kernel stores an inode in memory only if there are
C pointers referring to that inode. The
ref
field counts the number of C pointers referring to the
in-memory inode, and the kernel discards the inode from
memory if the reference count drops to zero.
The
iget
and
iput
functions acquire and release pointers to an inode,
modifying the reference count.
Pointers to an inode can come from file descriptors,
current working directories, and transient kernel code
such as
exec
.
There are four lock or lock-like mechanisms in xv6’s
inode code.
itable.lock
protects the invariant that an inode is present in the inode table
at most once, and the invariant that an in-memory inode’s
ref
field counts the number of in-memory pointers to the inode.
Each in-memory inode has a
lock
field containing a
sleep-lock, which ensures exclusive access to the
inode’s fields (such as file length) as well as to the
inode’s file or directory content blocks.
An inode’s
ref
,
if it is greater than zero, causes the system to maintain
the inode in the table, and not re-use the table entry for
a different inode.
Finally, each inode contains a
nlink
field (on disk and copied in memory if in memory) that
counts the number of directory entries that refer to a file;
xv6 won’t free an inode if its link count is greater than zero.
A
struct
inode
pointer returned by
iget()
is guaranteed to be valid until the corresponding call to
iput()
;
the inode won’t be deleted, and the memory referred to
by the pointer won’t be re-used for a different inode.
iget()
provides non-exclusive access to an inode, so that
there can be many pointers to the same inode.
Many parts of the file-system code depend on this behavior of
iget()
,
both to hold long-term references to inodes (as open files
and current directories) and to prevent races while avoiding
deadlock in code that manipulates multiple inodes (such as
pathname lookup).
The
struct
inode
that
iget
returns may not have any useful content.
In order to ensure it holds a copy of the on-disk
inode, code must call
ilock
.
This locks the inode (so that no other process can
ilock
it) and reads the inode from the disk,
if it has not already been read.
iunlock
releases the lock on the inode.
Separating acquisition of inode pointers from locking
helps avoid deadlock in some situations, for example during
directory lookup.
Multiple processes can hold a C pointer to an inode
returned by
iget
,
but only one process can lock the inode at a time.
The inode table only stores inodes to which kernel code
or data structures hold C pointers.
Its main job is synchronizing access by multiple processes.
The inode table also happens to cache frequently-used inodes, but
caching is secondary; if an inode is used frequently, the buffer cache will probably
keep it in memory.
Code that modifies an in-memory inode writes it to disk with
iupdate
.