8.14 Code: System calls
With the functions that the lower layers provide, the implementation of most system calls is trivial (see (kernel/sysfile.c)). There are a few calls that deserve a closer look.
The functions
sys_link
and
sys_unlink
edit directories, creating or removing references to inodes.
They are another good example of the power of using
transactions.
sys_link
(kernel/sysfile.c:124)
begins by fetching its arguments, two strings
old
and
new
(kernel/sysfile.c:129).
Assuming
old
exists and is not a directory
(kernel/sysfile.c:133-136),
sys_link
increments its
ip->nlink
count.
Then
sys_link
calls
nameiparent
to find the parent directory and final path element of
new
(kernel/sysfile.c:149)
and creates a new directory entry pointing at
old
’s
inode
(kernel/sysfile.c:152).
The new parent directory must exist and
be on the same device as the existing inode:
inode numbers only have a unique meaning on a single disk.
If an error like this occurs,
sys_link
must go back and decrement
ip->nlink
.
Transactions simplify the implementation because it requires updating multiple
disk blocks, but we don’t have to worry about the order in which we do
them. They either will all succeed or none.
For example, without transactions, updating
ip->nlink
before creating a link, would put the file system temporarily in an unsafe
state, and a crash in between could result in havoc.
With transactions we don’t have to worry about this.
sys_link
creates a new name for an existing inode.
The function
create
(kernel/sysfile.c:246)
creates a new name for a new inode.
It is a generalization of the three file creation
system calls:
open
with the
O_CREATE
flag makes a new ordinary file,
mkdir
makes a new directory,
and
mkdev
makes a new device file.
Like
sys_link
,
create
starts by calling
nameiparent
to get the inode of the parent directory.
It then calls
dirlookup
to check whether the name already exists
(kernel/sysfile.c:256).
If the name does exist,
create
’s
behavior depends on which system call it is being used for:
open
has different semantics from
mkdir
and
mkdev
.
If
create
is being used on behalf of
open
(type
==
T_FILE
)
and the name that exists is itself
a regular file,
then
open
treats that as a success,
so
create
does too
(kernel/sysfile.c:260).
Otherwise, it is an error
(kernel/sysfile.c:261-262).
If the name does not already exist,
create
now allocates a new inode with
ialloc
(kernel/sysfile.c:265).
If the new inode is a directory,
create
initializes it with
.
and
..
entries.
Finally, now that the data is initialized properly,
create
can link it into the parent directory
(kernel/sysfile.c:278).
create
,
like
sys_link
,
holds two inode locks simultaneously:
ip
and
dp
.
There is no possibility of deadlock because
the inode
ip
is freshly allocated: no other process in the system
will hold
ip
’s
lock and then try to lock
dp
.
Using
create
,
it is easy to implement
sys_open
,
sys_mkdir
,
and
sys_mknod
.
sys_open
(kernel/sysfile.c:305)
is the most complex, because creating a new file is only
a small part of what it can do.
If
open
is passed the
O_CREATE
flag, it calls
create
(kernel/sysfile.c:320).
Otherwise, it calls
namei
(kernel/sysfile.c:326).
create
returns a locked inode, but
namei
does not, so
sys_open
must lock the inode itself.
This provides a convenient place to check that directories
are only opened for reading, not writing.
Assuming the inode was obtained one way or the other,
sys_open
allocates a file and a file descriptor
(kernel/sysfile.c:344)
and then fills in the file
(kernel/sysfile.c:356-361).
Note that no other process can access the partially initialized file since it is only
in the current process’s table.
Chapter 7 examined the implementation of pipes
before we even had a file system.
The function
sys_pipe
connects that implementation to the file system
by providing a way to create a pipe pair.
Its argument is a pointer to space for two integers,
where it will record the two new file descriptors.
Then it allocates the pipe and installs the file descriptors.