1.4 File system

The xv6 file system provides data files, which contain uninterpreted byte arrays, and directories, which contain named references to data files and other directories. The directories form a tree, starting at a special directory called the root. A path like /a/b/c refers to the file or directory named c inside the directory named b inside the directory named a in the root directory /. Paths that don’t begin with / are evaluated relative to the calling process’s current directory, which can be changed with the chdir system call. Both these code fragments open the same file (assuming all the directories involved exist):


              1
              
              
              
            chdir("/a");


              2
              
              
              
            chdir("b");


              3
              
              
              
            open("c", O_RDONLY);


              4
              
              
              
            


              5
              
              
              
            open("/a/b/c", O_RDONLY);

The first fragment changes the process’s current directory to /a/b; the second neither refers to nor changes the process’s current directory.

There are system calls to create new files and directories: mkdir creates a new directory, open with the O_CREATE flag creates a new data file, and mknod creates a new device file. This example illustrates all three:


              1
              
              
              
            mkdir("/dir");


              2
              
              
              
            fd = open("/dir/file", O_CREATE|O_WRONLY);


              3
              
              
              
            close(fd);


              4
              
              
              
            mknod("/console", 1, 1);

mknod creates a special file that refers to a device. Associated with a device file are the major and minor device numbers (the two arguments to mknod), which uniquely identify a kernel device. When a process later opens a device file, the kernel diverts read and write system calls to the kernel device implementation instead of passing them to the file system.

A file’s name is distinct from the file itself; the same underlying file, called an inode, can have multiple names, called links. Each link consists of an entry in a directory; the entry contains a file name and a reference to an inode. An inode holds metadata about a file, including its type (file or directory or device), its length, the location of the file’s content on disk, and the number of links to a file.

The fstat system call retrieves information from the inode that a file descriptor refers to. It fills in a struct stat, defined in stat.h (kernel/stat.h) as:


              1
              
              
              
            #define T_DIR     1   // Directory


              2
              
              
              
            #define T_FILE    2   // File


              3
              
              
              
            #define T_DEVICE  3   // Device


              4
              
              
              
            


              5
              
              
              
            struct stat {


              6
              
              
              
              int dev;     // File system’s disk device


              7
              
              
              
              uint ino;    // Inode number


              8
              
              
              
              short type;  // Type of file


              9
              
              
              
              short nlink; // Number of links to file


              10
              
              
              
              uint64 size; // Size of file in bytes


              11
              
              
              
            };

The link system call creates another file system name referring to the same inode as an existing file. This fragment creates a new file named both a and b.


              1
              
              
              
            open("a", O_CREATE|O_WRONLY);


              2
              
              
              
            link("a", "b");

Reading from or writing to a is the same as reading from or writing to b. Each inode is identified by a unique inode number. After the code sequence above, it is possible to determine that a and b refer to the same underlying contents by inspecting the result of fstat: both will return the same inode number (ino), and the nlink count will be set to 2.

The unlink system call removes a name from the file system. The file’s inode and the disk space holding its content are only freed when the file’s link count is zero and no file descriptors refer to it. Thus adding


              1
              
              
              
            unlink("a");

to the last code sequence leaves the inode and file content accessible as b. Furthermore,


              1
              
              
              
            fd = open("/tmp/xyz", O_CREATE|O_RDWR);


              2
              
              
              
            unlink("/tmp/xyz");

is an idiomatic way to create a temporary inode with no name that will be cleaned up when the process closes fd or exits.

Unix provides file utilities callable from the shell as user-level programs, for example mkdir, ln, and rm. This design allows anyone to extend the command-line interface by adding new user-level programs. In hindsight this plan seems obvious, but other systems designed at the time of Unix often built such commands into the shell (and built the shell into the kernel).

One exception is cd, which is built into the shell (user/sh.c:161). cd must change the current working directory of the shell itself. If cd were run as a regular command, then the shell would fork a child process, the child process would run cd, and cd would change the child ’s working directory. The parent’s (i.e., the shell’s) working directory would not change.