3.3 Code: creating an address space
Most of the xv6 code for manipulating address spaces and page tables resides in vm.c (kernel/vm.c:1). The central data structure is pagetable_t, which is really a pointer to a RISC-V root page-table page; a pagetable_t may be either the kernel page table, or one of the per-process page tables. The central functions are walk, which finds the PTE for a virtual address, and mappages, which installs PTEs for new mappings. Functions starting with kvm manipulate the kernel page table; functions starting with uvm manipulate a user page table; other functions are used for both. copyout and copyin copy data to and from user virtual addresses provided as system call arguments; they are in vm.c because they need to explicitly translate those addresses in order to find the corresponding physical memory.
Early in the boot sequence,
main
calls
kvminit
(kernel/vm.c:54)
to create the kernel’s page table using
kvmmake
(kernel/vm.c:20).
This call occurs before xv6 has enabled paging on the RISC-V,
so addresses refer directly to physical memory.
kvmmake
first allocates a page of physical memory to hold the root page-table page.
Then it calls
kvmmap
to install the translations that the kernel needs.
The translations include the kernel’s
instructions and data, physical memory up to
PHYSTOP
,
and memory ranges which are actually devices.
proc_mapstacks
(kernel/proc.c:33)
allocates a kernel stack for each
process. It calls kvmmap
to map each stack at the virtual address generated by
KSTACK
, which leaves room for the invalid stack-guard
pages.
kvmmap
(kernel/vm.c:132)
calls
mappages
(kernel/vm.c:144),
which
installs mappings into a page table
for a range of virtual addresses to
a corresponding range of physical addresses.
It does this separately for each virtual address in the range,
at page intervals.
For each virtual address to be mapped,
mappages
calls
walk
to find the address of the PTE for that address.
It then initializes the PTE to hold the relevant physical page
number, the desired permissions
(PTE_W
,
PTE_X
,
and/or
PTE_R
),
and
PTE_V
to mark the PTE as valid
(kernel/vm.c:165).
walk
(kernel/vm.c:86)
mimics the RISC-V paging hardware as it
looks up the PTE for a virtual address (see
Figure 3.2).
walk
descends the page table one level at a time,
using each level’s 9 bits of virtual address to
index into the relevant page directory page.
At each level it finds either the PTE of the
next level’s page directory page, or the PTE of
final page
(kernel/vm.c:92).
If a PTE in a first or second level page directory
page isn’t valid, then the required directory page
hasn’t yet been allocated;
if the
alloc
argument is set,
walk
allocates a new page-table page and puts its physical address in the PTE.
It returns the address of the PTE in the lowest layer in the tree
(kernel/vm.c:102).
The above code depends on physical memory being direct-mapped into the
kernel virtual address space. For example, as walk
descends levels
of the page table, it pulls the (physical) address of the
next-level-down page table from a PTE (kernel/vm.c:94),
and then uses that address as a
virtual address to fetch the PTE at the next level down
(kernel/vm.c:92).
main
calls
kvminithart
(kernel/vm.c:62)
to install the kernel page table.
It writes the physical address of the root page-table page
into the register
satp.
After this the CPU will translate addresses using the kernel
page table. Since the kernel uses a direct mapping, the now
virtual address of the next instruction will map to the right physical
memory address.
Each RISC-V CPU caches page table entries in a
Translation Look-aside Buffer (TLB), and when xv6 changes
a page table, it must tell the CPU to invalidate corresponding
cached TLB entries. If it didn’t,
then at some point later the TLB might
use an old cached mapping, pointing to a physical page that in the meantime
has been allocated to another process, and as a result, a process
might be able to scribble on some other process’s memory. The RISC-V
has an instruction sfence.vma
that flushes
the current CPU’s TLB.
Xv6 executes sfence.vma in kvminithart after reloading the
satp register, and in the trampoline code that
switches to a user page table before returning to user space
(kernel/trampoline.S:89).
It is also necessary to issue sfence.vma before changing satp, in order to wait for completion of all outstanding loads and stores. This wait ensures that preceding updates to the page table have completed, and ensures that preceding loads and stores use the old page table, not the new one.
To avoid flushing the complete TLB, RISC-V CPUs may support address space identifiers (ASIDs) [16]. The kernel can then flush just the TLB entries for a particular address space. Xv6 does not use this feature.