4.7 Real world
The trampoline and trapframe may seem excessively complex. A driving
force is that the RISC-V intentionally does as little as it can when
forcing a trap, to allow the possibility of very fast trap handling,
which turns out to be important. As a result, the first few
instructions of the kernel trap handler effectively have to execute in
the user environment: the user page table, and user register contents.
And the trap handler is initially ignorant of useful facts such as the
identity of the process that’s running or the address of the kernel
page table. A solution is possible because RISC-V provides protected
places in which the kernel can stash away information before entering
user space: the sscratch register, and user page table entries
that point to kernel memory but are protected by lack of PTE_U
.
Xv6’s trampoline and trapframe exploit these RISC-V features.
The need for special trampoline pages could be eliminated if kernel
memory were mapped into every process’s user page table (with
PTE_U
clear).
That would
also eliminate the need for a page table switch when trapping from
user space into the kernel. That in turn would allow system call
implementations in the kernel to take advantage of the current
process’s user memory being mapped, allowing kernel code to directly
dereference user pointers. Many operating systems have used these ideas to
increase efficiency. Xv6 avoids them in order to reduce the chances of
security bugs in the kernel due to inadvertent use of user pointers,
and to reduce some complexity that would be required to ensure that
user and kernel virtual addresses don’t overlap.
Production operating systems implement copy-on-write fork, lazy allocation, demand paging, paging to disk, memory-mapped files, etc. Furthermore, production operating systems try to store something useful in all areas of physical memory, typically caching file content in memory that isn’t used by processes.
Production operating systems also provide applications with system calls to manage their address spaces and implement their own page-fault handling through the mmap, munmap, and sigaction system calls, as well as providing calls to pin memory into RAM (see mlock) and to advise the kernel how an application plans to use its memory (see madvise).