1.3 Pipes
A pipe is a small kernel buffer exposed to processes as a pair of file descriptors, one for reading and one for writing. Writing data to one end of the pipe makes that data available for reading from the other end of the pipe. Pipes provide a way for processes to communicate.
The following example code runs the program
wc
with standard input connected to
the read end of a pipe.
1 int p[2];
2 char *argv[2];
3
4 argv[0] = "wc";
5 argv[1] = 0;
6
7 pipe(p);
8 if(fork() == 0) {
9 close(0);
10 dup(p[0]);
11 close(p[0]);
12 close(p[1]);
13 exec("/bin/wc", argv);
14 } else {
15 close(p[0]);
16 write(p[1], "hello world\n", 12);
17 close(p[1]);
18 }
The program calls
pipe
,
which creates a new pipe and records the read and write
file descriptors in the array
p
.
After
fork
,
both parent and child have file descriptors referring to the pipe.
The child calls close
and dup
to make file descriptor
zero refer to the read end of the pipe,
closes the file descriptors in
p
,
and calls exec
to run
wc
.
When
wc
reads from its standard input, it reads from the pipe.
The parent closes the read side of the pipe,
writes to the pipe,
and then closes the write side.
If no data is available, a
read
on a pipe waits for either data to be written or for all
file descriptors referring to the write end to be closed;
in the latter case,
read
will return 0, just as if the end of a data file had been reached.
The fact that
read
blocks until it is impossible for new data to arrive
is one reason that it’s important for the child to
close the write end of the pipe
before executing
wc
above: if one of
wc
’s
file descriptors referred to the write end of the pipe,
wc
would never see end-of-file.
The xv6 shell implements pipelines such as
grep fork sh.c | wc -l
in a manner similar to the above code
(user/sh.c:101).
The child process creates a pipe to connect the left end of the pipeline
with the right end. Then it calls
fork
and
runcmd
for the left end of the pipeline
and
fork
and
runcmd
for the right end, and waits for both to finish.
The right end of the pipeline may be a command that itself includes a
pipe (e.g.,
a
|
b
|
c)
,
which itself forks two new child processes (one for
b
and one for
c
).
Thus, the shell may
create a tree of processes. The leaves of this tree are commands and
the interior nodes are processes that wait until the left and right
children complete.
Pipes may seem no more powerful than temporary files: the pipeline
1
echo hello world | wc
could be implemented without pipes as
1
echo hello world >/tmp/xyz; wc </tmp/xyz
Pipes have at least three advantages over temporary files
in this situation.
First, pipes automatically clean themselves up;
with the file redirection, a shell would have to
be careful to remove
/tmp/xyz
when done.
Second, pipes can pass arbitrarily long streams of
data, while file redirection requires enough free space
on disk to store all the data.
Third, pipes allow for parallel execution of pipeline stages,
while the file approach requires the first program to finish
before the second starts.