1.3 Pipes

A pipe is a small kernel buffer exposed to processes as a pair of file descriptors, one for reading and one for writing. Writing data to one end of the pipe makes that data available for reading from the other end of the pipe. Pipes provide a way for processes to communicate.

The following example code runs the program wc with standard input connected to the read end of a pipe.


              1
              
              
              
            int p[2];


              2
              
              
              
            char *argv[2];


              3
              
              
              
            


              4
              
              
              
            argv[0] = "wc";


              5
              
              
              
            argv[1] = 0;


              6
              
              
              
            


              7
              
              
              
            pipe(p);


              8
              
              
              
            if(fork() == 0) {


              9
              
              
              
              close(0);


              10
              
              
              
              dup(p[0]);


              11
              
              
              
              close(p[0]);


              12
              
              
              
              close(p[1]);


              13
              
              
              
              exec("/bin/wc", argv);


              14
              
              
              
            } else {


              15
              
              
              
              close(p[0]);


              16
              
              
              
              write(p[1], "hello world\n", 12);


              17
              
              
              
              close(p[1]);


              18
              
              
              
            }

The program calls pipe, which creates a new pipe and records the read and write file descriptors in the array p. After fork, both parent and child have file descriptors referring to the pipe. The child calls close and dup to make file descriptor zero refer to the read end of the pipe, closes the file descriptors in p, and calls exec to run wc. When wc reads from its standard input, it reads from the pipe. The parent closes the read side of the pipe, writes to the pipe, and then closes the write side.

If no data is available, a read on a pipe waits for either data to be written or for all file descriptors referring to the write end to be closed; in the latter case, read will return 0, just as if the end of a data file had been reached. The fact that read blocks until it is impossible for new data to arrive is one reason that it’s important for the child to close the write end of the pipe before executing wc above: if one of wc ’s file descriptors referred to the write end of the pipe, wc would never see end-of-file.

The xv6 shell implements pipelines such as grep fork sh.c | wc -l in a manner similar to the above code (user/sh.c:101). The child process creates a pipe to connect the left end of the pipeline with the right end. Then it calls fork and runcmd for the left end of the pipeline and fork and runcmd for the right end, and waits for both to finish. The right end of the pipeline may be a command that itself includes a pipe (e.g., a | b | c), which itself forks two new child processes (one for b and one for c). Thus, the shell may create a tree of processes. The leaves of this tree are commands and the interior nodes are processes that wait until the left and right children complete.

Pipes may seem no more powerful than temporary files: the pipeline


              1
              
              
              
            echo hello world | wc

could be implemented without pipes as


              1
              
              
              
            echo hello world >/tmp/xyz; wc </tmp/xyz

Pipes have at least three advantages over temporary files in this situation. First, pipes automatically clean themselves up; with the file redirection, a shell would have to be careful to remove /tmp/xyz when done. Second, pipes can pass arbitrarily long streams of data, while file redirection requires enough free space on disk to store all the data. Third, pipes allow for parallel execution of pipeline stages, while the file approach requires the first program to finish before the second starts.