Part 3. More head (30 points)

In this part, you’re going to finish your implementation of head.

You should re-read the man page but this time focus on behavior. How does head behave when no files are passed to it? How does head behave when one file is passed to it? How does head behave when multiple files are passed to it? What exit code (called the exit status in the man page) does it return on success? What about on error?

One of the key things to notice about head, is that sometimes it reads from stdin and sometimes it reads from a file. But in either case, the operation it should perform is the same: print out some number of lines or bytes.

What we would like is to be able to write a function like

/// Print `count` lines of `input` to `stdout`.
fn print_lines(input: ???, count: usize) {
    todo!();
}

and a similar print_bytes() function. But what type should we give input?

If we think about what operations we want to perform on input, that can help guide us. For print_lines(), we want to be able to read one line at a time from input. For print_bytes(), we just want to read some number of bytes. It turns out that what we want is a std::io::BufReader. Or more precisely, we want anything that implements the std::io::BufRead trait, which BufReader does.

Click on the documentation for BufRead and take a look at what methods are available. In particular, the read_line() method will read characters into a String until it hits a new line character. The read_until() method is similar except it reads bytes into a Vec<u8> until it encounters the byte you specify. For a print_lines() function, you’ll likely want to use one of read_line() or read_until() (or experiment with the lines() method!).

For a print_bytes() function, none of these methods seem appropriate. Fortunately, any type that implements the BufRead trait also implements the std::io::Read trait which has a read() method.

Example

Here’s an example of opening a file, wrapping it in a BufReader, and then reading from it using read_line(), read_until(), and read().

use std::fs::File;
use std::io::{self, BufReader, BufRead, Read, Write};

fn read_some() -> io::Result<()> {
    let file = File::open("/tmp/example.txt")?;
    let mut reader = BufReader::new(file);

    // Read a line into a string.
    let mut line = String::new();
    let length = reader.read_line(&mut line)?;

    if length == 0 {
        // End of file.
        return Ok(());
    }
    print!("First line: {line}");

    // Read a line as bytes into a `Vec<u8>`.
    let mut line = Vec::new();
    let length = reader.read_until(b'\n', &mut line)?;

    if length == 0 {
        return Ok(());
    }
    print!("Second line: ");
    io::stdout().write_all(&line)?;

    // Read at most 100 bytes into a `Vec<u8>`.
    let mut bytes = Vec::new();
    bytes.resize(100, 0); // Fill it with 100 zero bytes.
    let length = reader.read(&mut bytes)?;

    if length == 0 {
        return Ok(());
    }
    print!("Next {length} bytes: ");
    io::stdout().write_all(&bytes[..length])?;

    Ok(())
}

fn main() -> io::Result<()> {
    const EXAMPLE: &str = "Rust is fun!\nIts mascot is a 🦀.\nIt's also kind of a lot…\n";
    let mut file = File::create("/tmp/example.txt")?;
    file.write_all(EXAMPLE.as_bytes())?;
    read_some()?;
    std::fs::remove_file("/tmp/example.txt")?;
    Ok(())
}

Click the Run button to see the result.

Note that we needed to have a use statement for std::io::Write in order to call the write_all() method which is defined by the Write trait

Tip

If you use

#![allow(unused)]
fn main() {
use std::io;
}

or by including self in the list of items to use from std::io like

#![allow(unused)]
fn main() {
use std::io::{self, BufReader};
}

you can refer to everything in std::io just by io. The previous example did this for io::stdout() and io::Result<()>.

We can wrap a File in a BufReader and we can also wrap the result of std::io::stdin() (which is a struct called, appropriately enough std::io::Stdin) in a BufReader. But there’s one hitch: the two types of BufReader are different.

Just as Vec is paramaterized by the type of element it holds—e.g., Vec<i32> and Vec<String>—a BufReader is paramaterized by the underlying type it’s reading from. The code snippet below is explict about the types of the two readers. They’re simply not the same.

use std::io::{self, Stdin, BufReader};
use std::fs::File;
fn main() -> io::Result<()> {
    let file_reader: BufReader<File> = BufReader::new(File::open("example.txt")?);
    let stdin_reader: BufReader<Stdin> = BufReader::new(io::stdin());
  Ok(())
}

Fortunately, Rust gives us a way to deal with this! What we’re going to do is we’re going to wrap these readers in a special type of Box. Recall that values in Boxes are store on the heap rather than on the stack. If we wrap these types directly in a Box, i.e., Box<BufReader<File>> and Box<BufReader<Stdin>>, we’d still have different types. Instead, we want to use a Box<dyn BufRead>. By using the dyn keyword followed by a trait, we’re saying that the value stored in the Box is some type that implements the trait—in this case, BufRead.

Example

Let’s create a function that takes a &Path and opens the file for reading. As mentioned previously, many command line utilities treat an input file path of - to mean stdin and we’re going to do the same.

use std::fs::File;
use std::io::{self, BufRead, BufReader};
use std::path::{Path, PathBuf};

fn open_input(path: &Path) -> io::Result<Box<dyn BufRead>> {
    if path.as_os_str() == "-" {
        Ok(Box::new(BufReader::new(io::stdin())))
    } else {
        Ok(Box::new(BufReader::new(File::open(path)?)))
    }
}

fn main () {
    let path = PathBuf::from("example.txt");
    let mut reader: Box<dyn BufRead> = match open_input(&path) {
        Ok(reader) => reader,
        Err(err) => {
            eprintln!("{}: {err}", path.display());
            std::process::exit(1);
        }
    };
}

Let’s revisit the print_lines() function.

#![allow(unused)]
fn main() {
use std::io::{self, BufRead};
/// Print `count` lines of `input` to `stdout`.
fn print_lines(input: &mut dyn BufRead, count: usize) -> io::Result<()> {
   let mut line = String::new();
   input.read_line(&mut line)?;
   print!("{line}");
   todo!()
}
}

We finally have a type for input! It’s a mutable reference to some type of object that implements BufRead. And notice print_lines() now returns an std::io::Result<()> so we can use ? to propogate errors.

We can now combine this print_lines() with the example code for opening a file.

use std::fs::File;
use std::io::{self, BufRead, BufReader};
use std::path::{Path, PathBuf};

fn open_input(path: &Path) -> io::Result<Box<dyn BufRead>> {
    if path.as_os_str() == "-" {
        Ok(Box::new(BufReader::new(io::stdin())))
    } else {
        Ok(Box::new(BufReader::new(File::open(path)?)))
    }
}

/// Print `count` lines of `input` to `stdout`.
fn print_lines(input: &mut dyn BufRead, count: usize) -> io::Result<()> {
   todo!()
}

fn main () {
    let path = PathBuf::from("example.txt");
    let mut reader: Box<dyn BufRead> = match open_input(&path) {
        Ok(reader) => reader,
        Err(err) => {
            eprintln!("{}: {err}", path.display());
            std::process::exit(1);
        }
    };
    if let Err(err) = print_lines(&mut reader, 10) {
        eprintln!("{err}");
    }
}

Your task

Your task is to finish implementing head. Your implementation must have the following features.

  • It must support all of the options described in Part 2.
  • If neither --bytes nor --lines (nor their short arguments) are passed, then default to printing the first 10 lines of reach file or stdin.
  • If no files are passed as arguments, read from stdin, otherwise read from each of the files (see the hints below).
  • If - is given as a file, read from stdin in its place. So if
    $ cargo run -- -n 3 foo.txt - bar.txt
    
    is run, it should print the first three lines from foo.txt, then the first three lines from stdin, followed by the first three lines of bar.txt.
  • If multiple files are passed, each should start with a header line. A blank line should appear after the last line of output for each file before the header line (except for the last file which doesn’t have a trailing blank line). See the example below.
  • If a file cannot be opened, print a message to stderr and continue. The example code above show writing a message in the proper format to stderr using eprintln!() if open_input() fails. Your code should do the same thing except that rather than exiting immediately by calling std::process::exit(1), your code should simply remember that there was an error and after all input files have been processed, if there was an error, exit with the value 1.

Example

Here is some sample output showing multiple files, including stdin.

$ echo "hi there!" | cargo run -- -n 3 foo.txt - bar.txt
==> foo.txt <==
First line of foo.txt
Second line of foo.txt
Third line of foo.txt

==> standard input <==
hi there

==> bar.txt <==
First line of bar.txt
Second line of bar.txt
Third line of bar.txt

Here’s an example when there’s an error with an input file.

$ cargo run -- -n 1 foo.txt no-such-file.txt bar.txt
==> foo.txt <==
First line of foo.txt
no-such-file.txt: No such file or directory (os error 2)

==> bar.txt <==
First line of bar.txt

Play around with the real head to see other examples.

If you’ve made it this far and you’ve implemented everything, you’re just about done! Just gotta submit it. Make sure you test that your code works correctly with 0, 1, and multiple files. Make sure it supports - as reading from stdin. Make sure when you have multiple files, it prints the header and the blank line. You can always test your behavior against the behavior of the real head.

If you’d like to make one small improvement to your implementation, read on. Otherwise, continue to the submission instructions.

Info

The command line utilities were designed when C was the dominant programming language. C’s notion of strings is wildly different from Rust’s notion of strings. In particular, C strings are just sequences of non-zero bytes followed by a zero byte. There are no other restrictions on the contents of a C string. In contrast, Rust strings are required to be valid, UTF-8 encoded strings of Unicode characters. These two different views of strings has two consequences for us

  1. The real head utility works fine even if the files it’s operating on don’t contain UTF-8 encoded data. It simply separates the input into lines by detecting the the newline character '\n'.
  2. The BufRead::read_line() function will fail if the input is not valid UTF-8.

If you would like to match the real head behavior, you may use the BufRead::read_until() function. An example of doing this was given in the read_a_bit() function in an example above.