Part 3. More head (30 points)
In this part, you’re going to finish your implementation of head
.
You should re-read the man page but this time focus on behavior. How
does head
behave when no files are passed to it? How does head
behave when
one file is passed to it? How does head
behave when multiple files are
passed to it? What exit code (called the exit status in the man page) does it
return on success? What about on error?
One of the key things to notice about head
, is that sometimes it reads from
stdin
and sometimes it reads from a file. But in either case, the operation
it should perform is the same: print out some number of lines or bytes.
What we would like is to be able to write a function like
/// Print `count` lines of `input` to `stdout`.
fn print_lines(input: ???, count: usize) {
todo!();
}
and a similar print_bytes()
function. But what type should we give input
?
If we think about what operations we want to perform on input
, that can help
guide us. For print_lines()
, we want to be able to read one line at a time
from input
. For print_bytes()
, we just want to read some number of bytes.
It turns out that what we want is a std::io::BufReader
. Or more
precisely, we want anything that implements the std::io::BufRead
trait, which BufReader
does.
Click on the documentation for BufRead
and take a look at what
methods are available. In particular, the read_line()
method will read
characters into a String
until it hits a new line character. The
read_until()
method is similar except it reads bytes into a Vec<u8>
until
it encounters the byte you specify. For a print_lines()
function, you’ll
likely want to use one of read_line()
or read_until()
(or experiment with
the lines()
method!).
For a print_bytes()
function, none of these methods seem appropriate.
Fortunately, any type that implements the BufRead
trait also implements the
std::io::Read
trait which has a read()
method.
Here’s an example of opening a file, wrapping it in a BufReader
, and then
reading from it using read_line()
, read_until()
, and read()
.
use std::fs::File; use std::io::{self, BufReader, BufRead, Read, Write}; fn read_some() -> io::Result<()> { let file = File::open("/tmp/example.txt")?; let mut reader = BufReader::new(file); // Read a line into a string. let mut line = String::new(); let length = reader.read_line(&mut line)?; if length == 0 { // End of file. return Ok(()); } print!("First line: {line}"); // Read a line as bytes into a `Vec<u8>`. let mut line = Vec::new(); let length = reader.read_until(b'\n', &mut line)?; if length == 0 { return Ok(()); } print!("Second line: "); io::stdout().write_all(&line)?; // Read at most 100 bytes into a `Vec<u8>`. let mut bytes = Vec::new(); bytes.resize(100, 0); // Fill it with 100 zero bytes. let length = reader.read(&mut bytes)?; if length == 0 { return Ok(()); } print!("Next {length} bytes: "); io::stdout().write_all(&bytes[..length])?; Ok(()) } fn main() -> io::Result<()> { const EXAMPLE: &str = "Rust is fun!\nIts mascot is a 🦀.\nIt's also kind of a lot…\n"; let mut file = File::create("/tmp/example.txt")?; file.write_all(EXAMPLE.as_bytes())?; read_some()?; std::fs::remove_file("/tmp/example.txt")?; Ok(()) }
Click the Run button to see the result.
Note that we needed to have a use
statement for
std::io::Write
in order to call the write_all()
method which is defined by the Write
trait
If you use
#![allow(unused)] fn main() { use std::io; }
or by including self
in the list of items to use from std::io
like
#![allow(unused)] fn main() { use std::io::{self, BufReader}; }
you can refer to everything in std::io
just by io
. The previous example did this for io::stdout()
and io::Result<()>
.
We can wrap a File
in a BufReader
and we can also wrap the result of
std::io::stdin()
(which is a struct
called, appropriately
enough std::io::Stdin
) in a BufReader
. But there’s one hitch: the
two types of BufReader
are different.
Just as Vec
is paramaterized by the type of element it holds—e.g.,
Vec<i32>
and Vec<String>
—a BufReader
is paramaterized by the
underlying type it’s reading from. The code snippet below is explict about the
types of the two readers. They’re simply not the same.
use std::io::{self, Stdin, BufReader}; use std::fs::File; fn main() -> io::Result<()> { let file_reader: BufReader<File> = BufReader::new(File::open("example.txt")?); let stdin_reader: BufReader<Stdin> = BufReader::new(io::stdin()); Ok(()) }
Fortunately, Rust gives us a way to deal with this! What we’re going to do is
we’re going to wrap these readers in a special type of Box
. Recall that
values in Box
es are store on the heap rather than on the stack. If we wrap
these types directly in a Box
, i.e., Box<BufReader<File>>
and
Box<BufReader<Stdin>>
, we’d still have different types. Instead, we want to
use a Box<dyn BufRead>
. By using the dyn
keyword followed by a trait,
we’re saying that the value stored in the Box
is some type that implements
the trait—in this case, BufRead
.
Let’s create a function that takes a &Path
and opens the file for reading.
As mentioned previously, many command line utilities treat an input file path
of -
to mean stdin
and we’re going to do the same.
use std::fs::File; use std::io::{self, BufRead, BufReader}; use std::path::{Path, PathBuf}; fn open_input(path: &Path) -> io::Result<Box<dyn BufRead>> { if path.as_os_str() == "-" { Ok(Box::new(BufReader::new(io::stdin()))) } else { Ok(Box::new(BufReader::new(File::open(path)?))) } } fn main () { let path = PathBuf::from("example.txt"); let mut reader: Box<dyn BufRead> = match open_input(&path) { Ok(reader) => reader, Err(err) => { eprintln!("{}: {err}", path.display()); std::process::exit(1); } }; }
Let’s revisit the print_lines()
function.
#![allow(unused)] fn main() { use std::io::{self, BufRead}; /// Print `count` lines of `input` to `stdout`. fn print_lines(input: &mut dyn BufRead, count: usize) -> io::Result<()> { let mut line = String::new(); input.read_line(&mut line)?; print!("{line}"); todo!() } }
We finally have a type for input
! It’s a mutable reference to some type of
object that implements BufRead
. And notice print_lines()
now returns an
std::io::Result<()>
so we can use ?
to propogate errors.
We can now combine this print_lines()
with the example code for opening a file.
use std::fs::File; use std::io::{self, BufRead, BufReader}; use std::path::{Path, PathBuf}; fn open_input(path: &Path) -> io::Result<Box<dyn BufRead>> { if path.as_os_str() == "-" { Ok(Box::new(BufReader::new(io::stdin()))) } else { Ok(Box::new(BufReader::new(File::open(path)?))) } } /// Print `count` lines of `input` to `stdout`. fn print_lines(input: &mut dyn BufRead, count: usize) -> io::Result<()> { todo!() } fn main () { let path = PathBuf::from("example.txt"); let mut reader: Box<dyn BufRead> = match open_input(&path) { Ok(reader) => reader, Err(err) => { eprintln!("{}: {err}", path.display()); std::process::exit(1); } }; if let Err(err) = print_lines(&mut reader, 10) { eprintln!("{err}"); } }
Your task
Your task is to finish implementing head
. Your
implementation must have the following features.
- It must support all of the options described in Part 2.
- If neither
--bytes
nor--lines
(nor their short arguments) are passed, then default to printing the first 10 lines of reach file orstdin
. - If no files are passed as arguments, read from
stdin
, otherwise read from each of the files (see the hints below). - If
-
is given as a file, read fromstdin
in its place. So if
is run, it should print the first three lines from$ cargo run -- -n 3 foo.txt - bar.txt
foo.txt
, then the first three lines fromstdin
, followed by the first three lines ofbar.txt
. - If multiple files are passed, each should start with a header line. A blank line should appear after the last line of output for each file before the header line (except for the last file which doesn’t have a trailing blank line). See the example below.
- If a file cannot be opened, print a message to
stderr
and continue. The example code above show writing a message in the proper format tostderr
usingeprintln!()
ifopen_input()
fails. Your code should do the same thing except that rather than exiting immediately by callingstd::process::exit(1)
, your code should simply remember that there was an error and after all input files have been processed, if there was an error, exit with the value 1.
Here is some sample output showing multiple files, including stdin
.
$ echo "hi there!" | cargo run -- -n 3 foo.txt - bar.txt
==> foo.txt <==
First line of foo.txt
Second line of foo.txt
Third line of foo.txt
==> standard input <==
hi there
==> bar.txt <==
First line of bar.txt
Second line of bar.txt
Third line of bar.txt
Here’s an example when there’s an error with an input file.
$ cargo run -- -n 1 foo.txt no-such-file.txt bar.txt
==> foo.txt <==
First line of foo.txt
no-such-file.txt: No such file or directory (os error 2)
==> bar.txt <==
First line of bar.txt
Play around with the real head
to see other examples.
If you’ve made it this far and you’ve implemented everything, you’re just
about done! Just gotta submit it. Make sure you test that your code works
correctly with 0, 1, and multiple files. Make sure it supports -
as reading
from stdin
. Make sure when you have multiple files, it prints the header and
the blank line. You can always test your behavior against the behavior of the
real head
.
If you’d like to make one small improvement to your implementation, read on. Otherwise, continue to the submission instructions.
The command line utilities were designed when C was the dominant programming language. C’s notion of strings is wildly different from Rust’s notion of strings. In particular, C strings are just sequences of non-zero bytes followed by a zero byte. There are no other restrictions on the contents of a C string. In contrast, Rust strings are required to be valid, UTF-8 encoded strings of Unicode characters. These two different views of strings has two consequences for us
- The real
head
utility works fine even if the files it’s operating on don’t contain UTF-8 encoded data. It simply separates the input into lines by detecting the the newline character'\n'
. - The
BufRead::read_line()
function will fail if the input is not valid UTF-8.
If you would like to match the real head
behavior, you may use the BufRead::read_until()
function.
An example of doing this was given in the read_a_bit()
function in an example above.