Part 1. Parsing redirections (60 points)

In this part, you’re going to start implementing redirections. Implementing redirections has two parts: parsing the input containing redirections; and (2) performing the redirection when the pipeline is run. In Part 2, you will implement performing the redirections. So let’s get started with parsing!

Recall that a simple command consists of words (the command and the options and arguments) and redirections interspersed. For example, the following commands (even the ridiculous-looking final one) are all equivalent.

  • $ wc -l <input.txt >output.txt
  • $ wc <input.txt -l >output.txt
  • $ wc <input.txt >output.txt -l
  • $ <input.txt >output.txt wc -l All of them run wc -l with stdin connected to input.txt and stdout connected to output.txt.

You’re going to implement support for redirecting input from files in the following cases

  • n>path where n is a digit. This creates a file at path (truncating it if it exists) and connects file descriptor n to the created file.
  • n>>path where n is a digit. This opens a file at path (creating if necessary) in append mode and connects file descriptor n to the opened file.
  • n<path where n is a digit. This opens the file at path (it’s an error if the file doesn’t exist) and connects file descriptor n to the opened file.
  • >path. The same as 1>path.
  • >>path. The same as 1>>path.
  • <path. The same as 0<path.

Your task

There are three basic types of redirections: redirecting input, redirecting output, and redirecting output by appending.

First, add some new variants to your InputToken enum to handle the three redirection cases. Note that in each case, a redirection needs two pieces of data: the file descriptor number (a u32), and a path (a String). For example, one of my variants is

enum InputToken {
    // ...
    RedirectInput(u32, String),
    // ...
}

The difficult part of parsing the input is modifying the state machine to parse the new redirections.

Note

Note that it isn’t sufficient leave the current parsing mechanism in split_tokens() alone and simply examine each of the Word tokens to see if it has the right form. Consider echo '>/dev/null' and echo >/dev/null. If you correctly implemented the input parsing in the previous lab, then the output of parsing these should be identical, e.g., [Word("echo"), Word(">/dev/null")]. The problem is these two commands do not behave the same way. The first prints out >/dev/null whereas the second connects echo’s stdout to /dev/null and echo is run with no arguments.

Recall that the state machine described in the hint in the previous lab has four states. To support redirections, you will need some more states.

Hint

There are many ways to implement this. Here’s one way.

Since redirections consist of an optional digit, a redirection operator, <, >, or >>, and a path, it would be nice to reuse the code you’ve already written to parse Words to parse the paths. You can do this by maintaining additional state about what type of redirection is currently being parsed.

#![allow(unused)]
fn main() {
enum RedirectionState {
    None,
    Input(u32),
    Output(u32),
    Append(u32),
}
}

The None variant is when a normal word is being parsed. The u32 keeps track of which file descriptor will be redirected.

In addition, you will need to be able to distinguish input 5>path from 5x. Since the state machine looks at the input one character at a time, you’ll need to modify State to have an additional variant, Digit which represents the case where a digit was read in the Blank state.

#![allow(unused)]
fn main() {
enum State {
    Blank,
    Unquoted,
    SingleQuoted,
    DoubleQuoted,
    Digit(u32),
}
}

Here’s an updated state machine.

split_tokens() state machine

What isn’t represented is how the RedirectionState changes.

The intended behavior is in the Blank state, if the next input character is a digit, transition to state Digit(num) where num is the number.

In the Digit(num) state, if the next input character is < or >, set the redirection state to RedirectionState::Input(num) or RedirectionState::Output(num) and change to the Unquoted state.

In the Digit(num) state, if the next input character is a quotation mark, write num into the current word and transition to the appropriate quoted state. This is handling input like 3"foo"

Otherwise, in the Digit(num) state, if the next input character is anything else, write num into the current word and transition to the Unquoted state. This handles input like 4bar.

The transitions from the Blank and Unquoted states need to be updated to handle the redirection state.

In the Blank state, if the next input character is < or >, the redirection state should be set to RedirectionState::Input(0) or RedirectionState::Output(1) and transition to the Unquoted state.

In the Unquoted state, if the next input character is a |, then you either need to insert a Word token or one of the Redirect* token into the results vector. And then you need to insert a Pipe token. Here’s some sample code for this.

    (State::Unquoted, '|') => {
        let token = match redirection_state {
            RedirectionState::None => InputToken::Word(current_word),
            _ if current_word.is_empty() => return Err("Empty redirection target".into()),
            RedirectionState::Input(num) => InputToken::RedirectInput(num, current_word),
            RedirectionState::Output(num) => InputToken::RedirectOutput(num, current_word),
            RedirectionState::Append(num) => InputToken::RedirectAppend(num, current_word),
        };
        result.push(token);
        result.push(InputToken::Pipe);
        current_word = String::new();
        state = State::Blank;
        redirection_state = RedirectionState::None;
    }

Note that if the current word is empty, and a redirection is being parsed, this is an error. I.e., the input 2>| is an error.

In the Unquoted state, if the next input character is whitespace, then this behaves similar to the | case except you don’t push a Pipe token.

In the Unquoted state, if the next input character is a < or a >, then based on the redirection state, you need to do one of the following:

  1. insert a new Word token
  2. insert a new Redirect* token
  3. change the redirection state from RedirectionState::Output(num) to RedirectionState::Append(num)
  4. throw an Error

The third option occurs when the redirection state is RedirectionState::Output(num) and the current word is empty. For example, when parsing input 3>>foo, the first > changes the redirection state to Output(3) and the state to Unquoted. The second > changes the redirection state to Append(3).

For example, the input >foo>>bar<baz should be parsed as three separate redirection tokens.

Finally, after all of the input has been consumed, based on the state and the redirection state, you may need to insert either a Word or a Redirect* token. You will also need to update your code to handle the Digit state.

Write unit tests. In fact, you should probably write your unit tests first. You should write at least two additional tests, beyond what is provided here. Here are a few to help you get started.

Example

#[test]
fn redirections() {
    use InputToken::*;

    let input = ": >foo >>bar <baz";
    let tokens = split_input(input).unwrap();
    assert_eq!(
        tokens,
        vec![
            Word(":".to_string()),
            RedirectOutput(1, "foo".to_string()),
            RedirectAppend(1, "bar".to_string()),
            RedirectInput(0, "baz".to_string()),
        ]
    );
}

#[test]
fn redirections_nospace() {
    use InputToken::*;

    let input = ":>foo>>bar<baz";
    let tokens = split_input(input).unwrap();
    assert_eq!(
        tokens,
        vec![
            Word(":".to_string()),
            RedirectOutput(1, "foo".to_string()),
            RedirectAppend(1, "bar".to_string()),
            RedirectInput(0, "baz".to_string()),
        ]
    );
}

#[test]
fn digit_redirections() {
    use InputToken::*;

    let input = ": 1word 2>foo 3>>bar 4<baz 56>abc";
    let tokens = split_input(input).unwrap();
    assert_eq!(
        tokens,
        vec![
            Word(":".to_string()),
            Word("1word".to_string()),
            RedirectOutput(2, "foo".to_string()),
            RedirectAppend(3, "bar".to_string()),
            RedirectInput(4, "baz".to_string()),
            Word("56".to_string()),
            RedirectOutput(1, "abc".to_string()),
        ]
    );
}

#[test]
fn empty_redirection_targets() {
    assert!(split_input("<").is_err());
    assert!(split_input(">").is_err());
    assert!(split_input(">>>foo").is_err());
    assert!(split_input("><foo").is_err());
    assert!(split_input("<>foo").is_err());
    assert!(split_input(">><foo").is_err());
}

The good news is this was the hard part! The rest is simpler. (Just for fun, check out the state machine definition for HTML. It has 80 states. And that’s just the first of two state machines used to parse HTML. The second has about 25 states but is even more complex.)