Part 1. Parsing redirections (60 points)
In this part, you’re going to start implementing redirections. Implementing redirections has two parts: parsing the input containing redirections; and (2) performing the redirection when the pipeline is run. In Part 2, you will implement performing the redirections. So let’s get started with parsing!
Recall that a simple command consists of words (the command and the options and arguments) and redirections interspersed. For example, the following commands (even the ridiculous-looking final one) are all equivalent.
$ wc -l <input.txt >output.txt
$ wc <input.txt -l >output.txt
$ wc <input.txt >output.txt -l
$ <input.txt >output.txt wc -l
All of them runwc -l
with stdin connected toinput.txt
and stdout connected tooutput.txt
.
You’re going to implement support for redirecting input from files in the following cases
n>path
wheren
is a digit. This creates a file atpath
(truncating it if it exists) and connects file descriptorn
to the created file.n>>path
wheren
is a digit. This opens a file atpath
(creating if necessary) in append mode and connects file descriptorn
to the opened file.n<path
wheren
is a digit. This opens the file atpath
(it’s an error if the file doesn’t exist) and connects file descriptorn
to the opened file.>path
. The same as1>path
.>>path
. The same as1>>path
.<path
. The same as0<path
.
Your task
There are three basic types of redirections: redirecting input, redirecting output, and redirecting output by appending.
First, add some new variants to your InputToken
enum to handle the three
redirection cases. Note that in each case, a redirection needs two pieces of
data: the file descriptor number (a u32
), and a path (a String
). For
example, one of my variants is
enum InputToken {
// ...
RedirectInput(u32, String),
// ...
}
The difficult part of parsing the input is modifying the state machine to parse the new redirections.
Note that it isn’t sufficient leave the current parsing mechanism in
split_tokens()
alone and simply examine each of the Word
tokens to see if
it has the right form. Consider echo '>/dev/null'
and echo >/dev/null
. If
you correctly implemented the input parsing in the previous lab, then the
output of parsing these should be identical, e.g., [Word("echo"), Word(">/dev/null")]
. The problem is these two commands do not behave the same
way. The first prints out >/dev/null
whereas the second connects echo
’s
stdout to /dev/null
and echo
is run with no arguments.
Recall that the state machine described in the hint in the previous lab has four states. To support redirections, you will need some more states.
There are many ways to implement this. Here’s one way.
Since redirections consist of an optional digit, a redirection operator, <
,
>
, or >>
, and a path, it would be nice to reuse the code you’ve already
written to parse Word
s to parse the paths. You can do this by maintaining
additional state about what type of redirection is currently being parsed.
#![allow(unused)] fn main() { enum RedirectionState { None, Input(u32), Output(u32), Append(u32), } }
The None
variant is when a normal word is being parsed. The u32
keeps track
of which file descriptor will be redirected.
In addition, you will need to be able to distinguish input 5>path
from 5x
.
Since the state machine looks at the input one character at a time, you’ll
need to modify State
to have an additional variant, Digit
which represents
the case where a digit was read in the Blank
state.
#![allow(unused)] fn main() { enum State { Blank, Unquoted, SingleQuoted, DoubleQuoted, Digit(u32), } }
Here’s an updated state machine.
What isn’t represented is how the RedirectionState
changes.
The intended behavior is in the Blank
state, if the next input character is
a digit, transition to state Digit(num)
where num
is the number.
In the Digit(num)
state, if the next input character is <
or >
, set the
redirection state to RedirectionState::Input(num)
or
RedirectionState::Output(num)
and change to the Unquoted
state.
In the Digit(num)
state, if the next input character is a quotation mark,
write num
into the current word and transition to the
appropriate quoted state. This is handling input like 3"foo"
Otherwise, in the Digit(num)
state, if the next input character is anything
else, write num
into the current word and transition to the Unquoted
state. This handles input like 4bar
.
The transitions from the Blank
and Unquoted
states need to be updated to
handle the redirection state.
In the Blank
state, if the next input character is <
or >
, the
redirection state should be set to RedirectionState::Input(0)
or
RedirectionState::Output(1)
and transition to the Unquoted
state.
In the Unquoted
state, if the next input character is a |
, then you either
need to insert a Word
token or one of the Redirect*
token into the
results
vector. And then you need to insert a Pipe
token. Here’s some
sample code for this.
(State::Unquoted, '|') => {
let token = match redirection_state {
RedirectionState::None => InputToken::Word(current_word),
_ if current_word.is_empty() => return Err("Empty redirection target".into()),
RedirectionState::Input(num) => InputToken::RedirectInput(num, current_word),
RedirectionState::Output(num) => InputToken::RedirectOutput(num, current_word),
RedirectionState::Append(num) => InputToken::RedirectAppend(num, current_word),
};
result.push(token);
result.push(InputToken::Pipe);
current_word = String::new();
state = State::Blank;
redirection_state = RedirectionState::None;
}
Note that if the current word is empty, and a redirection is being parsed,
this is an error. I.e., the input 2>|
is an error.
In the Unquoted
state, if the next input character is whitespace, then this
behaves similar to the |
case except you don’t push a Pipe
token.
In the Unquoted state, if the next input character is a < or a >, then based on the redirection state, you need to do one of the following:
- insert a new Word token
- insert a new Redirect* token
- change the redirection state from RedirectionState::Output(num) to RedirectionState::Append(num)
- throw an Error
The third option occurs when the redirection state is
RedirectionState::Output(num)
and the current word is empty. For example,
when parsing input 3>>foo
, the first >
changes the redirection state to
Output(3)
and the state to Unquoted
. The second >
changes the
redirection state to Append(3)
.
For example, the input >foo>>bar<baz
should be parsed as three separate
redirection tokens.
Finally, after all of the input has been consumed, based on the state and the
redirection state, you may need to insert either a Word
or a Redirect*
token. You will also need to update your code to handle the Digit
state.
Write unit tests. In fact, you should probably write your unit tests first. You should write at least two additional tests, beyond what is provided here. Here are a few to help you get started.
#[test]
fn redirections() {
use InputToken::*;
let input = ": >foo >>bar <baz";
let tokens = split_input(input).unwrap();
assert_eq!(
tokens,
vec![
Word(":".to_string()),
RedirectOutput(1, "foo".to_string()),
RedirectAppend(1, "bar".to_string()),
RedirectInput(0, "baz".to_string()),
]
);
}
#[test]
fn redirections_nospace() {
use InputToken::*;
let input = ":>foo>>bar<baz";
let tokens = split_input(input).unwrap();
assert_eq!(
tokens,
vec![
Word(":".to_string()),
RedirectOutput(1, "foo".to_string()),
RedirectAppend(1, "bar".to_string()),
RedirectInput(0, "baz".to_string()),
]
);
}
#[test]
fn digit_redirections() {
use InputToken::*;
let input = ": 1word 2>foo 3>>bar 4<baz 56>abc";
let tokens = split_input(input).unwrap();
assert_eq!(
tokens,
vec![
Word(":".to_string()),
Word("1word".to_string()),
RedirectOutput(2, "foo".to_string()),
RedirectAppend(3, "bar".to_string()),
RedirectInput(4, "baz".to_string()),
Word("56".to_string()),
RedirectOutput(1, "abc".to_string()),
]
);
}
#[test]
fn empty_redirection_targets() {
assert!(split_input("<").is_err());
assert!(split_input(">").is_err());
assert!(split_input(">>>foo").is_err());
assert!(split_input("><foo").is_err());
assert!(split_input("<>foo").is_err());
assert!(split_input(">><foo").is_err());
}
The good news is this was the hard part! The rest is simpler. (Just for fun, check out the state machine definition for HTML. It has 80 states. And that’s just the first of two state machines used to parse HTML. The second has about 25 states but is even more complex.)