Preface
This website contains labs for CSCI 241. They are very much a work-in-progress. There are definitely errors, typos, misspellings, confusing portions, and probably things that are just plane wrong (like that use of “plane”). If you encounter anything confusing, please don’t hesitate to ask.
All corrections, big or small, are gladly welcome!
At the top of this page, you’ll find three icons. The left shows or hides the table of contents. The middle one lets you pick a theme for this book (go ahead and pick one you like). The right one lets you search the labs.
Once we get to the labs where you’ll be programming in Rust, there are a bunch of code samples. Many of these (but not all of them) are executable.
Here’s an example of runnable code.
fn main() { println!("Welcome to CSCI 241!"); }
Mouse over the block of code above and click the Run button to see the output.
The code you write in these labs is designed to run on the lab machines. There is also a virtual machine you may connect to remotely if needed. See the remote coding instructions for details.
Throughout the labs, you’ll find a number of colorful blocks like the example above. Here are a few you’ll encounter in the labs.
Example boxes contain important examples, usually of code that’s similar to the code you will be writing.
Tip or hint boxes contain important information that you’re almost certain to want to use in your implementations.
There are a variety of other boxes, including note, info, and caution. At this point, they have no fixed meaning. As the labs are improved, this may change.
Remote coding
All of the labs can be completed using the lab computers in King 137.
Additionally, there is a virtual machine mcnulty
(mcnulty.cs.oberlin.edu
)
that you can connect to remotely. This instructions for connecting depend on
if you are connecting from on campus or off campus.
In a terminal, you can ssh to mcnulty.cs.oberlin.edu
. This gives you a
shell on the virtual machine. You will have the same home directory as the lab
machines. Any changes you make in the VM will be reflected in the lab
machines, and vice versa.
You can also use Visual Studio Code to connect to mcnulty
. This also uses SSH.
This lets you program remotely using the standard VS Code interface. I find it
pretty convenient. You can also access the shell on the remote machine using
VS Code’s Terminal window as usual.
On campus
From on campus, you can ssh to mcnulty
entering the following command into a
terminal. (The $
indicates the prompt; don’t type it yourself.)
$ ssh username@mcnulty.cs.oberlin.edu
where username
is the username you use to log into the lab computers.
Off campus
From off campus, the mcnulty
VM is not reachable. You need to tell SSH to “proxy
jump” through occs.cs.oberlin.edu
. Fortunately, this isn’t much more
difficult. You just need to pass an option to ssh
:
$ ssh -J username@occs.cs.oberlin.edu username@mcnulty.cs.oberlin.edu
Visual Studio Code
To connect to mcnulty
using VS Code, open a new window File > New Window
and in
the welcome screen of the new window, click the Connect to…
link. Select the
option Connect to Host… Remote-SSH
.
The first time you do this, you’ll have to enter the SSH connection details by
clicking + Add new SSH Host…
.
Then you need to enter the full ssh
command (either the on-campus command or
the off-campus command from above).
VS Code will then ask which ssh_config
file to update and gives several
options. On macOS and Linux, you want the one in your home directory that’s in
.ssh/config
.
Once you’ve done this, click on the Connect to…
link and then
Connect to Host… Remote-SSH
again. Select the host you just configured.
At this point it will ask you to enter the password you use to log in to the
lab machines. If you are off campus, you’ll have to enter the password twice.
Once for occs
, and once for mcnulty
.
The documentation for using SSH with VS Code is available here.
The Visual Studio Code extensions you have installed locally, such as
rust-analyzer
, may not be installed on the remote machine. If this is the
case, then after connecting to the remote machine, you can install the
extensions as normal (click on the 4 boxes icon, select the extension you
want, and then click Install in SSH: mcnulty
).
Lab 0. Getting Started
Welcome to the 0th lab in CS 241! In this short lab, you’re going to
- Log in to a lab machine;
- Install Visual Studio Code extensions;
- Install Visual Studio Code extensions remotely; and
- Debug a short Bash program.
This lab is to be completed individually.
Setup
To get started, first log into the lab machine.
If the lab machine is currently running Windows, you’ll need to restart it. During the boot process, a text screen with different operating systems will pop up for 5 seconds. When this pops up, use the keyboard to select the Ubuntu option (which will probably be the second in the list, just under Windows).
Once you have logged in, open the Visual Studio Code application. To do so,
click the Show Applications
button in the bottom left corner of the desktop,
and then open Visual Studio Code
.
Move to Part 1 to install the extensions.
Part 1. Installing extensions
In this part, you’re going to install four extensions,
Remote - SSH
— this lets you connect to remote servers using Secure SHell (SSH);ShellCheck
— this checks your shell scripts for common errors;rust-analyzer
— this greatly eases the task of programming in Rust; andCodeLLDB
— this is the debugger we’ll use to debug Rust programs.
Your task
Click on the 4-boxes icon on the left side of the Visual Studio Code window.
This opens the Extensions Marketplace. In the textbox at the top left, enter
the name of the extension you want to install. Click on the matching extension
below the text box and then click the Install
button.
After installing an extension, it may ask you to reload Visual Studio Code. Go ahead and do so.
Install all four extensions listed above.
Part 2. Installing remote extensions
In this part, you’re going to connect to a remote server, mcnulty
, using
Visual Studio Code and install the ShellCheck
, rust-analyzer
, and
CodeLLDB
extensions.
Your task
Start by reading the Remote coding page.
Follow the instructions on the Remote
coding page for Visual Studio Code to
connect to mcnulty
.
At this point, follow the steps in Part 1 to install the three
extensions. This time, rather than an install button, you should see an
Install in SSH
button.
In the future, you’ll be able to connect to mcnulty
from an installation of
Visual Studio Code on your own machines, even if you’re not in the lab. All of
the files on your lab machine should also be available on mcnulty
.
Part 3. Debugging a shell script
In this final part, you’re going to use the ShellCheck
extension you just
installed to debug a short Bash program.
Your task
Create a new file in Visual Studio Code by selecting New File…
from the
File
menu. Name the file hello.sh
and save it in your home directory.
Copy the short program below into hello.sh
and save the file.
#!/bin/bash
if [[ $# -gt 0 ]]; then
user="$1"
else
user="$(whoami)"
fi
echo Hello ${user}, welcome to CS 241
The line that starts with #!
is called a
shebang and it tells the
operating system what program to use to run the file, namely /bin/bash
.
Next is a somewhat unusual looking if
statement that sets the variable
user
. If the user runs the script and passes foo
as an argument like this
$ bash hello.sh foo
then user
will be set to foo
. If the user runs the script with no arguments like this
$ bash hello.sh
then user
will be set to name of the current user by calling the whoami
program.
The final line of the program prints text. Notice that ${user}
is used to
get the value of the user
variable. (You can also use $user
without the
braces, but there are situations where the braces are required so I tend to
use braces all the time.)
Open a terminal in VS Code by pressing Ctrl-` (control-backtick). Run the program a few times with different arguments and pay close attention to how the program behaves.
$ bash hello.sh
$ bash hello.sh Stu Dent
$ bash hello.sh "Stu Dent"
$ bash hello.sh "Stu Dent"
Notice that $ bash hello.sh Stu Dent
didn’t include the Dent
portion
whereas when the quotation marks were used, the full name appeared. Bash uses
spaces to split the command into different parts. This is called word splitting.
Stu
and Dent
became
separate arguments passed to hello.sh
. By using quotation marks, the whole
string stays together as a single argument.
This will come up often when working with the terminal.
You may have noticed strange behavior with the final two examples. Namely,
"Stu Dent"
and "Stu Dent"
behaved identically:
$ bash hello.sh "Stu Dent"
Hello Stu Dent, welcome to CS 241
Where did the extra spaces go?
Look in VS Code and you’ll notice that ${user}
has a squiggly line under it.
Mouse over the line and you’ll see that the ShellCheck
extension has
reported an error here. You can see a list of all problems by opening the
Problems
pane by selecting Problems
from the View
menu.
ShellCheck
says, “Double quote to prevent globbing and word splitting.
shellcheck(SC2086).” Actually, the ShellCheck
extension merely runs the
shellcheck
program. If we run shellcheck
ourselves using the terminal as
shown below, it will give us some more detail:
$ shellcheck hello.sh
In hello.sh line 9:
echo Hello ${user}, welcome to CS 241
^-----^ SC2086 (info): Double quote to prevent globbing and word splitting.
Did you mean:
echo Hello "${user}", welcome to CS 241
For more information:
https://www.shellcheck.net/wiki/SC2086 -- Double quote to prevent globbing ...
In addition to showing the error, it also suggests and fix and provides a link for more information.
The underlying problem here is that ${user}
expanded to
Stu Dent
but this underwent word splitting when evaluating the echo
line. echo
takes any number of arguments and prints them out in a line with a single
space between them. So when executing the line
echo Hello ${user}, welcome to CS 241
it first expanded ${user}
giving
echo Hello Stu Dent, welcome to CS 241
This is split by spaces and thus executes exactly as if it had been written like this.
echo Hello Stu Dent, welcome to CS 241
Modify hello.sh
by following the suggestion given by shellcheck
and save
your file. Run the program again. This time, any spaces should be preserved in
the argument.
In general, when programming in Bash, you want to double-quote variables otherwise the variables undergo word splitting and you almost never want that.
See the ShellCheck Wiki for error SC2086 for more explanation as well as a
suggestion for a better way to use quotation marks in the echo
line in
hello.sh
.
Part 4. Getting Ready For Class
In this course, we will use a number of different online resources. In this part of the lab, we will make sure you are set up and ready to go.
Gradescope
Go to gradescope.com and log in using your oberlin.edu email account. (You may need to create an account.) When you log in, you should see CSCI 241 listed under Your Courses. If you click on the course, you should see some Reading Question assignments. You will need to complete these before class on the date listed on them.
If you do not see CSCI 241 listed as one of your courses, let your instructor know.
Ed
You should have received an email inviting you to the CSCI 241 Ed forum. If you did not, you can sign up with your oberlin.edu email address using this link. Sign up, and post a response to the Introductory Thread.
Syllabus
Please take a minute to read through the syllabus for this course (linked from Blackboard).
Submitting
In the other labs, you will be using GitHub to submit your assignments.
For Lab 0, call over the professor or lab helper, introduce yourself, and show your completed program.
Once you’ve done this, the professor or lab helper will mark that you have completed the lab.
At this point, you’re all done. Don’t forget to log out of the lab machines!
Lab 1. Introduction to Bash
Due: 2024-09-16 at 23:59
In this lab, you will explore the Bash shell and learn to use some of the standard command line utilities to
- navigate the file system;
- list the contents of a directory;
- create, move/rename, and delete files and directories;
- download files from the internet;
- read manual pages to learn how to use these utilities; and
- upload files to GitHub.
This lab is to be completed individually.
Setup
Log in to one of the lab machines and open a terminal by clicking the Show Applications
button in the lower left corner of the desktop and open
Terminal
.
To get started, continue on to Part 1!
Part 1. Navigating the file system
The file system is a hierarchical arrangement of files, grouped into
directories. The hierarchy forms a
tree of the sort you
learned about in CSCI 151. Each file and directory has a name and we can refer
to any file in the file system by giving a path through the tree, starting
at the root directory and using /
to separate the directories along the
path.
Here are some example paths.
/
The root directory itself/bin/bash
The programbash
(this is the shell we’re using right now!) which lives inside the directorybin
which lives inside the root directory/usr/bin/wget
The programwget
(this program will let us download files from the internet) which lives in the directory/usr/bin
Naming files by starting at the root /
all the time is time-consuming. Each
program has a notion of its current directory. We can also name files by
giving a path from the current directory.
Here are some examples of relative paths.
paper.pdf
The filepaper.pdf
inside the current directoryMusic/swift.mp3
The fileswift.mp3
inside theMusic
directory inside the current directory
The terminal you opened launched the bash
shell and it should be currently
running. You should see a prompt in the terminal that looks like
user@mcnulty:~$
(yours will have a different username and hostname—the name of
the computer). From here on out, the prompt will be specified only as $
. You
should not type the $
character when entering input.
Bash has several commands for navigating the file system and listing the contents. The most commonly used commands are
cd
change current directory;$ cd /usr/bin
changes the current directory to/usr/bin
; with no arguments,cd
changes the current directory to your home directoryls
lists the contents of a directory;$ ls /tmp
lists the contents of the directory for temporary files,/tmp
; with no arguments,ls
lists the contents of the current directorypwd
prints the current directory (the name stands for “print working directory” where “working directory” just means the current directory).
By convention, files and directories that start with a period are hidden from directory listings. If you want to see all files, including hidden ones, use $ ls -a
.
Every directory has two directory entries named .
and ..
. The single period, .
, is the entry for the directory itself. Two periods, ..
, is the entry for the parent directory.
Perhaps the most common use of ..
is to change to the parent directory using $ cd ..
. For example, after changing to a directory like $ cd foo/bar/
, I can return to the foo
directory by running $ cd ..
.
Your task
Create a file named task1.txt
using a text editor. To do so, click the Show Applications
button in the bottom left corner of the desktop, and then open
Text Editor
.
Using the shell commands cd
and ls
, find two different files or
directories in each of the following directories
/
/usr
/usr/bin
/etc
Write the names of the 8 different files (and which directory they’re in) in
task1.txt
. Save this file in your home directory for now. We’re going to
move it later.
The shell expands a tilde, ~
, into the the path for your home directory. Thus, you can, for example, change back to your home directory using $ cd ~
. You probably want to do so now.
You may construct paths relative to your home directory by starting the path with ~/
. For example, the shell will expand ~/foo/bar
into the path to the foo/bar
file or directory inside your home directory.
Note that the shell treats an unquoted ~
differently than a quoted tilde, '~'
or "~"
. Run the command $ echo ~ '~' "~"
to see the difference.
Use $ ls ~
to print the contents of your home directory. You should see task1.txt
in that directory listing.
Use $ cat ~/task1.txt
to print the contents of the file to the terminal.
Return to your home directory by running $ cd
(with no arguments) or
equivalently $ cd ~
.
Part 2. Manipulating files and directories
In Part 1, you created a file in your home directory using a text editor and
you were able to see that file in the file system using the ls
command. Now,
we’re going to do the opposite: We’re going to manipulate files on the command
line and examine the results in the GUI—the graphical user interface.
We’re going to be using some new commands.
mkdir
Creates a new directorycp
Copies files or (with a suitable option) directoriesrm
Removes files or (with a suitable option) directoriesmv
Moves (or renames) files and directoriesrmdir
Removes an empty directory (this is less commonly used thanrm
for this operation)man
This shows a manual page which provides a bunch of information about many of the shell commands and programswget
Downloads files from the Internet
Your task
Create the directory cs241/lab1
in your home directory using the mkdir
command. (If your current directory is not your home directory, you can run $ cd
or equivalently $ cd ~
to change to your home directory.)
If you try the “obvious” command, you’ll likely encounter a confusing error.
$ mkdir cs241/lab1
mkdir: cannot create directory ‘cs241/lab1’: No such file or directory
The problem here isn’t that cs241/lab1
doesn’t exist (which is good because
we’re trying to create this directory), the problem is that the cs241
directory doesn’t exist. It’d be great if the error message reflected which
component of the path didn’t exist, but alas, it does not.
In general, pay close attention to error messages but be aware that when the error message is about a file path, it could be about any component of that path.
To create the cs241/lab1
directory, you can use two invocations of mkdir
,
first to create cs241
and second to create lab1
within it.
Move the file task1.txt
you created in Part 1 inside the cs241/lab1
directory using mv
. The basic syntax for the mv
command is
$ mv src dest
Keep in mind that when naming a file with a path, you either need to give it
an absolute path starting with /
or you need to give it a path from the
current directory. Using ~
for your home directory can be a big time saver
when the files you want to use are in your home directory (or in a directory
in your home directory).
Open your home directory using the Files
application from the left sidebar
in the desktop. The task1.txt
file should not be there. Open the
cs241/lab1
directory. This should contain task1.txt
and nothing else.
Create a directory named books
in the lab1
directory using mkdir
(you
should see the new directory appear in Files
).
Run $ man wget
to pull up the manual page for the wget
program which you
can use to download files from the Internet. Press q
to exit man
when
you’re done. Run $ wget --help
to see similar information.
cd
into the books
directory you created and use wget
to download a copy
of Bram Stoker’s Dracula from
https://www.gutenberg.org/cache/epub/345/pg345.txt
.
Rename the file from pg345.txt
to Stoker, Bram - Dracula.txt
. Note that
this name has spaces in it. You’ll need to specify it as 'Stoker, Bram - Dracula.txt'
or Stoker,\ Bram\ -\ Dracula.txt
.
Use wget
again to download James Joyce’s Dubliners from
https://www.gutenberg.org/files/2814/2814-0.txt
. Rename it Joyce, James - Dubliners.txt
.
Different versions of ls
will print file names with spaces in them differently. Some versions simply print the file names with the spaces. Others print the file names with quotes around them. When you run ls
inside the books
directory, you may see something like this.
$ ls
'Joyce, James - Dubliners.txt' 'Stoker, Bram - Dracula.txt'
At this point, the files and directories in your cs241
directory should look
like this.
cs241/
└── lab1
├── books
│ ├── Joyce, James - Dubliners.txt
│ └── Stoker, Bram - Dracula.txt
└── task1.txt
If you’d like to generate a similar picture of your files, try the tree
command. tree directoryname
will print a drawing like this of the directory contents to the terminal.
Part 3. Grepping
The grep
command is used for searching for text files for a “pattern” and
printing out each line that matches the pattern. For example, $ grep vampire 'Stoker, Bram - Dracula.txt'
prints out each line containing the word
vampire
(in lower case). Try it out!
Read grep
’s man page to figure out how to perform a case-insensitive search
and run the command to print out all lines matching vampire
, case
insensitively. Hint: typing /case
(and then hit enter) while viewing a man
page will search for case
in the manual. While searching, you can press
n
/N
to go to the next/previous instance.
Your task
Using a text editor, create the file task3.txt
in the lab1
directory.
Use grep
to print out a count of the lines matching vampire
case
insensitively. (Search the man page again.) Write the command you used and
its output in task3.txt
Open the man page for grep
one final time and figure out how to get grep
to print the line numbers (and the lines themselves) that match Transylvania
and then do that. Write the command you used and its output in task3.txt
.
Find and use the command to print out a count of every word in both books. Use
the apropos
command to search manual pages for keywords. The -a
(or
--and
) option to apropos
lets you search for commands that involve all of
the key words so $ apropos -a apple sauce
will match commands whose
descriptions contain both the words apple and sauce. So use apropos -a
with
appropriate keywords to find a command that produces a word count.
Run that command on all .txt
files in the books
directory using
$ cmd books/*.txt
where cmd
is the command you found with apropos
.
The *
is called a glob and Bash replaces it with all files that match. In
this case, books/*.txt
will be replaced with paths to files in the books
directory that end with .txt
. Globs are powerful and make working with
multiple files with similar names easy.
Put the command you ran and its output in task3.txt
.
Finally, read the man page for the command to count words in files and find
the option to only print line counts. Use the command on both files. Put the
command and the output in task3.txt
.
Submitting (5 points)
In this course, we’re going to be using GitHub. GitHub uses Git which is a version control system used by programmers. We’re going to learn a lot more about Git later, but for now, let’s use its most basic functionality.
Go to GitHub and create an account, if you don’t already have one. Many of the courses in the computer science major will require using GitHub so you’ll be well-served by getting comfortable with it now.
Go to the GitHub Classroom assignment page and accept the assignment. Wait a few seconds and then reload the page and you should see “You’re ready to go!” Click the link for the assignment repository.
The repository it created should be at a URL like
https://github.com/systems-programming/lab1-accountname
. The last component
of the URL will be some combination of the lab name and your GitHub user name.
You will need this in the cloning step below.
Next, you need to clone the assignment repository which will download all of
the files (if any) in the remote repository on GitHub to your computer. To do
this, you’ll need to use gh
to authenticate to GitHub and then perform the clone.
gh
is the command-line tool for working with GitHub. Most of the time, you
will use plain Git commands, but for authentication and cloning repositories,
you’ll use gh
.
In a terminal, run the command
$ gh auth login
and then using the arrow keys, select GitHub.com
and press enter. Select
HTTPS
for the preferred protocol and press enter. When it asks to
authenticate Git with your GitHub credentials, type Y
and press enter. When
it asks how you’d like to authenticate, select Login with a web browser
and
press enter. Copy the one-time code it displays and then press enter to open
the browser. If it cannot open the browser, you can click
here.
In either case, paste the code you copied and login (which should only be
required if you have logged out of GitHub). Click the Authorize github
button on the web page. At this point, you should see something like
✓ Authentication complete.
- gh config set -h github.com git_protocol https
✓ Configured git protocol
✓ Logged in as accountname
in the terminal.
Now you’re ready to clone your repository. (In future labs, you shouldn’t need to authenticate again unless you log out.) Run
$ gh repo clone systems-programming/lab1-accountname
where lab1-accountname
should match the URL for the repository described above.
At this point, you should have a directory named lab1-accountname
(where,
again, the actual name is specific to your repository). This is your
newly-created local repository.
Copy all of the files and directories from lab1
into the newly-created local
repository’s directory. Read the man page for cp
to figure out how to
recursively copy directories.
cd
to your local repository directory. The contents of this directory
(recursively) should look like this.
repo
├── books
│ ├── Joyce, James - Dubliners.txt
│ └── Stoker, Bram - Dracula.txt
├── task1.txt
└── task3.txt
We want to add the files to the repository, commit them, and then push them to GitHub. (We’ll talk about why there are separate steps later. You can think of these three operations as “inform Git about the current state of the files,” “have Git make a snapshot of the current state of the repository,” and “tell GitHub about the changes we made in the local repository.”
These are the commands you need to run.
$ git add books task1.txt task3.txt
$ git commit -m 'Submitting lab 1'
$ git push
If you make changes to a file, you need to run $ git add
and pass it the path of the file you changed again. And then you’ll need to run $ git commit
and finally $ git push
.
The -m 'Submitting lab 1'
causes Git to create a commit message with the
contents Submitting lab 1
. Normally, we’ll want to create more descriptive
messages, but this is fine for now.
At this point, you’re done! You can check that everything worked correctly by going back to your repository on GitHub and checking that all of the files and directories show up. You should see two directories and two files:
.github
books
task1.txt
task3.txt
The .github
directory was added automatically by GitHub Classroom. So why
didn’t it show up in your local repo? Because its name starts with a period!
Recall that files whose names start with a period are hidden. You can see the hidden files
using ls -a
. If you do this, you’ll also see a .git
directory
too. .git
is where Git stores its local configuration.
Finally, you can remove your lab1
directory that we started with now that
you’ve copied everything over to your local repository for lab 1. Since lab1
isn’t empty, $ rmdir lab1
won’t work to remove it. And since it’s a
directory, $ rm lab1
work work either! Go ahead and try both commands out.
Read the man page for rm
to figure out how to remove directories
recursively.
Following the instructions above regarding which files should appear in your repository and which must be removed is worth 5 points. Every subsequent lab also has 5 free points for following the submission directions.
Lab 2. Shell scripting
Due: 2024-09-23 at 23:59
Shell scripts show up all over the place from programs on your computer to building software to running test suites as part of continuous integration. Learning to write (often short) Bash scripts can be a massive productivity boost as you can automate many tedious tasks. For example, if you find yourself needing to run a sequence of shell commands multiple times, a shell script is often the perfect tool. No more forgetting an argument or accidentally running the commands out of order. Write it once, run it many times.
In this lab, you will learn
- to write Bash shell scripts;
- how to run programs multiple times with different arguments;
- how to use tools to analyze your scripts for common errors.
Preliminaries
First, find a partner. You’re allowed to work by yourself, but I highly recommend working with a partner. If you choose to work with a partner, you and your partner must complete the entire project together. Dividing the project up into pieces and having each partner complete a part of it on their own will be considered a violation of the honor code. Both you and your partner are expected to fully understand all of the code you submit.
Click on the assignment link. One partner should create a new team. The second partner should click the link and choose the appropriate team. (Please don’t choose the wrong team, there’s a maximum of two people and if you join the wrong one, you’ll prevent the correct person from joining.)
Once you have accepted the assignment and created/joined a team, you can clone the repository and begin working. But before you do, read the entire assignment and be sure to check out the expected coding style.
Be sure to ask any questions on Ed.
Coding style
For all of the shell scripts you write, I recommend you follow the Google Shell Style Guide. Look at the section on formatting. These rules are, in many ways, arbitrary. However, it’s good to stick to one style. The guide says to use two spaces to indent. Use whatever you’d like, but be consistent.
Once you open a shell script, the bottom of the window should show something like
Ln 1, Col 1 Spaces: 4 UTF-8 LF Shell Script
You can change the number of spaces by clicking on Space: 4
, then Indent using spaces
and then select 2 (to match the Shell Style Guide).
If you use NeoVim or Vim as your editor, you can include the line (called a modeline)
# vim: set sw=2 sts=2 ts=8 et:
at the bottom of each of your scripts to force Vim to indent by 2 spaces and to ensure that tabs will insert spaces (to match the Shell Style Guide). This requires “modeline” support to be enabled which is disabled by default. You can enable it by adding the line
set modeline
to your ~/.vimrc
file, creating it if necessary.
You can also set options in your ~/.vimrc
file, creating one if necessary.
For example, on mcnulty, I have the simple ~/.vimrc
.
set background=dark
filetype plugin indent on
autocmd FileType sh setlocal shiftwidth=2 softtabstop=2 tabstop=8 expandtab
set modeline
The first line tells Vim to use colors suitable for a terminal with a dark background. The second line tells Vim to use file-type aware indenting. The third line tells Vim to set those options for shell script files. See the Vim wiki for more details. And the fourth enables modelines.
After you write the #!/bin/bash
line at the top of your file (or a modeline
at the bottom), you’ll probably want to reopen the file so that Vim knows its
a bash file and turns on the appropriate syntax highlighting and indentation.
If you use emacs, you’re kind of on your own. Feel free to ask on Ed, search StackOverflow, and read the Emacs Wiki.
Same with Nano. This might be useful.
Run time errors and return values
For each of the parts below that ask you to print out a usage message or an
error message, this message should be printed to stderr
. You can use code
like this to print a usage message.
echo "Usage: $0 arguments" >&2
Any errors should cause the script to exit with a nonzero value (1 is a pretty
good choice). Scripts that run successfully should exit with value 0. You can
use exit 1
to exit with the value 1.
Script warnings and errors
Make sure your scripts pass shellcheck
without errors or warnings. If you
disable a particular warning, you must leave a comment giving a very good
reason why you did so.
Executable scripts
All of your scripts should start with the line
#!/bin/bash
and must be executable. Running $ chmod +x
on each of the files before
adding them to Git is sufficient.
Script documentation
Each of your scripts should contain a comment at the top of the script (below
the #!/bin/bash
line) that contains usage information plus a description of
what the program does.
Example.
#!/bin/bash
# Usage: shellcheckall [dir]
#
# The dir parameter should be the path to a directory. Shellcheckall will...
To get started, continue on to Part 1!
Part 1. Multiple executions (15 points)
One of the most common tasks when working with computers is running a program multiple times with different inputs. (As just a single example, in the final project for CSCI 210, you first write a program to simulate how a part of the processor operates. You will then run this program with different options on different inputs. This is an error-prone operation. Instead, you could write a simple shell script to run the program all of the different ways you need.)
In this part, you will be writing a shell script to run a program (that is provided to you)
Clone the jfrac repository outside
of your assignment repository. Do not commit the jfrac
code to your
assignment repository! The jfrac
repository holds the source code for a
simple program, written in the Rust programming language, that generates
images of the Julia set
fractal.
Inside the jfrac
directory, run the following command.
$ cargo run -- default.png
The cargo
tool is used to building (and in this case running) software
written in Rust.
This will produce an image file that is 800 pixels wide and 800 pixels high
named default.png
. It looks like this.
.
The mathematics behind the Julia set are explained on the Wikipedia page, but
are not important for us here. What is important is that by changing the
complex number in the equation , we can drastically change
the output fractal. The number is specified by passing the --constant
option to the program. For example, the default constant is so
running
$ cargo run -- --constant="-0.4 + 0.6i" explicit.png
will produce the file explicit.png
which is identical to default.png
shown
above. The strange --
by itself is required here. Everything before the --
is an option or argument for cargo
. Everything after the --
is an option
or argument for the jfrac
binary itself.
There’s no man page for jfrac
, but it comes with a short help. Run
$ cargo run -- --help
to see the options.
Your task
First, play around with the --constant
option to jfrac
to produce images
with different constants. Look at the examples given in the quadratic
polynomials
section of Wikipedia for inspiration. You’ll likely want to select values of
such that and .
Find 5 values of that you like.
Your task is to write a Bash script named genfrac
which will run the jfrac
command
several times (by using cargo run
as shown above) using two nested for
loops.
Your script must generate each of the 5 fractals at 3 different sizes, 200 × 200, 400 × 400, and 800 × 800. You can assume your script is running inside the jfrac directory - however, once you are done testing and running it you will need to move it to your lab2 repository, and make sure not to submit the jfrac directory.
Each file should be named Julia set (constant) size.png
. For example, using
the default constant and default size, you would produce the file named Julia set (-0.4 + 0.6i) 800x800.png
.
You should have 15 generated images in total. Add those images to your
repository in an images
directory.
Part 2. Simple pipeline (5 points)
For several subsequent parts of this lab, you’re going to need to run the scripts you write on a large code base. We’re going to use the source code for the Linux kernel. You’re going to write a one-line shell script that downloads and decompresses the source code.
Your task
Write a one-line script that will uses curl
to download the compressed Linux
source code and pipes it to tar
which extracts the source code into the
current directory. The Linux source code is available at this URL.
https://github.com/torvalds/linux/archive/refs/tags/v6.4.tar.gz
Your command should be structured as follow.
$ curl ARGS | tar ARGS
After you run the command, you should have a new directory named linux-6.4
which contains the source code to version 6.4 of Linux. Do not commit the
Linux source code to your assignment repository.
Write the pipeline command you come up with in a file named README
.
Normally, a README
provides information about a project. GitHub will display
the contents of your README
when you visit the web page for your repository
(after you have pushed the README
to GitHub). You’ll use this same file
several times to record the commands you come up with.
- If you run
curl https://github.com/torvalds/linux/archive/refs/tags/v6.4.tar.gz
with no other options, no output is printed. That’s because GitHub, like many websites will redirect the browser to different URLs. The way this works technically is to use a redirection HTTP status with aLocation
header. Search thecurl
man page to figure out which option to use to getcurl
to follow the redirection specified in theLocation
header. - The
tar
command, like many (but sadly not all) command-line utilities that operates on files, will accept a single-
as the name of the file to mean read fromstdin
rather than reading from the file (and for output, write tostdout
rather than writing to a file).
Part 3. Shell script hygiene (25 points)
Shell scripting is a powerful tool that is used in many areas of software
development, including build scripts (that build the software), test scripts (that
test the software), and conformance checking scripts (that check things like
formatting conventions). An example of a conformance checking script might be
one that is run on every git commit
to check that the files being changed
have no errors or warnings according to some tool.
You’re going to write a conformance checking script that checks if all of the
shell scripts in a directory pass shellcheck
.
To find all of the shell scripts in a directory, you will use the find
command. Conceptually, we want to loop over all of the files that have a
particular extension and do something with them. For example, if we want to
print the paths of every text file in a directory whose path is stored in the
dir
variable, we’d like to be able to do
something like this:
Unfortunately, this doesn’t work! If we run shellcheck on this we get a warning
for file in $(find "${dir}" -name '*.txt'); do
^-- SC2044 (warning): For loops over find output are fragile. Use find -exec or a while read loop.
The problem here is that file names can contain spaces (and even newline characters!). The shellcheck wiki page for error SC2044 gives example code to make this work. It would look something like
while IFS= read -r -d '' file; do
echo "${file}"
done < <(find "${dir}" -name '*.txt' -print0)
Note carefully the space after the =
and the space between the two <
characters!
What this arcane construction is doing is it’s first running
find "${dir}" -name '*.txt' -print0
which is the same as the previous find
command except that rather than
printing the matching paths separated by a newline (which, as mentioned, is a
valid character in a file name), it separates the output with a 0
byte. (Not
the character 0
which is a byte with integer value 48, but the byte with
value 0.)
The output of the find
command becomes the standard input for the
while IFS= read -r -d '' file; do ... done
loop. The loop is going to read from stdin
(from the read
command) and
split up the input by 0 bytes (which, conveniently, is what the -print0
argument to find
produced).
Each time through the body of the loop, the file
variable will be set the
path of one of the files that find
found.
Your task
Write a shell script called shellcheckall
which takes zero or one parameters.
The parameter, if given, should be a path to a directory. If no parameters are
given, it should act on the current directory. If two or more parameters are
given, output usage information to stderr
, and exit with return value 1. If
the supplied parameter is not a directory, output an error message (on
stderr
) and exit with return value 1.
The script should search the given directory (and any directories inside of
it) to find all of the files with the extension .sh
. It should run
shellcheck
on each file and count how many scripts pass out of the total
number of shell scripts. If any of the scripts fail shellcheck
, then when
your script exits, it should exit with return value 1. See the examples below,
including the return values.
Make sure you handle files with spaces in the name. Make sure the script works correctly when run on an empty directory and one that contains no shell scripts (see the examples below).
How many shell scripts in the entire linux-6.4
pass shellcheck
? Write your
answer in your README
.
Write a for-loop that runs shellcheckall
on each directory in
linux-6.4
. It should print out the name of the directory, a colon, a space,
and then the output from shellcheckall
. Your loop should probably start like
this.
for dir in linux-6.4/*/; do
The /
after the *
means the glob will only match directories. You’ll want
to use echo -n
to print text without a trailing newline. See the final
example below.
Put your for-loop and the output in your README
.
The following examples echo the return value $?
to show that it returns 0 on success and 1 on an error or if a file does not pass shellcheck
.
$ ./shellcheckall too many args
Usage: ./shellcheckall [dir]
$ echo $?
1
$ ./shellcheckall linux-6.4/tools/perf
6 of 74 shell scripts passed shellcheck
$ echo $?
1
$ ./shellcheckall empty-directory
0 of 0 shell scripts passed shellcheck
$ echo $?
0
Here’s an example of the for
loop output.
$ for dir in linux-6.4/*/; do ...; done
linux-6.4/Documentation/: 1 of 9 shell scripts passed shellcheck
linux-6.4/LICENSES/: 0 of 0 shell scripts passed shellcheck
linux-6.4/arch/: 15 of 39 shell scripts passed shellcheck
...
Part 4. Top file types (20 points)
In this part, you’re going to write a complex pipeline to answer a simple question: What are the top 8 file types in the Linux 6.4 source code. We’re going to use a file’s extension to determine what type of file it is. We’re going to ignore any files that don’t have extensions.
Your task
Write a script topfiletypes
to answer this question. Your script should
consist of a single complex pipeline. Put the output of your script in your
README
.
The structure of the pipeline will look like this
while IFS= read -r file; do ...; done < <(find linux-6.4 -name '*.*') | ... | ... | ... | ...
Notice how the output from the while
loop is passed as standard input to the
next command in the pipeline.
Your output should look like this.
$ ./topfiletypes
32463 c
23745 h
3488 yaml
...
- The
while
loop given here is slightly different from the one you used before (no-d ''
or-print0
options). Most filenames don’t actually have newlines in them so we can use this technique to read one line at a time from the output of thefind
command rather than reading up to a 0 byte. - Bash supports a lot of convenient variable
expansions.
You want to read the section of that page that describes
${parameter##word}
. Here are some examples:
You can use this inside your$ x=foo.bar.baz $ echo "${x##*.}" baz $ y=linux-6.4/mm/mempool.c $ echo "${y##*.}" c
while
loop to print the extension of the file. - The
sort
command can be used to sort lines of input. For example, given the fileexample
if we runfoo bar foo foo cat cat foo
$ sort example
, we getbar cat cat foo foo foo foo
- The
uniq
command can be used to compress identical, consecutive lines of input into a single line of output.sort
anduniq
are frequently used together assort | uniq
to read lines of input, sort them, and then only output the unique lines. Running$ uniq example
produces
Runningfoo bar foo cat foo
$ sort example | uniq
givesbar cat foo
- The
uniq
command can take an argument to print out the count of each line. Here’s$ sort example | uniq -c
.1 bar 2 cat 4 foo
sort
can sort in reverse as well as performing a numeric sort (i.e., sort numbers in the usual way so that 9 comes before 10).- The
head
command can be used to print the first several lines of a file. Look up the options to print the first 8 rather than the default 10. - In the loop, print out the extension of the path in the
file
variable. Use a combination ofsort
,uniq
, andhead
in the pipeline to produce the final result. You’ll want to usesort
multiple times. - You can write this all as a single line. Don’t do that. Use
\
at the end of each line to continue on the next line.while IFS= read -r file; do echo ... done < <(find linux-6.4 -name '*.*') \ | ... \ | ... \ | ... \ | ...
Submitting (5 points)
To submit your lab, you must commit and push to GitHub before the deadline.
Make sure you followed all of the instructions about stderr
and return values, script warnings an errors, executable scripts, and script documentation.
Your repository should contain the following files and directory.
README
genfrac
shellcheckall
topfiletypes
images/
Your repository should not contain either the jfrac
or the Linux code.
Any additional files you have added to your repository should be removed from
the main
branch. (You’re free to make other branches, if you desire, but
make sure main
contains the version of the code you want graded.)
The README
should contain
- The names of both partners (or just your name if you worked alone…but please don’t work alone if you can manage it).
- Your answers to questions for each part and the commands you used to find them.
Lab 3. Introduction to Rust
Due: 2024-09-30 at 23:59
Preliminaries
First, find a partner. You’re allowed to work by yourself, but I highly recommend working with a partner. If you choose to work with a partner, you and your partner must complete the entire project together, as a single repository. Dividing the project up into sections and having each partner complete a part of it on their own will be considered a violation of the honor code. Both you and your partner are expected to fully understand all of the code you submit.
Click on the assignment link. One partner should create a new team. The team name cannot be the same name used for a previous lab. The second partner should click the link and choose the appropriate team. (Please don’t choose the wrong team, there’s a maximum of two people and if you join the wrong one, you’ll prevent the correct person from joining.) You cannot choose a team name you’ve used previously.
Once you have accepted the assignment and created/joined a team, you can clone the repository and begin working.
Be sure to ask any questions on Ed.
Compiler warnings and errors
Make sure your code compiles and passes clippy
without errors or warnings.
That is, running
$ cargo clean
$ cargo build
$ cargo clippy
should build your program without errors or warnings. Running these commands
will only work if you have cd
ed into a Rust project directory.
Cargo’s clippy
is a command-line utility for linting Rust code. A linter automatically analyzes source code for errors, vulnerabilities, and stylistic issues to improve code quality. Cargo clippy
will provide suggestions and warnings for potential errors in your code, such as unused variables, unnecessary operations, and more. For example, it will suggest more efficient ways of writing your code and point out common pitfalls that could lead to runtime errors.
VS Code doesn’t always update its error underlining until after you save your file. So make sure to do that with some regularity while you’re writing code.
Formatting
Your code must be formatted by running $ cargo fmt
. This will automatically format your code in a consistent Rust style.
To get started, continue on to Part 1!
Part 1. Guessing game (15 points)
The first application you will write is a number guessing game.
The tutorial you’re going to be following for this part of the lab contains a
discussion of the integer types like i32
, i64
, and u32
. It talks about
how many bits each of them is, 32 or 64, respectively. You’ll learn a lot
more about this in CSCI 210. To relate this to Java, an i32
is Java’s int
type. An i64
is Java’s long
type. The u32
and u64
types don’t have
analogues in Java. These are the unsigned types meaning that they can only
hold non-negative values. This is perhaps best demonstrated with an example.
We can print out the maximum and minimum values these types can hold as
follows.
#![allow(unused)] fn main() { println!("An i32 holds values from {} to {}", i32::MIN, i32::MAX); println!("An i64 holds values from {} to {}", i64::MIN, i64::MAX); println!("An u32 holds values from {} to {}", u32::MIN, u32::MAX); println!("An u64 holds values from {} to {}", u64::MIN, u64::MAX); }
Click the Run button to see the results. There are other integer types: i8
, i16
, u8
, u16
, isize
, and usize
. You can get their minimum or maximum values the same way.
Your task
Follow the instructions given in Chapter 2 of the book with the following adjustments:
-
In the section “Setting Up a New Project,” it tells you to go to a
projects
directory. Instead,cd
into the assignment repository you cloned. You’ll createguessing_game
in there when you runcargo new guessing_game
It’s important that you do this from inside the assignment repository. Don’t run
cargo new guessing_game
and then latermv
the directory into the assignment repo.After you run
cargo new guessing_game
, open up Visual Studio Code by running$ code guessing_game
This will open Visual Studio Code which will ask you some questions:
- Do you trust the authors of the code? Yes.
- It detects that the containing directory is the root of a Git repository and it will ask if you want to open it. Say yes and then select the path to the assignment directory from the list displayed (there will only be one entry). Now, you can use Git directly from VS Code’s user interface.
-
When it tells you to run
cargo run
, you can do that from your open terminal window (after$ cd guessing_game
), or you can use VS Code’s built-in terminal to run it.Let’s configure VS Code to be able to build, run, and debug our code. Click on the
Run and Debug
button (shaped like a triangle with a bug on it) on the left side of the window. Click the “create a launch.json file” text. It will ask you for a debugger, selectLLDB
. This asks if you want to generate launch configurations for its targets. Say yes.You can close the
launch.json
file it created and opened.At this point, you can run and debug your application from VS Code.
Once you have finished writing the guessing_game
, you should make sure
you’ve added your files to Git and committed them. You can do so either from
the command line, or from within VS Code. Here are the
docs for how to
do it from within VS Code.
Part 2. Iteration (5 points)
Iteration with iterators is a key concept in Rust. Iteration is similar to Java but has some quirks which can be quite surprising.
For the rest of this lab, you’ll explore iterating over elements in a vector and iterating over characters in a string.
Your task
In a shell, cd
to your assignment repository and create a new project using
cargo
and then open it in VS Code.
$ cargo new iteration
$ code iteration
Change the main
function to
fn main() { let data = vec![2, 3, 5, 7, 11, 13]; for x in data { println!("{x}"); } }
Hover your mouse over the code above and some buttons will appear in the top right of the text box. Click the Run button to run this code.
The vec!
macro creates a new Vec
containing the elements in the
brackets. (Click the link in the previous sentence to go to the documentation
for Vec
.) Vec
is the standard vector class in Rust: it is a growable array
of elements, similar to a list in Python and an ArrayList
in Java. The
vec!
macro here gives a result similar to this (but more efficient and
easier to read).
let mut data = Vec::new(); // Create a new Vec
data.push(2); // Add 2 to the end of the Vec.
data.push(3); // Add 3 to the end of the Vec.
// ...
data.push(13); // Add 13 to the end of the Vec.
Predict what the output of the code will be and then run it. If you predicted incorrectly, try to figure out why the code does what it does before moving on.
Now, duplicate the for loop
fn main() { let data = vec![2, 3, 5, 7, 11, 13]; for x in data { println!("{x}"); } for x in data { println!("{x}"); } }
Again, predict what the output of the code will be and then run it. (You can click the Run button on the above text box to see the error message.)
I suspect the results are quite surprising. You probably got an error that looks like this.
error[E0382]: use of moved value: `data`
--> src/main.rs:8:14
|
2 | let data = vec![2, 3, 5, 7, 11, 13];
| ---- move occurs because `data` has type `Vec<i32>`, which does not implement the `Copy` trait
3 |
4 | for x in data {
| ---- `data` moved due to this implicit call to `.into_iter()`
...
8 | for x in data {
| ^^^^ value used here after move
|
note: `into_iter` takes ownership of the receiver `self`, which moves `data`
--> /Users/steve/.rustup/toolchains/stable-aarch64-apple-darwin/lib/rustlib/src/rust/library/core/src/iter/traits/collect.rs:262:18
|
262 | fn into_iter(self) -> Self::IntoIter;
| ^^^^
help: consider iterating over a slice of the `Vec<i32>`'s content to avoid moving into the `for` loop
|
4 | for x in &data {
| +
For more information about this error, try `rustc --explain E0382`.
error: could not compile `iteration` (bin "iteration") due to previous error
This is a lot of information and it’s quite difficult to understand at first. Reading error messages takes practice. If you look carefully, you’ll notice that the error message is divided into three parts, the error, a note, and a help.
The error tells us that we have tried to use a moved value, namely data
on
line 8, column 14 of the file src/main.rs
. Next, it shows that on line 2 the
variable data
has type std::vec::Vec<i32>
which does not implement Copy
.
Then, on line 4, data
was moved due an implicit call to .into_iter()
and
finally, on line 8, we tried to use data
again.
We’ll talk about what this all means but basically, the first for
loop used
up or consumed our Vec
and thus the Vec
no longer exists. This is not how we normally expect for
loops to work. Nevertheless, this is how Rust works.
The note portion of the error message tells us why it was used up and shows us code from the standard library to explain why. Let’s skip this part for now.
Finally, the help portion offers a way to solve this problem. If we use for x in &data { }
(note the addition of the ampersand &
), this will avoid
consuming data
. Note that it suggests making this change only on the first
for
loop (which for me happens to be on line 4). By using &data
rather
than data
, we’re instructing the loop to iterate over a reference to the
Vec
, rather than the Vec
itself. The upshot is that we can now iterate
multiple times.
Go ahead and make that change and rerun cargo run
. At this point, your code
should print out the list twice.
Part 3. Reversing a vector (15 points)
In the rest of the lab, you’re going to write some short functions dealing with vectors and strings and you’re going to write some unit tests for the functions.
Continue working in the main.rs
file of your iteration
project.
Your task
Implement the function reversed_vec
.
fn reversed_vec(input_data: &[i32]) -> Vec<i32> {
let mut result: Vec<i32> = Vec::new();
todo!("Finish this function")
}
Two things to notice
- The input argument
input_data
is of type&[i32]
. We’ll talk more about this type later, but for now, think of this as a reference to aVec
. We saw this above when we wrotefor x in &data {}
wheredata
was aVec
so&data
essentially gives us a&[i32]
. - The
result
variable is declared mutable (with themut
keyword) so we can modify it by inserting elements using thepush()
function.
Before you remove the todo!()
and implement this function, let’s write a
unit test similar to the unit tests you wrote in CSCI 151.
At the bottom of main.rs
, add the following code.
#[cfg(test)]
mod test {
use super::*;
#[test]
fn reversed_vec_empty() {
let data = vec![];
assert_eq!(reversed_vec(&data), vec![]);
}
}
There’s a lot to unpack here. The first line, #[cfg(test)]
tells the
compiler that the item that follows it should only be compiled when we’re
compiling tests. And that item is a module named test
. (The name is
arbitrary but is traditionally called test
.) We’ll talk more about modules
later.
The use super::*;
line makes all of the functions in main.rs
outside the
test
module available for use inside the test
module.
Finally, each unit test we write is just a Rust function that is annotated
with #[test]
. If you omit this annotation, the function will not be treated
as a test.
Run the command $ cargo test
. You should see some warnings followed by a
test failure (remember, we didn’t implement reversed_vec()
so it makes sense
our test should fail).
running 1 test
test test::reversed_vec_empty ... FAILED
failures:
---- test::reversed_vec_empty stdout ----
thread 'test::reversed_vec_empty' panicked at 'not yet implemented: Finish this function', src/main.rs:19:5
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
failures:
test::reversed_vec_empty
test result: FAILED. 0 passed; 1 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s
Notice that the output tells us our code panicked (i.e., failed) on line 19 of
main.rs
.
Before implementing reversed_vec()
, write two more tests
#[test]
fn reversed_vec_one() {
todo!();
}
#[test]
fn reversed_vec_three() {
todo!();
}
that test the result of reversing a vector with 1 element and a vector with 3 elements.
$ cargo test
should now show three failing tests.
Implement the rest of reversed_vec()
.
- You can create an iterator over
input_data
usinginput_data.iter()
. Some iterators let you create reverse iterators using therev()
function. In other words, given one iteratorit
, you can create a new iterator viait.rev()
that will iterate in reverse order. Sofor x in input_data.iter().rev() { }
will iterate overinput_data
in reverse order. - The iterator returned by
input_data.iter()
will not make copies of the elements ininput_data
. Instead, the value returned by the iterator will be a reference to the elements. That is, in
the type offor x in input_data.iter() { }
x
is&i32
and noti32
. Since ourresult
vector holdsi32
and not&i32
, we need to dereference the referencex
to get the underlying integer. We use the*
operator to do this. For example,
shows the incorrect and correct ways to insert the element intofor x in input_data.iter() { // result.push(x); // Fails because x has type &i32 result.push(*x); // Succeeds because *x has type i32 }
result
.
You’re probably getting a warning at this point about reversed_vec
not being
used. Add the line
#![allow(dead_code)]
to the very top of main.rs
and it’ll stop warning about the functions that
we’re only using in tests.
Part 4. Checking for in-order data (15 points)
In this part, you’ll implement a new function and write a few more unit tests to make sure your code works.
Your task
Write a function
#![allow(unused)] fn main() { fn is_in_order(data: &[i32]) -> bool { todo!("Implement me") } }
that returns true
if the elements of data
are in ascending order and false
otherwise. To do this, you’ll want to iterate over data
using for x in data {}
and check that each element is at least as large as the previous element.
To keep track of the pervious element, you’ll need a mutable variable: let mut prev: i32;
. You can either use i32::MIN
as its initial value or the
value of the first element in data
. Either approach works.
Before implementing the function write the following unit tests that should pass once you implement the function.
- Empty vectors are always in order.
- Vectors containing a single element are always in order.
- Vectors with multiple elements, in order.
- Vectors with multiple elements, not in order.
Inside your test
module, you’ll want to add a new function per unit test. For example, the test for multiple elements not in order might look something like this.
#![allow(unused)] fn main() { fn is_in_order(data: &[i32]) -> bool { todo!() } #[test] fn is_in_order_multiple_out_of_order() { let test_vector = vec![10, -3, 8, 2, 25]; assert!(!is_in_order(&test_vector)); } }
Notice that this is asserting the negation of is_in_order()
because of the !
.
Implement is_in_order
and make sure it passes your tests.
My output for $ cargo test
looks like this.
running 7 tests
test test::is_in_order_empty ... ok
test test::is_in_order_multiple_out_of_order ... ok
test test::reversed_vec_empty ... ok
test test::is_in_order_multiple_in_order ... ok
test test::is_in_order_one ... ok
test test::reversed_vec_one ... ok
test test::reversed_vec_three ... ok
test result: ok. 7 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s
Part 5. Iterating manually (10 points)
Similar to the previous part, you will implement a function and some unit tests.
Your task
Write a function that iterates over the elements in its argument and adds them all and returns the sum.
#![allow(unused)] fn main() { fn manual_sum(data: &[i32]) -> i32 { let mut data_iter = data.iter(); todo!("Implement me") } }
The catch is that for this task, you may not use a for
loop. Instead, you’ll
need to work with the iterator returned by data.iter()
directly. Iterators
in Rust work similarly to iterators in Java. To implement this as a Java
method using iterators rather than a for
loop we might use something like
this.
int manualSum(ArrayList<Integer> data) {
int sum = 0;
Iterator<Integer> iter = data.iterator();
while (iter.hasNext()) {
sum += iter.next();
}
return sum;
}
Iterators in Rust don’t have separate has_next()
and next()
methods.
Instead, there’s just a single next()
method that returns an
Option<T>
. Every Option
is either None
or Some(blah)
for some
value blah
. Here’s a simple example of using an Option
.
#![allow(unused)] fn main() { let x: Option<i32> = Some(10); assert!(x.is_some()); assert!(!x.is_none()); assert_eq!(x.unwrap(), 10); let y: Option<i32> = None; assert!(y.is_none()); assert!(!y.is_some()); }
We can use is_some()
or is_none()
to determine if an Option
is a
Some
or a None
. If we have a Some
, then we can unwrap it by
calling unwrap()
which returns the data. If we try calling unwrap()
on
None
, we’ll get a runtime error.
Write some unit tests. Make sure you cover all of the following cases.
- Empty
data
(where the sum should be 0); - Single-element vectors; and
- Vectors with multiple elements.
Implement manual_sum
by calling next()
on data_iter
repeatedly until the
result is None
. Unwrap all of the other values and sum them up.
You’ll probably want a loop like this.
loop {
// XXX: Get the next element from the iterator
if /* the element is None */ {
return sum;
}
// XXX: Unwrap and add the value to the sum.
}
Part 6. Iterating over string (10 points)
Strings in Rust are similar to vectors of characters except that, unfortunately, they’re slightly more difficult to use because text is inherently complicated than vectors.
The most obvious difficulty is you cannot access individual characters in the same way you’d access individual elements of a vector.
#![allow(unused)] fn main() { let x = vec![1, 2, 3]; assert_eq!(x[0], 1); let y = "ABC"; assert_eq!(y[0], 'A'); // FAILS! }
That last line fails (click the Run button to see the error) because you cannot index into a string like this. The problem is that Rust stores all strings in the UTF-8 encoding. This is a variable-length encoding where different characters may take a different number of characters to encode. So if you want to get the 10 th character from a string, for example, you cannot easily figure out where in the encoded data for the string the 10 th character starts without starting at the beginning of the string.
This means that to work with characters, we’re going to need to use an
iterator over the characters. Fortunately, the chars()
function returns an
iterator that returns each character.
#![allow(unused)] fn main() { let example = "Here is my string"; for ch in example.chars() { println!("{ch}"); } }
Just as we’ve made a distinction between Vec<i32>
and &[i32]
where the
latter is a reference to the former, in Rust we have a String
data
type and a reference to a string &str
. When we have a function that takes a
reference to a string, we use an argument of type &str
. Note that strings we
create by enclosing text in quotation marks have type &str
and not
String
. If we have a String
and we want to pass it to a function that
takes a &str
, we get a reference to the string using &
just as we did with
vectors and reference to vectors.
Your task
Write a function
#![allow(unused)] fn main() { fn clown_case(s: &str) -> String { todo!("Implement me") } }
that takes a &str
as input and returns a new String
that alternates
capitalization and includes a clown emoji, 🤡, at the beginning and end of the
string as long as s
is not the empty string. If s
is the empty string,
return just a single clown emoji.
Here are some examples. I recommend you turn them into unit tests before you
implement clown_case
.
""
->"🤡"
"I'm just asking questions"
->"🤡i'M jUsT aSkInG qUeStIoNs🤡"
"Μην είσαι κλόουν στα ελληνικά!"
->"🤡μΗν ΕίΣαΙ κΛόΟυΝ σΤα ΕλΛηΝιΚά!🤡"
- If
ch
is achar
, you can usech.is_alphabetic()
to decide if you want to lowercase or uppercase the letter or just include it in the result unchanged. I.e., only uppercase/lowercase the characters for whichch.is_alphabetic()
returns true. - If
ch
is achar
, thench.to_lowercase()
andch.to_uppercase()
return iterators tochar
s rather than achar
itself. Why? Well, in some languages converting the case changes the number of characters. E.g, a capitalß
isSS
. You can add the elements of an iterator to a string using theextend()
function.#![allow(unused)] fn main() { let mut result = String::new(); result.extend('ß'.to_uppercase()); assert_eq!(result, "SS"); }
- You can append a
char
to aString
by usingpush()
. You can append a&str
to aString
by usingpush_str()
.
At this point you’re done and can submit your code.
However, this is one more optional step you can take if you’d like: Comment
out the existing code in your main()
function. (In VS Code, you can comment
or uncomment multiple lines at once by selecting all of the lines and hitting
Ctrl-/.) Write some code in main()
to read a line of
input from stdin
, pass it to clown_case()
, and then print the results. Now
you can run the program and have it turn any input you enter into clown case.
(I have been reliably informed that this is a lot of fun to use.)
Submitting (5 points)
To submit your lab, you must commit and push to GitHub before the deadline.
Your repository should contain the following files and directory.
fa23-lab3-xxx-yyy
├── README
├── guessing_game
│ ├── Cargo.lock
│ ├── Cargo.toml
│ └── src
│ └── main.rs
└── iteration
├── Cargo.lock
├── Cargo.toml
└── src
└── main.rs
Any additional files you have added to your repository should be removed from
the main
branch. (You’re free to make other branches, if you desire, but
make sure main
contains the version of the code you want graded.)
The README
should contain the names of both partners (or just your name if
you worked alone…but please don’t work alone if you can manage it).
Lab 4: Wordle ahead
Due: 2024-10-07 at 23:59
In this lab, you’ll learn how to
- work with individual characters in a string;
- write colored text to the terminal;
- flush output to standard out;
- call functions defined in other modules;
- implement a command-line utility;
- parse command-line arguments;
- read from standard in or from a file; and
- handle errors.
Preliminaries
First, find a partner. You’re allowed to work by yourself, but I highly recommend working with a partner. Click on the assignment link. One partner should create a new team. The team name cannot be the same name used for a previous lab. The second partner should click the link and choose the appropriate team. (Please don’t choose the wrong team, there’s a maximum of two people and if you join the wrong one, you’ll prevent the correct person from joining.) You cannot choose a team name you’ve used previously.
Once you have accepted the assignment and created/joined a team, you can clone the repository and begin working.
Be sure to ask any questions on Ed.
Compiler warnings and errors
Make sure your code compiles and passes clippy
without errors or warnings.
That is, running
$ cargo clean
$ cargo build
$ cargo clippy
should build your program without errors or warnings.
Formatting
Your code must be formatted by running cargo fmt
.
Part 1. Wordle (35 points)
In this part, you’re going to implement a Wordle game. Wordle is a word guessing game. A word is chosen at random from a small list of 5-letter words at the start of the game. The player has 6 chances to guess the word.
Guesses that are not valid words are rejected and do not count toward the guess total.
After each (valid) guess, the word appears with each letter colored as follows:
- If a letter is in the correct position, it is colored green.
- If a letter is in the word but in the wrong position, it is colored yellow (but see the example below).
- If a letter does not appear in the word, it is colored gray.
The rules given above aren’t quite precise. See this example from the New York Times’s implementation that shows four guesses.
The first guess was LEAST. From the colors, we know that the word contains an L, an E, and a T, but not in those locations. We also know that the word doesn’t contain an A.
The second guess was ELATE. This was a bad guess because it contained an A
and we knew from the first guess that there were no As. We did learn that the
E, L, and T also cannot be in those locations. Notice that the second E is
gray and not yellow. This is because the word we’re guessing contains a
single E and it has already been colored yellow in the word.
The third guess was TITLE. This was also a bad guess because we know that E cannot appear at the end of the word because otherwise the second guess would have a green E at the end and a gray E at the beginning. We can see that the T is in the right place but the E and L are not.
The fourth and final guess was HOTEL. This is the correct word and so all of the letters are colored green.
Your task
From inside the assignment repository, run cargo new wordle
to create a new
application just as you did for the guessing game in Lab 3.
Using git
, move the words.rs
file from the root of the assignment repo
into the wordle/src/
directory. You can do this via
$ git mv words.rs wordle/src
You want to use git mv
rather than mv
because git mv
is telling Git that
the file has been moved to a new location. If you just use mv
, then Git will
see that a file has been deleted and that there’s a new, untracked, file. git mv
is just easier.
Add the line
mod words;
to the top of main.rs
. Doing so gives you access to the two public functions
in words.rs
: words::random_word()
and words::is_word_valid()
. The first
uses the rand
crate to pick a random word and the second returns true
if
the argument is a valid word.
In order to use the rand
crate, you’ll need to add rand
as a
dependency. In Lab 3, we did this by editing the Cargo.toml
file by hand.
There’s actually an easier way of doing it using cargo
.
$ cargo add rand
This will add the most recent version of rand
as a dependency.
Notice that the argument to the is_word_valid()
function is a &str
and not
a String
.
/// Returns `true` if the passed word is a valid 5-letter guess.
pub fn is_word_valid(word: &str) -> bool {
let word = word.to_uppercase();
WORD_LIST.contains(&word.as_str()) || ALLOWED_LIST.contains(&word.as_str())
}
As we saw in Lab 3, the &str
type is a way to let us pass a reference to a string without
having to make a copy of the string to pass to the function.
The is_word_valid()
function doesn’t deal with the newline character that you may have read from stdin
. You can use the trim()
method to return a reference to the portion of the string not containing leading and trailing whitespace characters.
is_word_valid(guess.trim())
At a high level, your task is as follows.
- Select a secret word using
words::random_word()
. - If the user has made fewer than 6 valid guesses, prompt the user to enter a guess, otherwise print a message that the player failed to guess the secret word and what the secret word is.
- If the guessed word is not a valid guess, go back to step 2.
- Color the guessed word.
- Print out all of the (colored) guesses so far.
- If the guessed word is correct, stop, otherwise go back to step 2.
Here’s an example run of the game.
Enter your guess: stern
STERN
Enter your guess: logic
STERN
LOGIC
Enter your guess: reset
STERN
LOGIC
RESET
Enter your guess: terse
STERN
LOGIC
RESET
TERSE
Step 2, prompting for input is a little tricky. We want to print out the
prompt without a newline. Rust will, by default, buffer
characters written to stdout
until it sees a new line character so we’ll
need to print the prompt (without a newline) and then ask Rust to flush
its buffer which will cause it to be printed. Give this a shot:
#![allow(unused)] fn main() { // Put this line near the top of `main.rs`, under your `mod words;` line use std::io::Write; print!("Prompt text: "); // Notice print!(), not println!(). std::io::stdout().flush().unwrap(); }
std::io::Write
is a trait.
Traits are similar to a Java Interface in that they
specify functions that types which implement the trait must implement. Unlike
in Java, to use the functions defined in a trait, you have to first inform the
compiler about the trait using a use
statement.
In this case, the Write
trait
defines a flush()
function which returns a Result
. Note that we call unwrap()
on the result so that if it is an Err(err)
, the program will exit.
Click on the link at the beginning of this note to see what other functions the Write
trait defines.
The remaining tricky parts are in step 4, coloring the guessed word. There are two parts to this, (a) figuring out which colors to apply; and then (b) actually creating colored text.
Let’s start with (b), creating colored text. We’re going to use the
colored crate to create
colored text. Take a look at the documentation. You’ll need to add colored
as a dependency just as you did for rand
.
As the documentation says, you need to add the line
use colored::Colorize;
with the other use
statements in main.rs
. The Colorize
trait gives you access to some
functions you can call on strings to produce colored strings.
You will need to color individual characters. Unfortunately, the char
type does
not implement Colorize
and so it doesn’t have access to the Colorize
functions
like black()
and on_green()
. So let’s write a function that
takes a char
and returns a String
containing the character colored with
black text on a green background.
/// Colors `ch` black on green.
fn green(ch: char) -> String {
ch.to_string()
.black()
.on_green()
.to_string()
}
The first line, ch.to_string()
converts the char
into a String
so that
the colored
functions can operate on it. The .black()
function returns a
new type representing a string of colored text. The .on_green()
function
returns another of the colored strings but with a green background. And
finally, the second .to_string()
converts the colored string into a normal
String
, but one containing special control characters which instruct the
terminal to color the text appropriately.
String
s don’t have any notion of colors. They’re little more than a mutable
collection of char
s (except they’re encoded using UTF-8). So how does colored
text work? Well, your terminal emulator (the program that pretends to be a 1980s era
hardware video terminal) reads all of the
text written to standard out (and standard error) and looks for control characters.
Some of those control characters let you change the color of the text or background
in a limited fashion.
For example, run this in your terminal (but not in the VS Code terminal which seems to mess the colors up when both foreground and background colors are present).
$ echo -e '\033[36;45;1mHello!\033[0m Goodbye!'
The -e
argument tells echo
to interpret escape characters. In this case, \033
is
the escape sequence for the ASCII esc
character which has decimal value 27 (hex 0x1B
and octal 033). The [36;45;1m
says use a cyan foreground color (the 36), a magenta
background color (the 45) and make it bold (the 1). The ;
separate arguments and the
m
ends them. Take a look at Wikipedia for
the supported colors.
We can do the same thing in Rust without using the colored
crate.
#![allow(unused)] fn main() { println!("\x1B[36;45;1mHello!\x1B[0m Goodbye!"); }
Notice that I specified the esc
character in hexadecimal \x1B
rather than octal.
(If you click the Run button, you won’t see colored text because HTML uses a completely
different way to select colors. View the source code for this webpage to
see how I did it for the Wordle example above, if you’re curious.)
I recommend writing similar functions yellow()
and gray()
which produce
black on yellow and black on white. After you do that, you can combine the
strings to produce multi-colored string. Here’s an example which prints ABC
but in colors.
let mut s = String::new();
s.push_str(&green('A'));
s.push_str(&yellow('B'));
s.push_str(&gray('C'));
println!("{s}");
There is no on_gray()
function. You’ll want to use on_white()
in your
implementation of gray()
instead.
Notice the &
being used? That’s because the String
function push_str()
takes a &str
as an argument, that is a reference to a string, but green()
and the others return a String
. So just like with words::is_word_valid()
,
we need to use &
to get a reference.
Okay, so you can get a secret word, you can prompt for input, and you can color text. So all that remains is to decide which letters for a guess should get which colors.
Here was my approach. I wrote a function
#![allow(unused)] fn main() { fn color_guess(guess: &str, secret: &str) -> String { todo!() } }
that takes a reference to the guess and a reference to the secret and it
returns a String
that is appropriately colored.
Inside color_guess
, I created a mutable Vec<char>
(which will hold letters from
the secret
that don’t match the guess
) and added each letter of secret
to it.
Remember, to access the individual letters of a string, we use the .chars()
function
which returns an iterator. E.g.,
#![allow(unused)] fn main() { let secret = "LEAST"; // Example secret. for ch in secret.chars() { println!("{ch}"); } }
will print each character in secret
on its own line.
We can also use the .next()
function on the iterator directly rather than
using it in a loop.
#![allow(unused)] fn main() { let s = "ABC"; let mut iterator = s.chars(); iterator.next(); // returns Some('A') iterator.next(); // returns Some('B') iterator.next(); // returns Some('C') iterator.next(); // returns None }
You can use the .unwrap()
function to turn a Some(x)
into x
.
My color_guess()
creates a mutable vector and adds all the characters of
secret
to it. Next, it iterates over both guess
and secret
and for each
letter in guess
that is in the correct place in secret
, remove one
instance of that character from the vector of unmatched characters. You can
use the remove()
method to remove the element at a specified index
from a vector.
#![allow(unused)] fn main() { let guess = "BLAST"; // Example guess. let secret = "STERN"; // Example secret. let mut secret_iter = secret.chars(); for ch in guess.chars() { if ch == secret_iter.next().unwrap() { // Remove `ch` from the vector of unmatched characters } } }
Finally, we want to iterate over both guess
and secret
a second time and
construct our colored guess string. If the character of guess
matches the
character in the corresponding position of secret
, then color the character
green and append it to the result string.
If the character of guess
does not match the corresponding character of
secret
, then we need to color it either yellow or gray. Iterate through the
unmatched character vector and if the character from guess
matches any of
those, color the character yellow in the result string and remove the
character from the vector of unmatched characters. Otherwise, color it gray.
That was a lot of work! Maybe play a few games.
Part 2. Head (10 points)
In the next two parts of the lab, you’re going to implement the head
command
line utility in Rust. This will be divided into two tasks: command line
argument parsing (this part) and the implementation of head
itself (Part
3).
Before you begin, read the man page for head
. While you’re
reading it, pay attention to the command line arguments and where head
gets
its input (from files vs. from stdin
) in which cases.
There are many versions of head
. You’ll be implementing the FreeBSD version
which is the version described in the man page above. The GNU project has
another version which is used in most Linux distributions. In general, the GNU
versions of command-line utilities tend to have more options. If you run
$ man head
on the lab machines, you’ll get a manual for the GNU version of
head
.
We’re going to use the Command Line Argument Parsing library, `clap`` to handle command-line argument parsing for us. As we’ll see, Rust makes supporting command-line arguments, including a nicely formatted help message, very easy.
Let’s look at an example.
extern crate clap; use std::path::PathBuf; use clap::Parser; #[derive(Parser, Debug)] #[command(author, version, about = "Example command line parsing", long_about = None)] struct Config { /// Use colored output #[arg(short, long)] color: bool, /// Use at most COUNT values #[arg(short = 'C', long)] count: Option<i32>, /// Specify optional log file #[arg(short = 'f', long, conflicts_with = "count")] log_file: Option<PathBuf>, /// Input files #[arg(default_value = "-")] files: Vec<PathBuf>, } fn main() { let config = Config::parse(); println!("{config:#?}"); }
There’s a lot going on here! At the top of the code, there are two use
statements which inform the compiler about PathBuf
from the standard
library and Parser
from clap
.
Next we have a struct
named Config
.
This structure defines the supported command line arguments of this
example program. The main
function calls the function Config::parse()
which parses all of the command line arguments and crucially,
returns an instance of Config
with the structure’s members. We can access
the members as config.color
, config.count
, config.log_file
, and config.files
.
The println!("{config:#?}")
line will pretty-print the debug representation of config
.
The format specifier {config:?}
(with the colon followed by a question mark) means to
print the debug representation and the #
in {config:#?}
means to pretty-print it
with new lines.
All good command line utilities have a --help
option or similar. Before we look
at this code in detail, here’s the output of running this program with the --help
option which just prints out the help message and then exits the program.
$ cargo run -- --help
Example command line parsing
Usage: ex [OPTIONS] [FILES]...
Arguments:
[FILES]... Input files [default: -]
Options:
-c, --color Use colored output
-C, --count <COUNT> Use at most COUNT values
-f, --log-file <LOG_FILE> Specify optional log file
-h, --help Print help
-V, --version Print version
This code defines three options, each of which has a short and a long form. Two of
the options take arguments and two of the options conflict with one another meaning
at most one of those options can be specified at a time. It also
takes a list of file paths. Each file path is represented by a PathBuf
in Rust.
Because the final line of code prints out the Config
instance that was returned from
Config::parse()
, we can run the program a few times with different options to see
the results. Note that in what follows I’m passing the --quiet
option to cargo run
because I don’t want cargo
to print out any output like “Created binary …” As usual,
the argument before the --
go to cargo run
, the arguments after the --
go to
the program itself.
$ cargo run --quiet --
Config {
color: false,
count: None,
log_file: None,
files: [
"-",
],
}
$ cargo run --quiet -- --color -f log.txt input1.txt input2.txt
Config {
color: true,
count: None,
log_file: Some(
"log.txt",
),
files: [
"input1.txt",
"input2.txt",
],
}
$ cargo run --quiet -- --version
ex 0.1.0
$ cargo run --quiet -- -f log.txt -C 10
error: the argument '--log-file <LOG_FILE>' cannot be used with '--count <COUNT>'
Usage: ex --log-file <LOG_FILE> [FILES]...
For more information, try '--help'.
The definition of Config
starts by deriving the clap::Parser
and Debug
traits.
The Parser
trait is what provides the parse()
function which reads the command-line
arguments and returns an instance of Config
.
The next line
#[command(author, version, about = "Example command line parsing", long_about = None)]
is new. It is attaching metadata to the Config
struct that the clap
library will
use to do all of the argument parsing and producing usage and error messages. The about = "..."
lets us specify a short about message to appear in the --help
message.
Each of the arguments consists of three parts.
/// Use colored output
#[arg(short, long)]
color: bool,
The comment with three slashes is a doc comment. Normally, such a comment is used to write documentation. (Running cargo doc
produces HTML documentation of your code. The doc comments are where the documentation comes from.)
Clap is repurposing that doc comment to construct a --help
message.
The next line, #[arg(short, long)]
, is attaching some more
metadata for Clap. It says this should have a short option -c
and a long option --color
. It took both
option names
from the name of the variable color
itself. Note that we can be explict about what the option names are.
There are a variety of other things we can do including making options conflict. If we try to pass both -f
and -C
,
we’ll get an error from Clap.
The third line is the variable and its type, of course. Because the type of color
is
bool
, this acts as an on/off flag. For other types, using an Option<...>
indicates
that the argument is optional. If you want to allow 0 or more, you use a
Vec
rather than an Option
as I did for files
.
That’s a lot of information to take in! Fortunately, I’ve got a simple recipe to follow for parsing command line arguments:
Start by defining a new struct
that derives Debug
and Parser
.
#[derive(Parser, Debug)]
#[command(author, version, about = "Short description", long_about = None)]
struct Config {
}
Now, for each option, modify one of the examples below and remember, you can
change the short or long options by using short = 'c'
or long = "blah"
, for example.
- If you want a boolean flag, use a
bool
/// Boolean flag, defaults to false #[arg(short, long)] flag: bool,
- If you want an option that takes an argument, use an
Option
/// Option that takes a String argument #[arg(short, long)] name: Option<String>,
- If you want a path to a file, use an
Option<PathBuf>
/// Option that takes a file path #[arg(short, long)] path: Option<PathBuf>,
- If you want a list of input files, don’t use
short
orlong
but do useVec<PathBuf>
; if you want to specify a default value, usedefault_value
/// Input files but if none are specified, use - #[arg(default_value = "-")] files: Vec<PathBuf>,
- If you want to specify that two options conflict, mark one of them (it doesn’t matter
which) as
conflicts_with
the other/// An option that takes a path #[arg(short, long)] log: Option<PathBuf>, /// A flag that conflicts with option `--log` #[arg(short, long, conflicts_with = "log")] output: bool,
Finally, add a main
function that prints out the debug representation. (Click the Run
button to run this or the Show button to show the whole example.)
// You don't need the `extern crate clap` line. That's just to make this runnable on the web. extern crate clap; use std::path::PathBuf; use clap::Parser; #[derive(Parser, Debug)] #[command(author, version, about = "Short description", long_about = None)] struct Config { /// Boolean flag, defaults to false #[arg(short, long)] flag: bool, /// Option that takes a String argument #[arg(short, long)] name: Option<String>, /// Option that takes a file path #[arg(short, long)] path: Option<PathBuf>, /// Input files but if none are specified, use - #[arg(default_value = "-")] files: Vec<PathBuf>, /// An option that takes a path #[arg(short, long)] log: Option<PathBuf>, /// A flag that conflicts with option `--log` #[arg(short, long, conflicts_with = "log")] output: bool, } fn main() { let config = Config::parse(); println!("{config:#?}"); }
The Clap tutorial has a lot more information.
Your task
Before you start, you need to create a new project inside your assignment
repository and add the clap
library.
$ cargo new head
$ cd head
$ cargo add clap --features derive
Your task is to implement parsing the head
command line arguments and
printing out a --help
message. Your head
will support the following
options.
-c
/--bytes
;-n
/--lines
;-h
/--help
; and-V
/--version
.
The --help
and --version
options are handled by clap
automatically. Use
clap
’s conflicts_with
mechanism for marking --bytes
and --lines
as
conflicting. See the tip above.
Additionally, head
takes 0 or more files as input. Configure clap
such
that if no files are passed, then it defaults to -
. See the tip above for
how to set default values.
Many command line utilities use -
in place of a file path to denote reading
from stdin
instead of the file or writing to stdout
instead of writing to
the file. Our head
will support that convention.
Once you have this working, you should be able to run head --help
and see
this.
$ cargo run --quiet -- --help
Display the first few lines of a file
Usage: head [OPTIONS] [FILES]...
Arguments:
[FILES]... Input files or - for stdin [default: -]
Options:
-n, --lines <LINES> Print LINES lines of each of the specified files
-c, --bytes <BYTES> Print BYTES bytes of each of the specified files
-h, --help Print help
-V, --version Print version
The version number printed by --version
can be set by editing the version
field in Cargo.toml
. Go ahead and set the version number to 1.2.3 so that you get
this output.
$ cargo run --quiet -- --version
head 1.2.3
Part 3. More head (30 points)
In this part, you’re going to finish your implementation of head
.
You should re-read the man page but this time focus on behavior. How
does head
behave when no files are passed to it? How does head
behave when
one file is passed to it? How does head
behave when multiple files are
passed to it? What exit code (called the exit status in the man page) does it
return on success? What about on error?
One of the key things to notice about head
, is that sometimes it reads from
stdin
and sometimes it reads from a file. But in either case, the operation
it should perform is the same: print out some number of lines or bytes.
What we would like is to be able to write a function like
/// Print `count` lines of `input` to `stdout`.
fn print_lines(input: ???, count: usize) {
todo!();
}
and a similar print_bytes()
function. But what type should we give input
?
If we think about what operations we want to perform on input
, that can help
guide us. For print_lines()
, we want to be able to read one line at a time
from input
. For print_bytes()
, we just want to read some number of bytes.
It turns out that what we want is a std::io::BufReader
. Or more
precisely, we want anything that implements the std::io::BufRead
trait, which BufReader
does.
Click on the documentation for BufRead
and take a look at what
methods are available. In particular, the read_line()
method will read
characters into a String
until it hits a new line character. The
read_until()
method is similar except it reads bytes into a Vec<u8>
until
it encounters the byte you specify. For a print_lines()
function, you’ll
likely want to use one of read_line()
or read_until()
(or experiment with
the lines()
method!).
For a print_bytes()
function, none of these methods seem appropriate.
Fortunately, any type that implements the BufRead
trait also implements the
std::io::Read
trait which has a read()
method.
Here’s an example of opening a file, wrapping it in a BufReader
, and then
reading from it using read_line()
, read_until()
, and read()
.
use std::fs::File; use std::io::{self, BufReader, BufRead, Read, Write}; fn read_some() -> io::Result<()> { let file = File::open("/tmp/example.txt")?; let mut reader = BufReader::new(file); // Read a line into a string. let mut line = String::new(); let length = reader.read_line(&mut line)?; if length == 0 { // End of file. return Ok(()); } print!("First line: {line}"); // Read a line as bytes into a `Vec<u8>`. let mut line = Vec::new(); let length = reader.read_until(b'\n', &mut line)?; if length == 0 { return Ok(()); } print!("Second line: "); io::stdout().write_all(&line)?; // Read at most 100 bytes into a `Vec<u8>`. let mut bytes = Vec::new(); bytes.resize(100, 0); // Fill it with 100 zero bytes. let length = reader.read(&mut bytes)?; if length == 0 { return Ok(()); } print!("Next {length} bytes: "); io::stdout().write_all(&bytes[..length])?; Ok(()) } fn main() -> io::Result<()> { const EXAMPLE: &str = "Rust is fun!\nIts mascot is a 🦀.\nIt's also kind of a lot…\n"; let mut file = File::create("/tmp/example.txt")?; file.write_all(EXAMPLE.as_bytes())?; read_some()?; std::fs::remove_file("/tmp/example.txt")?; Ok(()) }
Click the Run button to see the result.
Note that we needed to have a use
statement for
std::io::Write
in order to call the write_all()
method which is defined by the Write
trait
If you use
#![allow(unused)] fn main() { use std::io; }
or by including self
in the list of items to use from std::io
like
#![allow(unused)] fn main() { use std::io::{self, BufReader}; }
you can refer to everything in std::io
just by io
. The previous example did this for io::stdout()
and io::Result<()>
.
We can wrap a File
in a BufReader
and we can also wrap the result of
std::io::stdin()
(which is a struct
called, appropriately
enough std::io::Stdin
) in a BufReader
. But there’s one hitch: the
two types of BufReader
are different.
Just as Vec
is paramaterized by the type of element it holds—e.g.,
Vec<i32>
and Vec<String>
—a BufReader
is paramaterized by the
underlying type it’s reading from. The code snippet below is explict about the
types of the two readers. They’re simply not the same.
use std::io::{self, Stdin, BufReader}; use std::fs::File; fn main() -> io::Result<()> { let file_reader: BufReader<File> = BufReader::new(File::open("example.txt")?); let stdin_reader: BufReader<Stdin> = BufReader::new(io::stdin()); Ok(()) }
Fortunately, Rust gives us a way to deal with this! What we’re going to do is
we’re going to wrap these readers in a special type of Box
. Recall that
values in Box
es are store on the heap rather than on the stack. If we wrap
these types directly in a Box
, i.e., Box<BufReader<File>>
and
Box<BufReader<Stdin>>
, we’d still have different types. Instead, we want to
use a Box<dyn BufRead>
. By using the dyn
keyword followed by a trait,
we’re saying that the value stored in the Box
is some type that implements
the trait—in this case, BufRead
.
Let’s create a function that takes a &Path
and opens the file for reading.
As mentioned previously, many command line utilities treat an input file path
of -
to mean stdin
and we’re going to do the same.
use std::fs::File; use std::io::{self, BufRead, BufReader}; use std::path::{Path, PathBuf}; fn open_input(path: &Path) -> io::Result<Box<dyn BufRead>> { if path.as_os_str() == "-" { Ok(Box::new(BufReader::new(io::stdin()))) } else { Ok(Box::new(BufReader::new(File::open(path)?))) } } fn main () { let path = PathBuf::from("example.txt"); let mut reader: Box<dyn BufRead> = match open_input(&path) { Ok(reader) => reader, Err(err) => { eprintln!("{}: {err}", path.display()); std::process::exit(1); } }; }
Let’s revisit the print_lines()
function.
#![allow(unused)] fn main() { use std::io::{self, BufRead}; /// Print `count` lines of `input` to `stdout`. fn print_lines(input: &mut dyn BufRead, count: usize) -> io::Result<()> { let mut line = String::new(); input.read_line(&mut line)?; print!("{line}"); todo!() } }
We finally have a type for input
! It’s a mutable reference to some type of
object that implements BufRead
. And notice print_lines()
now returns an
std::io::Result<()>
so we can use ?
to propogate errors.
We can now combine this print_lines()
with the example code for opening a file.
use std::fs::File; use std::io::{self, BufRead, BufReader}; use std::path::{Path, PathBuf}; fn open_input(path: &Path) -> io::Result<Box<dyn BufRead>> { if path.as_os_str() == "-" { Ok(Box::new(BufReader::new(io::stdin()))) } else { Ok(Box::new(BufReader::new(File::open(path)?))) } } /// Print `count` lines of `input` to `stdout`. fn print_lines(input: &mut dyn BufRead, count: usize) -> io::Result<()> { todo!() } fn main () { let path = PathBuf::from("example.txt"); let mut reader: Box<dyn BufRead> = match open_input(&path) { Ok(reader) => reader, Err(err) => { eprintln!("{}: {err}", path.display()); std::process::exit(1); } }; if let Err(err) = print_lines(&mut reader, 10) { eprintln!("{err}"); } }
Your task
Your task is to finish implementing head
. Your
implementation must have the following features.
- It must support all of the options described in Part 2.
- If neither
--bytes
nor--lines
(nor their short arguments) are passed, then default to printing the first 10 lines of reach file orstdin
. - If no files are passed as arguments, read from
stdin
, otherwise read from each of the files (see the hints below). - If
-
is given as a file, read fromstdin
in its place. So if
is run, it should print the first three lines from$ cargo run -- -n 3 foo.txt - bar.txt
foo.txt
, then the first three lines fromstdin
, followed by the first three lines ofbar.txt
. - If multiple files are passed, each should start with a header line. A blank line should appear after the last line of output for each file before the header line (except for the last file which doesn’t have a trailing blank line). See the example below.
- If a file cannot be opened, print a message to
stderr
and continue. The example code above show writing a message in the proper format tostderr
usingeprintln!()
ifopen_input()
fails. Your code should do the same thing except that rather than exiting immediately by callingstd::process::exit(1)
, your code should simply remember that there was an error and after all input files have been processed, if there was an error, exit with the value 1.
Here is some sample output showing multiple files, including stdin
.
$ echo "hi there!" | cargo run -- -n 3 foo.txt - bar.txt
==> foo.txt <==
First line of foo.txt
Second line of foo.txt
Third line of foo.txt
==> standard input <==
hi there
==> bar.txt <==
First line of bar.txt
Second line of bar.txt
Third line of bar.txt
Here’s an example when there’s an error with an input file.
$ cargo run -- -n 1 foo.txt no-such-file.txt bar.txt
==> foo.txt <==
First line of foo.txt
no-such-file.txt: No such file or directory (os error 2)
==> bar.txt <==
First line of bar.txt
Play around with the real head
to see other examples.
If you’ve made it this far and you’ve implemented everything, you’re just
about done! Just gotta submit it. Make sure you test that your code works
correctly with 0, 1, and multiple files. Make sure it supports -
as reading
from stdin
. Make sure when you have multiple files, it prints the header and
the blank line. You can always test your behavior against the behavior of the
real head
.
If you’d like to make one small improvement to your implementation, read on. Otherwise, continue to the submission instructions.
The command line utilities were designed when C was the dominant programming language. C’s notion of strings is wildly different from Rust’s notion of strings. In particular, C strings are just sequences of non-zero bytes followed by a zero byte. There are no other restrictions on the contents of a C string. In contrast, Rust strings are required to be valid, UTF-8 encoded strings of Unicode characters. These two different views of strings has two consequences for us
- The real
head
utility works fine even if the files it’s operating on don’t contain UTF-8 encoded data. It simply separates the input into lines by detecting the the newline character'\n'
. - The
BufRead::read_line()
function will fail if the input is not valid UTF-8.
If you would like to match the real head
behavior, you may use the BufRead::read_until()
function.
An example of doing this was given in the read_a_bit()
function in an example above.
Submitting (5 points)
To submit your lab, you must commit and push to GitHub before the deadline.
Your repository should contain the following files and directory.
repo
├── head
│ ├── Cargo.lock
│ ├── Cargo.toml
│ └── src
│ └── main.rs
└── wordle
├── Cargo.lock
├── Cargo.toml
└── src
├── main.rs
└── words.rs
Any additional files you have added to your repository should be removed from
the main
branch. (You’re free to make other branches, if you desire, but
make sure main
contains the version of the code you want graded.)
The README
should contain the names of both partners (or just your name if
you worked alone…but please don’t work alone if you can manage it).
After you have everything working, make sure you add your code, commit it, and push it to GitHub.
Lab 5: Gutensearch
Due: 2024-10-14 at 23:59
In this lab, you’re going to build a simple command-line search interface to Project Gutenberg. You’re going to read an index from a file and create objects representing entry.
In this lab, you’ll learn
- how to create a generic
Result<T>
type alias which you can use to return multiple types of errors from a function; - how to use
struct
s andenum
s to structure related data; - how to implement methods for
struct
s andenums
; and - how to implement a trait.
In the first part of this lab (which contains a lot of text to read but very little actual coding), you’ll learn a generic method for handling errors in functions which we’ll use for the rest of the semester.
In the subsequent parts, you’re going to implement an interactive search interface. When you run your completed program, you should see
$ cargo run --quiet
Welcome to Gutensearch! The index contains 71329 entries.
Example queries to search titles, authors, or both:
> title: two cities
> author: wodehouse
> shelley
Enter a search query (Ctrl-D to exit)
>
and then you’ll be able to enter search queries and get back nicely formatted entries. E.g., here’s one of the entries for the query: title: two cities
:
Author: Dickens, Charles
Title: A Tale of Two Cities
URL: https://www.gutenberg.org/ebooks/98
Resources:
- text: https://www.gutenberg.org/ebooks/98.html.images
- text: https://www.gutenberg.org/files/98/98-h/98-h.htm
- text: https://www.gutenberg.org/files/98/98-h.zip
- epub: https://www.gutenberg.org/ebooks/98.epub3.images
- epub: https://www.gutenberg.org/ebooks/98.epub.images
- epub: https://www.gutenberg.org/ebooks/98.epub.noimages
- text: https://www.gutenberg.org/files/98/98-0.txt
- text: https://www.gutenberg.org/files/98/98-0.zip
Preliminaries
First, find a partner. You’re allowed to work by yourself, but I highly recommend working with a partner. Click on the assignment link. One partner should create a new team. The team name cannot be the same name used for a previous lab. The second partner should click the link and choose the appropriate team. (Please don’t choose the wrong team, there’s a maximum of two people and if you join the wrong one, you’ll prevent the correct person from joining.) You cannot choose a team name you’ve used previously.
Once you have accepted the assignment and created/joined a team, you can clone the repository and begin working.
Be sure to ask any questions on Ed.
Compiler warnings and errors
Make sure your code compiles and passes clippy
without errors or warnings.
That is, running
$ cargo clean
$ cargo build
$ cargo clippy
should build your program without errors or warnings.
Formatting
Your code must be formatted by running cargo fmt
.
Part 1: Getting a result (5 points)
In this part, you’re going to implement a simple method for returning different types of errors from a single function.
Recall that in Rust, we typically indicate that a function may succeed and
return a value or fail and return an error using the standard type
Result<T, E>
. For example, we might have the following code.
fn safe_divide(dividend: i32, divisor: i32) -> Result<i32, String> { if divisor == 0 { Err(format!("Cannot divide {dividend} by zero")) } else { Ok(dividend / divisor) } } fn main() { println!("{:?}", safe_divide(25, -7)); println!("{:?}", safe_divide(25, 0)); }
If divisor
is 0, then it returns an error (click Run to see the results).
Otherwise it returns the quotient.
This works really well! Unfortunately, it has one slightly unfortunate
drawback: we can only return a single type of error. In this case, we’re
returning a String
as the error. The functions that perform input/output
(e.g., reading from a file), use a special type std::io::Error
to indicate an error.
So the question becomes, how can we return multiple different types of errors from a single function?
There are multiple ways to return different types of errors from a function.
One approach is to pick an error type for the Result
that can hold
different types that share a common interface (i.e., implement some trait).
In Lab 4, we encountered a similar situation. We needed to be able
to say that we had some type that implemented the BufRead
trait, but we
didn’t want to be specific about exactly which type. We used a Box
.
We’re going to do the same thing here. The type we want to return is a
Result<T, Box<dyn std::error::Error>>
An object of this type is either an Ok(T)
where T
is some concrete type
like i32
or String
(the success case) or it’s an Err(Box<dyn std::error::Error>)
which is just some object on the heap that implements the
std::error::Error
trait (the error case). All of the error types in
the Rust standard library implement this trait.
Returning to our example, we will need to do more than replace Result<i32, String>
with Result<i32, Box<dyn std::error::Error>>
, but not much more!
fn safe_divide(dividend: i32, divisor: i32) -> Result<i32, Box<dyn std::error::Error>> { if divisor == 0 { Err(format!("Cannot divide {dividend} by zero")) } else { Ok(dividend / divisor) } } fn main() { println!("{:?}", safe_divide(25, -7)); println!("{:?}", safe_divide(25, 0)); }
Click the Run button to see the error.
As the error message states, “expected Box<dyn Error>
, found String
”. The
issue is that a String
is not a boxed error. Fortunately, Rust knows how to
convert it into one using the .into()
method:
Err(format!("Cannot divide {dividend} by zero").into())
There are a pair of traits From<T>
and Into<T>
which define .from()
and .into()
respectively. These functions are used to convert from some type T
or into some type T
. We’ll encounter these functions fairly frequently.
In this case, Box<dyn Error>
implements the From<String>
trait and String
implements the Into<Box<dyn Error>>
. (One typically only implements the From
trait because the compiler will provide a default Into
).
The previous example didn’t showcase returning multiple possible errors. So let’s examine one more example that will try to read an integer from a text file.
Reading an integer from a file can fail in multiple ways: (1) The file might not exist or be readable; (2) the file might not contain valid UTF-8 text; or (3) the contents of the file might not constitute a valid integer.
#![allow(unused)] fn main() { use std::fs; fn read_int_from_file() -> Result<i32, Box<dyn std::error::Error>> { let num = fs::read_to_string("/some/file.txt")? // cases 1 and 2 .trim() .parse()?; // case 3 Ok(num) } }
If the file cannot be read or if the contents aren’t valid UTF-8,
fs::read_to_string()
will return an error of type std::io::Error
as
mentioned above. The ?
will unwrap the result if there is no error.
Otherwise, it’ll return the error from the read_int_from_file()
function. In
fact, it’s slightly better than that, it’ll call .into()
on it for us! This
is super important because otherwise we’d get an error about how an
std::io::Error
is not the same as a Box<dyn std::error::Error>
. By calling
.into()
, Rust knows to box the error up on the heap, just as it did with the
String
above.
If the contents of the file aren’t a valid integer (after trimming off
whitespace), then the .parse()
method will return a
std::num::ParseIntError
. This time, the ?
will unwrap an Ok(i32)
or
convert the Err(std::io::Error)
into the appropriately boxed type.
Your task
You’re going to define type alias for Result<T, Box<dyn std::error::Error>>
which will save on typing. We’ll be using this type alias (or a similar one)
in future labs!
A type alias is a way to give an existing type a second (usually shorter) name. An alias and its underlying type are the same in all respects. This is purely a way to make your code easier to read and write.
Inside your assignment repo, run
$ cargo init --name gutensearch
At the top of main.rs
, add the line
#![allow(unused)] fn main() { type Result<T> = std::result::Result<T, Box<dyn std::error::Error>>; }
Okay, but what did this accomplish? You’ve just defined a new generic type
alias Result<T>
which is exactly the same thing as the Result<T, Box<dyn std::error::Error>>
we used above. Here are the two functions from the
examples above using this type alias.
use std::fs; type Result<T> = std::result::Result<T, Box<dyn std::error::Error>>; fn safe_divide(dividend: i32, divisor: i32) -> Result<i32> { if divisor == 0 { Err(format!("Cannot divide {dividend} by zero").into()) } else { Ok(dividend / divisor) } } fn read_int_from_file() -> Result<i32> { let num = fs::read_to_string("/some/file.txt")? // cases 1 and 2 .trim() .parse()?; // case 3 Ok(num) } fn main() -> Result<()> { let divisor = read_int_from_file()?; let quotient = safe_divide(25, divisor)?; println!("Dividing 25 by {divisor} has quotient {quotient}"); Ok(()) }
Note that you can have main
return a Result
and it’ll print out the error
if there is an error. Since there’s no real return value, we return ()
wrapped in an Ok()
to indicate success. This
doesn’t show a terribly friendly error message though. Click Run to see what
this looks like.
As you can see from the previous example, the error message we get when we
return an error from main
isn’t great. Let’s not do that. Instead, define a
new function, run()
, that will return a Result<()>
that we can check in
main
. On an error, we can print the error message to stderr
and then exit
with return value 1.
Your code should look something like this now.
type Result<T> = std::result::Result<T, Box<dyn std::error::Error>>; fn run() -> Result<()> { todo!("Code for the next part goes here.") } fn main() { if let Err(err) = run() { eprintln!("{err}"); std::process::exit(1); } }
You’re done with this part, so continue to Part 2.
Part 2. Reading the index (5 points)
In this part of the lab, you’re going to read in the index file and create an
instance of an Entry
structure for each entry in the index.
There are many common formats for exchanging data. One of the most simple is
tab-separated values. In this format, each line in the file refers to a
multi-field record. Each field is separated from the previous one in the line
by a tab character (we specify a tab character in code using the escape
sequence \t
, similar to how \n
is a newline character).
For example,
first field second field third field
XXX YYY ZZZ
shows a file with two records, each of which has three fields.
The Project Gutenberg index is a tab-separated values file. Each line of the index file has the format
authors<tab>title<tab>URL<tab>resource 1<tab>...<tab>resource n
where <tab>
means there’s a tab character. The first field is the authors,
the second field is the title, the third is the URL for the entry on Project
Gutenberg’s website. The remaining fields represent individual resources
associated with the entry (resources include .epub
files, text files, and
audio files).
Here’s one example,
Roosevelt, Wyn; Schneider, S. The Frontier Boys in the Sierras; Or, The Lost Mine https://www.gutenberg.org/ebooks/32253 text https://www.gutenberg.org/ebooks/32253.html.images text https://www.gutenberg.org/files/32253/32253-h/32253-h.htm text https://www.gutenberg.org/files/32253/32253-h.zip epub https://www.gutenberg.org/ebooks/32253.epub3.images epub https://www.gutenberg.org/ebooks/32253.epub.images epub https://www.gutenberg.org/ebooks/32253.epub.noimages text https://www.gutenberg.org/ebooks/32253.txt.utf-8 text https://www.gutenberg.org/files/32253/32253-8.txt text https://www.gutenberg.org/files/32253/32253-8.zip text https://www.gutenberg.org/files/32253/32253.txt text https://www.gutenberg.org/files/32253/32253.zip
Given a line of text from the index file, it’s easy to split it up into its parts by splitting the string on a tab character.
#![allow(unused)] fn main() { let line = "Roosevelt, Wyn; Schneider, S. The Frontier Boys in the Sierras; Or, The Lost Mine https://www.gutenberg.org/ebooks/32253 text https://www.gutenberg.org/ebooks/32253.html.images text https://www.gutenberg.org/files/32253/32253-h/32253-h.htm text https://www.gutenberg.org/files/32253/32253-h.zip epub https://www.gutenberg.org/ebooks/32253.epub3.images epub https://www.gutenberg.org/ebooks/32253.epub.images epub https://www.gutenberg.org/ebooks/32253.epub.noimages text https://www.gutenberg.org/ebooks/32253.txt.utf-8 text https://www.gutenberg.org/files/32253/32253-8.txt text https://www.gutenberg.org/files/32253/32253-8.zip text https://www.gutenberg.org/files/32253/32253.txt text https://www.gutenberg.org/files/32253/32253.zip"; // Assuming we're reading through the file a line at a time and `line` // holds the current line of text, we can split the line into parts. let parts: Vec<&str> = line.split('\t').collect(); println!("Author: {}", parts[0]); println!("Title: {}", parts[1]); println!("URL: {}", parts[2]); println!("{} resources", parts.len() - 3); }
Click Run.
You’re going to represent each entry using an instance of an Entry
structure. So you’re going to have to define the structure.
Each entry has an author field, a title field, a URL field and 0 or more resources associated with it. In fact, it’s possible for the author field to be empty. In this case, Project Gutenberg hasn’t recorded the author or the author is unknown. You’ll have to deal with this. Let’s ignore the resources for the moment. We’ll return to them in a later part.
Since the author field can be empty, let’s model that with an
Option<String>
so that None
refers to the no author case and
Some(authors)
is the case where there are authors. The other fields should
not be empty. This suggests that we define our Entry
structure like this.
#![allow(unused)] fn main() { #[derive(Debug, Clone)] struct Entry { author: Option<String>, title: String, // Other fields here. } }
Your task
First, you’ll need to download the index file and decompress it inside your assignment repository
$ gunzip pgindex.txt.gz
Your repo should look like this
repo
├── Cargo.toml
├── pgindex.txt
└── src
└── main.rs
Do not add this file to your repository as it’s pretty large for a text file at 46 MB.
In main.rs
, define a new Entry
structure that has fields for author, title, and URL.
In your run()
function, open the file pgindex.txt
, wrap it in a
BufReader
. (If you forget how BufReader
works, review Lab 4.)
Then read from the reader line by line (the .lines()
method is extremely
useful here) and parse each line into an Entry
. Finally, print out the debug
representation of the Entry
.
#![allow(unused)] fn main() { use std::fs::File; use std::io::{BufRead, BufReader}; type Result<T> = std::result::Result<T, Box<dyn std::error::Error>>; #[derive(Debug)] struct Entry { author: Option<String>, title: String, } fn run() -> Result<()> { let mut reader = BufReader::new(File::open("pgindex.txt")?); for line in reader.lines() { //Turn the line from a result to a &str let line: &str = &line?; // TODO: Parse the entry here. You will need to set the author field to None if the string length is 0 let author = Some("TODO: get this from line".to_string()); let title = "TODO: get this from line".to_string(); // Other fields let entry = Entry { author, title, // ... }; println!("{entry:?}"); } Ok(()) } }
If you run your code, you’re going to get a lot of output. Far more than you want. I recommend while you’re testing this part, you replace reader.lines()
in the above code with reader.lines().take(5)
which will only run the body of the loop for the first 5 lines of the file.
Just don’t forget to remove the .take(5)
later.
When this all works, you should see this output when you run your code.
Entry { author: None, title: "\"Contemptible\", by \"Casualty\"", url: "https://www.gutenberg.org/ebooks/18103" }
Entry { author: None, title: "A Handbook of the Boer War With General Map of South Africa and 18 Sketch Maps and Plans", url: "https://www.gutenberg.org/ebooks/15699" }
Entry { author: None, title: "A Jolly by Josh", url: "https://www.gutenberg.org/ebooks/17499" }
Entry { author: None, title: "Baseball ABC", url: "https://www.gutenberg.org/ebooks/19169" }
Entry { author: None, title: "Bibelen, Det nye Testamente", url: "https://www.gutenberg.org/ebooks/2143" }
Note that the first few entries in the index don’t have authors so it prints None
.
Part 3. Implementing some methods (55 points)
At this point, you should have some code that can parse the fields of the
index file and extract the author, title, and URL fields and stick them in an
Entry
.
In this part, you’re going to create an Index
structure which will hold all
of the Entry
s and implement some methods on it to perform a simple search.
Recall that to implement methods for a structure, we use an impl
block.
#![allow(unused)] fn main() { #[derive(Debug, Clone)] struct Example { name: String, url: Option<String>, } impl Example { fn new(name: String) -> Self { Self { name, url: None } } fn has_url(&self) -> bool { self.url.is_some() } fn url(&self) -> Option<String> { // Return a copy of the url member. self.url.clone() } fn set_url(&mut self, url: &str) { self.url = Some(url.to_string()); } fn into_name(self) -> String { self.name } } }
Some things to notice here:
- There’s a
new()
function which returns aSelf
. Inside animpl
block,Self
refers to the type being implemented, hereExample
; - The
new()
function does not takeself
,&self
, or&mut self
as an argument because this isn’t a method you call on an instance of anExample
. Instead, you’d call it to create an instance. This is similar to a constructor in Java although it’s just convention and not inforced. To callnew()
, we useExample::new("Blarg".to_string())
. It’s a good practice to have anew()
function (which may or may not take arguments as the situation demands) and returns one ofSelf
,Option<Self>
orResult<Self, SomeErrorType>
. The last two are used when it’s possible for thenew()
function to fail. - The
has_url()
andurl()
methods take&self
as the first argument. This is similar to Python’sself
argument and Java’s implicitthis
argument in methods; - The
set_url()
method takes&mut self
rather than&self
. This is necessary because modification toself
requires that it be mutable. If you remove themut
from the method signature, it’ll fail to compile; and - The
into_name()
method takesself
rather than&self
. This method takes ownership of theExample
object (meaning it cannot be used after calling.into_name()
) and returns thename
field without making a copy (as theurl()
method does).
Your task
Define a new structure called Index
which will hold all of the Entry
s
you’re going to parse from the index file. You’re going to implement the following methods.
#![allow(unused)] fn main() { type Result<T> = std::result::Result<T, Box<dyn std::error::Error>>; struct Entry; #[derive(Debug, Clone)] struct Index { // TODO: What field(s) do you want here? } impl Index { /// Create a new `Index` by reading from `pgindex.txt`. fn new() -> Result<Self> { todo!() } /// Returns the number of entries in the index. fn len(&self) -> usize { todo!() } /// Returns a vector containing references to entries whose titles /// contain `search_string`. fn title_search(&self, search_string: &str) -> Vec<&Entry> { todo!() } /// Returns a vector containing references to entries whose authors /// contain `search_string`. fn author_search(&self, search_string: &str) -> Vec<&Entry> { todo!() } /// Returns a vector containing references to entries whose titles /// or authors contain `search_string`. fn search(&self, search_string: &str) -> Vec<&Entry> { todo!() } } }
To implement new()
, take your code from the run()
in the previous part and
move it into new()
. Rather than print out the entries, you’re going to need
to collect them all into some data structure and store them in your Index
structure and return that.
The len()
method should return the number of entries in the index.
The three search methods should behave similarly: Each converts the
search_string
to lowercase. Then, for each entry in the index, convert the
relevant fields (title, author, or both) to lowercase. If the lowercased
search_string
is contained in the field, add a reference to the
corresponding entry to a results
vector and return that.
You’ll probably want something like this.
fn title_search(&self, search_string: &str) -> Vec<&Entry> {
let mut results = Vec::new();
for entry in ??? {
if ??? {
results.push(entry);
}
}
results
}
The other two are similar; however, remember that some entries don’t have authors. You can use
if let Some(ref author) = entry.author {
// This entry did have an author and `author` is currently a reference
// to the `String` stored inside the `Option<String>` that is the
// `entry.author` field.
}
If you omit the ref
in if let Some(ref author)
, you’ll get an error about
trying to move out of entry
which isn’t mutable so you cannot. ref
tells
Rust that you want reference instead.
Once you have implemented one of these functions, it’s worth testing your
code. If you change your run()
function to
fn run() -> Result<()> {
let index = Index::new()?;
println!("There are {} entries in the index", index.len());
let results = index.title_search("the inferno");
println!("{results:#?}");
Ok(())
}
you should see three results, The Inferno by Barbusse and O’Brien, The Divine Comedy of Dante Alighieri: The Inferno by Dante, and The Inferno by Strindberg.
In the remaining parts of the lab, you’re going to handle parsing the
resources, implementing the Display
trait to make printing better, and
finally the simple user interface shown in the lab’s overview.
Part 4. Modeling resources (10 points)
In this part, you’re going to create a new type, an enumeration (enum
)
rather than a structure type (struct
), in order to represent (or model) a
resource associated with the entries in the Project Gutenberg index.
Briefly, an enum
lets us model the situation where a value is one of a
finite set of choices. For example, the Option<T>
type is either None
or
it’s a Some(x)
(for some value x
of type T
). Review Chapter
6 of the Rust book
as needed for details on enum
s.
In this case, there are three different types of resources: epub (an ebook format), text, and audio.
Let’s return to the example index entry we saw before:
Roosevelt, Wyn; Schneider, S. The Frontier Boys in the Sierras; Or, The Lost Mine https://www.gutenberg.org/ebooks/32253 text https://www.gutenberg.org/ebooks/32253.html.images text https://www.gutenberg.org/files/32253/32253-h/32253-h.htm text https://www.gutenberg.org/files/32253/32253-h.zip epub https://www.gutenberg.org/ebooks/32253.epub3.images epub https://www.gutenberg.org/ebooks/32253.epub.images epub https://www.gutenberg.org/ebooks/32253.epub.noimages text https://www.gutenberg.org/ebooks/32253.txt.utf-8 text https://www.gutenberg.org/files/32253/32253-8.txt text https://www.gutenberg.org/files/32253/32253-8.zip text https://www.gutenberg.org/files/32253/32253.txt text https://www.gutenberg.org/files/32253/32253.zip
Each of the tab-separated fields after the author, title, and URL fields
consists of the word epub
, text
, or audio
followed by a space and then a
URL of the resource in question.
Your task
Define a new enum Resource
which has variants corresponding to the epub,
text, and audio resources. Each variant should hold its corresponding URL. For
inspiration, look at the definition of enum IpAddr
in the
book.
You should make sure you derive Debug
as normal so that you can print the
debug representation using the {:?}
format specifier.
let x = Resource::Audio("https://example.com");
println!("{x:?}"); // Prints debug representation of `x`
Next, modify your Entry
structure to hold 0 or more Resource
s and update
Index::new()
to correctly fill in the resources from the resource fields
pgindex.txt
.
You can split each resource field on a space and then match
on the first part.
fn main() -> Result<(), Box<dyn std::error::Error>> { let parts = vec!["audio", "https://example.com"]; match parts[0] { "epub" => todo!(), "text" => todo!(), "audio" => todo!(), _ => return Err(format!("Unknown format: {}", parts[0]).into()), } Ok(()) }
How cool is that?
At this point, the run()
function from the previous part should work
(assuming you derived Debug
as suggested) and its output should include all
of the resources associated with each of the three entries matching the query.
Part 5. Implementing Display (15 points)
The debug representation of a type—the thing you get from the {:?}
format
specifier—is extremely handy for developers. Rust makes it trivial to
automatically get this representation for almost every type using
#[derive(Debug)]
which is why we add it to all of our types.
The downside is that as a user of a program, it kinda sucks: It shows a bunch of internal details about the types and users don’t care about that; it can be too verbose by including data that’s important for the implementation, but not for the end users; it’s hard to read if you don’t know what you’re looking at; and so on.
In this part, you’re going to remedy this by implementing the
std::fmt::Display
trait. Unlike Debug
, the compiler cannot
derive the Display
trait for you. Fortunately, Rust makes this
pretty easy.
Every trait has a list of required methods that must be implemented by a type
that implements a trait. This is just like implementing an interface in Java.
Some traits have 0 required methods. (These so-called “marker” traits convey
information to the compiler. The Copy
trait is one such trait. A type that
implements Copy
behaves like the primitive types like i32
in that copies
of the data are made rather than moving the data.) Most traits have 1 or a
small handful of methods that must be implemented. Traits can have additional
methods that have default implementations (much like abstract classes in Java).
The Display
trait has a single method to implement: fmt()
. Read
the documentation for Display
and pay particular attention to the
example showing how to implement Display
.
The write!()
macro and its companion writeln!()
work very similarly to the
print!()
and println!()
macros. There are two differences between
write!()/writeln!()
and print!()/println!()
.
- The
print
macros write tostdout
whereas thewrite
macros write to their first argument, in this case it’s the mutable reference to aFormatter
object; and - The
print
macros will panic (a safe crash) if writing tostdout
fails whereas thewrite
macros return aResult
.
The upshot of 2 is that if you want to make multiple calls to writeln!()
in
the fmt()
method, you can use the ?
operator as usual.
writeln!(f, "First line")?;
writeln!(f, "Second line")?;
Ok(())
Your task
Implement Display
for Entry
. It should behave as follows:
- If the entry has an author (i.e., it’s not
None
), then the author line should be printed (see examples below) otherwise the author line should be omitted; - Print the title line;
- Print the URL line; and
- If there are resources, print the list of resources.
Change your run()
function to print out the Display
representation for
each matching entry.
fn run() -> Result<()> {
let index = Index::new()?;
println!("There are {} entries in the index", index.len());
let results = index.title_search("the inferno");
for entry in results {
println!("{entry}");
}
Ok(())
}
If everything has been implemented correctly, the three entries should appear like this.
Author: Barbusse, Henri; O'Brien, Edward J. (Edward Joseph)
Title: The Inferno
URL: https://www.gutenberg.org/ebooks/12414
Resources
- text: https://www.gutenberg.org/ebooks/12414.html.images
- epub: https://www.gutenberg.org/ebooks/12414.epub3.images
- text: https://www.gutenberg.org/ebooks/12414.html.noimages
- epub: https://www.gutenberg.org/ebooks/12414.epub.images
- epub: https://www.gutenberg.org/ebooks/12414.epub.noimages
- text: https://www.gutenberg.org/ebooks/12414.txt.utf-8
- text: https://www.gutenberg.org/files/12414/12414-8.txt
- text: https://www.gutenberg.org/files/12414/12414-8.zip
- text: https://www.gutenberg.org/files/12414/12414.txt
- text: https://www.gutenberg.org/files/12414/12414.zip
Author: Dante Alighieri; Sibbald, James Romanes
Title: The Divine Comedy of Dante Alighieri: The Inferno
URL: https://www.gutenberg.org/ebooks/41537
Resources
- text: https://www.gutenberg.org/ebooks/41537.html.images
- text: https://www.gutenberg.org/files/41537/41537-h/41537-h.htm
- epub: https://www.gutenberg.org/ebooks/41537.epub3.images
- epub: https://www.gutenberg.org/ebooks/41537.epub.images
- epub: https://www.gutenberg.org/ebooks/41537.epub.noimages
- text: https://www.gutenberg.org/ebooks/41537.txt.utf-8
- text: https://www.gutenberg.org/files/41537/41537-0.txt
Author: Strindberg, August; Field, Claud
Title: The Inferno
URL: https://www.gutenberg.org/ebooks/44108
Resources
- text: https://www.gutenberg.org/ebooks/44108.html.images
- text: https://www.gutenberg.org/files/44108/44108-h/44108-h.htm
- text: https://www.gutenberg.org/files/44108/44108-h.zip
- epub: https://www.gutenberg.org/ebooks/44108.epub3.images
- epub: https://www.gutenberg.org/ebooks/44108.epub.images
- epub: https://www.gutenberg.org/ebooks/44108.epub.noimages
- text: https://www.gutenberg.org/files/44108/44108-0.txt
- text: https://www.gutenberg.org/files/44108/44108-0.zip
Part 6. Building a TUI (20 points)
In the final part of this lab, you will implement the text-based user interface (TUI) shown in the overview.
Your task
Change your run()
function to implement the text-based user interface. See the examples below.
When gutensearch starts, it should print out a welcome message containing the number of entries in the index and example queries.
Then in a loop it should print the instruction to enter a search query and how to exit. Then print out a prompt string "> "
(don’t forget to flush stdout
so that the prompt appears). Read a line from stdin
. If the line is empty (because the user pressed control-d), exit the program successfully. If the line consists only of white space, print the prompt again and wait for more input.
There are three types of queries you need to handle:
title: query
: pass the trimmed query portion toIndex::title_search()
, unless the trimmed query is empty in which case it should print the prompt again;author: query
: same as thetitle
query except useIndex::author_search()
, of course; andquery
: pass the trimmed query toIndex::search()
.
If a colon appears but the trimmed text before the colon is neither title
nor author
, print an error message and print the prompt again.
Some tips.
- Use
str::find()
to find the first:
character; - Use
str::trim()
to trim strings;
Here’s an example run showing empty queries, errors, a success, and an empty search result for a valid query.
cargo run --quiet
Welcome to Gutensearch! The index contains 71329 entries.
Example queries to search titles, authors, or both:
> title: two cities
> author: wodehouse
> shelley
Enter a search query (ctrl-d to exit)
>
>
> title:
> author:
> subject:
Unknown query type: subject
> title: The Modern Prometheus
Author: Shelley, Mary Wollstonecraft
Title: Frankenstein; Or, The Modern Prometheus
URL: https://www.gutenberg.org/ebooks/20038
Resources
- audio: https://www.gutenberg.org/files/20038/mp3/20038-17.mp3
- audio: https://www.gutenberg.org/files/20038/mp3/20038-03.mp3
- audio: https://www.gutenberg.org/files/20038/mp3/20038-06.mp3
- audio: https://www.gutenberg.org/files/20038/m4b/20038-01.m4a
- audio: https://www.gutenberg.org/files/20038/ogg/20038-12.ogg
- audio: https://www.gutenberg.org/files/20038/m4b/20038-09.m4a
- audio: https://www.gutenberg.org/files/20038/spx/20038-24.spx
- audio: https://www.gutenberg.org/files/20038/ogg/20038-02.ogg
- audio: https://www.gutenberg.org/files/20038/20038-m4b.zip
- audio: https://www.gutenberg.org/files/20038/m4b/20038-04.m4a
- audio: https://www.gutenberg.org/files/20038/ogg/20038-08.ogg
- audio: https://www.gutenberg.org/files/20038/ogg/20038-09.ogg
- audio: https://www.gutenberg.org/files/20038/mp3/20038-09.mp3
- audio: https://www.gutenberg.org/files/20038/m4b/20038-13.m4a
- audio: https://www.gutenberg.org/files/20038/ogg/20038-16.ogg
- audio: https://www.gutenberg.org/files/20038/ogg/20038-14.ogg
- audio: https://www.gutenberg.org/files/20038/mp3/20038-02.mp3
- audio: https://www.gutenberg.org/files/20038/spx/20038-05.spx
- audio: https://www.gutenberg.org/files/20038/m4b/20038-24.m4a
- audio: https://www.gutenberg.org/files/20038/mp3/20038-14.mp3
- audio: https://www.gutenberg.org/files/20038/spx/20038-15.spx
- audio: https://www.gutenberg.org/files/20038/m4b/20038-08.m4a
- audio: https://www.gutenberg.org/files/20038/spx/20038-09.spx
- audio: https://www.gutenberg.org/files/20038/ogg/20038-19.ogg
- audio: https://www.gutenberg.org/files/20038/spx/20038-08.spx
- audio: https://www.gutenberg.org/files/20038/m4b/20038-10.m4a
- audio: https://www.gutenberg.org/files/20038/mp3/20038-20.mp3
- audio: https://www.gutenberg.org/files/20038/spx/20038-10.spx
- audio: https://www.gutenberg.org/files/20038/ogg/20038-06.ogg
- audio: https://www.gutenberg.org/files/20038/spx/20038-11.spx
- audio: https://www.gutenberg.org/files/20038/m4b/20038-15.m4a
- audio: https://www.gutenberg.org/files/20038/ogg/20038-17.ogg
- audio: https://www.gutenberg.org/files/20038/ogg/20038-01.ogg
- audio: https://www.gutenberg.org/files/20038/spx/20038-16.spx
- text: https://www.gutenberg.org/files/20038/20038-index.html
- audio: https://www.gutenberg.org/files/20038/mp3/20038-01.mp3
- audio: https://www.gutenberg.org/files/20038/mp3/20038-19.mp3
- audio: https://www.gutenberg.org/files/20038/spx/20038-07.spx
- audio: https://www.gutenberg.org/files/20038/mp3/20038-10.mp3
- text: https://www.gutenberg.org/files/20038/20038-readme.txt
- audio: https://www.gutenberg.org/files/20038/spx/20038-06.spx
- audio: https://www.gutenberg.org/files/20038/mp3/20038-04.mp3
- audio: https://www.gutenberg.org/files/20038/spx/20038-13.spx
- audio: https://www.gutenberg.org/files/20038/mp3/20038-05.mp3
- audio: https://www.gutenberg.org/files/20038/m4b/20038-02.m4a
- audio: https://www.gutenberg.org/files/20038/m4b/20038-03.m4a
- audio: https://www.gutenberg.org/files/20038/m4b/20038-05.m4a
- audio: https://www.gutenberg.org/files/20038/mp3/20038-12.mp3
- audio: https://www.gutenberg.org/files/20038/ogg/20038-18.ogg
- audio: https://www.gutenberg.org/files/20038/m4b/20038-16.m4a
- audio: https://www.gutenberg.org/files/20038/ogg/20038-03.ogg
- audio: https://www.gutenberg.org/files/20038/mp3/20038-24.mp3
- audio: https://www.gutenberg.org/files/20038/spx/20038-18.spx
- audio: https://www.gutenberg.org/files/20038/mp3/20038-15.mp3
- audio: https://www.gutenberg.org/files/20038/ogg/20038-07.ogg
- audio: https://www.gutenberg.org/files/20038/ogg/20038-05.ogg
- audio: https://www.gutenberg.org/files/20038/spx/20038-17.spx
- audio: https://www.gutenberg.org/files/20038/ogg/20038-15.ogg
- audio: https://www.gutenberg.org/files/20038/spx/20038-12.spx
- audio: https://www.gutenberg.org/files/20038/spx/20038-01.spx
- audio: https://www.gutenberg.org/files/20038/m4b/20038-20.m4a
- audio: https://www.gutenberg.org/files/20038/mp3/20038-07.mp3
- audio: https://www.gutenberg.org/files/20038/mp3/20038-08.mp3
- audio: https://www.gutenberg.org/files/20038/m4b/20038-21.m4a
- audio: https://www.gutenberg.org/files/20038/spx/20038-03.spx
- audio: https://www.gutenberg.org/files/20038/spx/20038-04.spx
- audio: https://www.gutenberg.org/files/20038/ogg/20038-10.ogg
- audio: https://www.gutenberg.org/files/20038/spx/20038-02.spx
- audio: https://www.gutenberg.org/files/20038/ogg/20038-21.ogg
- audio: https://www.gutenberg.org/files/20038/spx/20038-20.spx
- audio: https://www.gutenberg.org/files/20038/m4b/20038-11.m4a
- audio: https://www.gutenberg.org/files/20038/spx/20038-19.spx
- audio: https://www.gutenberg.org/files/20038/ogg/20038-20.ogg
- audio: https://www.gutenberg.org/files/20038/m4b/20038-06.m4a
- audio: https://www.gutenberg.org/files/20038/mp3/20038-16.mp3
- audio: https://www.gutenberg.org/files/20038/ogg/20038-24.ogg
- audio: https://www.gutenberg.org/files/20038/m4b/20038-17.m4a
- audio: https://www.gutenberg.org/files/20038/mp3/20038-13.mp3
- audio: https://www.gutenberg.org/files/20038/m4b/20038-19.m4a
- audio: https://www.gutenberg.org/files/20038/ogg/20038-13.ogg
- audio: https://www.gutenberg.org/files/20038/m4b/20038-14.m4a
- audio: https://www.gutenberg.org/files/20038/ogg/20038-11.ogg
- audio: https://www.gutenberg.org/files/20038/m4b/20038-18.m4a
- audio: https://www.gutenberg.org/files/20038/mp3/20038-18.mp3
- audio: https://www.gutenberg.org/files/20038/mp3/20038-11.mp3
- audio: https://www.gutenberg.org/files/20038/spx/20038-14.spx
- audio: https://www.gutenberg.org/files/20038/m4b/20038-07.m4a
- audio: https://www.gutenberg.org/files/20038/spx/20038-21.spx
- audio: https://www.gutenberg.org/files/20038/mp3/20038-21.mp3
- audio: https://www.gutenberg.org/files/20038/m4b/20038-12.m4a
- audio: https://www.gutenberg.org/files/20038/20038-mp3.zip
- audio: https://www.gutenberg.org/files/20038/ogg/20038-04.ogg
Author: Shelley, Mary Wollstonecraft
Title: Frankenstein; Or, The Modern Prometheus
URL: https://www.gutenberg.org/ebooks/41445
Resources
- text: https://www.gutenberg.org/ebooks/41445.html.images
- text: https://www.gutenberg.org/files/41445/41445-h/41445-h.htm
- epub: https://www.gutenberg.org/ebooks/41445.epub3.images
- epub: https://www.gutenberg.org/ebooks/41445.epub.images
- epub: https://www.gutenberg.org/ebooks/41445.epub.noimages
- text: https://www.gutenberg.org/ebooks/41445.txt.utf-8
- text: https://www.gutenberg.org/files/41445/41445-0.txt
Author: Shelley, Mary Wollstonecraft
Title: Frankenstein; Or, The Modern Prometheus
URL: https://www.gutenberg.org/ebooks/42324
Resources
- text: https://www.gutenberg.org/ebooks/42324.html.images
- text: https://www.gutenberg.org/files/42324/42324-h/42324-h.htm
- text: https://www.gutenberg.org/files/42324/42324-h.zip
- epub: https://www.gutenberg.org/ebooks/42324.epub3.images
- epub: https://www.gutenberg.org/ebooks/42324.epub.images
- epub: https://www.gutenberg.org/ebooks/42324.epub.noimages
- text: https://www.gutenberg.org/ebooks/42324.txt.utf-8
- text: https://www.gutenberg.org/files/42324/42324-8.txt
- text: https://www.gutenberg.org/files/42324/42324-8.zip
- text: https://www.gutenberg.org/files/42324/42324.txt
- text: https://www.gutenberg.org/files/42324/42324.zip
Author: Shelley, Mary Wollstonecraft
Title: Frankenstein; Or, The Modern Prometheus
URL: https://www.gutenberg.org/ebooks/84
Resources
- text: https://www.gutenberg.org/ebooks/84.html.images
- text: https://www.gutenberg.org/files/84/84-h/84-h.htm
- epub: https://www.gutenberg.org/ebooks/84.epub3.images
- epub: https://www.gutenberg.org/ebooks/84.epub.images
- epub: https://www.gutenberg.org/ebooks/84.epub.noimages
- text: https://www.gutenberg.org/ebooks/84.txt.utf-8
- text: https://www.gutenberg.org/files/84/84-0.txt
Enter a search query (ctrl-d to exit)
> author: Not A. Real Author
No matching results
Enter a search query (ctrl-d to exit)
>
Your output need not match mine exactly, but it should be substantially similar.
If your code works, congratulations! Submit your work and then consider finding a book on Project Gutenberg and giving it a read!
Submitting (5 points)
To submit your lab, you must commit and push to GitHub before the deadline.
Your repository should contain the following files and directory.
repo
├── Cargo.lock
├── Cargo.toml
├── README
└── src
└── main.rs
Any additional files you have added to your repository should be removed from
the main
branch. (You’re free to make other branches, if you desire, but
make sure main
contains the version of the code you want graded.)
The README
should contain the names of both partners (or just your name if
you worked alone…but please don’t work alone if you can manage it).
After you have everything working, make sure you add your code, commit it, and push it to GitHub.
Lab 6: Processes I
Due: 2024-10-28 at 23:59
In this lab and the next, you are going to write several command line applications that work with processes. You will use the same repository for both labs.
The goal is to learn
- how to create a library and binaries in the same project;
- how to create a module;
- one way to implement generic error handling;
- how to define new
struct
s and implement methods for them; - how to call functions in the C standard library to interact with the operating system;
- about how Linux exposes information to users via the
/proc
file system; and - how processes, process groups, sessions, and terminals all fit together!
Unlike the previous labs, the next two are going to be specific to Linux. Windows, macOS, and other operating systems use different mechanisms to expose information to user programs. See the instructions for remote coding.
In the first part of this lab, you’re going to create a Rust project that
contains both a library and a binary. In the second part of the lab, you’re
going to copy the type alias Result<T>
you created in the last lab to this
lab because it’s extremely useful. In the third part, you’ll read information
from /proc
. In the remaining parts of this lab, you’re going to implement a
process
library that will contain functions that return information about
running processes as well as most of an implementation for the ps
command
line tool which displays information about processes.
Preliminaries
First, find a partner. You’re allowed to work by yourself, but I highly recommend working with a partner. Click on the assignment link. One partner should create a new team. The team name cannot be the same name used for a previous lab. The second partner should click the link and choose the appropriate team. (Please don’t choose the wrong team, there’s a maximum of two people and if you join the wrong one, you’ll prevent the correct person from joining.) You cannot choose a team name you’ve used previously.
Once you have accepted the assignment and created/joined a team, you can clone the repository and begin working.
Be sure to ask any questions on Ed.
Compiler warnings and errors
Make sure your code compiles and passes clippy
without errors or warnings.
That is, running
$ cargo clean
$ cargo build
$ cargo clippy
should build your program without errors or warnings.
Formatting
Your code must be formatted by running cargo fmt
.
Part 1: Making a library (5 points)
In this part, you’re going to create a new project using cargo
that will
build a library and binaries. A library is just a collection of code that we
can use in multiple applications. We’ve seen and used several libraries
already, rand
, colored
, and clap
. Now it’s time to make your own!
There are several key differences between a binary and a library project.
- A library project has a
src/lib.rs
file and a binary project has asrc/main.rs
file. (The names are configurable by changing theCargo.toml
file, but let’s stick with the defaults unless we have a reason to change the names.) - Libraries do not have a
main
function. Instead, they expose types, data, and functions that other code can use.
Your task
Clone the assignment repository (which will be empty) on your machine.
From inside the assignment repo, initialize a new library project using cargo
.
$ cargo init --lib --name process
Note that we used cargo init
rather than cargo new
. The difference is that
new
will create a new directory whereas init
will use an existing one (by
default, the current directory, .
). We’re using init
because we’re not
going to have multiple projects inside the repository. It’s more normal to
just have a single project per get repo.
At this point, we cannot run $ cargo run
because there’s no binary (we’re
going to change that momentarily). Indeed, if you look at the files created by
running cargo init --lib
, you’ll see there’s a src/lib.rs
but no
src/main.rs
. The former is used for a library whereas the latter is used for
a binary.
Let’s add a binary.
Create a new directory named bin
inside the src
directory. Inside bin
,
create a file named runnable.rs
. By default, cargo
will look inside the
src/bin
directory and create executables, one per file. In this case, we’re
creating an executable runnable
that you will implement in Part 3.
For now, create a main
function that will call the add
method that cargo init
generated inside the process
library’s lib.rs
mod process { pub fn add(left: usize, right: usize) -> usize { left + right } } fn main() { let result = process::add(5, 10); println!("5 + 10 = {result}"); }
Running $ cargo run
will now run the runnable
program and print 5 + 10 = 15
.
If we wanted (and we will later), we could create a new file inside src/bin
and then we would have multiple binaries that could use the library code.
Neat!
Once you can run $ cargo run
, it’s time to move on to next part.
Part 2: Result, again (5 points)
In the previous lab, you created a type alias Result<T>
that you used for returning multiple types of errors from a function. You’re
going to make extensive use of that type in this lab.
As a reminder, here’s a function that reads an integer from a file and returns it.
#![allow(unused)] fn main() { use std::fs; use std::path::Path; // A generic Result type that can hold any type of Error. type Result<T> = std::result::Result<T, Box<dyn std::error::Error>>; fn read_int_from_file(path: &Path) -> Result<i32> { let num = fs::read_to_string(path)? .trim() .parse()?; Ok(num) } }
This function can fail in multiple ways including the file not existing or being inaccessible and the contents of the file not being an integer.
There are three key benefits of doing this.
-
We can use the
?
operator to propagate errors to the calling function. E.g., iffs::read_from_string()
was unable to read the file,read_int_from_file()
will return anErr(err)
whereerr
is the boxed up error. This makes our code shorter and easier to read (with practice). -
The
?
operator knows how to turn anyResult<T, E>
(whereE
is some type that implementsstd::error::Error
, as all of the standard library errors do) into aResult<T, Box<dyn std::error::Error>>
.So even though neither
fs::read_from_file()
norstr::parse()
return the correct type ofResult
,?
will perform the conversion for us. -
The standard library knows how to turn a
String
into aBox<dyn std::error::Error>
. This is incredibly helpful as the next tip shows.
When you are dealing with file input/output, we can use
Result::map_err()
to attach context to our errors.
map_err()
lets us convert from one error type to another. Here’s an example
of attaching the path of a file that failed to open to the error message.
use std::fs::File; use std::path::{Path, PathBuf}; type Result<T> = std::result::Result<T, Box<dyn std::error::Error>>; fn bad_error_message(path: &Path) -> Result<()> { let mut f = File::open(path)?; todo!() } fn good_error_message(path: &Path) -> Result<()> { let mut f = File::open(path) .map_err(|err| format!("{}: {err}", path.display()))?; todo!() } fn main() { let path = PathBuf::from("/path/to/no-such-file"); println!("Calling the function with the bad error message:"); if let Err(err) = bad_error_message(&path) { // This should be eprintln!() println!("{err}"); } println!("\nCalling the function with the good error message:"); if let Err(err) = good_error_message(&path) { // This should be eprintln!() println!("{err}"); } }
Click the Run button and look at the two error messages. Notice that the good error message shows the path of the file that couldn’t be opened and the bad one does not.
Let me assure you there’s little that’s as frustrating as trying to debug code (in any language) where the error messages don’t tell you which files couldn’t be opened.
You are going to want to transform the errors, particularly file I/O errors, in this way to add context.
Your task
Start by deleting the contents of lib.rs
and create a public type alias
named Result<T>
which is an alias for a standard Result
whose error is
boxed just as you did in the previous lab.
In order to be able to use this type from outside the process
library, you
need to make the type public. To do that, add pub
before it.
pub type Result<T> = ...;
This is similar to the public
keyword in Java.
In runnable.rs
, add the line
use process::Result;
Now, any time you use Result
in runnable.rs
, it will use the Result
type you
defined in the library.
At this point, your runnable
binary is now broken because you’ve removed the add
function.
In Part 3, you will fix that by making runnable
read a file from
/proc
, parse it, and print out some information.
Part 3. A proc primer (15 points)
In this part of the lab, you’re going to make a simple program that reads some information about running processes from a virtual file.
Linux exposes a bunch of information to user processes through a virtual file
system called procfs
. It’s a virtual file system because there are no
actual files. Instead, when you read from the files in the procfs
file
system, Linux returns information about processes. Exposing information to
users as a file is a common Unix approach. Read this very short Wikipedia
article about the philosophy where everything is a
file.
The procfs
file system is usually mounted at /proc
, meaning we can read
the virtual files in the file system by treating them as normal files inside
the /proc
directory.
The proc(5) man page describes the contents of the files in /proc
in
great details. We’ll be returning to man page repeatedly. But it’s very long
so don’t go read it now. This lab write up will tell you when and where to
look in the file for the information you need.
Your task
In runnable.rs
, write a function
#![allow(unused)] fn main() { type Result<T> = std::result::Result<T, Box<dyn std::error::Error>>; /// Returns the number of currently runnable kernel scheduling entities /// (processes and threads) and the total number of kernel scheduling entities. fn scheduling_entities() -> Result<(usize, usize)> { todo!() } }
This function is going to return either Ok((runnable, total))
or an
Err(err)
. That is, on success, it’ll return a pair of numbers, (runnable, total)
(hence the two parentheses in the Ok
case) and on error, well, it’ll
return some error. The number of runnable “kernel scheduling entities” tells
us how many processes (and threads, which we’ll talk about later in the
course) are currently ready to run (or are running). A process that’s blocked
waiting for input, for example, is not runnable.
Linux exposes that information to you (as well as other information) in the
virtual file /proc/loadavg
. For example,
$ cat loadavg
0.00 0.01 0.04 5/756 26030
Remember that to split a String
(or really a &str
) on white space, we can
use str::split_whitespace()
to get an iterator and we can use collect()
to
read the elements from an interator into a collection, in this case, a
Vec<&str>
. If we wanted to split a string on a particular character, we can
use the str::split()
method. Click the link for the documentation.
The manual page informs me that the final number is the process identifier, PID, for the most recently created process.
#![allow(unused)] fn main() { let loadavg = "0.00 0.01 0.04 5/756 26030"; let parts: Vec<&str> = loadavg.split_whitespace().collect(); println!("Most recent PID: {}", parts[4]); }
Figure out which of those numbers corresponds to the runnable and which to the
total kernel scheduling entities by reading the man page and searching
it for the documentation for /proc/loadavg
.
Recall that you can parse an integer from a string using the parse()
method.
Here’s an example main
function you can use in runnable.rs
to print out the
number of runnable entities.
type Result<T> = std::result::Result<T, Box<dyn std::error::Error>>; fn scheduling_entities() -> Result<(usize, usize)> { Ok((10, 300)) } fn main() { match scheduling_entities() { Ok((runnable, total)) => { println!("Runnable entities {runnable} of {total}"); } Err(err) => { eprintln!("{err}"); std::process::exit(1); } } }
If you run your code on a Linux system, you should see something like
Runnable entities 10 of 100
(your numbers will differ).
If you run your code on a non-Linux system (and you followed the tip about
attaching file names to error messages using .map_err()
), you’ll get an
error like this.
$ cargo run
/proc/loadavg: No such file or directory (os error 2)
Now that you have some experience reading the procfs manual and
getting information out of the files there, it’s time to start working on the
process
library in the next part.
Part 4. Modeling a process (15 points)
In this part of the lab, you’re going to define a new struct
in the library (which is the lib.rs
file and additional files you’ll create)
which you’ll use to represent, or model, a process. This means that your
struct
, is going to contain enough information about a process for us to
perform our task: implementing the ps
command-line utility which
prints out information about running processes.
A process is an instance of a running program. For example, there is one
bash
program which lives in the file system at /bin/bash
but we can run
many different instances of bash
at the same time. Each instance is a
process. The operating system assigns each running process a nonnegative
number called the process identifier or PID.
Open a terminal and run $ ps
. You should see output similar to the following.
$ ps
PID TTY TIME CMD
29604 pts/4 00:00:00 bash
31292 pts/4 00:00:00 ps
In the first column, we can see the process identifier. The second column gives the name of the controlling terminal. We’ll come back to that shortly. The third column gives the total execution time of the process. The fourth column gives the command name, or name of the process.
Returning to Rust, in order to replicate this behavior, your struct
is going
to contain a PID, a representation of the controlling terminal, the execution
time, and the command name to start with. We’ll worry about the details of
ps
in a subsequent part.
Because this code will be useful in multiple binaries, you’re going to write
this code in a new proc
module in the library. Let’s get started!
Your task
First, create a new module named proc
as follows:
- Add the line
mod proc;
to the top of yourlib.rs
which will now look like this.
This line informs the compiler that there’s a file namedmod proc; pub type Result<T> = std::result::Result<T, Box<dyn std::error::Error>>;
proc.rs
insrc
which contains the code for aproc
module. - Create the
src/proc.rs
file. Add the line
to it. This will let you use theuse super::Result;
Result
alias you defined inlib.rs
inproc.rs
.
Second, create a new struct to model a process. We can see from the ps
output above that we’re going to need a process identifier, a controlling
terminal, the total execution time, and the command name. I suggest starting
simple and building it once you have the basics works. With that in mind,
let’s start with a Process
structure that looks like this.
#![allow(unused)] fn main() { /// Models a Linux process. #[derive(Debug)] pub struct Process { /// Process ID. pub pid: i32, /// Command name. pub command_name: String, } }
You will add more to this structure as you continue but for now, this is enough to get started.
The next step will be to implement a function Process::for_pid()
in proc.rs
that takes a PID as an argument and returns a Result<Process>
.
#![allow(unused)] fn main() { pub type Result<T> = std::result::Result<T, Box<dyn std::error::Error>>; pub struct Process { pid: i32, command_name: String }; impl Process { /// Look up information about a running process with the given PID. pub fn for_pid(pid: i32) -> Result<Self> { let command_name = todo!("Need to look up the command name"); Ok(Self { pid, command_name, }) } } }
Recall that inside an impl
we can refer to the current type as Self
hence
Process::for_pid()
is returning a Result<Process>
. This returns a
Result<Process>
rather than a Process
because there might not be any such
process identifier and Process::for_pid()
must return an error in that case.
Linux exposes information about each running process in the procfs
file
system by exposing a virtual directory /proc/<pid>
where <pid>
is the
numeric process identifier. Run $ ls /proc
to see all of the directories
corresponding to processes, and other virtual files and directories with other
information. You already saw /proc/loadavg
but now we’ll investigate the
numbered directories.
Run $ cat /proc/self/stat
(not /proc/self/status
which is similar
information but designed for human consumption rather than programatic
manipulation). You probably got some mysterious output like
$ cat /proc/self/stat
32177 (cat) R 29604 32177 29604 34820 32177 4194304 79 0 0 0 0 0 0 0 20 0 1 0 251402402 7798784 200 18446744073709551615 94224239239168 94224239270352 140734102210720 0 0 0 0 0 0 0 0 0 17 9 0 0 0 0 0 94224241367664 94224241369280 94224242470912 140734102216746 140734102216766 140734102216766 140734102220783 0
As we’ll see, this file contains most of the information we will need. Look in the
proc(5) man page for the file /proc/pid/stat
. There are many fields
here. You’re going to pick out just the ones you need to fill out an instance
of the Process
structure.
Ideally, we’d like to read this file into a string and then split the string
on white space into the fields. Then we can parse just the fields we care
about. The one hitch here is the command name which is in the second field. It
can have spaces or newlines or even a )
character! This complicates our
task, but not by a great deal. The command name is preceded by a (
and
followed by a )
and the parentheses characters will not appear anywhere else
in this file other than the command name.
Use str::find('(')
and str::rfind(')')
to get
the indices of the first (
and last )
characters and then extract three substrings.
pub type Result<T> = std::result::Result<T, Box<dyn std::error::Error>>; fn main() -> Result<()> { // Imagine we read this string in from `/proc/32177/stat`. let pid = 32177; let stat = "32177 (odd ) proc name) R 29604 32177 29604 34820 32177 4194304 79 0 0 0 0 0 0 0 20 0 1 0 251402402 7798784 200 18446744073709551615 94224239239168 94224239270352 140734102210720 0 0 0 0 0 0 0 0 0 17 9 0 0 0 0 0 94224241367664 94224241369280 94224242470912 140734102216746 140734102216766 140734102216766 140734102220783 0"; let comm_start = stat.find('(') .ok_or_else(|| format!("Couldn't parse stat file for PID {pid}"))?; let comm_end = stat.rfind(')') .ok_or_else(|| format!("Couldn't parse state file for PID {pid}"))?; let pid_as_str = stat[..comm_start].trim(); let command_name = &stat[comm_start + 1..comm_end]; let remaining_fields = stat[comm_end + 1..].trim(); println!("Before command name: {pid_as_str}"); println!("Command name: {command_name}"); println!("After command name: {remaining_fields}"); Ok(()) }
Click the Run button to see the output.
The str::find()
and str::rfind()
methods return an Option<usize>
. There
are a bunch of methods for converting between an Option<T>
and a Result<T, E>
. The Option::ok_or_else(f)
method turns a
Some(x)
into an Ok(x)
and a None
into Err(f())
(where f
is a
zero-argument function). Read the documentation for details and examples.
The upshot that if there aren’t at least one each of (
and )
, the find()
or rfind()
methods will return None
which the ok_or_else()
will turn
into Err("Couldn't parse...")
and the ?
will return the error as usual.
I recommend creating a mutable Vec::<&str>
, pushing pid_as_str
and
command_name
, and then split the remaining_fields
on white space and use
Vec::extend()
to add the elements from the split. More
complete documentation for extend()
is
here. The result will
be a Vec
containing the fields described in the proc(5) man page.
But note that the fields in the man page are numbered starting at 1 whereas
the elements in a Vec
are numbered starting at 0!
Assuming the Vec
is named fields
, you can now construct a new instance of Process
with something like
let proc = Process {
pid: fields[0].parse()?,
command_name: fields[1].to_string(),
};
Implement the function for_pid()
.
#![allow(unused)] fn main() { pub type Result<T> = std::result::Result<T, Box<dyn std::error::Error>>; pub struct Process { pid: i32, command_name: String }; impl Process { /// Look up information about a running process with the given PID. pub fn for_pid(pid: i32) -> Result<Self> { let path = format!("/proc/{pid}/stat"); let mut fields: Vec<&str> = Vec::new(); todo!("Open path and read its contents and split into fields as described"); Ok(Self { pid: fields[0].parse()?, command_name: fields[1].to_string(), }) } } }
We’re nearly done with this part! All that remains is to make the Process
structure visible outside of the library so that you can use it in bin/ps.rs
, the new binary you’re about to create.
There are multiple ways to do this, but the way that you’re going to do it is by re-exporting the type. Add the line
pub use proc::Process;
to your lib.rs
.
It’s time to start implementing the ps
utility so create the file bin/ps.rs
. Just as with bin/runnable.rs
, Cargo will compile it into a binary named ps
.
Add the following code to your new ps.rs
.
use process::{Process, Result};
fn run() -> Result<()> {
let proc1 = Process::for_pid(1)?;
println!("{proc1:?}");
Ok(())
}
fn main() {
if let Err(err) = run() {
eprintln!("{err}");
std::process::exit(1);
}
}
Try to understand what this code is doing before running it.
If you try to run your code using $ cargo run
, you’ll get an error message.
error: `cargo run` could not determine which binary to run. Use the `--bin` option to specify a binary, or the `default-run` manifest key.
available binaries: runnable, ps
This makes sense. You now have two binaries, runnable
and ps
and cargo
doesn’t know which one you want to run unless you tell it. You can use $ cargo run --bin ps
to select the binary you want. This is a little tedious. Since you’re going to be working with ps
for the rest of the lab, I suggest making that the default binary to run.
Edit Cargo.toml
and add the line default-run = "ps"
in the package settings. For reference, the package settings in my solution looks like this
[package]
name = "process"
version = "0.1.0"
edition = "2021"
description = "ps - report process status"
default-run = "ps"
$ cargo run
now runs ps
by default.
If all has gone according to plan, you should see
Process { pid: 1, command_name: "systemd" }
Part 5. More processing (40 points)
In this part, you’ll add more fields to your Process
structure and
implementing a few methods. The next part will explain the meaning of the
fields in more detail.
Your task
Add the following fields to your Process
structure
- process state
char
- parent process ID
i32
, - session ID
i32
, - controlling terminal
i32
, - total execution time
i64
(noti32
), - command line
Vec<String>
, and - a user ID
u32
You can pick whatever names you like. I chose to name my structure fields after the field names given in the proc(5) man page, when applicable.
The process state, parent process ID, session ID, and controlling terminal are
parsed directly from fields named state
, ppid
, session
, and tty_nr
.
The total execution time comes from adding the results of parsing the fields
utime
and stime
as i64
. So something like this.
let utime: i64 = fields[14].parse()?;
let stime: i64 = fields[15].parse()?;
let execution_time = utime + stime;
The command line and user ID are slightly more complicated.
The command line (when available) is stored in /proc/<pid>/cmdline
. Search
the proc(5) man page for /proc/pid/cmdline
(only the first paragraph
of its description is important for us here).
Recall that a C string ends with a single 0 byte to indicate the end of the
string. The same convention is happening here. So /proc/<pid>/cmdline
consists of strings terminated by a 0.
Fortunately, Rust is awesome and comes with a standard library function for
splitting “string data that is terminated, rather than separated by a
pattern.” That function is str::split_terminator().
So read the file to a string and then use split_terminator()
with a pattern
of '\0'
, a zero character.
The user ID (often abbreviated UID) of the user who is running the process is
not stored in any of the virtual files. Instead, Linux exposes that
information as the user who owns the files in /proc/<pid>/
. You have
already opened up /proc/<pid>/stat
so we can simply ask for the UID.
#![allow(unused)] fn main() { use std::fs::File; use std::os::unix::fs::MetadataExt; pub type Result<T> = std::result::Result<T, Box<dyn std::error::Error>>; fn run() -> Result<()> { let path = "/proc/1/stat"; let mut file = File::open(path) .map_err(|err| format!("{path}: {err}"))?; let metadata = file.metadata() .map_err(|err| format!("Couldn't get metadata for {path}: {err}"))?; let uid = metadata.uid(); println!("The UID that owns {path} is {uid}"); Ok(()) } let _ = run(); }
Four more methods to implement. One is involved, three are easy.
#![allow(unused)] fn main() { pub type Result<T> = std::result::Result<T, Box<dyn std::error::Error>>; struct Process; impl Process { /// Look up information about a running process with the given PID. pub fn for_pid(pid: i32) -> Result<Self> { todo!("You did this already") } /// Look up information for the current process. pub fn for_self() -> Result<Self> { todo!() } /// Returns a list of all running processes. pub fn all_processes() -> Result<Vec<Self>> { todo!() } /// Returns `true` if the process is a session leader. pub fn is_session_leader(&self) -> bool { todo!() } /// Returns `true` if the process has a controlling terminal. pub fn has_tty(&self) -> bool { todo!() } } }
The eagle-eyed among you may have noticed a symbolic link /proc/self
.
This always points to /proc/<pid>
where <pid>
is the process’s own PID.
However, there’s an easier way to get a process’s identifier for the current process than reading the target of the symbolic link.
#![allow(unused)] fn main() { let pid = std::process::id(); println!("The current process ID is {pid}"); }
The Process::for_self()
function is easy if you use std::process::id()
and
Self::for_pid()
. The one minor catch is std::process::id()
returns a
u32
and we want an i32
. We can cast a u32
into an i32
.
#![allow(unused)] fn main() { let x: u32 = 1; let y: i32 = x as i32; println!("{x} {y}"); }
The last two methods you have to implement, is_session_leader()
and has_tty()
are convenience
methods you’ll use later. The meanings will be explained in the next part. For
right now, a process is a session leader if and only if its PID matches its
session id. A process has a controlling terminal if and only if its
controlling terminal field is not 0.
The final function, Process::all_processes()
is more complicated. You need
to examine the contents of the /proc
directory and find all of the
directories whose names are numbers.
Here’s some example code for reading the contents of the `/proc directory and printing the names of all directories contained within.
#![allow(unused)] fn main() { use std::fs; pub type Result<T> = std::result::Result<T, Box<dyn std::error::Error>>; fn go() -> Result<()> { for entry in fs::read_dir("/proc")? { // Unwrap the result and return any errors that result from reading /proc. let entry = entry?; // If the entry is not a directory or reading the metadata failed, continue. match entry.metadata() { Ok(metadata) if metadata.is_dir() => (), _ => continue, } let file_name = entry.file_name(); if let Some(name) = file_name.to_str() { println!("{name}") } } Ok(()) } let _ = go(); }
There are lots of ways to determine if a string is a number, but the easiest
is probably to try to parse it as an i32
using the slightly unpleasant “turbofish” syntax:
#![allow(unused)] fn main() { println!("{}", "1234".parse::<i32>().is_ok()); println!("{}", "12e4".parse::<i32>().is_ok()); }
Once you find a directory in /proc
whose name is a number, you can use
Process::for_pid()
to get information about that process.
After reading the /proc
directory and learning that there’s a process with
PID 12345, for example, you might call Process::for_pid(12345)
and this
could fail! If the process has exited between you learning of its existence
and you reading its /proc
entry, the Linux kernel will have deleted its
/proc
entry and the File::open()
will fail. If this happens,
Process::all_processes()
should continue finding the other processes rather
than returning an error.
So in particular, you do not want to have code like this.
let mut result: Vec<Process> = Vec::new();
for entry in fs::read_dir("/proc")? {
// ...
let proc = Process::from_pid(pid)?; // <-- Don't do this here!
result.push(proc);
}
The ?
will cause an early return on an error and we don’t want that.
Test that your code is working by changing ps.rs
’s main
to iterate over
all processes and print out the debug representation.
Part 6. Implementing ps
(30 points)
Click the link and read the description of ps
as defined by the POSIX
standard. (The Portable Operating System Interface, or POSIX, is a standard
that various Unix-like operating systems including Linux distributions, BSD
distributions, macOS, and others follow to a greater or lesser extent.)
There are many different versions of ps
. Most support the options and output
format specified in the POSIX standard as well as their own, nonstandard
options. We’re going to restrict ourselves to a subset of POSIX.
Your task
Implement a subset of the ps
utility functionality. In particular, your
implementation needs to support the following command-line arguments.
-a
Write information for all processes with controlling terminals. Omit processes for whichis_session_leader()
returnstrue
.-A
/-e
Write information for all processes.-d
Write information for all processes, except session leaders.-f
Generate a full listing.-l
Generate a long listing.
The -e
option and the -A
option are the same. You can support this in clap
via
#[derive(Debug, Parser)]
#[command(author, version, about = "ps - report process status", long_about = None)]
struct Config
/// Write information for all processes.
#[arg(short = 'A', short_alias = 'e')]
all: bool,
// ...
}
in your Config
structure. (Review the Lab 4 for a refresher on
how to parse command-line arguments with clap
.)
Notice that none of these options have equivalent long options. POSIX command line utilities tend not to.
The -a
, -A
, -d
, and -e
flags select which processes get displayed
whereas the -f
and -l
options control what information about the
selected processes to display.
Given a particular Process
, you can decide if it should be printed by
following this simple algorithm:
- If the
-A
or-e
flag was passed, then print the process. - If the
-a
flag was passed and the process has a controlling terminal and the process is not a session leader, then print the process. - If the
-d
flag was passed and the process is not a session leader, then print the process. - Otherwise if none of
-a
,-A
,-d
, and-e
were passed and the process’s user ID is the same as the user ID of theps
process and the process’s controlling terminal is the same as the controlling terminal of theps
process, then print the process.
Remember that you can use Process::for_self()
to get both the current user
ID and controlling terminal.
I recommend implementing the algorithm to select the processes to print first
and then print out the debug representation of the Process
. Your printed
output will differ, but the processes that you select for printing should
match the behavior of the real ps
when called with some of -a
, -A
, -d
,
and -e
. Make sure you try combinations of arguments and none at all to make
sure you’re getting the same number of processes.
Next, implement printing out the table that ps
prints out. The spacing of
the columns of output doesn’t need to match the spacing of the real system. So
try to pick field widths large enough to contain the values in the table. (The
real ps
does a poor job of this because it expects some fields, like user
IDs to be short.)
There are 15 fields that ps
will print about each process (subject to the
-l
and -f
options). These are listed in the STDOUT
section of the
ps standard. We will only support a subset of these, namely
S
,UID
,PID
,PPID
,TTY
,TIME
, andCMD
.
Look at the table in the man page to see which fields get printed when the
-l
and/or -f
options are present.
You should print the fields in the same order as shown but only if the
appropriate flag was passed as indicated in the middle column of the table in
the STDOUT
section of the ps standard. For example, the state field,
S
, is only displayed if -l
is passed.
For this part, make sure you can print out the S
, UID
(as a number),
PID
, PPID
, and CMD
fields. We’ll tackle the TTY
, TIME
, and
non-numeric UID
in the next lab. In the mean time, print out the UID as
numeric, even with -f
, print out the process’s tty_nr
field for TTY
and
print the total execution time as an integer for the time.
Here are some examples.
$ cargo run --quiet --
PID TTY TIME CMD
11050 34816 32 ps
29327 34816 55 bash
$ cargo run --quiet -- -l
S UID PID PPID TTY TIME CMD
R 1425750506 11135 29327 34816 29 ps
S 1425750506 29327 29326 34816 57 bash
$ cargo run --quiet -- -f
UID PID PPID TTY TIME CMD
1425750506 11141 29327 34816 28 target/debug/ps -f
1425750506 29327 29326 34816 58 -bash
$ cargo run --quiet -- -fl
S UID PID PPID TTY TIME CMD
R 1425750506 11155 29327 34816 28 target/debug/ps -fl
S 1425750506 29327 29326 34816 60 -bash
(You can see the execution time for that bash
process creep up from 57 to 58
to 60 as I run commands because Bash has to wake up, process the command line,
arrange for Linux to start running the command, and after the command runs,
print a prompt and go back to sleep waiting for input. As you can see in the
real output below, the actual execution time for bash after that last one is
less than 1 second. In fact, it was about 0.6 seconds. We’ll see how to
determine this in the next lab.)
For comparison, here’s the real output with those options.
$ ps
PID TTY TIME CMD
8637 pts/0 00:00:00 ps
29327 pts/0 00:00:00 bash
$ ps -l
F S UID PID PPID C PRI NI ADDR SZ WCHAN TTY TIME CMD
0 R 1425750506 8641 29327 0 80 0 - 7223 - pts/0 00:00:00 ps
0 S 1425750506 29327 29326 0 80 0 - 9167 wait pts/0 00:00:00 bash
$ ps -f
UID PID PPID C STIME TTY TIME CMD
steve 8645 29327 0 12:57 pts/0 00:00:00 ps -f
steve 29327 29326 0 Jul31 pts/0 00:00:00 -bash
$ ps -fl
F S UID PID PPID C PRI NI ADDR SZ WCHAN STIME TTY TIME CMD
0 R steve 8651 29327 0 80 0 - 9344 - 12:57 pts/0 00:00:00 ps -fl
0 S steve 29327 29326 0 80 0 - 9167 wait Jul31 pts/0 00:00:00 -bash
Note that the CMD
field displays the command_name
unless -f
was passed
in which case it shows all of the command line arguments, if it can. If -f
was passed but /proc/<pid>/cmdline
was an empty file, your
process.command_line
field will be empty. In this case, the standard says to
print the command name in brackets. You might want some code like this.
#![allow(unused)] fn main() { struct Config { full: bool } struct Process { command_name: String, command_line: Vec<String> } let config = Config { full: true }; let process = Process { command_name: String::new(), command_line: Vec::new() }; let cmd: String = if !config.full { process.command_name } else if process.command_line.is_empty() { format!("[{}]", process.command_name) } else { process.command_line.join(" ") }; }
If your code is working as described up to this point (congratulations!), you’re done and can submit it by following the instructions. You don’t need to keep reading.
However, I want to explain some of the terms you encountered regarding processes. Specifically, “session” and “controlling terminal.” What the heck are these? Read on!
In some of the early days of computing, a user would interact with a computer
by sitting down at a computer
terminal which consisted of
a keyboard and a display. Earlier terminals contained a printer rather than a
display and output was displayed to the user by printing it! Click on that
link to look at some images of computer terminals on Wikipedia. Note in
particular that a hard-copy terminal (one that printed the output) was called
a teletypewriter or TTY. This terminology persists today. The TTY in the
ps
output refers to exactly this. But that’s getting ahead of our story here.
If you think about multiple users using hardware terminals to interact with the same computer, it’s clear that input from the keyboard in one terminal should only go to certain processes on the computer, namely the ones that particular user is running from that terminal. By the same token, any output from those processes should go back to the same terminal.
We no longer use hardware terminals any longer, of courrse. Instead, we use terminal emulation programs that are a software implementation of the hardware terminals.
The notion of sessions and controlling terminals arose out of the need to keep input/output associated with the correct processes. These concepts were standardized in the POSIX standard in the General Terminal Interface portion of the standard. Read that link if you want all of the details.
In fact, there’s a third concept, a process group, which is essential to understand for this as well. I’ll layout the terminology first and then explain what happens when a user interacts with the system.
- Process – A running instance of a computer program. Every process has a
process ID (PID), a parent process ID (PPID), a process group ID (PGID), and
a session ID (SID), and a controlling terminal (TTY). All of these IDs are
just integers and are present in the
/proc/<pid>/stat
file as you saw. - Process group – A group of related processes that share a PGID. A
process group is usually created by the shell for each pipeline. E.g., if
you run
$ sort foo | uniq -c | sort -rn
,bash
will create a new process group containing 3 processes, the twosort
processes and theuniq
process. - Session – A set of process groups. Every process in each process group in a session has the same SID and the same TTY (or they all have none).
- Controlling terminal – A controlling terminal is an abstraction. It’s just an integer that the operating system kernel, Linux in this case, uses to represent a hardware of software terminal. Every controlling terminal has a foreground process group.
- Foreground process group – Definitionally, a foreground process group is just the process group that is the foreground process group for the controlling terminal of the process group’s member processes. But more importantly, only processes in the foreground process group of a controlling terminal can receive input from that terminal.
- Process group leader – A process is the process group leader of its process if and only if its PID is equal to its PGID.
- Session leader – A process is the session leader of its session if
and only if its PID is equal to the SID. We saw this in the
Process::is_session_leader()
function in Part 4.
These all fit together roughly like this.
The user sits down at the terminal and logs in. The shell, let’s say bash
,
is started. Initially, the bash
process is the only process in its process
group and its process group will be the only process group in its session.
Thus bash will be both its process group’s leader and its session’s leader. We
can get the real ps
to print out the various identifiers using the -o
option.
Here’s the real output from ps
showing bash
as the process group leader
and the session leader.
$ ps -o pid,pgid,sid,tname,command
PID PGID SID TTY COMMAND
29327 29327 29327 pts/0 -bash
(I removed the output for the ps
process itself.) Notice that the PID, PGID,
and SID are all the same.
Let’s run a pipeline and look at the output from ps
. On one terminal, I’m
going to run $ sort | uniq -c | sort -nr
. The first sort
will sleep
waiting for input from the terminal. On a second terminal, I’m going to run
ps
and use one of the nonstandard options -s
to select only the processes
that match the SID of the bash process shown above and it’s also printing out
the parent process ID (PPID).
$ ps -s 29327 -o pid,ppid,pgid,sid,tname,command
PID PPID PGID SID TTY COMMAND
17314 29327 17314 29327 pts/0 sort
17315 29327 17314 29327 pts/0 uniq -c
17316 29327 17314 29327 pts/0 sort -nr
29327 29326 29327 29327 pts/0 -bash
The key things to notice are
bash
is the parent process of the other three (i.e., the PPID of thesort
anduniq
processes is the PID ofbash
);- All three processes that were created for the pipeline have the same
process group ID (PGID), 17314, which is different from the PGID of
bash
; and - The first
sort
process is the process group leader of its group (i.e., its PID is equal to its PGID, 17314). - All four processes have the same TTY (which we know they must because they all have the same SID).
I said above that only processes in the foreground process group of a
controlling terminal are allowed to read input from the terminal. We can ask
ps
to display the process group ID of the foreground process group of the
controlling terminal of each process, if any. (That’s a mouthful. I mean
something like this pseudocode:
process.controlling_terminal().foreground_process_group().id()
.)
$ ps -s 29327 -o pid,ppid,pgid,sid,tgid,tname,command
PID PPID PGID SID TPGID TTY COMMAND
17314 29327 17314 29327 17314 pts/0 sort
17315 29327 17314 29327 17314 pts/0 uniq -c
17316 29327 17314 29327 17314 pts/0 sort -nr
29327 29326 29327 29327 17314 pts/0 -bash
The TPGID column holds the process group ID of the active foreground process
group for the controlling TTY. All four processes have the same TTY, symbolic
name pts/0
, and thus all have the same TPGID. Since foreground process group
of pts/0
has ID 17314, we can see that it’s only the sort | uniq | sort
pipeline which is able to receive input from the terminal. In particular,
bash
cannot do so at this time.
I had to log in to use a second terminal to run these commands because, as
we’ve just discussed, the process group for the pipeline is the only one that
can receive input from its controlling terminal right now. If I print out all
of the processes belonging to my user (using the nonstandard -u
option)
rather than just the processes associated with the session we’ve been looking
at so far, I get this (I cut off the right side of ps
’s output as it’s long
and unimportant).
$ ps -u steve -o pid,ppid,pgid,sid,tpgid,tname,command
PID PPID PGID SID TPGID TTY COMMAND
17314 29327 17314 29327 17314 pts/0 sort
17315 29327 17314 29327 17314 pts/0 uniq -c
17316 29327 17314 29327 17314 pts/0 sort -nr
18314 29326 18314 18314 18408 pts/5 -bash
18408 18314 18408 18314 18408 pts/5 ps -u steve -o pid,ppid,pgid,
23147 1 22908 22908 -1 ? sh /usr/users/noquota/faculty
23157 23147 22908 22908 -1 ? /net/storage/zfs/faculty/stev
23218 23157 22908 22908 -1 ? /net/storage/zfs/faculty/stev
24880 1 24880 24880 -1 ? /lib/systemd/systemd --user
24882 24880 24880 24880 -1 ? (sd-pam)
29326 29306 29306 29306 -1 ? sshd: steve@pts/0,pts/5
29327 29326 29327 29327 17314 pts/0 -bash
29401 29326 29401 29401 -1 ? -bash
29477 29401 29401 29401 -1 ? bash
29529 23157 22908 22908 -1 ? /net/storage/zfs/faculty/stev
29540 23157 22908 22908 -1 ? /net/storage/zfs/faculty/stev
29596 29529 22908 22908 -1 ? /net/storage/zfs/faculty/stev
29604 23218 29604 29604 29604 pts/4 /bin/bash --init-file /net/st
29673 29529 22908 22908 -1 ? /usr/users/noquota/faculty/st
29723 29673 22908 22908 -1 ? /net/storage/zfs/faculty/stev
There’s a lot of output here, but some things to notice:
- I’m actually connected to 3 different TTYs,
pts/0
,pts/4
, andpts/5
. All three of these come fromssh
ing into a Linux server. The processes onpts/4
are actually from Visual Studio Code on my Macssh
ing to the server. - Some of these processes don’t have a controlling terminal, it shows
?
instead. These processes will never read or write to a terminal. Most of these were started by Visual Studio Code and are things like therust-analyzer
VS Code extension running in the background.
Once the pipeline ends (all of the processes in it have completed), bash
will arrange for its process group to once again be the foreground process
group for its controlling terminal so that it can print out the prompt and
wait for the next line of input.
In the next lab, you will implement the kill
command-line utility which can
terminate individual processes by PID or entire process groups by PGID. For
example, I can kill the entire sort | uniq | sort
pipeline’s process group.
To distinguish between a PID and a PGID, we use a negative value to indicate
the PGID and a positive value to indicate the PID. This command kills every
process in process group 17314, namely the pipeline.
$ kill -TERM -17314
For the final time, let’s print out details about the processes in that first session.
$ ps -s 29327 -o pid,ppid,pgid,sid,tgid,tname,command
PID PPID PGID SID TGID TTY COMMAND
29327 29326 29327 29327 29327 pts/0 -bash
And there we go. We’re back to a single process, bash
, in a single process
group in the session. This process group is the controlling terminal’s
foreground process group.
I didn’t show an example, but bash
can create background process groups,
change up which group is the foreground process group and perform a wide
variety of so-called “job control.” There’s a bunch of information in the
Bash
manual.
I almost never avail myself of this functionality, but in specific
circumstances, it can be useful.
Hopefully, you now have a better understanding of the somewhat unusual
behavior of ps
with respect to which processes it prints information about
in response to -a/-A/-d/-e
. To recap:
- If none of those options are passed,
ps
prints out just the processes running in the current terminal. This makes historical sense: a user would typically only be logged in to a single terminal so it makes sense to show only those processes by default. - The
-a
and-d
options show all processes except for session leaders (and-a
also omits processes without a controlling terminal). Why not show session leaders? My best guess is because this is basically always going to be the shell and you’re not likely to want to see a bunch ofbash
instances and if you really want to, you can use-A
or-e
.
If you read this far, I’m impressed! I think understanding the low-level details of how systems work is really cool. This is one of those super important, but not well understood areas of computers. Now, having read this, you’re better informed about how processes and terminals work on Unix-like systems, including macOS and Linux, than almost every single computer programmer.
Submitting (5 points)
To submit your lab, you must commit and push to GitHub before the deadline.
Your repository should contain the following files and directory.
repo
├── Cargo.lock
├── Cargo.toml
├── README
└── src
├── bin
│ ├── ps.rs
│ ├── runnable.rs
├── lib.rs
└── proc.rs
Any additional files you have added to your repository should be removed from
the main
branch. (You’re free to make other branches, if you desire, but
make sure main
contains the version of the code you want graded.)
The README
should contain the names of both partners (or just your name if
you worked alone…but please don’t work alone if you can manage it).
After you have everything working, make sure you add your code, commit it, and push it to GitHub.
Lab 7: Processes II
Due: 2024-11-04 at 23:59
In this lab, you’re going to continue working on your process
library and
ps
binary from last time.
In this lab, you will learn
- how to make system calls; and
- how to call functions in the C standard library to interact with the operating system.
Like Lab 6, this one is going to be specific to Linux.
Preliminaries
Reuse your assignment repository from the previous lab.
Be sure to ask any questions on Ed.
Compiler warnings and errors
Make sure your code compiles and passes clippy
without errors or warnings.
That is, running
$ cargo clean
$ cargo build
$ cargo clippy
should build your program without errors or warnings.
Formatting
Your code must be formatted by running cargo fmt
.
Part 1. Calling C functions (5 points)
We saw last time (and will see a little more this time) how Linux uses the
procfs virtual file system (which is usually mounted at /proc
) to expose
information to user processes.
This is not the only mechanism that operating systems use. Every modern OS works by enforcing a strict separation between user processes (the ones we run) and the operating system kernel. This is essential to prevent, among other things, buggy programs from crashing the whole computer. Unfortunately, it comes at a cost. If a user process wants information from the OS, it cannot simply call a function in the kernel. Instead, it makes a system call.
System calls are a mechanism for requesting the OS do something on your behalf. Some examples of system calls in different application domains:
- Open a file
- Read data from a file
- Write data to a file
- Get the current time of day
- Run a new program (more about this in a later lab)
- Terminate a program (we’re going to do this!)
- Generate random numbers in a secure fashion
- Open a network connection to a server (more about this in a later lab)
- Send or receive data over the network to/from the server
- Exit the current process
Every program you’ve written, has used at least one system call.
Most of the time, programmers do not issue system calls themselves. The reason for this is that the system call mechanism is not portable, meaning it’s different on every OS. Nevertheless, we can write code like
use std::fs::File; fn main() -> Result<(), Box<dyn std::error::Error>> { let mut file = File::open("example.txt")?; Ok(()) }
and it can be compiled for any (modern) OS and it’ll open the file.
The way this works is that every OS exposes a set of functions in a library
which will make the relevant system calls. For example, Apple calls their
library libSystem
. Even if the underlying system calls themselves change, as
part of an update for example, the OS vendor (e.g., Micorsoft or Apple)
updates this library to make the updated system calls on the program’s behalf.
Now we hit a bit of a snag. There are a million programming languages. Each
language can have wildly different ideas about what a function is. One way to
deal with this would be to have a library per programming language. E.g., we
could have a libSystem-Python
, libSystem-Java
, libSystem-Rust
, etc. This
quickly becomes unworkable.
Instead, the C programming language is the de facto programming language for
these libraries. And indeed, the most common name for the library containing
functions that make system calls is libc
. On an x86-64 Linux system, you’re
likely to find the GNU version of libc
living at
/lib/x86_64-linux-gnu/libc.so.6
.
This is really unusual, but you can actually run this libc
as if it were a
program and it’ll print out some information about the library. Almost no
library does this.
$ /lib/x86_64-linux-gnu/libc.so.6
GNU C Library (Ubuntu GLIBC 2.27-3ubuntu1.6) stable release version 2.27.
Copyright (C) 2018 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.
There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A
PARTICULAR PURPOSE.
Compiled by GNU CC version 7.5.0.
libc ABIs: UNIQUE IFUNC
For bug reporting instructions, please see:
<https://bugs.launchpad.net/ubuntu/+source/glibc/+bugs>.
Okay, so if C is the de facto language, how can we make system calls in Rust if the Rust standard library doesn’t already have the functionality we need? Well, we call C functions! Nearly every programming language has some mechanism, often called a Foreign Function Interface (FFI), that lets you call C functions. Rust is no exception.
In Rust, the functions in libc
are exposed in the libc
crate. Here’s the
documentation
for x86-64 Linux.
As part of your implementation of ps
, you printed out the user id (UID) of
the user that is running the process. You’re going to write some code to turn
the UID into a String
containing the user’s username.
Your Task
You’re going to implement the most bare-bones version of the standard whoami
command-line utility that prints out the user name of the user who runs the
process. E.g., here’s the standard whoami
on my laptop.
$ whoami
steve
Pretty straight forward. Unfortunately, there’s no Rust standard library function to get this information. We’re going to have to write it ourselves!
First, add the libc
crate as a dependency by running
$ cargo add libc
in your assignment repository.
Next, create a file src/bin/whoami.rs
. By putting the file in bin
, $ cargo build
will build two different applications and $ cargo run
now needs
to know which binary to run. Write a simple main
function in whoami.rs
that prints out a message and exits. You can (build and) run this via
$ cargo run --bin whoami
Hello world!
If you have two or more binaries and you run $ cargo run
without specifying
which to run, you’ll get an error.
Getting the user name for the user who is running the current process requires two steps:
- Ask the OS for the UID of the user
- Ask the OS for the username associated with the UID.
The C function we need to call for 1 is
geteuid().
If you click that link, you’ll find no real documentation. That’s a shame, but
not surprising as the libc
crate is generated programatically and it doesn’t
include documentation. Fortunately, there are manual pages for C functions.
Here’s the man page for
geteuid(2)
. Give
it a read. (Note that it is in section 2 of the manual. Section 1 is where user
programs like cat
are documented. Section 2 is where system calls are
documented. Section 3 is where C standard library functions are documented.
Run $ man man
for a list of the different sections.)
There are two functions documented, getuid()
which returns the real UID (we
don’t want this), and geteuid()
which returns the effective UID (this is the
one we want). These are usually the same. The most common way they can differ
is if you run a process as the root user (which has UID 0) via the sudo
command. In that case, geteuid()
will return 0
and getuid()
will return
the real user. We want the effective one instead:
$ whoami
steve
$ sudo whoami
root
Every function in libc
is marked unsafe
. This indicates that the function
might do something that violates Rust’s safety guarantees and it’s up to the
programmer to ensure that it doesn’t. We can only call an unsafe
function
within an unsafe
function or within an unsafe {}
block.
Because of this, it is considered best practice to encapsulate anything unsafe behind a safe interface. It’s up to you to perform error handling.
The geteuid()
function is unusual in that it cannot fail. There is no return
value that indicates an error condition (I know this because I read the man
page, “These functions are always successful…”). As a result, we can
trivially make a safe wrapper around geteuid
by calling libc::geteuid()
inside an unsafe {}
block and returning the result.
extern crate libc; type Result<T> = std::result::Result<T, Box<dyn std::error::Error>>; fn geteuid() -> u32 { unsafe { libc::geteuid() } } fn main() { let uid = geteuid(); println!("The current effective UID is {uid}"); }
There are a handful of libc
functions like getpid()
, getppid()
, and
getpgrp()
which just return an numeric identifier and can be handled by just
wrapping the call in an unsafe {}
block. Most system calls are sadly not
this easy to deal with. We’ll need to carefully handle memory and any errors
that result.
Speaking of other system calls, you now know how to get the effective UID of the user running the process. Neat!
Turning that into a user name is more complex and can fail.
Create a new file src/user.rs
.
Part 2. Looking up users (20 points)
In this part, you’re going to implement a function username()
which takes a
user ID (UID) and returns a string containing the username.
The libc function you called in Part 1 was very simple: It took no arguments and it could never fail. The libc function you’ll call in this part is much more complicated.
POSIX-compatible operating systems provide functions to get get information
about a user. Take a look at the man page for
getpwnam_r(3)
(this is in section 3 of the manual which is the “Library Functions Manual”).
The four functions described all behave similarly: They look up a user in the
user database. getpwnam()
, and getpwnam_r()
look up a user given a user
name whereas getpwuid()
and getpwuid_r()
look up a user given a UID.
The information returned by these functions is a passwd
structure. In C, this structure looks like this.
struct passwd {
char *pw_name; /* username */
char *pw_passwd; /* user password */
uid_t pw_uid; /* user ID */
gid_t pw_gid; /* group ID */
char *pw_gecos; /* user information */
char *pw_dir; /* home directory */
char *pw_shell; /* shell program */
};
Here’s the same definition in
Rust,
from the libc
crate.
#![allow(unused)] fn main() { type c_char = i8; type uid_t = u32; type gid_t = u32; #[repr(C)] pub struct passwd { pub pw_name: *mut c_char, pub pw_passwd: *mut c_char, pub pw_uid: uid_t, pub pw_gid: gid_t, pub pw_gecos: *mut c_char, pub pw_dir: *mut c_char, pub pw_shell: *mut c_char, } }
The interesting types here are the string type here is *mut c_char
. This is
a raw mutable pointer to a C character (a character in C is just a byte). In
fact, it’s a raw mutable pointer to a sequence of nonzero characters followed
by a zero character (meaning a byte with value 0, '\0'
rather than the
digit '0'
which has value 48).
A raw pointer in Rust is essentially the same thing as a reference except without the safety guarantees: In particular, a pointer need not point at a valid, initialized object of its type whereas a reference always points to a valid, initialized object.
Just as there are two types of references, &T
, and &mut T
, there are two
types of pointers, *const T
and *mut T
. A const
pointer is used when the
thing it is pointing to will not change (similar to &
) whereas a mut
pointer is used when the thing it is pointing to may be changed (similar to
&mut
).
It’s always safe to create a *const T
from a &T
or a *mut T
from a &mut T
.
Here’s an example.
#![allow(unused)] fn main() { let x: i32 = 10; let x_ptr = &x as *const _; let mut y: i32 = 20; let y_ptr = &mut y as *mut _; println!("Address of x: {x_ptr:?}"); println!("Address of y: {y_ptr:?}"); }
If you click the run button, you’ll see that addresses are just numbers. If you think of your computer’s memory as one giant array of bytes ranging from index 0 up to some n, then a pointer to some value in memory is just the index of the first byte of the value in that big array of bytes. (This isn’t completely true for several reasons. Nevertheless, if you think of a pointer as just the index of the first byte of your value, your mental model of how pointers work will be 95% correct.)
The operation that isn’t safe is dereferencing the pointer. Here’s an example:
#![allow(unused)] fn main() { let mut x: i32 = 10; let x_ref = &mut x; let y = *x_ref; // Dereferencing the reference, always safe. Creates y with value 10. *x_ref = 20; // Dereferencing the reference, always safe. Assigns 20 to x. let x_ptr = x_ref as *mut _; // Dereferencing the pointer is safe in this case because it points to // a valid, initialized `i32`. But dereferencing in general is unsafe because // the pointer need not be valid. // Creates z with value 20. let z = unsafe { *x_ptr }; // Again, dereferencing is safe in this case, but not in general. // Assigns 40 to x. unsafe { *x_ptr = 40; } println!("x = {x}\ny = {y}\nz = {z}"); }
Click the Run button.
The getpwuid_r()
function is quite complicated. The Rust binding to the C function is the following.
pub unsafe extern "C" fn getpwuid_r(
uid: uid_t,
pwd: *mut passwd,
buf: *mut c_char,
buflen: size_t,
result: *mut *mut passwd
) -> c_int
The parameters match the parameters for the C function and are as follows.
uid
— The user ID of the user to look up;pwd
— A (mutable) pointer to apasswd
struct that will be filled out bygetpwuid_r()
;buf
- A buffer to hold the contents of string data (see below);buflen
— The length of thebuf
buffer; andresult
— On success,*result
will set topwd
and on failure tostd::ptr::null_mut()
, a null pointer.
The passwd
structure contains pointers to string data. The getpwuid_r()
function needs to put that string data somewhere. It uses the provided buf
as a place to store the strings.
Your task
Your task is to create a new user
module and implement a fn username(uid: u32) -> String
function that calls getpwuid_r()
to retrieve the user name
of the user corresponding to uid
.
Start by creating a new file src/user.rs
and add the user
module to lib.rs
with
pub mod user;
Then add
use std::ffi::CStr;
use std::ptr;
use libc;
pub fn username(uid: u32) -> String {
unimplemented!()
}
to user.rs
.
It is time to implement username()
. Implementing it is a little tricky. You’re going to need to
- Create a
passwd
local variable with all of the fields set to either 0 or nulllet mut pwd = libc::passwd { pw_name: ptr::null_mut(), // fill out the rest of the fields };
- Create a
Vec<libc::c_char>
for thebuf
argumentlet mut buf: Vec<libc::c_char> = vec![0; 1024];
- Create a pointer to a
libc::passwd
namedresult
that will be set bygetpwuid_r()
to the address ofpwd
on success.let mut result: *mut libc::passwd = ptr::null_mut();
- Call
getpwuid_r()
, passing the appropriate arguments.let ret = unsafe { libc::getpwuid_r( uid, &mut pwd as *mut _, buf.as_mut_ptr(), buf.len(), &mut result as *mut _, ) };
- If
ret
islibc::ERANGE
, then the provided buffer wasn’t large enough to hold the string data. Resizebuf
(something likebuf.resize(buf.len() * 2)
) and callgetpwuid_r()
again. Steps 4 and 5 should be in a loop and and you should break out of the loop ifret
is any other value. - If
ret
is any nonzero value (other thanlibc::ERANGE
which you handled in step 5) or ifresult.is_null()
, then either thegetpwuid_r()
call failed or there was no user with that UID. In either case, convert the UID to a String (uid.to_string()
) and return it. - At this point, everything has succeeded and we just need to convert
from a C string into a Rust
String
and return it. For this, we can use a&CStr
reference which behaves like a&str
for a C string.// UNSAFE: This is safe because ret is 0 and result is not null and // therefore pwd was filled out. Additionally, pwd.pw_name is a valid, // NUL-terminated C-string. let username: &CStr = unsafe { CStr::from_ptr(pwd.pw_name) }; // Convert CStr into a String and return it. username.to_string_lossy().into_owned()
Finally, modify the main()
function in whoami.rs
to call
user::username(uid)
with the UID that results from libc::geteuid()
. You’ll
want to add use process::user;
to whoami.rs
to be able to call functions
in the user
module.
If all has worked correctly, running cargo run --bin whoami
should print out your username.
$ cargo run --quiet --bin whoami
steve
Part 3. Adding usernames to ps (10 points)
Now that user::username()
has been implemented, modify your ps
to use user::username()
.
Your task
Modify your implementation of ps
to print out a left-aligned column with the
username when the -f
option is passed and a right-aligned column with the
UID when -l
but not -f
is passed.
$ cargo run --bin ps --quiet -- -f
UID PID PPID TTY TIME CMD
steve 1519068 1518177 34816 0 /bin/bash --init-file /zfs/faculty/steve/.vscode-server/bin/f1b07bd25dfad64b0167beb15359ae573aecd2cc/out/vs/workbench/contrib/terminal/browser/media/shellIntegration-bash.sh
steve 1521087 1519068 34816 0 target/debug/ps -f
$ cargo run --bin ps --quiet -- -l
S UID PID PPID TTY TIME CMD
S 1425750506 1519068 1518177 34816 0 bash
R 1425750506 1521098 1519068 34816 0 ps
$ cargo run --bin ps --quiet -- -fl
S UID PID PPID TTY TIME CMD
S steve 1521440 1518177 34819 0 /bin/bash --init-file /zfs/faculty/steve/.vscode-server/bin/f1b07bd25dfad64b0167beb15359ae573aecd2cc/out/vs/workbench/contrib/terminal/browser/media/shellIntegration-bash.sh
R steve 1521657 1521440 34819 0 target/debug/ps -fl
Part 4. Execution times (15 points)
You’re nearly finished with ps
. Two tasks remain. The first is making the
execution time of a process print out in the [DD-]hh:mm:ss
format. Meaning
that if the process has been running for less than a day, the hours, minutes,
and seconds are printed out (two digits each) and if the process has been
running for longer than a day, then the number of days are printed out
followed by a dash and then hours, minutes, and seconds. For example,
00:05:23
for a process that has been running for 5 minutes and 23 seconds
and 03-15:59:21
for a process that has been running for 3 days, 15 hours, 59
minutes, and 21 seconds.
The process for computing the execution time has two parts: First, you need to get the execution time in seconds and second, you need to convert that into days, hours, minutes, and seconds. The second part is standard. The first part isn’t difficult, but isn’t obvious.
Linux keeps track of time in terms of ticks. There are some number of ticks
per second which can vary by system. You will have to ask the OS how many
ticks there are per second. Fortunately, there’s a libc
function that will
do that for us! Here’s a function you can use which returns the number of
ticks per second.
/// Returns the number of clock ticks per second.
fn ticks_per_second() -> Result<i64> {
let ret = unsafe { libc::sysconf(libc::_SC_CLK_TCK) };
if ret == -1 {
return Err(io::Error::last_os_error().into());
}
Ok(ret)
}
Your task
In your ps
’s run()
, call ticks_per_second()
. To compute the execution
time of a process in seconds, use the execution_time
field of the Process
struct and divide it by the result of ticks_per_second()
.
Convert this time to days, hours, minutes, and seconds, and when printing the execution time, use the appropriate format.
Part 5. Controlling terminals (40 points)
The final thing to add to ps
is printing a human-readable description of the
controlling terminal (the TTY column).
In this part, you’re going to create a new DeviceMap
struct which you will
use to convert the “controlling terminal” integer to a string like pts/3
.
The controlling terminal integer has two parts, a minor number that’s in the
range [0, 255] and a major number. The integer is computed as major * 256 + minor
. The major number identifies the type of “character device.”
Linux exposes the mapping between major number and device name in the file
/proc/devices
. Go ahead and take a look at that file to see how it is
structured.
Your task
Create a new file src/device_map.rs
(and add a module declaration in
lib.rs
) and define a new DeviceMap
struct. This struct should contain a
std::collections::HashMap<i32, String>
field. (You’ll probably want to use std::collections::HashMap
.) Don’t forget to make DeviceMap
public.
You’re going to implement a public constructor named new()
and a public
lookup()
method.
#![allow(unused)] fn main() { type Result<T> = std::result::Result<T, Box<dyn std::error::Error>>; struct DeviceMap; impl DeviceMap { pub fn new() -> Result<Self> { todo!() } pub fn lookup(&self, tty_nr: i32) -> String { let major = tty_nr >> 8; let minor = tty_nr & 0xFF; todo!() } } }
(You will need to use super::Result;
to have access to your Result
type).
The new()
function should open /proc/devices
and read lines corresponding
to the Character devices
section of the file and insert the names of the
devices into the hash map using the integer value as the key. See the
HashMap
documentation for the methods you can use.
To simplify your implementation, you may want to use this code
let devices = std::fs::read_to_string("/proc/devices")
.map_err(|err| format!("/proc/devices: {err}"))?;
for line in devices
.lines()
.skip_while(|&line| line != "Character devices:")
.skip(1)
.take_while(|&line| !line.is_empty())
{
todo!("Split the line on whitespace and insert into the map")
}
This reads /proc/devices
into a String
named devices
. Then it iterates
over the lines by (1) skipping lines until Character devices:
is found; (2)
skipping that line; and (3) only returning the lines that are nonempty.
There are duplicate numbers in /proc/devices
. That’s okay, just insert them
into the hash map one at a time. The earlier values will be replaced by the
later values.
What remains is to finish the implementation of lookup()
. The provided code
above extracts the major and minor number. Look up the major number in the
hash map. If it’s not found, return the string "?"
. If it is found, then
return format!("{device}/{minor}")
(where device
is the returned value
from the hash map).
The final step is to update ps
to create a DeviceMap
in main()
and when
printing the TTY
column, call the lookup()
method to get the name of the
TTY
.
Here’s some sample output as compared to the real ps
.
$ cargo run --bin ps --quiet
PID TTY TIME CMD
1521440 pts/3 00:00:00 bash
1523501 pts/3 00:00:00 ps
$ ps
PID TTY TIME CMD
1521440 pts/3 00:00:00 bash
1523768 pts/3 00:00:00 ps
Submitting (5 points)
To submit your lab, you must commit and push to GitHub before the deadline.
Your repository should contain the following files and directory.
repo
├── Cargo.lock
├── Cargo.toml
├── README
└── src
├── bin
│ ├── ps.rs
│ ├── runnable.rs
│ └── whoami.rs
├── device_map.rs
├── lib.rs
├── proc.rs
└── user.rs
Any additional files you have added to your repository should be removed from
the main
branch. (You’re free to make other branches, if you desire, but
make sure main
contains the version of the code you want graded.)
The README
should contain the names of both partners (or just your name if
you worked alone…but please don’t work alone if you can manage it).
After you have everything working, make sure you add your code, commit it, and push it to GitHub.
Lab 8: Osh I
In this lab and the next, you’re going to implement a simple shell named Osh. The shell will be able to
- Run simple commands like
wc -l some/file
; and - Run pipelines containing an arbitrary number of commands.
You will learn
- one way to parse input;
- how to model a pipeline; and
- how to run programs and wait for them to complete.
Preliminaries
First, find a partner. You’re allowed to work by yourself, but I highly recommend working with a partner. Click on the assignment link. One partner should create a new team. The team name cannot be the same name used for a previous lab. The second partner should click the link and choose the appropriate team. (Please don’t choose the wrong team, there’s a maximum of two people and if you join the wrong one, you’ll prevent the correct person from joining.) You cannot choose a team name you’ve used previously.
Once you have accepted the assignment and created/joined a team, you can clone the repository and begin working.
Be sure to ask any questions on Ed.
Compiler warnings and errors
Make sure your code compiles and passes clippy
without errors or warnings.
That is, running
$ cargo clean
$ cargo build
$ cargo clippy
should build your program without errors or warnings.
Formatting
Your code must be formatted by running cargo fmt
.
Part 1. Parsing input (40 points)
At a high level, your shell will (1) read a line of input from stdin; (2) parse the input into tokens; (3) construct a pipeline from the tokens; and (4) run the whole pipeline.
In this part, you will parse input into a sequence of tokens.
Initially, a token will be either a pipe character |
or a word. For example,
the input echo 'Why hello there!'| wc -l -c
consists of the following
tokens.
[Word("echo"), Word("Why hello there!"), Pipe, Word("wc"), Word("-l"), Word("-c")]
Notice how the |
turns into a pipe token rather than a word token. In
contrast, if the pipe is quoted, it is a word as in this example echo '|'
.
[Word("echo"), Word("|")]
This is a perfect use case for an enum.
Your task
Create a new binary project in the assignment repo using
$ cargo init --bin --vcs git --name osh .
Define the Result
type in main.rs
the same way you did for the past few labs.
Create a new file src/input.rs
and add the input
module to main.rs
.
Inside input.rs
, add a use
line for Result
.
use crate::Result;
Create a new, public enum called InputToken
which you will use to represent
either a word or a pipe.
Create a new function
#![allow(unused)] fn main() { type Result<T> = std::result::Result<T, Box<dyn std::error::Error>>; enum InputToken { Word(String), Pipe }; pub fn split_input(input: &str) -> Result<Vec<InputToken>> { todo!() } }
which on success will return a Vec
of InputToken
s.
To parse the input into tokens, you’ll want to iterate over the input string
one character at a time which you can do using the .chars()
method which
returns an iterator over characters.
What makes this tricky is that how you handle any given character depends on what other characters have been read. For example, a space character can separate two words or it could be part of a quoted word. Similarly, a pipe character might be part of a word (if it is quoted) or it might be a separator in the pipeline (see the two examples at the top of this page).
Your split_input()
function should support unquoted words, single quoted words, and
double quoted words or a mix. Here’s an example input line
echo unquoted_word 'single "quoted" word' "double 'quoted' word" 'mixing different 'quoting" styles"
and its parsing
[Word("echo"), Word("unquoted_word"), Word("single \"quoted\" word"), Word("double 'quoted' word"), Word("mixing different quoting styles")]
Note that the unquoted quotation marks are removed from each word. (The single
quoted word containing a double quoted inner word is shown with a backslash
before the inner quotes. This is an artifact of the Debug
representation,
not actual characters.)
To parse the input, I recommend creating a state machine which examines its input one character at a time. A state machine works by maintaining a current state and reading the input in a loop. For each character read, use the state to determine what to do with the character and what state to move to next.
My implementation used the following four states and a mutable State
variable which is initially State::Blank
and represents the current state of
parsing.
#![allow(unused)] fn main() { #[derive(Clone, Copy, Debug, PartialEq, Eq)] enum State { Blank, Unquoted, SingleQuoted, DoubleQuoted, } let mut state = State::Blank; }
The Blank
state indicates that the state machine is not currently parsing a
word. Initially, the current state is Blank
, but after reading an unquoted
white space character (a blank, if you will), the state will change back to
Blank
.
The other three states indicate that the state machine is in the middle of parsing an unquoted word, a single quoted word, or a double quoted word.
This is a graphical representation of the state machine.
Each circle represents one of the four states. The arrows indicate how the
state machine’s current state changes between them. For example,
starting in the Blank
state parsing the input foo 'bar'
, the state changes
to Unquoted
upon reading the f
. It stays in Unquoted
until the space
when it transitions back to Blank
. Upon reading the '
, it transitions to
SingleQuoted
where it stays until it reaches the last '
which causes it to
transition to Unquoted
.
Note that in state SingleQuoted
, a single quote character causes a transition to state Unquoted
. The DoubleQuoted
state behaves similarly with a double quote. It doesn’t transition to the Blank
state because a closing quote doesn’t end the word. For example, the input foo' 'bar
is the same as 'foo bar'
: both of them produce
a Word("foo bar")
token.
The bulk of the state machine is implemented in a loop
containing a match
.
for ch in input.chars() {
match (state, ch) {
_ -> todo!()
}
}
Rust will ensure that we add cases for every possible state and input character combination.
For example, here’s how my solution handles characters in the Blank
state.
let mut current_word = String::new();
let mut result = Vec::new();
for ch in input.chars() {
match (state, ch) {
(State::Blank, '|') => result.push(InputToken::Pipe),
(State::Blank, _) if ch.is_ascii_whitespace() => (),
(State::Blank, '\'') => state = State::SingleQuoted,
(State::Blank, '"') => state = State::DoubleQuoted,
(State::Blank, _) => {
current_word.push(ch);
state = State::Unquoted;
}
// Handle the other states
}
}
In the Blank
state, if a |
is read, it pushes an InputToken::Pipe
token
into a result
vector.
If the character is ASCII whitespace, then it’s ignored. Notice that we can add conditions to the match patterns.
match (state, ch) {
(State::Blank, _) if ch.is_ascii_whitespace() => (),
}
This will match if state
is State::Blank
and ch.is_ascii_whitespace()
returns true
. (The _
wildcard will match any character.)
If the character is a single or double quote, the state changes to the appropriate state.
In every other case, the character is pushed onto the current_word
and
changes the state to Unquoted
.
The Unquoted
state is similar to Blank
except that a |
or ASCII
whitespace ends the current_word
(so it should be pushed onto result
). If
it’s a |
, then an InputToken::Pipe
should also be pushed. The state should be
set to Blank
.
The SingleQuoted
and DoubleQuoted
are similar to each other. They should
keep collecting characters in the word until the matching single or double
quote character is read in which case the state should be changed to
Unquoted
.
After all characters have been read, if the state is Unquoted
, the current
word should be pushed onto the result vector. If the state is one of the
quoted states, return an error. Otherwise, return the vector of tokens.
For a complete example, see below.
Implement a main()
function in main.rs
that reads a line of input from
stdin
, removes the newline, calls split_input()
on it, and prints the
debug representation.
Write some unit tests in input.rs
to test your code. Here are two sample
unit tests to get you started.
#[cfg(test)]
mod test {
use super::*;
#[test]
fn simple_pipeline() {
let input = "echo hello | wc -c";
let tokens = split_input(input).unwrap();
assert_eq!(
tokens,
vec![
InputToken::Word("echo".to_string()),
InputToken::Word("hello".to_string()),
InputToken::Pipe,
InputToken::Word("wc".to_string()),
InputToken::Word("-c".to_string()),
]
);
}
#[test]
fn mixed_quotes() {
// This is a "raw" string. It has delimiters r#" and "#.
// Note that we can use " inside of it.
let input = r#"'mixing different 'quoting" styles""#;
let tokens = split_input(input).unwrap();
assert_eq!(
tokens,
vec![
InputToken::Word("mixing different quoting styles".to_string()),
]
);
}
}
Remember that you can add additional unit tests by adding functions to the
test
module. Each function that you want to run as a test needs to be
annotated with #[test]
. Note the use of raw
strings
in the second test.
To run all unit tests, run $ cargo test
.
Example parsing
The table below shows how current_word
and result
change as a result of
parsing the input echo 'one two' three" four"| tr a-z A-Z
.
The first (body) row of the table indicates that the state machine starts in
the Blank
state and the first character is e
. This causes the e
to be
pushed into current_word
and result
remains empty. The state changed to
the Unquoted
state which you can see in the second row.
The second row indicates that the state machine is in the Unquoted
state
and the next letter is c
. This causes the c
to be pushed into
current_word
and result
remains empty. The state remains Unquoted
.
The final row of the table shows that after reading all of the input
characters, the state machine is in the Unquoted
state and the
current_word
from the row above has been pushed into the result
vector.
State | ch | current_word | result |
---|---|---|---|
Blank | 'e' | "e" | [] |
Unquoted | 'c' | "ec" | [] |
Unquoted | 'h' | "ech" | [] |
Unquoted | 'o' | "echo" | [] |
Unquoted | ' ' | "" | [Word("echo")] |
Blank | '\'' | "" | [Word("echo")] |
SingleQuoted | 'o' | "o" | [Word("echo")] |
SingleQuoted | 'n' | "on" | [Word("echo")] |
SingleQuoted | 'e' | "one" | [Word("echo")] |
SingleQuoted | ' ' | "one " | [Word("echo")] |
SingleQuoted | 't' | "one t" | [Word("echo")] |
SingleQuoted | 'w' | "one tw" | [Word("echo")] |
SingleQuoted | 'o' | "one two" | [Word("echo")] |
SingleQuoted | '\'' | "one two" | [Word("echo")] |
Unquoted | ' ' | "" | [Word("echo"), Word("one two")] |
Blank | 't' | "t" | [Word("echo"), Word("one two")] |
Unquoted | 'h' | "th" | [Word("echo"), Word("one two")] |
Unquoted | 'r' | "thr" | [Word("echo"), Word("one two")] |
Unquoted | 'e' | "thre" | [Word("echo"), Word("one two")] |
Unquoted | 'e' | "three" | [Word("echo"), Word("one two")] |
Unquoted | '"' | "three" | [Word("echo"), Word("one two")] |
DoubleQuoted | ' ' | "three " | [Word("echo"), Word("one two")] |
DoubleQuoted | 'f' | "three f" | [Word("echo"), Word("one two")] |
DoubleQuoted | 'o' | "three fo" | [Word("echo"), Word("one two")] |
DoubleQuoted | 'u' | "three fou" | [Word("echo"), Word("one two")] |
DoubleQuoted | 'r' | "three four" | [Word("echo"), Word("one two")] |
DoubleQuoted | '"' | "three four" | [Word("echo"), Word("one two")] |
Unquoted | ' ' | "" | [Word("echo"), Word("one two"), Word("three four")] |
Blank | '|' | "" | [Word("echo"), Word("one two"), Word("three four"), Pipe] |
Blank | ' ' | "" | [Word("echo"), Word("one two"), Word("three four"), Pipe] |
Blank | 't' | "t" | [Word("echo"), Word("one two"), Word("three four"), Pipe] |
Unquoted | 'r' | "tr" | [Word("echo"), Word("one two"), Word("three four"), Pipe] |
Unquoted | ' ' | "" | [Word("echo"), Word("one two"), Word("three four"), Pipe, Word("tr")] |
Blank | 'a' | "a" | [Word("echo"), Word("one two"), Word("three four"), Pipe, Word("tr")] |
Unquoted | '-' | "a-" | [Word("echo"), Word("one two"), Word("three four"), Pipe, Word("tr")] |
Unquoted | 'z' | "a-z" | [Word("echo"), Word("one two"), Word("three four"), Pipe, Word("tr")] |
Unquoted | ' ' | "" | [Word("echo"), Word("one two"), Word("three four"), Pipe, Word("tr"), Word("a-z")] |
Blank | 'A' | "A" | [Word("echo"), Word("one two"), Word("three four"), Pipe, Word("tr"), Word("a-z")] |
Unquoted | '-' | "A-" | [Word("echo"), Word("one two"), Word("three four"), Pipe, Word("tr"), Word("a-z")] |
Unquoted | 'Z' | "A-Z" | [Word("echo"), Word("one two"), Word("three four"), Pipe, Word("tr"), Word("a-z")] |
Unquoted | [Word("echo"), Word("one two"), Word("three four"), Pipe, Word("tr"), Word("a-z"), Word("A-Z")] |
Once you finish the rest of the lab, this input should print out ONE TWO THREE FOUR
Part 2. Building a pipeline (25 points)
Now that your split_input()
function takes a string and returns a vector of
InputTokens
, the second step is to build a pipeline. To do this, you’re
going to create a SimpleCommand
struct and a Pipeline
struct.
The Pipeline
struct will contain a vector of SimpleCommands
to run.
At a high level, the Pipeline
constructor,
#![allow(unused)] fn main() { type Result<T> = std::result::Result<T, Box<dyn std::error::Error>>; enum InputToken { Word(String), Pipe } struct Pipeline; impl Pipeline { pub fn new(tokens: &[InputToken]) -> Result<Self> { todo!() } } }
takes a slice of InputToken
s and constructs a Pipeline
by splitting the
tokens on InputToken::Pipe
. Then, for each sequence of tokens between the
InputToken::Pipe
, new()
will construct a SimpleCommand
.
For example, the command cat Cargo.toml | wc -l
is split into the following tokens.
[Word("cat"), Word("Cargo.toml"), Pipe, Word("wc"), Word("-l")]
When these tokens are passed to Pipeline::new(tokens)
, it constructs the following Pipeline
.
Pipeline {
commands: [
SimpleCommand {
path: "cat",
args: [
"Cargo.toml",
],
},
SimpleCommand {
path: "wc",
args: [
"-l",
],
},
],
}
Your task
Create a new pipeline
module by creating a pipeline.rs
file and adding the
appropriate mod
line to main.rs
.
Create a SimpleCommand
struct that holds a path
(or command name) of the
program to execute and a vector of String
arguments. (In the next lab, we’ll
expand this struct.)
Next, create a public Pipeline
struct containing a vector of SimpleCommand
s.
Implement the Pipeline::new()
constructor shown above.
We can use the .split()
method on a slice to split it on InputToken::Pipe
. The result is an iterator over sub-slices.
for command_tokens in tokens.split(|token| token == &InputToken::Pipe) {
let mut words: Vec<String> = Vec::new();
for token in command_tokens {
todo!("Push each word into words");
}
if words.is_empty() {
return Err("Command missing".into());
}
todo!("Create a new SimpleCommand from the words");
}
words.remove(0)
will remove and return the first word from words
. You can use this to create the SimpleCommand
.
Update your main()
function to print the debug representation of the
Pipeline
that results from the input tokens you parsed in the previous part.
Create some unit tests to test the construction of a Pipeline
. Here’s one to
get you started using the example above.
#[cfg(test)]
mod test {
use super::*;
#[test]
fn simple_pipeline() {
let tokens = [
InputToken::Word("cat".to_string()),
InputToken::Word("Cargo.toml".to_string()),
InputToken::Pipe,
InputToken::Word("wc".to_string()),
InputToken::Word("-l".to_string()),
];
let pipeline = Pipeline::new(&tokens).expect("Failed to create a Pipeline");
assert_eq!(
pipeline,
Pipeline {
commands: vec![
SimpleCommand {
path: "cat".to_string(),
args: vec!["Cargo.toml".to_string()]
},
SimpleCommand {
path: "wc".to_string(),
args: vec!["-l".to_string()]
},
]
}
);
}
}
Part 3. Running a pipeline (25 points)
In the previous parts, you parsed input into tokens and then the tokens into a
representation of a pipeline. In this part, you’re going to implement a
run()
method which will create and run the processes in the pipeline.
To create a new process, you’re going to use Rust’s standard
std::process::Command
. It’s worth reading the documentation now.
#![allow(unused)] fn main() { use std::process::{Command, Stdio}; let arguments = vec!["-l", "/usr"]; let child = Command::new("ls") .args(arguments) .stdin(Stdio::null()) .stdout(Stdio::inherit()) .stderr(Stdio::inherit()) .spawn(); match child { Ok(mut child) => { // Wait for the child to exit. child.wait().unwrap(); } Err(err) => { eprintln!("Error: {err}"); } } }
(Click Run to see the output.)
There are several things to notice in this example. First, the style of code
with the Command
struct is called the “builder pattern” because Command
is
only used for configuring the command to run. Each of the methods you call on
a Command
return a &mut Self
, meaning you can chain together methods as
shown in the example. The final method, .spawn()
consumes the Command
and
returns a Child
(wrapped in a Result
, of course).
Second, you configure the arguments by calling .args()
and passing it a
vector of arguments. (Cleverly, it can handle arrays, vectors, and slices of
&str
or String
.)
Third, you configure the behavior of stdin, stdout, and stderr by passing the
corresponding methods an instance of Stdio
. You’ll see next lab how to
redirect those to/from files. For now, you’ll want to use Stdio::inherit()
to mean use the stdin/stdout/stderr of the current process, Stdio::piped()
to mean create a pipe (see the next example), and Stdio::null()
to mean
connect it to /dev/null
.
Fourth, The .spawn()
method returns a Child
. To wait for the child to
exit, use the .wait()
method. The .wait()
method returns a Result
, but
the error condition shouldn’t happen since .spawn()
just spawned a new
process so the example just unwraps the result.
The previous example shows how to spawn one process. To spawn two processes
and have the second process’s stdin be the first process’s stdout, you’ll need
to (1) spawn the first process; (2) extract the ChildStdout
from the Child
returned by .spawn()
; (3) convert that to a Stdio
; (4) spawn the second
process using the newly converted Stdio
as stdin; and (5) wait for both
children.
#![allow(unused)] fn main() { use std::process::{Command, Stdio}; let mut children = Vec::new(); // 1. Spawn the first child. let arguments = vec!["-l", "/usr"]; let first_child = Command::new("ls") .args(arguments) .stdin(Stdio::inherit()) .stdout(Stdio::piped()) .stderr(Stdio::inherit()) .spawn(); let first_child_stdout: Stdio = match first_child { Ok(mut child) => { let stdout = child.stdout .take() // 2. Extract the ChildStdout. .map_or(Stdio::null(), Stdio::from); // 3. Convert it to a Stdio. children.push(child); stdout } Err(err) => { eprintln!("Error: {err}"); Stdio::null() } }; // 4. Spawn the second child. let second_child = Command::new("wc") .stdin(first_child_stdout) .stdout(Stdio::inherit()) .stderr(Stdio::inherit()) .spawn(); match second_child { Ok(mut child) => { children.push(child) } Err(err) => { eprintln!("Error: {err}"); } } // 5. Wait for all of the children that were created successfully. for mut child in children { child.wait().unwrap(); } }
One tricky portion of this is
let stdout = child.stdout
.take() // 2. Extract the ChildStdout.
.map_or(Stdio::null(), Stdio::from); // 3. Convert it to a Stdio.
child.stdout
is an Option<ChildStdout>
. The .take()
method for an Option
takes the value from self
if it’s a Some
, replacing it with None
and returns a new Option
. Here’s some example code showing this behavior. (Click Run.)
#![allow(unused)] fn main() { let mut x: Option<i32> = Some(10); let taken_from_x = x.take(); let mut y: Option<i32> = None; let taken_from_y = y.take(); println!("{x:?} {taken_from_x:?}"); println!("{y:?} {taken_from_y:?}"); }
The .map_or(default_value, function)
method for an Option
will return default_value
if the Option
is None
, otherwise, it unwraps the Option
and calls function
on the unwrapped value. In this case, if child.stdout
is None, stdout
will be set to Stdio::null()
and if child.stdout
is Some(child_stdout)
, then stdout
will be set to Stdio::from(child_stdout)
.
Your task
Implement the run()
method.
#![allow(unused)] fn main() { type Result<T> = std::result::Result<T, Box<dyn std::error::Error>>; struct Pipeline; impl Pipeline { pub fn run(self) -> Result<()> { todo!() } } }
Your code should spawn processes in a loop, connecting the stdout of each process to stdin of the next process. I suggest starting with something like
let mut last_stdout = Stdio::inherit();
and using it as the argument to .stdin()
for each process. After spawning a
process, extract the ChildStdout
and convert it to a Stdio
as shown in the
example and assign it to last_stdout
.
Each child that is successfully spawned should be added to a vector of
Child
. Any errors spawning a child should be printed to stderr, but the rest
of the pipeline should be spawned. After all children have attempted to be
spawned, you should wait for all of them to complete. See the example for
details.
Modify your main()
function to run the Pipeline
you’ve created.
Part 4. Making a shell (15 points)
Congratulations, you’ve completed the difficult part! All that remains is to turn your code into a proper shell.
Here’s a short sample session with my implementation.
Welcome to the Oberlin Shell!
$ whoami
steve
$ date
Wed Oct 18 13:51:00 EDT 2023
$ echo "I can run" 'commands with' multiple' 'arguments | tr a-z A-Z
I CAN RUN COMMANDS WITH MULTIPLE ARGUMENTS
$ ^D
To exit the shell, I pressed Ctrl-D (which it printed as ^D
). This causes stdin to be closed.
Your task
Modify your main()
function to print out a welcome message and then read
lines of input in a loop, parse their input, build a pipeline, and run the
pipeline. Any error should be printed out but the shell should not exit.
If the user enters a line containing only whitespace, the line should be ignored and the prompt printed again.
I found it helpful to separate out the logic of reading a line, parsing it,
and running it into a separate function which returned a Result<()>
and then
in main()
, I have a loop that calls the function and if it returns an error,
print the error out.
You can determine if stdin is closed when you read from it because it will
return Ok(0)
. Here’s some code that reads from stdin
and and exits the
program if stdin is closed. this.
let input_length = io::stdin().read_line(&mut line)?;
if input_length == 0 {
println!("");
std::process::exit(0);
}
Submitting (5 points)
To submit your lab, you must commit and push to GitHub before the deadline.
Your repository should contain the following files and directory.
repo
├── Cargo.lock
├── Cargo.toml
├── README
└── src
├── input.rs
├── main.rs
└── pipeline.rs
Any additional files you have added to your repository should be removed from
the main
branch. (You’re free to make other branches, if you desire, but
make sure main
contains the version of the code you want graded.)
The README
should contain the names of both partners (or just your name if
you worked alone…but please don’t work alone if you can manage it).
After you have everything working, make sure you add your code, commit it, and push it to GitHub.
Lab 9: Osh II
In this lab, you’re going to finish writing your shell. After this lab, your shell will
- support file redirection (e.g.,
$ echo foo >output.txt
); and - handle the interrupt (Ctrl-C) and quit (Ctrl-\) signals.
You will learn
- how to launch processes with stdin/stdout/stderr connected to files;
- how to open files with more complex open options; and
- how to install signal handlers and respond to signals.
Preliminaries
Continue working with your existing repository.
Be sure to ask any questions on Ed.
Compiler warnings and errors
Make sure your code compiles and passes clippy
without errors or warnings.
That is, running
$ cargo clean
$ cargo build
$ cargo clippy
should build your program without errors or warnings.
Formatting
Your code must be formatted by running cargo fmt
.
Part 1. Parsing redirections (60 points)
In this part, you’re going to start implementing redirections. Implementing redirections has two parts: parsing the input containing redirections; and (2) performing the redirection when the pipeline is run. In Part 2, you will implement performing the redirections. So let’s get started with parsing!
Recall that a simple command consists of words (the command and the options and arguments) and redirections interspersed. For example, the following commands (even the ridiculous-looking final one) are all equivalent.
$ wc -l <input.txt >output.txt
$ wc <input.txt -l >output.txt
$ wc <input.txt >output.txt -l
$ <input.txt >output.txt wc -l
All of them runwc -l
with stdin connected toinput.txt
and stdout connected tooutput.txt
.
You’re going to implement support for redirecting input from files in the following cases
n>path
wheren
is a digit. This creates a file atpath
(truncating it if it exists) and connects file descriptorn
to the created file.n>>path
wheren
is a digit. This opens a file atpath
(creating if necessary) in append mode and connects file descriptorn
to the opened file.n<path
wheren
is a digit. This opens the file atpath
(it’s an error if the file doesn’t exist) and connects file descriptorn
to the opened file.>path
. The same as1>path
.>>path
. The same as1>>path
.<path
. The same as0<path
.
Your task
There are three basic types of redirections: redirecting input, redirecting output, and redirecting output by appending.
First, add some new variants to your InputToken
enum to handle the three
redirection cases. Note that in each case, a redirection needs two pieces of
data: the file descriptor number (a u32
), and a path (a String
). For
example, one of my variants is
enum InputToken {
// ...
RedirectInput(u32, String),
// ...
}
The difficult part of parsing the input is modifying the state machine to parse the new redirections.
Note that it isn’t sufficient leave the current parsing mechanism in
split_tokens()
alone and simply examine each of the Word
tokens to see if
it has the right form. Consider echo '>/dev/null'
and echo >/dev/null
. If
you correctly implemented the input parsing in the previous lab, then the
output of parsing these should be identical, e.g., [Word("echo"), Word(">/dev/null")]
. The problem is these two commands do not behave the same
way. The first prints out >/dev/null
whereas the second connects echo
’s
stdout to /dev/null
and echo
is run with no arguments.
Recall that the state machine described in the hint in the previous lab has four states. To support redirections, you will need some more states.
There are many ways to implement this. Here’s one way.
Since redirections consist of an optional digit, a redirection operator, <
,
>
, or >>
, and a path, it would be nice to reuse the code you’ve already
written to parse Word
s to parse the paths. You can do this by maintaining
additional state about what type of redirection is currently being parsed.
#![allow(unused)] fn main() { enum RedirectionState { None, Input(u32), Output(u32), Append(u32), } }
The None
variant is when a normal word is being parsed. The u32
keep track
of which file descriptor will be redirected.
In addition, you will need to be able to distinguish input 5>path
from 5x
.
Since the state machine looks at the input one character at a time, you’ll
need to modify State
to have an additional variant, Digit
which represents
the case where a digit was read in the Blank
state.
#![allow(unused)] fn main() { enum State { Blank, Unquoted, SingleQuoted, DoubleQuoted, Digit(u32), } }
Here’s an updated state machine.
What isn’t represented is how the RedirectionState
changes.
The intended behavior is in the Blank
state, if the next input character is
a digit, transition to state Digit(num)
where num
is the number.
In the Digit(num)
state, if the next input character is <
or >
, set the
redirection state to RedirectionState::Input(num)
or
RedirectionState::Output(num)
and change to the Unquoted
state.
In the Digit(num)
state, if the next input character is a quotation mark,
write num
into the current word and transition to the
appropriate quoted state. This is handling input like 3"foo"
Otherwise, in the Digit(num)
state, if the next input character is anything
else, write num
into the current word and transition to the Unquoted
state. This handles input like 4bar
.
The transitions from the Blank
and Unquoted
states need to be updated to
handle the redirection state.
In the Blank
state, if the next input character is <
or >
, the
redirection state should be set to RedirectionState::Input(0)
or
RedirectionState::Output(1)
and transition to the Unquoted
state.
In the Unquoted
state, if the next input character is a |
, then you either
need to insert a Word
token or one of the Redirect*
token into the
results
vector. And then you need to insert a Pipe
token. Here’s some
sample code for this.
(State::Unquoted, '|') => {
let token = match redirection_state {
RedirectionState::None => InputToken::Word(current_word),
_ if current_word.is_empty() => return Err("Empty redirection target".into()),
RedirectionState::Input(num) => InputToken::RedirectInput(num, current_word),
RedirectionState::Output(num) => InputToken::RedirectOutput(num, current_word),
RedirectionState::Append(num) => InputToken::RedirectAppend(num, current_word),
};
result.push(token);
result.push(InputToken::Pipe);
current_word = String::new();
state = State::Blank;
redirection_state = RedirectionState::None;
}
Note that if the current word is empty, and a redirection is being parsed,
this is an error. I.e., the input 2>|
is an error.
In the Unquoted
state, if the next input character is whitespace, then this
behaves similar to the |
case except you don’t push a Pipe
token.
In the Unquoted
state, if the next input character is a <
or a >
, then
based on the redirection state, you need to (1), insert a new Word
token,
(2) insert a new Redirect*
token, or (3) change the redirection state from
RedirectionState::Output(num)
to RedirectionState::Append(num)
.
The third option occurs when the redirection state is
RedirectionState::Output(num)
and the current word is empty. For example,
when parsing input 3>>foo
, the first >
changes the redirection state to
Output(3)
and the state to Unquoted
. The second >
changes the
redirection state to Append(3)
.
For example, the input >foo>>bar<baz
should be parsed as three separate
redirection tokens.
Finally, after all of the input has been consumed, based on the state and the
redirection state, you may need to insert either a Word
or a Redirect*
token. You will also need to update your code to handle the Digit
state.
Write unit tests. In fact, you should probably write your unit tests first. Here are a few to help you get started.
#[test]
fn redirections() {
use InputToken::*;
let input = ": >foo >>bar <baz";
let tokens = split_input(input).unwrap();
assert_eq!(
tokens,
vec![
Word(":".to_string()),
RedirectOutput(1, "foo".to_string()),
RedirectAppend(1, "bar".to_string()),
RedirectInput(0, "baz".to_string()),
]
);
}
#[test]
fn redirections_nospace() {
use InputToken::*;
let input = ":>foo>>bar<baz";
let tokens = split_input(input).unwrap();
assert_eq!(
tokens,
vec![
Word(":".to_string()),
RedirectOutput(1, "foo".to_string()),
RedirectAppend(1, "bar".to_string()),
RedirectInput(0, "baz".to_string()),
]
);
}
#[test]
fn digit_redirections() {
use InputToken::*;
let input = ": 1word 2>foo 3>>bar 4<baz 56>abc";
let tokens = split_input(input).unwrap();
assert_eq!(
tokens,
vec![
Word(":".to_string()),
Word("1word".to_string()),
RedirectOutput(2, "foo".to_string()),
RedirectAppend(3, "bar".to_string()),
RedirectInput(4, "baz".to_string()),
Word("56".to_string()),
RedirectOutput(1, "abc".to_string()),
]
);
}
#[test]
fn empty_redirection_targets() {
assert!(split_input("<").is_err());
assert!(split_input(">").is_err());
assert!(split_input(">>>foo").is_err());
assert!(split_input("><foo").is_err());
assert!(split_input("<>foo").is_err());
assert!(split_input(">><foo").is_err());
}
The good news is this was the hard part! The rest is simpler. (Just for fun, check out the state machine definition for HTML. It has 80 states. And that’s just the first of two state machines used to parse HTML. The second has about 25 states but is even more complex.)
Part 2. Processing redirections (25 points)
Now that you have a split_tokens()
function that can parse redirections,
it’s time to actually implement them! Fortunately, this is much easier than
parsing the input.
Although your implementation from the previous part supported file descriptors 0-9, for this part, you only need to support file descriptors 0, 1, and 2.
In the last lab, implementing the pipeline consisted of two parts, (1)
splitting the slice of InputToken
s on Pipe
tokens and constructing a
vector of SimpleCommand
s in Pipeline::new()
; and (2) spawning the
processes in Pipeline::run()
.
Your task
You will need to update the constructor to handle the new InputToken
variants and update the .run()
method to perform redirections.
First, update SimpleCommand
to hold a vector of redirections. It’s up to you
how you want to model a redirection. I used a new enum with two variants, one
for input and one for output. Both variants have fields for the file
descriptor number and the path. The output variant also has a bool
field
which is true
when appending.
What you should not do is have a vector of InputToken
s for the
redirections. It wouldn’t make sense to have a redirection of,
InputToken::Word
or InputToken::Pipe
after all. Make some new type.
In the Pipeline::new()
constructor, insert the appropriate redirections into
each SimpleCommand
’s redirections.
In the .run()
method, to process the redirections, for each command, you
need to construct a Stdio
object for stdin, stdout, and stderr. You did part
of this before when hooking up stdout of one command to stdin of the next
command.
Initially, the Stdio
for stdin should be the last stdout, the Stdio
for
stdout should be either Stdio::piped()
or Stdio::inherit()
and the Stdio
for stderr should be Stdio::inherit()
. This is what you had from the last
lab but with an explicit Stdio
for each of stdin/stdout/stderr.
Then, for each redirection in the SimpleCommand
, you want to open/create a
file. The File
object can be converted to a Stdio
using .into()
. This can be assigned to variables for stdin/stdout/stderr using code like
match num {
0 => stdin = file.into(),
1 => stdout = file.into(),
2 => stderr = file.into(),
_ => return Err(format!("Redirecting {num} not supported").into()),
}
Finally, when creating the Command
builder, you want to call .stdin()
,
.stdout()
, and .stderr()
and pass the three Stdio
s.
Remember that we can open a file for input using
let file = File::open(path).map_err(|err| format!("{path}: {err}"))?;
For output redirections, we need to create a file if it doesn’t exist and truncate it if it does. For append redirections, we need to create the file if it doesn’t exist and open it in append mode.
In append mode, all writes to the file are appended atomically. This means if multiple processes are trying to append to the file at the same time, you won’t get some bytes from one process and some bytes from the other process intermixed. A complete line of output will be written at a time.
To open files with different options, we need to use the
OpenOptions
builder. Read the documentation for OpenOptions
. Pay particular attention
to the methods .write()
, .truncate()
, and .append()
. You can handle
output and append redirections in a similar manner by passing different values
to .truncate()
and .append()
.
Part 3. Handling signals (30 points)
When the user types a Ctrl-C, the kernel sends a
signal to every process in the foreground process group of the controlling
terminal (review this info box for details
on process groups and controlling terminals). By default, this signal—its
name is SIGINT
and it is the interrupt signal—causes the process that
receives it to terminate.
There are other signals that can be sent in response keys being pressed.
SIGINT
is by far the most common signal sent by a key press, but
Ctrl-\ sends SIGQUIT
, the quit signal. (Go ahead, try
running a program like $ sleep 10
that is long running and kill it by typing
Ctrl-C or Ctrl-\.) Note this is
different from Ctrl-D which closes stdin but doesn’t
terminate processes.
A process can choose to handle a signal by registering a signal handler function which will be called in the event of a signal. Or, a process can choose to ignore a signal if it doesn’t want the default action—often termination—to occur.
Shells need to deal with SIGINT
and SIGQUIT
to avoid being terminated.
When Bash receives a SIGINT
, it stops reading its current line of input,
prints a newline and a prompt and waits for the next line of input. When Bash
receives a SIGQUIT
, it ignores it.
A process can choose to handle or ignore a signal using the
sigaction(2) system call (by calling the libc wrapper function as
usual). The first argument is the signal number. The second argument is a
pointer to a sigaction
structure (yes, the struct and the function have the
same name) that controls the action that should be taken when that signal is
received. The third argument is a pointer to a sigaction
struct that will be
filled out by sigaction()
with the previous action.
These are the Rust definitions for the libc::sigaction() function and the libc::sigaction.
To ignore a particular signal, you create a libc::sigaction
struct with its
sa_sigaction
field set to libc::SIG_IGN
and pass it to
libc::sigaction()
.
When this example program is run, SIGINT
is ignored and then the process
goes to sleep for 10 seconds. During this time, pressing
Ctrl-C has no effect.
use std::io;
fn main() -> io::Result<()> {
unsafe {
let action = libc::sigaction {
sa_sigaction: libc::SIG_IGN,
..std::mem::zeroed()
};
if libc::sigaction(libc::SIGINT, &action, std::ptr::null_mut()) < 0 {
return Err(io::Error::last_os_error());
}
}
std::thread::sleep(std::time::Duration::from_secs(10));
Ok(())
}
One thing to notice is that action
was constructed using ..
to fill the rest of the fields other than sa_action
. See the Book for details about ..
. This is because different OSes use different structures with different members. We only need to use the sa_sigaction
field which appears in all of the different sigaction
structs. Essentially, doing it this way makes the code slightly more portable. For example, it works on both macOS and Linux.
The std::mem::zeroed()
is creating a libc::sigaction
struct where all of the fields have been set to 0. In general, doing this is unsafe, hence the reason it is in the unsafe
block. It’s safe in this particular case because all of the fields may be safely set to 0.
To install a signal handler, you set the sa_sigaction
field to the address
of a function.
When a signal is received for which a signal handler has been registered, the kernel will run the handler. But the question is when does it run it? The answer is it’s a little unpredictable. It will be run the next time the kernel returns control to the process. This can happen as the result of returning from a system call or simply because the kernel has scheduled the process as the next process to run.
Since signal handler functions can be run at any time, the actions a signal
handler can take are extremely limited. There are a very small number of
functions that are safe to be called from a signal handler. In particular, a
signal handler is not allowed to allocate memory or use perform I/O using
things like println!()
. Instead, about all a signal handler should do is set
flag which is checked outside the signal handler to see if the signal
occurred.
Setting the flag and checking if the flag has been set must happen
atomically, meaning without being interrupted. You will be unsurprised to
learn that Rust has great support for atomic variables. For the flag, you want
to use an AtomicBool
. This will have values true
or false
, but we assign
values to them using the .store()
method and load values using the .load()
methods. The .swap()
method is used to atomically load the old value and
assign a new value.
This example shows how to register a signal handler for SIGINT
.
use std::io;
use std::sync::atomic::{AtomicBool, Ordering};
static INTERRUPTED: AtomicBool = AtomicBool::new(false);
extern "C" fn handler(_sig: libc::c_int) {
INTERRUPTED.store(true, Ordering::Relaxed);
}
fn main() -> io::Result<()> {
unsafe {
let action = libc::sigaction {
sa_sigaction: handler as libc::sighandler_t,
..std::mem::zeroed()
};
if libc::sigaction(libc::SIGINT, &action, std::ptr::null_mut()) < 0 {
return Err(io::Error::last_os_error());
}
}
std::thread::sleep(std::time::Duration::from_secs(10));
if INTERRUPTED.load(Ordering::Relaxed) {
println!("Received a SIGINT");
}
Ok(())
}
Some key points to notice: The handler
function is declared as extern "C"
because the handler function is expected to be a C function. INTERRUPTED
is our first example of a global variable in Rust. The handler
function is
incredibly simple: All it does is set a flag.
If you run the code in that example and press Ctrl-C
during the sleep, you’ll notice something disconcerting: The process sleeps
for the full 10 seconds and then afterward it prints out Received a SIGINT
.
The whole point of the interrupt signal is to interrupt a process so what is
going on here?
Some system calls are allowed to be interrupted by signals. When this happens,
the C wrapper around the underlying system call sets the thread-local variable
errno
to EINTR
and returns -1
. If you examine the source code for the
sleep()
function, you’ll see it calls the underlying
libc::nanosleep()
function in a loop if it is awakened by a signal. (Note
the if
on line 242 checking if the return value of libc::nanosleep()
is
-1
and then asserting that the value of os::errno()
is libc::EINTR
.)
If we replace the call to sleep()
with a call to libc::nanosleep()
, we can
see the difference.
In this example, the call to nanosleep()
is explicit and since it is not called
in a loop, interrupting via SIGINT
works as expected.
use std::io;
use std::sync::atomic::{AtomicBool, Ordering};
static INTERRUPTED: AtomicBool = AtomicBool::new(false);
extern "C" fn handler(_sig: libc::c_int) {
INTERRUPTED.store(true, Ordering::Relaxed);
}
fn main() -> io::Result<()> {
unsafe {
let action = libc::sigaction {
sa_sigaction: handler as libc::sighandler_t,
..std::mem::zeroed()
};
if libc::sigaction(libc::SIGINT, &action, std::ptr::null_mut()) < 0 {
return Err(io::Error::last_os_error());
}
}
let ts = libc::timespec {
tv_sec: 10,
tv_nsec: 0,
};
if unsafe { libc::nanosleep(&ts, std::ptr::null_mut()) } < 0 {
let err = io::Error::last_os_error();
if err.kind() != io::ErrorKind::Interrupted {
return Err(err);
}
}
if INTERRUPTED.load(Ordering::Relaxed) {
println!("Received a SIGINT");
}
Ok(())
}
Note how a io::Error
is constructed from errno
by calling io::Error::last_os_error()
and then returned if it isn’t io::ErrorKind::Interrupted
.
Many Rust standard library functions, such as the read_line()
method, will
perform this same thing where if a signal interrupts a system call, the
standard library function will make the call again. Usually, that’s the
behavior you want. For a shell, it isn’t. And when it isn’t, you have to
implement the functionality you want which sometimes means calling the
underlying libc functions.
Your task
Osh should work slightly differently than Bash. In particular, if Osh receives
either a SIGINT
or SIGQUIT
signal, it should ignore the current line of
input that is being typed, print a newline, and then prompt for the next line
of input.
Create a new module named signals
. Inside it write a function
install_signal_handlers() -> io::Result<()>
that sets the signal handler for
both signals to call a function named handler
. Make sure any errors from
libc::sigaction()
are returned. Write the handler
function
extern "C" fn handler(_sig: libc::c_int) {
todo!()
}
that stores true
in an AtomicBool
following the examples above.
Write a was_interrupted() -> bool
function that atomically loads the value
of the AtomicBool
and stores false
. Use the .swap()
method for this
purpose. Return the loaded value. In this way, was_interrupted()
returns
true
if a SIGINT
was received since the last time was_interrupted()
was
called.
Add a call to install_signal_handlers()
to the beginning of main()
.
As mentioned above, the .read_line()
method explicitly ignores interrupts
from receiving a signal. Since Osh needs a way to read a line of input from
stdin but also return immediately when a SIGINT
or SIGQUIT
is received,
you cannot use the standard library methods.
Fortunately, it’s easy to look at the Rust source code and modify it to suit
your needs. So I did. Here’s a function you can use which reads from stdin and
returns a io::Result<String>
.
#![allow(unused)] fn main() { /// This is similar to std::io::stdin().read_line() except that /// being interrupted by a signal returns an Err(err). /// This implementation comes from modifying /// https://doc.rust-lang.org/src/std/io/mod.rs.html#1940 fn read_line() -> std::io::Result<String> { use std::io::BufRead; let mut stdin = std::io::stdin().lock(); let mut buf = Vec::new(); loop { let (done, used) = { let available = stdin.fill_buf()?; match available.iter().position(|&b| b == b'\n') { Some(i) => { buf.extend_from_slice(&available[..=i]); (true, i + 1) } None => { buf.extend_from_slice(available); (false, available.len()) } } }; stdin.consume(used); if done || used == 0 { let s = String::from_utf8(buf) .map_err(|err| std::io::Error::new(std::io::ErrorKind::InvalidData, err))?; return Ok(s); } } } }
The key point is that if stdin.fill_buf()
returns an error because it was
interrupted by a signal, then read_line()
will return that error. You can
check if the returned error was due to being interrupted by a signal by using
err.kind() == std::io::ErrorKind::Interrupted
.
In your function that prints a prompt and reads from stdin, replace that with
a loop that calls signals::was_signaled()
and if a signal has been received,
prints a newline. Then print the prompt and call that read_line()
function.
If read_line()
returns an error and the error’s kind is Interrupted
, then
continue the loop. If it returns some other error, return it. Otherwise break
out of the loop and process the line as before.
Here’s some sample output.
$ cargo run --quiet
Welcome to the Oberlin Shell!
$ this line is interrupted^C
$ this one is quit^\
$ echo this one is not
this one is not
$ sleep 10
^C
$
If your shell can now handle redirections and signals, you’re done! Congratulations, this was a lot of work.
Submitting (5 points)
To submit your lab, you must commit and push to GitHub before the deadline.
Your repository should contain the following files and directory.
repo
├── Cargo.lock
├── Cargo.toml
├── README
└── src
├── input.rs
├── main.rs
├── pipeline.rs
└── signals.rs
Any additional files you have added to your repository should be removed from
the main
branch. (You’re free to make other branches, if you desire, but
make sure main
contains the version of the code you want graded.)
The README
should contain the names of both partners (or just your name if
you worked alone…but please don’t work alone if you can manage it).
After you have everything working, make sure you add your code, commit it, and push it to GitHub.
Lab 10: Threads
In this lab, you’re going to take a single threaded program and turn it into a multithreaded program by splitting up the computation into multiple threads.
You will learn
- how to spawn threads; and
- how to return values from threads.
Preliminaries
First, find a partner. You’re allowed to work by yourself, but I highly recommend working with a partner. Click on the assignment link. One partner should create a new team. The second partner should click the link and choose the appropriate team. (Please don’t choose the wrong team, there’s a maximum of two people and if you join the wrong one, you’ll prevent the correct person from joining.) You cannot choose a team name you’ve used previously.
Once you have accepted the assignment and created/joined a team, you can clone the repository and begin working.
Be sure to ask any questions on Ed.
Compiler warnings and errors
Make sure your code compiles and passes clippy
without errors or warnings.
That is, running
$ cargo clean
$ cargo build
$ cargo clippy
should build your program without errors or warnings.
Formatting
Your code must be formatted by running cargo fmt
.
Part 1. Hello threads! (10 points)
In this part, you’re going to write a very short program that will spawn several threads.
There are multiple ways to spawn threads in Rust. We’re going to use the thread::scope()
method. Click the link to read its documentation, specifically the example. Pay particular attention to how the scope()
function takes a closure as an argument. That closure has an argument s
which is used to spawn new threads by calling s.spawn()
. The spawn()
method takes a (zero-argument) closure as an argument.
Your task
Modify the starter code in the hello
directory to spawn n
threads in a loop where n
is the value returned by thread::available_parallelism()
.
The value returned by thread::available_parallelism()?
is a
NonZeroUsize
and not a normal usize
. To turn it into one, you can write your loop like
for thread_num in 0..n.into() { }
Each of your threads should print hello from its thread number. Here’s an example run of the program. Your output will likely look different since you may have a different number of threads and they will run in some unpredictable order.
$ cargo run --quiet
This program should probably only use 10 threads
Hello from thread 0
Hello from thread 1
Hello from thread 2
Hello from thread 5
Hello from thread 6
Hello from thread 3
Hello from thread 4
Hello from thread 8
Hello from thread 9
Hello from thread 7
Part 2. Parallelizing jfrac (20 points)
In this part, you’re going to modify a version of the jfrac
code from lab 2 that produces animated gifs of the Julia set fractal. Specifically, you’re going to turn it into a multi-threaded program where each thread will generate some of the frames of the animated gif.
The provided code in the jfrac
directory can be run with several new options. Here’s the output from --help
.
Usage: jfrac [OPTIONS] <OUTPUT>
Arguments:
<OUTPUT> Output file path
Options:
-s, --size <SIZE> Sets the width and height of the image [default: 800]
-c, --constant <CONSTANT> Sets the constant `c` in the equation `f(z) = z^2 + c` [default: "-0.8 + 0.156i"]
-i, --initial-zoom <INITIAL_ZOOM> Sets the initial zoom [default: 1.0]
-f, --final-zoom <FINAL_ZOOM> Sets the final zoom [default: 20.0]
-n, --frames <FRAMES> Sets the number of frames in the gif [default: 100]
-t, --threads <THREADS> Sets the number of threads to use [default: 1]
-h, --help Print help
If you run the code with the default options, it will take a fairly long time to run, nearly 5 minutes for me. We can time it using the time
shell command.
$ time cargo run -- output.gif
Finished dev [unoptimized + debuginfo] target(s) in 0.04s
Running `target/debug/jfrac output.gif`
cargo run -- output.gif 293.09s user 0.33s system 99% cpu 4:55.70 total
We can speed it up substantially by compiling our code using compiler optimizations. To do that, pass the --release
argument to cargo build
and cargo run
. After $ cargo build --release
, running the code still takes about 25 seconds for me.
$ time cargo run --quiet --release -- output.gif
cargo run --quiet --release -- output.gif 25.51s user 0.08s system 99% cpu 25.827 total
You should run this yourself and look at the produced output.gif
to get a sense of what the program is doing.
We can do better by making use of threads.
While testing your code, you may wish to change the number of frames of animation using the --frames
command line argument.
Your task
Your task is to finish the program by handling the case where the number of
threads the user selects via the --threads
option is greater than 1. (There
is currently a todo!()
marking where your code needs to go.)
Let’s look at the main loop in the program.
for frame_num in 0..config.frames {
let zoom = config.initial_zoom
* zoom_factor.powf(frame_num as f32 / (config.frames - 1) as f32);
let frame = make_frame(&config, zoom);
encoder.encode_frame(frame)?;
}
The two key pieces here are creating a frame of animation by calling make_frame(&config, zoom)
which creates the fractal at the specified level of zoom and encoding that frame using the Gif encoder.
You will need to write similar code for the case where the number of threads is greater than 1.
When parallelizing code in general, you’ll need to identify which portions of the task can be handled concurrently and which must be handled serially. In this case, this is easy: Each frame of animation can be created independently of the other frames but encoding the gif must happen serially since we need the frames to appear in order.
This suggests the following steps.
- In a loop, create
config.threads
number of threads, each of which is responsible for creating aVec<Frame>
containing (approximately)config.frames / config.threads
frames of animation. - In a second loop, wait for the threads to finish by using
join()
(see below) and encode the frames returned by each thread in order.
Both steps will need to occur inside the body of the closure that we pass to thread::scope()
.
thread::scope(|s| -> Result<()> {
// Step 1.
// Step 2.
Ok(())
})?;
Notice that this code contains an explicit return type on the closure. The
Rust compiler can often deduce return types, but not always. Since step 2
involves calling encoder.encode_frame()
which can return a Result
, we can
propagate errors by using ?
if we mark the closure as returning a Result
.
For step 1, you’ll need to figure out what range of frames to create for each thread. One approach is to have a loop like this.
for thread_num in 0..config.threads {
let start_frame = thread_num * config.frames / config.threads;
let end_frame = (thread_num + 1) * config.frames / config.threads;
s.spawn(move || {
todo!("Create frames in the range start_frame..end_frame");
});
}
This ensures all frames will be generated and each thread will have approximately the same number of frames to generate. In fact, the number of frames per thread will differ by at most 1 using this approach. For example, generating 100 frames using 8 threads with this approach gives the following frame ranges per thread.
Thread 0: 0..12
Thread 1: 12..25
Thread 2: 25..37
Thread 3: 37..50
Thread 4: 50..62
Thread 5: 62..75
Thread 6: 75..87
Thread 7: 87..100
Note the use of the move
keyword when spawning threads. This causes values
from outside the closure that are referenced inside the closure to be moved
(or copied) into the closure. E.g., start_frame
and end_frame
will be
copied into the closure. Without the move
, the compiler will give an error
about lifetimes. (See the note at the end of this page when you get an error about using the moved value of config
.)
For step 2, you’ll need to wait for each thread to finish and return the
frames. To support waiting for threads to finish and for getting their return
values, the s.spawn()
method returns a thread handle. You should insert
that handle into a vector of handles for use in step 2.
thread::scope(|s| -> Result<()> {
let mut handles = Vec::with_capacity(config.threads);
// Step 1.
for thread_num in 0..config.threads {
// ...
let handle = s.spawn(move || { /* ... */ });
handles.push(handle);
}
// Step 2.
for handle in handles {
// ...
}
Ok(())
})?;
In the loop for step 2, you can use let frames = handle.join().unwrap()
to
wait for the thread to finish and return the vector of frames. You can then
encode the frames.
There’s one final little detail you’ll need to handle: The move
keyword you needed to use when creating the threads tried to move the config
value into the thread. You actually only want a shared reference to config
since each thread needs to use it.
You can force Rust to use a reference by shadowing the config
variable with a reference to config
. You’ll want something like this.
thread::scope(|s| -> Result<()> {
let mut handles = Vec::with_capacity(config.threads);
let config = &config;
// Step 1.
// Step 2.
Ok(())
})?;
With the let config = &config
, the move || { }
closure will make a copy of
the reference for use in each thread rather than trying to move the underlying
Config
structure itself into the thread.
Part 3. Timing (15 points)
In this part, you’ll run your code several times with different numbers of threads to determine how long it takes to generate the image.
Your task
Time how long it takes to generate the image with several different numbers of
threads using the --threads
option. Remember to use cargo build --release
and time cargo run --release
to use the optimized release build rather than
the unoptimized debug build. Put the results of your timing experiment in your
README
.
Answer the following questions in the README
.
- Using
--threads 2
, you may have noticed that the program runs much faster than when run with only a single thread but it takes more than half the time of a single thread. Why is that? - At what point does increasing the number of threads stop speeding up the program?
- How does the number you found for question 2 compare with the available parallelism the
hello
program you wrote in Part 1?
At this point, you’re all done and should submit your work. Feel free to play around with the other options to jfrac
like --constant
, --initial-zoom
, --final-zoom
, --frames
to produce other interesting images.
Submitting (5 points)
To submit your lab, you must commit and push to GitHub before the deadline.
Your repository should contain the following files and directory.
repo
├── README
├── hello
│ ├── Cargo.lock
│ ├── Cargo.toml
│ └── src
│ └── main.rs
└── jfrac
├── Cargo.lock
├── Cargo.toml
└── src
├── colormap.rs
└── main.rs
Any additional files you have added to your repository should be removed from
the main
branch. (You’re free to make other branches, if you desire, but
make sure main
contains the version of the code you want graded.)
The README
should contain the names of both partners (or just your name if
you worked alone…but please don’t work alone if you can manage it) in
addition to the answers to the questions in Part 3.
After you have everything working, make sure you add your code, commit it, and push it to GitHub.
Final Group Project
Overview
The project consists of a proposal, a status update, implementation, final report, and presentation.
The specific project is up to each group. However, it must involve a significant amount of effort—more than one or two people could do. All partners are expected to contribute to the implementation, the write-ups, and the presentation. Division of labor within each part is expected and good!
Any group member who does not participate in every part of the project will fail the course. No exceptions without prior approval from your Professor.
The GitHub Classroom link for the final project is here. Do not create a team or accept the assignment until after you have formed a group.
Requirements
The programming portion of the project must be (mostly) written in Rust and must use one or more crates from crates.io. Otherwise, the specific project you choose is up to you.
Collaboration and coordination between group members must happen on GitHub. This means regular commits from all group members are expected. I strongly recommend you learn about pull requests and use them along with code review of each commit before it gets pushed to master. Using GitHub issues to track bugs that need to be fixed or features that need to be implemented is a great idea and is encouraged.
Project suggestions
Some project suggestions include
- Program a microcontroller (like an Arduino) to use some sensors and lights/actuators to do something (you can test with a simulator)
- Implement a game using a game engine.
- Use the Rust bindings for OpenCV to do something with computer vision
- Implement some machine learning algorithms (like k-means or k-nearest neighbors) and run them on some interesting datasets and do some visualization or use existing implementations of ML algorithms
- Build a website in Rust via WebAssembly.
- Make some interactive art (audio and video)
Previous projects have included games, scenes in virtual reality, programs communicating over a network.
Project proposal
A written proposal (750-word maximum) is the first step.
You should provide the following information
- What are you doing?
- How are you doing it?
- What do you need to learn to do it?
- A proposed schedule with milestones (the status update will discuss the milestones so you might want to track these on GitHub.)
You should provide this information in narrative form (i.e., don’t just submit a list of questions and answers). You should use this as an opportunity to practice technical writing.
The proposal should be added your README.md file in your github repository.
Status update
A maximum 750-word status update is due several weeks into the project. The report should summarize the project, including any changes in direction or scope from the proposal and why those changes were made.
The update should also provide a revised schedule with milestones. If you’re on-track, you should say so. If you’re not, explain how your revised schedule differs from the original schedule.
It’s completely fine to drop features if you find that you were overly ambitious in the proposal. Just explain how doing so will let you complete the project successfully.
The status update will be added to your README.md in your github repository.
Final report
A maximum 1500–2000-word final report is due before presentations begin. This should include a standalone description of your project. You should describe what you accomplished and what you weren’t able to accomplish. Explain what the most challenging aspect of the project is. What was the most fun part?
Include anything else you think I should know.
The final report should be uploaded as a pdf to your github repository.
Presentation
During the scheduled final exam time, you will give a seven minute presentation on the project. This includes five minutes of talking (among the four of you) and two minutes of answering audience questions. Every group member must speak.
If you’re able to do so, I strongly encourage you to give a demonstration of your project.
You should tell the class who you are, what you did, and how you did it. You can tell us what didn’t work or was almost working if you would like. If you do a demonstration, you should show off some cool features.
Public speaking is very hard. This is a low-stakes way to get some practice at it in a supportive environment.