Preface

This website contains labs for CSCI 241. They are very much a work-in-progress. There are definitely errors, typos, misspellings, confusing portions, and probably things that are just plane wrong (like that use of “plane”). If you encounter anything confusing, please don’t hesitate to ask.

All corrections, big or small, are gladly welcome!

At the top of this page, you’ll find three icons. The left shows or hides the table of contents. The middle one lets you pick a theme for this book (go ahead and pick one you like). The right one lets you search the labs.

Once we get to the labs where you’ll be programming in Rust, there are a bunch of code samples. Many of these (but not all of them) are executable.

Example

Here’s an example of runnable code.

fn main() {
    println!("Welcome to CSCI 241!");
}

Mouse over the block of code above and click the Run button to see the output.

The code you write in these labs is designed to run on the lab machines. There is also a virtual machine you may connect to remotely if needed. See the remote coding instructions for details.

Throughout the labs, you’ll find a number of colorful blocks like the example above. Here are a few you’ll encounter in the labs.

Example

Example boxes contain important examples, usually of code that’s similar to the code you will be writing.

Tip

Tip or hint boxes contain important information that you’re almost certain to want to use in your implementations.

Bug

Bug boxes contains an example that showing a common error.

Note

There are a variety of other boxes, including note, info, and caution. At this point, they have no fixed meaning. As the labs are improved, this may change.

Remote coding

All of the labs can be completed using the lab computers in King 137.

Additionally, there is a virtual machine mcnulty (mcnulty.cs.oberlin.edu) that you can connect to remotely. This instructions for connecting depend on if you are connecting from on campus or off campus.

In a terminal, you can ssh to mcnulty.cs.oberlin.edu. This gives you a shell on the virtual machine. You will have the same home directory as the lab machines. Any changes you make in the VM will be reflected in the lab machines, and vice versa.

You can also use Visual Studio Code to connect to mcnulty. This also uses SSH. This lets you program remotely using the standard VS Code interface. I find it pretty convenient. You can also access the shell on the remote machine using VS Code’s Terminal window as usual.

On campus

From on campus, you can ssh to mcnulty entering the following command into a terminal. (The $ indicates the prompt; don’t type it yourself.)

$ ssh username@mcnulty.cs.oberlin.edu

where username is the username you use to log into the lab computers.

Off campus

From off campus, the mcnulty VM is not reachable. You need to tell SSH to “proxy jump” through occs.cs.oberlin.edu. Fortunately, this isn’t much more difficult. You just need to pass an option to ssh:

$ ssh -J username@occs.cs.oberlin.edu username@mcnulty.cs.oberlin.edu

Visual Studio Code

To connect to mcnulty using VS Code, open a new window File > New Window and in the welcome screen of the new window, click the Connect to… link. Select the option Connect to Host… Remote-SSH.

The first time you do this, you’ll have to enter the SSH connection details by clicking + Add new SSH Host….

Then you need to enter the full ssh command (either the on-campus command or the off-campus command from above).

VS Code will then ask which ssh_config file to update and gives several options. On macOS and Linux, you want the one in your home directory that’s in .ssh/config.

Once you’ve done this, click on the Connect to… link and then Connect to Host… Remote-SSH again. Select the host you just configured.

At this point it will ask you to enter the password you use to log in to the lab machines. If you are off campus, you’ll have to enter the password twice. Once for occs, and once for mcnulty.

The documentation for using SSH with VS Code is available here.

Tip

The Visual Studio Code extensions you have installed locally, such as rust-analyzer, may not be installed on the remote machine. If this is the case, then after connecting to the remote machine, you can install the extensions as normal (click on the 4 boxes icon, select the extension you want, and then click Install in SSH: mcnulty).

Lab 0. Getting Started

Welcome to the 0th lab in CS 241! In this short lab, you’re going to

  • Log in to a lab machine;
  • Install Visual Studio Code extensions;
  • Install Visual Studio Code extensions remotely; and
  • Debug a short Bash program.

This lab is to be completed individually.

Setup

To get started, first log into the lab machine.

Important

If the lab machine is currently running Windows, you’ll need to restart it. During the boot process, a text screen with different operating systems will pop up for 5 seconds. When this pops up, use the keyboard to select the Ubuntu option (which will probably be the second in the list, just under Windows).

Once you have logged in, open the Visual Studio Code application. To do so, click the Show Applications button in the bottom left corner of the desktop, and then open Visual Studio Code.

Move to Part 1 to install the extensions.

Part 1. Installing extensions

In this part, you’re going to install four extensions,

  1. Remote - SSH — this lets you connect to remote servers using Secure SHell (SSH);
  2. ShellCheck — this checks your shell scripts for common errors;
  3. rust-analyzer — this greatly eases the task of programming in Rust; and
  4. CodeLLDB — this is the debugger we’ll use to debug Rust programs.

Your task

Click on the 4-boxes icon on the left side of the Visual Studio Code window. This opens the Extensions Marketplace. In the textbox at the top left, enter the name of the extension you want to install. Click on the matching extension below the text box and then click the Install button.

After installing an extension, it may ask you to reload Visual Studio Code. Go ahead and do so.

Install all four extensions listed above.

Part 2. Installing remote extensions

In this part, you’re going to connect to a remote server, mcnulty, using Visual Studio Code and install the ShellCheck, rust-analyzer, and CodeLLDB extensions.

Your task

Start by reading the Remote coding page.

Follow the instructions on the Remote coding page for Visual Studio Code to connect to mcnulty.

At this point, follow the steps in Part 1 to install the three extensions. This time, rather than an install button, you should see an Install in SSH button.

In the future, you’ll be able to connect to mcnulty from an installation of Visual Studio Code on your own machines, even if you’re not in the lab. All of the files on your lab machine should also be available on mcnulty.

Part 3. Debugging a shell script

In this final part, you’re going to use the ShellCheck extension you just installed to debug a short Bash program.

Your task

Create a new file in Visual Studio Code by selecting New File… from the File menu. Name the file hello.sh and save it in your home directory.

Copy the short program below into hello.sh and save the file.

#!/bin/bash

if [[ $# -gt 0 ]]; then
    user="$1"
else
    user="$(whoami)"
fi

echo Hello ${user}, welcome to CS 241

Info

The line that starts with #! is called a shebang and it tells the operating system what program to use to run the file, namely /bin/bash.

Next is a somewhat unusual looking if statement that sets the variable user. If the user runs the script and passes foo as an argument like this

$ bash hello.sh foo

then user will be set to foo. If the user runs the script with no arguments like this

$ bash hello.sh

then user will be set to name of the current user by calling the whoami program.

The final line of the program prints text. Notice that ${user} is used to get the value of the user variable. (You can also use $user without the braces, but there are situations where the braces are required so I tend to use braces all the time.)

Open a terminal in VS Code by pressing Ctrl-` (control-backtick). Run the program a few times with different arguments and pay close attention to how the program behaves.

$ bash hello.sh
$ bash hello.sh Stu Dent
$ bash hello.sh "Stu Dent"
$ bash hello.sh "Stu         Dent"

Tip

Notice that $ bash hello.sh Stu Dent didn’t include the Dent portion whereas when the quotation marks were used, the full name appeared. Bash uses spaces to split the command into different parts. This is called word splitting. Stu and Dent became separate arguments passed to hello.sh. By using quotation marks, the whole string stays together as a single argument.

This will come up often when working with the terminal.

You may have noticed strange behavior with the final two examples. Namely, "Stu Dent" and "Stu Dent" behaved identically:

$ bash hello.sh "Stu         Dent"
Hello Stu Dent, welcome to CS 241

Where did the extra spaces go?

Look in VS Code and you’ll notice that ${user} has a squiggly line under it. Mouse over the line and you’ll see that the ShellCheck extension has reported an error here. You can see a list of all problems by opening the Problems pane by selecting Problems from the View menu.

ShellCheck says, “Double quote to prevent globbing and word splitting. shellcheck(SC2086).” Actually, the ShellCheck extension merely runs the shellcheck program. If we run shellcheck ourselves using the terminal as shown below, it will give us some more detail:

$ shellcheck hello.sh

In hello.sh line 9:
echo Hello ${user}, welcome to CS 241
           ^-----^ SC2086 (info): Double quote to prevent globbing and word splitting.

Did you mean: 
echo Hello "${user}", welcome to CS 241

For more information:
  https://www.shellcheck.net/wiki/SC2086 -- Double quote to prevent globbing ...

In addition to showing the error, it also suggests and fix and provides a link for more information.

The underlying problem here is that ${user} expanded to

Stu         Dent

but this underwent word splitting when evaluating the echo line. echo takes any number of arguments and prints them out in a line with a single space between them. So when executing the line

echo Hello ${user}, welcome to CS 241

it first expanded ${user} giving

echo Hello Stu         Dent, welcome to CS 241

This is split by spaces and thus executes exactly as if it had been written like this.

echo Hello Stu Dent, welcome to CS 241

Modify hello.sh by following the suggestion given by shellcheck and save your file. Run the program again. This time, any spaces should be preserved in the argument.

Tip

In general, when programming in Bash, you want to double-quote variables otherwise the variables undergo word splitting and you almost never want that.

See the ShellCheck Wiki for error SC2086 for more explanation as well as a suggestion for a better way to use quotation marks in the echo line in hello.sh.

Part 4. Getting Ready For Class

In this course, we will use a number of different online resources. In this part of the lab, we will make sure you are set up and ready to go.

Gradescope

Go to gradescope.com and log in using your oberlin.edu email account. (You may need to create an account.) When you log in, you should see CSCI 241 listed under Your Courses. If you click on the course, you should see some Reading Question assignments. You will need to complete these before class on the date listed on them.

If you do not see CSCI 241 listed as one of your courses, let your instructor know.

Ed

You should have received an email inviting you to the CSCI 241 Ed forum. If you did not, you can sign up with your oberlin.edu email address using this link. Sign up, and post a response to the Introductory Thread.

Syllabus

Please take a minute to read through the syllabus for this course (linked from Blackboard).

Submitting

In the other labs, you will be using GitHub to submit your assignments.

For Lab 0, call over the professor or lab helper, introduce yourself, and show your completed program.

Once you’ve done this, the professor or lab helper will mark that you have completed the lab.

At this point, you’re all done. Don’t forget to log out of the lab machines!

Lab 1. Introduction to Bash

Due: 2024-09-16 at 23:59

In this lab, you will explore the Bash shell and learn to use some of the standard command line utilities to

  • navigate the file system;
  • list the contents of a directory;
  • create, move/rename, and delete files and directories;
  • download files from the internet;
  • read manual pages to learn how to use these utilities; and
  • upload files to GitHub.

This lab is to be completed individually.

Setup

Log in to one of the lab machines and open a terminal by clicking the Show Applications button in the lower left corner of the desktop and open Terminal.

To get started, continue on to Part 1!

Part 1. Navigating the file system

The file system is a hierarchical arrangement of files, grouped into directories. The hierarchy forms a tree of the sort you learned about in CSCI 151. Each file and directory has a name and we can refer to any file in the file system by giving a path through the tree, starting at the root directory and using / to separate the directories along the path.

Here are some example paths.

  • / The root directory itself
  • /bin/bash The program bash (this is the shell we’re using right now!) which lives inside the directory bin which lives inside the root directory
  • /usr/bin/wget The program wget (this program will let us download files from the internet) which lives in the directory /usr/bin

Naming files by starting at the root / all the time is time-consuming. Each program has a notion of its current directory. We can also name files by giving a path from the current directory.

Here are some examples of relative paths.

  • paper.pdf The file paper.pdf inside the current directory
  • Music/swift.mp3 The file swift.mp3 inside the Music directory inside the current directory

The terminal you opened launched the bash shell and it should be currently running. You should see a prompt in the terminal that looks like user@mcnulty:~$ (yours will have a different username and hostname—the name of the computer). From here on out, the prompt will be specified only as $. You should not type the $ character when entering input.

Bash has several commands for navigating the file system and listing the contents. The most commonly used commands are

  • cd change current directory; $ cd /usr/bin changes the current directory to /usr/bin; with no arguments, cd changes the current directory to your home directory
  • ls lists the contents of a directory; $ ls /tmp lists the contents of the directory for temporary files, /tmp; with no arguments, ls lists the contents of the current directory
  • pwd prints the current directory (the name stands for “print working directory” where “working directory” just means the current directory).

Info

By convention, files and directories that start with a period are hidden from directory listings. If you want to see all files, including hidden ones, use $ ls -a.

Every directory has two directory entries named . and ... The single period, ., is the entry for the directory itself. Two periods, .., is the entry for the parent directory.

Perhaps the most common use of .. is to change to the parent directory using $ cd ... For example, after changing to a directory like $ cd foo/bar/, I can return to the foo directory by running $ cd ...

Your task

Create a file named task1.txt using a text editor. To do so, click the Show Applications button in the bottom left corner of the desktop, and then open Text Editor.

Using the shell commands cd and ls, find two different files or directories in each of the following directories

  • /
  • /usr
  • /usr/bin
  • /etc

Write the names of the 8 different files (and which directory they’re in) in task1.txt. Save this file in your home directory for now. We’re going to move it later.

Info

The shell expands a tilde, ~, into the the path for your home directory. Thus, you can, for example, change back to your home directory using $ cd ~. You probably want to do so now.

You may construct paths relative to your home directory by starting the path with ~/. For example, the shell will expand ~/foo/bar into the path to the foo/bar file or directory inside your home directory.

Note that the shell treats an unquoted ~ differently than a quoted tilde, '~' or "~". Run the command $ echo ~ '~' "~" to see the difference.

Use $ ls ~ to print the contents of your home directory. You should see task1.txt in that directory listing.

Use $ cat ~/task1.txt to print the contents of the file to the terminal.

Return to your home directory by running $ cd (with no arguments) or equivalently $ cd ~.

Part 2. Manipulating files and directories

In Part 1, you created a file in your home directory using a text editor and you were able to see that file in the file system using the ls command. Now, we’re going to do the opposite: We’re going to manipulate files on the command line and examine the results in the GUI—the graphical user interface.

We’re going to be using some new commands.

  • mkdir Creates a new directory
  • cp Copies files or (with a suitable option) directories
  • rm Removes files or (with a suitable option) directories
  • mv Moves (or renames) files and directories
  • rmdir Removes an empty directory (this is less commonly used than rm for this operation)
  • man This shows a manual page which provides a bunch of information about many of the shell commands and programs
  • wget Downloads files from the Internet

Your task

Create the directory cs241/lab1 in your home directory using the mkdir command. (If your current directory is not your home directory, you can run $ cd or equivalently $ cd ~ to change to your home directory.)

If you try the “obvious” command, you’ll likely encounter a confusing error.

$ mkdir cs241/lab1
mkdir: cannot create directory ‘cs241/lab1’: No such file or directory

The problem here isn’t that cs241/lab1 doesn’t exist (which is good because we’re trying to create this directory), the problem is that the cs241 directory doesn’t exist. It’d be great if the error message reflected which component of the path didn’t exist, but alas, it does not.

Caution

In general, pay close attention to error messages but be aware that when the error message is about a file path, it could be about any component of that path.

To create the cs241/lab1 directory, you can use two invocations of mkdir, first to create cs241 and second to create lab1 within it.

Move the file task1.txt you created in Part 1 inside the cs241/lab1 directory using mv. The basic syntax for the mv command is

$ mv src dest

Keep in mind that when naming a file with a path, you either need to give it an absolute path starting with / or you need to give it a path from the current directory. Using ~ for your home directory can be a big time saver when the files you want to use are in your home directory (or in a directory in your home directory).

Open your home directory using the Files application from the left sidebar in the desktop. The task1.txt file should not be there. Open the cs241/lab1 directory. This should contain task1.txt and nothing else.

Create a directory named books in the lab1 directory using mkdir (you should see the new directory appear in Files).

Run $ man wget to pull up the manual page for the wget program which you can use to download files from the Internet. Press q to exit man when you’re done. Run $ wget --help to see similar information.

cd into the books directory you created and use wget to download a copy of Bram Stoker’s Dracula from https://www.gutenberg.org/cache/epub/345/pg345.txt.

Rename the file from pg345.txt to Stoker, Bram - Dracula.txt. Note that this name has spaces in it. You’ll need to specify it as 'Stoker, Bram - Dracula.txt' or Stoker,\ Bram\ -\ Dracula.txt.

Use wget again to download James Joyce’s Dubliners from https://www.gutenberg.org/files/2814/2814-0.txt. Rename it Joyce, James - Dubliners.txt.

Info

Different versions of ls will print file names with spaces in them differently. Some versions simply print the file names with the spaces. Others print the file names with quotes around them. When you run ls inside the books directory, you may see something like this.

$ ls
'Joyce, James - Dubliners.txt'  'Stoker, Bram - Dracula.txt'

At this point, the files and directories in your cs241 directory should look like this.

cs241/
└── lab1
    ├── books
    │   ├── Joyce, James - Dubliners.txt
    │   └── Stoker, Bram - Dracula.txt
    └── task1.txt

If you’d like to generate a similar picture of your files, try the tree command. tree directoryname will print a drawing like this of the directory contents to the terminal.

Part 3. Grepping

The grep command is used for searching for text files for a “pattern” and printing out each line that matches the pattern. For example, $ grep vampire 'Stoker, Bram - Dracula.txt' prints out each line containing the word vampire (in lower case). Try it out!

Read grep’s man page to figure out how to perform a case-insensitive search and run the command to print out all lines matching vampire, case insensitively. Hint: typing /case (and then hit enter) while viewing a man page will search for case in the manual. While searching, you can press n/N to go to the next/previous instance.

Your task

Using a text editor, create the file task3.txt in the lab1 directory.

Use grep to print out a count of the lines matching vampire case insensitively. (Search the man page again.) Write the command you used and its output in task3.txt

Open the man page for grep one final time and figure out how to get grep to print the line numbers (and the lines themselves) that match Transylvania and then do that. Write the command you used and its output in task3.txt.

Find and use the command to print out a count of every word in both books. Use the apropos command to search manual pages for keywords. The -a (or --and) option to apropos lets you search for commands that involve all of the key words so $ apropos -a apple sauce will match commands whose descriptions contain both the words apple and sauce. So use apropos -a with appropriate keywords to find a command that produces a word count.

Run that command on all .txt files in the books directory using

$ cmd books/*.txt

where cmd is the command you found with apropos.

Info

The * is called a glob and Bash replaces it with all files that match. In this case, books/*.txt will be replaced with paths to files in the books directory that end with .txt. Globs are powerful and make working with multiple files with similar names easy.

Put the command you ran and its output in task3.txt.

Finally, read the man page for the command to count words in files and find the option to only print line counts. Use the command on both files. Put the command and the output in task3.txt.

Submitting (5 points)

In this course, we’re going to be using GitHub. GitHub uses Git which is a version control system used by programmers. We’re going to learn a lot more about Git later, but for now, let’s use its most basic functionality.

Go to GitHub and create an account, if you don’t already have one. Many of the courses in the computer science major will require using GitHub so you’ll be well-served by getting comfortable with it now.

Go to the GitHub Classroom assignment page and accept the assignment. Wait a few seconds and then reload the page and you should see “You’re ready to go!” Click the link for the assignment repository.

The repository it created should be at a URL like https://github.com/systems-programming/lab1-accountname. The last component of the URL will be some combination of the lab name and your GitHub user name. You will need this in the cloning step below.

Next, you need to clone the assignment repository which will download all of the files (if any) in the remote repository on GitHub to your computer. To do this, you’ll need to use gh to authenticate to GitHub and then perform the clone. gh is the command-line tool for working with GitHub. Most of the time, you will use plain Git commands, but for authentication and cloning repositories, you’ll use gh.

In a terminal, run the command

$ gh auth login

and then using the arrow keys, select GitHub.com and press enter. Select HTTPS for the preferred protocol and press enter. When it asks to authenticate Git with your GitHub credentials, type Y and press enter. When it asks how you’d like to authenticate, select Login with a web browser and press enter. Copy the one-time code it displays and then press enter to open the browser. If it cannot open the browser, you can click here.

In either case, paste the code you copied and login (which should only be required if you have logged out of GitHub). Click the Authorize github button on the web page. At this point, you should see something like

✓ Authentication complete.
- gh config set -h github.com git_protocol https
✓ Configured git protocol
✓ Logged in as accountname

in the terminal.

Now you’re ready to clone your repository. (In future labs, you shouldn’t need to authenticate again unless you log out.) Run

$ gh repo clone systems-programming/lab1-accountname 

where lab1-accountname should match the URL for the repository described above.

At this point, you should have a directory named lab1-accountname (where, again, the actual name is specific to your repository). This is your newly-created local repository.

Copy all of the files and directories from lab1 into the newly-created local repository’s directory. Read the man page for cp to figure out how to recursively copy directories.

cd to your local repository directory. The contents of this directory (recursively) should look like this.

repo
├── books
│   ├── Joyce, James - Dubliners.txt
│   └── Stoker, Bram - Dracula.txt
├── task1.txt
└── task3.txt

We want to add the files to the repository, commit them, and then push them to GitHub. (We’ll talk about why there are separate steps later. You can think of these three operations as “inform Git about the current state of the files,” “have Git make a snapshot of the current state of the repository,” and “tell GitHub about the changes we made in the local repository.”

These are the commands you need to run.

$ git add books task1.txt task3.txt
$ git commit -m 'Submitting lab 1'
$ git push

Caution

If you make changes to a file, you need to run $ git add and pass it the path of the file you changed again. And then you’ll need to run $ git commit and finally $ git push.

The -m 'Submitting lab 1' causes Git to create a commit message with the contents Submitting lab 1. Normally, we’ll want to create more descriptive messages, but this is fine for now.

At this point, you’re done! You can check that everything worked correctly by going back to your repository on GitHub and checking that all of the files and directories show up. You should see two directories and two files:

  • .github
  • books
  • task1.txt
  • task3.txt

Info

The .github directory was added automatically by GitHub Classroom. So why didn’t it show up in your local repo? Because its name starts with a period! Recall that files whose names start with a period are hidden. You can see the hidden files using ls -a. If you do this, you’ll also see a .git directory too. .git is where Git stores its local configuration.

Finally, you can remove your lab1 directory that we started with now that you’ve copied everything over to your local repository for lab 1. Since lab1 isn’t empty, $ rmdir lab1 won’t work to remove it. And since it’s a directory, $ rm lab1 work work either! Go ahead and try both commands out.

Read the man page for rm to figure out how to remove directories recursively.

Following the instructions above regarding which files should appear in your repository and which must be removed is worth 5 points. Every subsequent lab also has 5 free points for following the submission directions.

Lab 2. Shell scripting

Due: 2024-09-23 at 23:59

Shell scripts show up all over the place from programs on your computer to building software to running test suites as part of continuous integration. Learning to write (often short) Bash scripts can be a massive productivity boost as you can automate many tedious tasks. For example, if you find yourself needing to run a sequence of shell commands multiple times, a shell script is often the perfect tool. No more forgetting an argument or accidentally running the commands out of order. Write it once, run it many times.

In this lab, you will learn

  • to write Bash shell scripts;
  • how to run programs multiple times with different arguments;
  • how to use tools to analyze your scripts for common errors.

Preliminaries

First, find a partner. You’re allowed to work by yourself, but I highly recommend working with a partner. If you choose to work with a partner, you and your partner must complete the entire project together. Dividing the project up into pieces and having each partner complete a part of it on their own will be considered a violation of the honor code. Both you and your partner are expected to fully understand all of the code you submit.

Click on the assignment link. One partner should create a new team. The second partner should click the link and choose the appropriate team. (Please don’t choose the wrong team, there’s a maximum of two people and if you join the wrong one, you’ll prevent the correct person from joining.)

Once you have accepted the assignment and created/joined a team, you can clone the repository and begin working. But before you do, read the entire assignment and be sure to check out the expected coding style.

Be sure to ask any questions on Ed.

Coding style

For all of the shell scripts you write, I recommend you follow the Google Shell Style Guide. Look at the section on formatting. These rules are, in many ways, arbitrary. However, it’s good to stick to one style. The guide says to use two spaces to indent. Use whatever you’d like, but be consistent.

Hints for VS Code users

Once you open a shell script, the bottom of the window should show something like

Ln 1, Col 1  Spaces: 4  UTF-8  LF  Shell Script

You can change the number of spaces by clicking on Space: 4, then Indent using spaces and then select 2 (to match the Shell Style Guide).

Hints for Vim users

If you use NeoVim or Vim as your editor, you can include the line (called a modeline)

# vim: set sw=2 sts=2 ts=8 et:

at the bottom of each of your scripts to force Vim to indent by 2 spaces and to ensure that tabs will insert spaces (to match the Shell Style Guide). This requires “modeline” support to be enabled which is disabled by default. You can enable it by adding the line

set modeline

to your ~/.vimrc file, creating it if necessary.

You can also set options in your ~/.vimrc file, creating one if necessary. For example, on mcnulty, I have the simple ~/.vimrc.

set background=dark
filetype plugin indent on
autocmd FileType sh setlocal shiftwidth=2 softtabstop=2 tabstop=8 expandtab
set modeline

The first line tells Vim to use colors suitable for a terminal with a dark background. The second line tells Vim to use file-type aware indenting. The third line tells Vim to set those options for shell script files. See the Vim wiki for more details. And the fourth enables modelines.

After you write the #!/bin/bash line at the top of your file (or a modeline at the bottom), you’ll probably want to reopen the file so that Vim knows its a bash file and turns on the appropriate syntax highlighting and indentation.

If you use emacs, you’re kind of on your own. Feel free to ask on Ed, search StackOverflow, and read the Emacs Wiki.

Same with Nano. This might be useful.

Run time errors and return values

For each of the parts below that ask you to print out a usage message or an error message, this message should be printed to stderr. You can use code like this to print a usage message.

echo "Usage: $0 arguments" >&2

Any errors should cause the script to exit with a nonzero value (1 is a pretty good choice). Scripts that run successfully should exit with value 0. You can use exit 1 to exit with the value 1.

Script warnings and errors

Make sure your scripts pass shellcheck without errors or warnings. If you disable a particular warning, you must leave a comment giving a very good reason why you did so.

Executable scripts

All of your scripts should start with the line

#!/bin/bash

and must be executable. Running $ chmod +x on each of the files before adding them to Git is sufficient.

Script documentation

Each of your scripts should contain a comment at the top of the script (below the #!/bin/bash line) that contains usage information plus a description of what the program does.

Example.

#!/bin/bash

# Usage: shellcheckall [dir]
#
# The dir parameter should be the path to a directory. Shellcheckall will...

To get started, continue on to Part 1!

Part 1. Multiple executions (15 points)

One of the most common tasks when working with computers is running a program multiple times with different inputs. (As just a single example, in the final project for CSCI 210, you first write a program to simulate how a part of the processor operates. You will then run this program with different options on different inputs. This is an error-prone operation. Instead, you could write a simple shell script to run the program all of the different ways you need.)

In this part, you will be writing a shell script to run a program (that is provided to you)

Clone the jfrac repository outside of your assignment repository. Do not commit the jfrac code to your assignment repository! The jfrac repository holds the source code for a simple program, written in the Rust programming language, that generates images of the Julia set fractal.

Inside the jfrac directory, run the following command.

$ cargo run -- default.png

The cargo tool is used to building (and in this case running) software written in Rust. This will produce an image file that is 800 pixels wide and 800 pixels high named default.png. It looks like this.

Julia set fractal.

The mathematics behind the Julia set are explained on the Wikipedia page, but are not important for us here. What is important is that by changing the complex number in the equation , we can drastically change the output fractal. The number is specified by passing the --constant option to the program. For example, the default constant is so running

$ cargo run -- --constant="-0.4 + 0.6i" explicit.png

will produce the file explicit.png which is identical to default.png shown above. The strange -- by itself is required here. Everything before the -- is an option or argument for cargo. Everything after the -- is an option or argument for the jfrac binary itself.

There’s no man page for jfrac, but it comes with a short help. Run

$ cargo run -- --help

to see the options.

Your task

First, play around with the --constant option to jfrac to produce images with different constants. Look at the examples given in the quadratic polynomials section of Wikipedia for inspiration. You’ll likely want to select values of such that and .

Find 5 values of that you like.

Your task is to write a Bash script named genfrac which will run the jfrac command several times (by using cargo run as shown above) using two nested for loops.

Your script must generate each of the 5 fractals at 3 different sizes, 200 × 200, 400 × 400, and 800 × 800. You can assume your script is running inside the jfrac directory - however, once you are done testing and running it you will need to move it to your lab2 repository, and make sure not to submit the jfrac directory.

Each file should be named Julia set (constant) size.png. For example, using the default constant and default size, you would produce the file named Julia set (-0.4 + 0.6i) 800x800.png.

You should have 15 generated images in total. Add those images to your repository in an images directory.

Part 2. Simple pipeline (5 points)

For several subsequent parts of this lab, you’re going to need to run the scripts you write on a large code base. We’re going to use the source code for the Linux kernel. You’re going to write a one-line shell script that downloads and decompresses the source code.

Your task

Write a one-line script that will uses curl to download the compressed Linux source code and pipes it to tar which extracts the source code into the current directory. The Linux source code is available at this URL.

https://github.com/torvalds/linux/archive/refs/tags/v6.4.tar.gz

Your command should be structured as follow.

$ curl ARGS | tar ARGS

After you run the command, you should have a new directory named linux-6.4 which contains the source code to version 6.4 of Linux. Do not commit the Linux source code to your assignment repository.

Write the pipeline command you come up with in a file named README. Normally, a README provides information about a project. GitHub will display the contents of your README when you visit the web page for your repository (after you have pushed the README to GitHub). You’ll use this same file several times to record the commands you come up with.

Hints

  1. If you run curl https://github.com/torvalds/linux/archive/refs/tags/v6.4.tar.gz with no other options, no output is printed. That’s because GitHub, like many websites will redirect the browser to different URLs. The way this works technically is to use a redirection HTTP status with a Location header. Search the curl man page to figure out which option to use to get curl to follow the redirection specified in the Location header.
  2. The tar command, like many (but sadly not all) command-line utilities that operates on files, will accept a single - as the name of the file to mean read from stdin rather than reading from the file (and for output, write to stdout rather than writing to a file).

Part 3. Shell script hygiene (25 points)

Shell scripting is a powerful tool that is used in many areas of software development, including build scripts (that build the software), test scripts (that test the software), and conformance checking scripts (that check things like formatting conventions). An example of a conformance checking script might be one that is run on every git commit to check that the files being changed have no errors or warnings according to some tool.

You’re going to write a conformance checking script that checks if all of the shell scripts in a directory pass shellcheck.

To find all of the shell scripts in a directory, you will use the find command. Conceptually, we want to loop over all of the files that have a particular extension and do something with them. For example, if we want to print the paths of every text file in a directory whose path is stored in the dir variable, we’d like to be able to do something like this:

Bug

for file in $(find "${dir}" -name '*.txt'); do
  echo "${file}"
done

Unfortunately, this doesn’t work! If we run shellcheck on this we get a warning

for file in $(find "${dir}" -name '*.txt'); do
            ^-- SC2044 (warning): For loops over find output are fragile. Use find -exec or a while read loop.

The problem here is that file names can contain spaces (and even newline characters!). The shellcheck wiki page for error SC2044 gives example code to make this work. It would look something like

while IFS= read -r -d '' file; do 
  echo "${file}"
done < <(find "${dir}" -name '*.txt' -print0)

Note carefully the space after the = and the space between the two < characters!

What this arcane construction is doing is it’s first running

find "${dir}" -name '*.txt' -print0

which is the same as the previous find command except that rather than printing the matching paths separated by a newline (which, as mentioned, is a valid character in a file name), it separates the output with a 0 byte. (Not the character 0 which is a byte with integer value 48, but the byte with value 0.)

The output of the find command becomes the standard input for the

while IFS= read -r -d '' file; do ... done

loop. The loop is going to read from stdin (from the read command) and split up the input by 0 bytes (which, conveniently, is what the -print0 argument to find produced).

Each time through the body of the loop, the file variable will be set the path of one of the files that find found.

Your task

Write a shell script called shellcheckall which takes zero or one parameters. The parameter, if given, should be a path to a directory. If no parameters are given, it should act on the current directory. If two or more parameters are given, output usage information to stderr, and exit with return value 1. If the supplied parameter is not a directory, output an error message (on stderr) and exit with return value 1.

The script should search the given directory (and any directories inside of it) to find all of the files with the extension .sh. It should run shellcheck on each file and count how many scripts pass out of the total number of shell scripts. If any of the scripts fail shellcheck, then when your script exits, it should exit with return value 1. See the examples below, including the return values.

Make sure you handle files with spaces in the name. Make sure the script works correctly when run on an empty directory and one that contains no shell scripts (see the examples below).

How many shell scripts in the entire linux-6.4 pass shellcheck? Write your answer in your README.

Write a for-loop that runs shellcheckall on each directory in linux-6.4. It should print out the name of the directory, a colon, a space, and then the output from shellcheckall. Your loop should probably start like this.

for dir in linux-6.4/*/; do

The / after the * means the glob will only match directories. You’ll want to use echo -n to print text without a trailing newline. See the final example below.

Put your for-loop and the output in your README.

Examples

The following examples echo the return value $? to show that it returns 0 on success and 1 on an error or if a file does not pass shellcheck.

$ ./shellcheckall too many args
Usage: ./shellcheckall [dir]
$ echo $?
1

$ ./shellcheckall linux-6.4/tools/perf
6 of 74 shell scripts passed shellcheck
$ echo $?
1

$ ./shellcheckall empty-directory
0 of 0 shell scripts passed shellcheck
$ echo $?
0

Here’s an example of the for loop output.

$ for dir in linux-6.4/*/; do ...; done
linux-6.4/Documentation/: 1 of 9 shell scripts passed shellcheck
linux-6.4/LICENSES/: 0 of 0 shell scripts passed shellcheck
linux-6.4/arch/: 15 of 39 shell scripts passed shellcheck
...

Part 4. Top file types (20 points)

In this part, you’re going to write a complex pipeline to answer a simple question: What are the top 8 file types in the Linux 6.4 source code. We’re going to use a file’s extension to determine what type of file it is. We’re going to ignore any files that don’t have extensions.

Your task

Write a script topfiletypes to answer this question. Your script should consist of a single complex pipeline. Put the output of your script in your README.

The structure of the pipeline will look like this

while IFS= read -r file; do ...; done < <(find linux-6.4 -name '*.*') | ... | ... | ... | ...

Notice how the output from the while loop is passed as standard input to the next command in the pipeline.

Your output should look like this.

$ ./topfiletypes
  32463 c
  23745 h
   3488 yaml
...

Hints

  1. The while loop given here is slightly different from the one you used before (no -d '' or -print0 options). Most filenames don’t actually have newlines in them so we can use this technique to read one line at a time from the output of the find command rather than reading up to a 0 byte.
  2. Bash supports a lot of convenient variable expansions. You want to read the section of that page that describes ${parameter##word}. Here are some examples:
    $ x=foo.bar.baz
    $ echo "${x##*.}"
    baz
    
    $ y=linux-6.4/mm/mempool.c
    $ echo "${y##*.}"
    c
    
    You can use this inside your while loop to print the extension of the file.
  3. The sort command can be used to sort lines of input. For example, given the file example
    foo
    bar
    foo
    foo
    cat
    cat
    foo
    
    if we run $ sort example, we get
    bar
    cat
    cat
    foo
    foo
    foo
    foo
    
  4. The uniq command can be used to compress identical, consecutive lines of input into a single line of output. sort and uniq are frequently used together as sort | uniq to read lines of input, sort them, and then only output the unique lines. Running $ uniq example produces
    foo
    bar
    foo
    cat
    foo
    
    Running $ sort example | uniq gives
    bar
    cat
    foo
    
  5. The uniq command can take an argument to print out the count of each line. Here’s $ sort example | uniq -c.
          1 bar
          2 cat
          4 foo
    
  6. sort can sort in reverse as well as performing a numeric sort (i.e., sort numbers in the usual way so that 9 comes before 10).
  7. The head command can be used to print the first several lines of a file. Look up the options to print the first 8 rather than the default 10.
  8. In the loop, print out the extension of the path in the file variable. Use a combination of sort, uniq, and head in the pipeline to produce the final result. You’ll want to use sort multiple times.
  9. You can write this all as a single line. Don’t do that. Use \ at the end of each line to continue on the next line.
    while IFS= read -r file; do
      echo ...
    done < <(find linux-6.4 -name '*.*') \
      | ... \
      | ... \
      | ... \
      | ...
    

Submitting (5 points)

To submit your lab, you must commit and push to GitHub before the deadline.

Caution

Make sure you followed all of the instructions about stderr and return values, script warnings an errors, executable scripts, and script documentation.

Your repository should contain the following files and directory.

  • README
  • genfrac
  • shellcheckall
  • topfiletypes
  • images/

Your repository should not contain either the jfrac or the Linux code.

Any additional files you have added to your repository should be removed from the main branch. (You’re free to make other branches, if you desire, but make sure main contains the version of the code you want graded.)

The README should contain

  1. The names of both partners (or just your name if you worked alone…but please don’t work alone if you can manage it).
  2. Your answers to questions for each part and the commands you used to find them.

Lab 3. Introduction to Rust

Due: 2024-09-30 at 23:59

Preliminaries

First, find a partner. You’re allowed to work by yourself, but I highly recommend working with a partner. If you choose to work with a partner, you and your partner must complete the entire project together, as a single repository. Dividing the project up into sections and having each partner complete a part of it on their own will be considered a violation of the honor code. Both you and your partner are expected to fully understand all of the code you submit.

Click on the assignment link. One partner should create a new team. The team name cannot be the same name used for a previous lab. The second partner should click the link and choose the appropriate team. (Please don’t choose the wrong team, there’s a maximum of two people and if you join the wrong one, you’ll prevent the correct person from joining.) You cannot choose a team name you’ve used previously.

Once you have accepted the assignment and created/joined a team, you can clone the repository and begin working.

Be sure to ask any questions on Ed.

Compiler warnings and errors

Make sure your code compiles and passes clippy without errors or warnings. That is, running

$ cargo clean
$ cargo build
$ cargo clippy

should build your program without errors or warnings. Running these commands will only work if you have cded into a Rust project directory.

Info

Cargo’s clippy is a command-line utility for linting Rust code. A linter automatically analyzes source code for errors, vulnerabilities, and stylistic issues to improve code quality. Cargo clippy will provide suggestions and warnings for potential errors in your code, such as unused variables, unnecessary operations, and more. For example, it will suggest more efficient ways of writing your code and point out common pitfalls that could lead to runtime errors.

Tip

VS Code doesn’t always update its error underlining until after you save your file. So make sure to do that with some regularity while you’re writing code.

Formatting

Your code must be formatted by running $ cargo fmt. This will automatically format your code in a consistent Rust style.

To get started, continue on to Part 1!

Part 1. Guessing game (15 points)

The first application you will write is a number guessing game.

Note

The tutorial you’re going to be following for this part of the lab contains a discussion of the integer types like i32, i64, and u32. It talks about how many bits each of them is, 32 or 64, respectively. You’ll learn a lot more about this in CSCI 210. To relate this to Java, an i32 is Java’s int type. An i64 is Java’s long type. The u32 and u64 types don’t have analogues in Java. These are the unsigned types meaning that they can only hold non-negative values. This is perhaps best demonstrated with an example. We can print out the maximum and minimum values these types can hold as follows.

#![allow(unused)]
fn main() {
println!("An i32 holds values from {} to {}", i32::MIN, i32::MAX);
println!("An i64 holds values from {} to {}", i64::MIN, i64::MAX);
println!("An u32 holds values from {} to {}", u32::MIN, u32::MAX);
println!("An u64 holds values from {} to {}", u64::MIN, u64::MAX);
}

Click the Run button to see the results. There are other integer types: i8, i16, u8, u16, isize, and usize. You can get their minimum or maximum values the same way.

Your task

Follow the instructions given in Chapter 2 of the book with the following adjustments:

  • In the section “Setting Up a New Project,” it tells you to go to a projects directory. Instead, cd into the assignment repository you cloned. You’ll create guessing_game in there when you run cargo new guessing_game

    It’s important that you do this from inside the assignment repository. Don’t run cargo new guessing_game and then later mv the directory into the assignment repo.

    After you run cargo new guessing_game, open up Visual Studio Code by running

    $ code guessing_game
    

    This will open Visual Studio Code which will ask you some questions:

    1. Do you trust the authors of the code? Yes.
    2. It detects that the containing directory is the root of a Git repository and it will ask if you want to open it. Say yes and then select the path to the assignment directory from the list displayed (there will only be one entry). Now, you can use Git directly from VS Code’s user interface.
  • When it tells you to run cargo run, you can do that from your open terminal window (after $ cd guessing_game), or you can use VS Code’s built-in terminal to run it.

    Let’s configure VS Code to be able to build, run, and debug our code. Click on the Run and Debug button (shaped like a triangle with a bug on it) on the left side of the window. Click the “create a launch.json file” text. It will ask you for a debugger, select LLDB. This asks if you want to generate launch configurations for its targets. Say yes.

    You can close the launch.json file it created and opened.

    At this point, you can run and debug your application from VS Code.

Once you have finished writing the guessing_game, you should make sure you’ve added your files to Git and committed them. You can do so either from the command line, or from within VS Code. Here are the docs for how to do it from within VS Code.

Part 2. Iteration (5 points)

Iteration with iterators is a key concept in Rust. Iteration is similar to Java but has some quirks which can be quite surprising.

For the rest of this lab, you’ll explore iterating over elements in a vector and iterating over characters in a string.

Your task

In a shell, cd to your assignment repository and create a new project using cargo and then open it in VS Code.

$ cargo new iteration
$ code iteration

Change the main function to

fn main() {
    let data = vec![2, 3, 5, 7, 11, 13];

    for x in data {
        println!("{x}");
    }
}

Tip

Hover your mouse over the code above and some buttons will appear in the top right of the text box. Click the Run button to run this code.

Info

The vec! macro creates a new Vec containing the elements in the brackets. (Click the link in the previous sentence to go to the documentation for Vec.) Vec is the standard vector class in Rust: it is a growable array of elements, similar to a list in Python and an ArrayList in Java. The vec! macro here gives a result similar to this (but more efficient and easier to read).

    let mut data = Vec::new(); // Create a new Vec
    data.push(2);  // Add 2 to the end of the Vec.
    data.push(3);  // Add 3 to the end of the Vec.
    // ...
    data.push(13); // Add 13 to the end of the Vec.

Predict what the output of the code will be and then run it. If you predicted incorrectly, try to figure out why the code does what it does before moving on.

Now, duplicate the for loop

fn main() {
    let data = vec![2, 3, 5, 7, 11, 13];

    for x in data {
        println!("{x}");
    }

    for x in data {
        println!("{x}");
    }
}

Again, predict what the output of the code will be and then run it. (You can click the Run button on the above text box to see the error message.)

I suspect the results are quite surprising. You probably got an error that looks like this.

error[E0382]: use of moved value: `data`
   --> src/main.rs:8:14
    |
2   |     let data = vec![2, 3, 5, 7, 11, 13];
    |         ---- move occurs because `data` has type `Vec<i32>`, which does not implement the `Copy` trait
3   |
4   |     for x in data {
    |              ---- `data` moved due to this implicit call to `.into_iter()`
...
8   |     for x in data {
    |              ^^^^ value used here after move
    |
note: `into_iter` takes ownership of the receiver `self`, which moves `data`
   --> /Users/steve/.rustup/toolchains/stable-aarch64-apple-darwin/lib/rustlib/src/rust/library/core/src/iter/traits/collect.rs:262:18
    |
262 |     fn into_iter(self) -> Self::IntoIter;
    |                  ^^^^
help: consider iterating over a slice of the `Vec<i32>`'s content to avoid moving into the `for` loop
    |
4   |     for x in &data {
    |              +

For more information about this error, try `rustc --explain E0382`.
error: could not compile `iteration` (bin "iteration") due to previous error

This is a lot of information and it’s quite difficult to understand at first. Reading error messages takes practice. If you look carefully, you’ll notice that the error message is divided into three parts, the error, a note, and a help.

The error tells us that we have tried to use a moved value, namely data on line 8, column 14 of the file src/main.rs. Next, it shows that on line 2 the variable data has type std::vec::Vec<i32> which does not implement Copy. Then, on line 4, data was moved due an implicit call to .into_iter() and finally, on line 8, we tried to use data again.

We’ll talk about what this all means but basically, the first for loop used up or consumed our Vec and thus the Vec no longer exists. This is not how we normally expect for loops to work. Nevertheless, this is how Rust works.

The note portion of the error message tells us why it was used up and shows us code from the standard library to explain why. Let’s skip this part for now.

Finally, the help portion offers a way to solve this problem. If we use for x in &data { } (note the addition of the ampersand &), this will avoid consuming data. Note that it suggests making this change only on the first for loop (which for me happens to be on line 4). By using &data rather than data, we’re instructing the loop to iterate over a reference to the Vec, rather than the Vec itself. The upshot is that we can now iterate multiple times.

Go ahead and make that change and rerun cargo run. At this point, your code should print out the list twice.

Part 3. Reversing a vector (15 points)

In the rest of the lab, you’re going to write some short functions dealing with vectors and strings and you’re going to write some unit tests for the functions.

Continue working in the main.rs file of your iteration project.

Your task

Implement the function reversed_vec.

fn reversed_vec(input_data: &[i32]) -> Vec<i32> {
    let mut result: Vec<i32> = Vec::new();
    todo!("Finish this function")
}

Two things to notice

  1. The input argument input_data is of type &[i32]. We’ll talk more about this type later, but for now, think of this as a reference to a Vec. We saw this above when we wrote for x in &data {} where data was a Vec so &data essentially gives us a &[i32].
  2. The result variable is declared mutable (with the mut keyword) so we can modify it by inserting elements using the push() function.

Before you remove the todo!() and implement this function, let’s write a unit test similar to the unit tests you wrote in CSCI 151.

At the bottom of main.rs, add the following code.

#[cfg(test)]
mod test {
    use super::*;

    #[test]
    fn reversed_vec_empty() {
        let data = vec![];
        assert_eq!(reversed_vec(&data), vec![]);
    }
}

There’s a lot to unpack here. The first line, #[cfg(test)] tells the compiler that the item that follows it should only be compiled when we’re compiling tests. And that item is a module named test. (The name is arbitrary but is traditionally called test.) We’ll talk more about modules later.

The use super::*; line makes all of the functions in main.rs outside the test module available for use inside the test module.

Finally, each unit test we write is just a Rust function that is annotated with #[test]. If you omit this annotation, the function will not be treated as a test.

Run the command $ cargo test. You should see some warnings followed by a test failure (remember, we didn’t implement reversed_vec() so it makes sense our test should fail).

running 1 test
test test::reversed_vec_empty ... FAILED

failures:

---- test::reversed_vec_empty stdout ----
thread 'test::reversed_vec_empty' panicked at 'not yet implemented: Finish this function', src/main.rs:19:5
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace


failures:
    test::reversed_vec_empty

test result: FAILED. 0 passed; 1 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s

Notice that the output tells us our code panicked (i.e., failed) on line 19 of main.rs.

Before implementing reversed_vec(), write two more tests

    #[test]
    fn reversed_vec_one() {
        todo!();
    }

    #[test]
    fn reversed_vec_three() {
        todo!();
    }

that test the result of reversing a vector with 1 element and a vector with 3 elements.

$ cargo test should now show three failing tests.

Implement the rest of reversed_vec().

Hint

  1. You can create an iterator over input_data using input_data.iter(). Some iterators let you create reverse iterators using the rev() function. In other words, given one iterator it, you can create a new iterator via it.rev() that will iterate in reverse order. So for x in input_data.iter().rev() { } will iterate over input_data in reverse order.
  2. The iterator returned by input_data.iter() will not make copies of the elements in input_data. Instead, the value returned by the iterator will be a reference to the elements. That is, in
    for x in input_data.iter() { }
    the type of x is &i32 and not i32. Since our result vector holds i32 and not &i32, we need to dereference the reference x to get the underlying integer. We use the * operator to do this. For example,
    for x in input_data.iter() {
        // result.push(x); // Fails because x has type &i32
        result.push(*x); // Succeeds because *x has type i32
    }
    shows the incorrect and correct ways to insert the element into result.
Once you implement your function, run `$ cargo test` and make sure you have 3 passing tests and no failing tests.

Tip

You’re probably getting a warning at this point about reversed_vec not being used. Add the line

#![allow(dead_code)]

to the very top of main.rs and it’ll stop warning about the functions that we’re only using in tests.

Part 4. Checking for in-order data (15 points)

In this part, you’ll implement a new function and write a few more unit tests to make sure your code works.

Your task

Write a function

#![allow(unused)]
fn main() {
fn is_in_order(data: &[i32]) -> bool {
    todo!("Implement me")
}
}

that returns true if the elements of data are in ascending order and false otherwise. To do this, you’ll want to iterate over data using for x in data {} and check that each element is at least as large as the previous element. To keep track of the pervious element, you’ll need a mutable variable: let mut prev: i32;. You can either use i32::MIN as its initial value or the value of the first element in data. Either approach works.

Before implementing the function write the following unit tests that should pass once you implement the function.

  1. Empty vectors are always in order.
  2. Vectors containing a single element are always in order.
  3. Vectors with multiple elements, in order.
  4. Vectors with multiple elements, not in order.

Example

Inside your test module, you’ll want to add a new function per unit test. For example, the test for multiple elements not in order might look something like this.

#![allow(unused)]
fn main() {
fn is_in_order(data: &[i32]) -> bool { todo!() }
#[test]
fn is_in_order_multiple_out_of_order() {
    let test_vector = vec![10, -3, 8, 2, 25];
    assert!(!is_in_order(&test_vector));
}
}

Notice that this is asserting the negation of is_in_order() because of the !.

Implement is_in_order and make sure it passes your tests.

My output for $ cargo test looks like this.

running 7 tests
test test::is_in_order_empty ... ok
test test::is_in_order_multiple_out_of_order ... ok
test test::reversed_vec_empty ... ok
test test::is_in_order_multiple_in_order ... ok
test test::is_in_order_one ... ok
test test::reversed_vec_one ... ok
test test::reversed_vec_three ... ok

test result: ok. 7 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s

Part 5. Iterating manually (10 points)

Similar to the previous part, you will implement a function and some unit tests.

Your task

Write a function that iterates over the elements in its argument and adds them all and returns the sum.

#![allow(unused)]
fn main() {
fn manual_sum(data: &[i32]) -> i32 {
    let mut data_iter = data.iter();
    todo!("Implement me")
}
}

The catch is that for this task, you may not use a for loop. Instead, you’ll need to work with the iterator returned by data.iter() directly. Iterators in Rust work similarly to iterators in Java. To implement this as a Java method using iterators rather than a for loop we might use something like this.

int manualSum(ArrayList<Integer> data) {
    int sum = 0;
    Iterator<Integer> iter = data.iterator();
    while (iter.hasNext()) {
        sum += iter.next();
    }
    return sum;
}

Iterators in Rust don’t have separate has_next() and next() methods. Instead, there’s just a single next() method that returns an Option<T>. Every Option is either None or Some(blah) for some value blah. Here’s a simple example of using an Option.

#![allow(unused)]
fn main() {
let x: Option<i32> = Some(10);
assert!(x.is_some());
assert!(!x.is_none());
assert_eq!(x.unwrap(), 10);

let y: Option<i32> = None;
assert!(y.is_none());
assert!(!y.is_some());
}

We can use is_some() or is_none() to determine if an Option is a Some or a None. If we have a Some, then we can unwrap it by calling unwrap() which returns the data. If we try calling unwrap() on None, we’ll get a runtime error.

Write some unit tests. Make sure you cover all of the following cases.

  1. Empty data (where the sum should be 0);
  2. Single-element vectors; and
  3. Vectors with multiple elements.

Implement manual_sum by calling next() on data_iter repeatedly until the result is None. Unwrap all of the other values and sum them up.

Hint

You’ll probably want a loop like this.

loop {
    // XXX: Get the next element from the iterator
    if /* the element is None */ {
      return sum;
    }
    // XXX: Unwrap and add the value to the sum.
}

Part 6. Iterating over string (10 points)

Strings in Rust are similar to vectors of characters except that, unfortunately, they’re slightly more difficult to use because text is inherently complicated than vectors.

The most obvious difficulty is you cannot access individual characters in the same way you’d access individual elements of a vector.

#![allow(unused)]
fn main() {
let x = vec![1, 2, 3];
assert_eq!(x[0], 1);

let y = "ABC";
assert_eq!(y[0], 'A'); // FAILS!
}

That last line fails (click the Run button to see the error) because you cannot index into a string like this. The problem is that Rust stores all strings in the UTF-8 encoding. This is a variable-length encoding where different characters may take a different number of characters to encode. So if you want to get the 10 th character from a string, for example, you cannot easily figure out where in the encoded data for the string the 10 th character starts without starting at the beginning of the string.

This means that to work with characters, we’re going to need to use an iterator over the characters. Fortunately, the chars() function returns an iterator that returns each character.

#![allow(unused)]
fn main() {
let example = "Here is my string";
for ch in example.chars() {
    println!("{ch}");
}
}

Just as we’ve made a distinction between Vec<i32> and &[i32] where the latter is a reference to the former, in Rust we have a String data type and a reference to a string &str. When we have a function that takes a reference to a string, we use an argument of type &str. Note that strings we create by enclosing text in quotation marks have type &str and not String. If we have a String and we want to pass it to a function that takes a &str, we get a reference to the string using & just as we did with vectors and reference to vectors.

Your task

Write a function

#![allow(unused)]
fn main() {
fn clown_case(s: &str) -> String {
  todo!("Implement me")
}
}

that takes a &str as input and returns a new String that alternates capitalization and includes a clown emoji, 🤡, at the beginning and end of the string as long as s is not the empty string. If s is the empty string, return just a single clown emoji.

Here are some examples. I recommend you turn them into unit tests before you implement clown_case.

  • "" -> "🤡"
  • "I'm just asking questions" -> "🤡i'M jUsT aSkInG qUeStIoNs🤡"
  • "Μην είσαι κλόουν στα ελληνικά!" -> "🤡μΗν ΕίΣαΙ κΛόΟυΝ σΤα ΕλΛηΝιΚά!🤡"

Tip

  1. If ch is a char, you can use ch.is_alphabetic() to decide if you want to lowercase or uppercase the letter or just include it in the result unchanged. I.e., only uppercase/lowercase the characters for which ch.is_alphabetic() returns true.
  2. If ch is a char, then ch.to_lowercase() and ch.to_uppercase() return iterators to chars rather than a char itself. Why? Well, in some languages converting the case changes the number of characters. E.g, a capital ß is SS. You can add the elements of an iterator to a string using the extend() function.
    #![allow(unused)]
    fn main() {
    let mut result = String::new();
    result.extend('ß'.to_uppercase());
    assert_eq!(result, "SS");
    }
  3. You can append a char to a String by using push(). You can append a &str to a String by using push_str().

At this point you’re done and can submit your code.

However, this is one more optional step you can take if you’d like: Comment out the existing code in your main() function. (In VS Code, you can comment or uncomment multiple lines at once by selecting all of the lines and hitting Ctrl-/.) Write some code in main() to read a line of input from stdin, pass it to clown_case(), and then print the results. Now you can run the program and have it turn any input you enter into clown case. (I have been reliably informed that this is a lot of fun to use.)

Submitting (5 points)

To submit your lab, you must commit and push to GitHub before the deadline.

Your repository should contain the following files and directory.

fa23-lab3-xxx-yyy
├── README
├── guessing_game
│   ├── Cargo.lock
│   ├── Cargo.toml
│   └── src
│       └── main.rs
└── iteration
    ├── Cargo.lock
    ├── Cargo.toml
    └── src
        └── main.rs

Any additional files you have added to your repository should be removed from the main branch. (You’re free to make other branches, if you desire, but make sure main contains the version of the code you want graded.)

The README should contain the names of both partners (or just your name if you worked alone…but please don’t work alone if you can manage it).

Lab 4: Wordle ahead

Due: 2024-10-07 at 23:59

In this lab, you’ll learn how to

  • work with individual characters in a string;
  • write colored text to the terminal;
  • flush output to standard out;
  • call functions defined in other modules;
  • implement a command-line utility;
  • parse command-line arguments;
  • read from standard in or from a file; and
  • handle errors.

Preliminaries

First, find a partner. You’re allowed to work by yourself, but I highly recommend working with a partner. Click on the assignment link. One partner should create a new team. The team name cannot be the same name used for a previous lab. The second partner should click the link and choose the appropriate team. (Please don’t choose the wrong team, there’s a maximum of two people and if you join the wrong one, you’ll prevent the correct person from joining.) You cannot choose a team name you’ve used previously.

Once you have accepted the assignment and created/joined a team, you can clone the repository and begin working.

Be sure to ask any questions on Ed.

Compiler warnings and errors

Make sure your code compiles and passes clippy without errors or warnings. That is, running

$ cargo clean
$ cargo build
$ cargo clippy

should build your program without errors or warnings.

Formatting

Your code must be formatted by running cargo fmt.

Part 1. Wordle (35 points)

In this part, you’re going to implement a Wordle game. Wordle is a word guessing game. A word is chosen at random from a small list of 5-letter words at the start of the game. The player has 6 chances to guess the word.

Guesses that are not valid words are rejected and do not count toward the guess total.

After each (valid) guess, the word appears with each letter colored as follows:

  1. If a letter is in the correct position, it is colored green.
  2. If a letter is in the word but in the wrong position, it is colored yellow (but see the example below).
  3. If a letter does not appear in the word, it is colored gray.

The rules given above aren’t quite precise. See this example from the New York Times’s implementation that shows four guesses.

Wordle example

The first guess was LEAST. From the colors, we know that the word contains an L, an E, and a T, but not in those locations. We also know that the word doesn’t contain an A.

The second guess was ELATE. This was a bad guess because it contained an A and we knew from the first guess that there were no As. We did learn that the E, L, and T also cannot be in those locations. Notice that the second E is gray and not yellow. This is because the word we’re guessing contains a single E and it has already been colored yellow in the word.

The third guess was TITLE. This was also a bad guess because we know that E cannot appear at the end of the word because otherwise the second guess would have a green E at the end and a gray E at the beginning. We can see that the T is in the right place but the E and L are not.

The fourth and final guess was HOTEL. This is the correct word and so all of the letters are colored green.

Your task

From inside the assignment repository, run cargo new wordle to create a new application just as you did for the guessing game in Lab 3.

Using git, move the words.rs file from the root of the assignment repo into the wordle/src/ directory. You can do this via

$ git mv words.rs wordle/src

You want to use git mv rather than mv because git mv is telling Git that the file has been moved to a new location. If you just use mv, then Git will see that a file has been deleted and that there’s a new, untracked, file. git mv is just easier.

Add the line

mod words;

to the top of main.rs. Doing so gives you access to the two public functions in words.rs: words::random_word() and words::is_word_valid(). The first uses the rand crate to pick a random word and the second returns true if the argument is a valid word.

Tip

In order to use the rand crate, you’ll need to add rand as a dependency. In Lab 3, we did this by editing the Cargo.toml file by hand. There’s actually an easier way of doing it using cargo.

$ cargo add rand

This will add the most recent version of rand as a dependency.

Note

Notice that the argument to the is_word_valid() function is a &str and not a String.

/// Returns `true` if the passed word is a valid 5-letter guess.
pub fn is_word_valid(word: &str) -> bool {
    let word = word.to_uppercase();
    WORD_LIST.contains(&word.as_str()) || ALLOWED_LIST.contains(&word.as_str())
}

As we saw in Lab 3, the &str type is a way to let us pass a reference to a string without having to make a copy of the string to pass to the function.

The is_word_valid() function doesn’t deal with the newline character that you may have read from stdin. You can use the trim() method to return a reference to the portion of the string not containing leading and trailing whitespace characters.

is_word_valid(guess.trim())

At a high level, your task is as follows.

  1. Select a secret word using words::random_word().
  2. If the user has made fewer than 6 valid guesses, prompt the user to enter a guess, otherwise print a message that the player failed to guess the secret word and what the secret word is.
  3. If the guessed word is not a valid guess, go back to step 2.
  4. Color the guessed word.
  5. Print out all of the (colored) guesses so far.
  6. If the guessed word is correct, stop, otherwise go back to step 2.

Example

Here’s an example run of the game.

Enter your guess: stern
STERN
Enter your guess: logic
STERN LOGIC
Enter your guess: reset
STERN LOGIC RESET
Enter your guess: terse
STERN LOGIC RESET TERSE

Step 2, prompting for input is a little tricky. We want to print out the prompt without a newline. Rust will, by default, buffer characters written to stdout until it sees a new line character so we’ll need to print the prompt (without a newline) and then ask Rust to flush its buffer which will cause it to be printed. Give this a shot:

#![allow(unused)]
fn main() {
// Put this line near the top of `main.rs`, under your `mod words;` line
use std::io::Write;

print!("Prompt text: "); // Notice print!(), not println!().
std::io::stdout().flush().unwrap();
}

Note

std::io::Write is a trait. Traits are similar to a Java Interface in that they specify functions that types which implement the trait must implement. Unlike in Java, to use the functions defined in a trait, you have to first inform the compiler about the trait using a use statement.

In this case, the Write trait defines a flush() function which returns a Result. Note that we call unwrap() on the result so that if it is an Err(err), the program will exit.

Click on the link at the beginning of this note to see what other functions the Write trait defines.

The remaining tricky parts are in step 4, coloring the guessed word. There are two parts to this, (a) figuring out which colors to apply; and then (b) actually creating colored text.

Let’s start with (b), creating colored text. We’re going to use the colored crate to create colored text. Take a look at the documentation. You’ll need to add colored as a dependency just as you did for rand.

As the documentation says, you need to add the line

use colored::Colorize;

with the other use statements in main.rs. The Colorize trait gives you access to some functions you can call on strings to produce colored strings.

You will need to color individual characters. Unfortunately, the char type does not implement Colorize and so it doesn’t have access to the Colorize functions like black() and on_green(). So let’s write a function that takes a char and returns a String containing the character colored with black text on a green background.

/// Colors `ch` black on green.
fn green(ch: char) -> String {
    ch.to_string()
      .black()
      .on_green()
      .to_string()
}

The first line, ch.to_string() converts the char into a String so that the colored functions can operate on it. The .black() function returns a new type representing a string of colored text. The .on_green() function returns another of the colored strings but with a green background. And finally, the second .to_string() converts the colored string into a normal String, but one containing special control characters which instruct the terminal to color the text appropriately.

Note

Strings don’t have any notion of colors. They’re little more than a mutable collection of chars (except they’re encoded using UTF-8). So how does colored text work? Well, your terminal emulator (the program that pretends to be a 1980s era hardware video terminal) reads all of the text written to standard out (and standard error) and looks for control characters. Some of those control characters let you change the color of the text or background in a limited fashion.

For example, run this in your terminal (but not in the VS Code terminal which seems to mess the colors up when both foreground and background colors are present).

$ echo -e '\033[36;45;1mHello!\033[0m Goodbye!'

The -e argument tells echo to interpret escape characters. In this case, \033 is the escape sequence for the ASCII esc character which has decimal value 27 (hex 0x1B and octal 033). The [36;45;1m says use a cyan foreground color (the 36), a magenta background color (the 45) and make it bold (the 1). The ; separate arguments and the m ends them. Take a look at Wikipedia for the supported colors.

We can do the same thing in Rust without using the colored crate.

#![allow(unused)]
fn main() {
println!("\x1B[36;45;1mHello!\x1B[0m Goodbye!");
}

Notice that I specified the esc character in hexadecimal \x1B rather than octal. (If you click the Run button, you won’t see colored text because HTML uses a completely different way to select colors. View the source code for this webpage to see how I did it for the Wordle example above, if you’re curious.)

I recommend writing similar functions yellow() and gray() which produce black on yellow and black on white. After you do that, you can combine the strings to produce multi-colored string. Here’s an example which prints ABC but in colors.

let mut s = String::new();
s.push_str(&green('A'));
s.push_str(&yellow('B'));
s.push_str(&gray('C'));
println!("{s}");

There is no on_gray() function. You’ll want to use on_white() in your implementation of gray() instead.

Notice the & being used? That’s because the String function push_str() takes a &str as an argument, that is a reference to a string, but green() and the others return a String. So just like with words::is_word_valid(), we need to use & to get a reference.

Okay, so you can get a secret word, you can prompt for input, and you can color text. So all that remains is to decide which letters for a guess should get which colors.

Here was my approach. I wrote a function

#![allow(unused)]
fn main() {
fn color_guess(guess: &str, secret: &str) -> String {
    todo!()
}
}

that takes a reference to the guess and a reference to the secret and it returns a String that is appropriately colored.

Inside color_guess, I created a mutable Vec<char> (which will hold letters from the secret that don’t match the guess) and added each letter of secret to it.

Tip

Remember, to access the individual letters of a string, we use the .chars() function which returns an iterator. E.g.,

#![allow(unused)]
fn main() {
let secret = "LEAST"; // Example secret.
for ch in secret.chars() {
    println!("{ch}");
}
}

will print each character in secret on its own line.

We can also use the .next() function on the iterator directly rather than using it in a loop.

#![allow(unused)]
fn main() {
let s = "ABC";
let mut iterator = s.chars();

iterator.next(); // returns Some('A')
iterator.next(); // returns Some('B')
iterator.next(); // returns Some('C')
iterator.next(); // returns None
}

You can use the .unwrap() function to turn a Some(x) into x.

My color_guess() creates a mutable vector and adds all the characters of secret to it. Next, it iterates over both guess and secret and for each letter in guess that is in the correct place in secret, remove one instance of that character from the vector of unmatched characters. You can use the remove() method to remove the element at a specified index from a vector.

#![allow(unused)]
fn main() {
let guess = "BLAST"; // Example guess.
let secret = "STERN"; // Example secret.
let mut secret_iter = secret.chars();
for ch in guess.chars() {
    if ch == secret_iter.next().unwrap() {
        // Remove `ch` from the vector of unmatched characters
    }
}
}

Finally, we want to iterate over both guess and secret a second time and construct our colored guess string. If the character of guess matches the character in the corresponding position of secret, then color the character green and append it to the result string.

If the character of guess does not match the corresponding character of secret, then we need to color it either yellow or gray. Iterate through the unmatched character vector and if the character from guess matches any of those, color the character yellow in the result string and remove the character from the vector of unmatched characters. Otherwise, color it gray.

That was a lot of work! Maybe play a few games.

Part 2. Head (10 points)

In the next two parts of the lab, you’re going to implement the head command line utility in Rust. This will be divided into two tasks: command line argument parsing (this part) and the implementation of head itself (Part 3).

Before you begin, read the man page for head. While you’re reading it, pay attention to the command line arguments and where head gets its input (from files vs. from stdin) in which cases.

There are many versions of head. You’ll be implementing the FreeBSD version which is the version described in the man page above. The GNU project has another version which is used in most Linux distributions. In general, the GNU versions of command-line utilities tend to have more options. If you run $ man head on the lab machines, you’ll get a manual for the GNU version of head.

We’re going to use the Command Line Argument Parsing library, `clap`` to handle command-line argument parsing for us. As we’ll see, Rust makes supporting command-line arguments, including a nicely formatted help message, very easy.

Example

Let’s look at an example.

extern crate clap;
use std::path::PathBuf;

use clap::Parser;

#[derive(Parser, Debug)]
#[command(author, version, about = "Example command line parsing", long_about = None)]
struct Config {
    /// Use colored output
    #[arg(short, long)]
    color: bool,

    /// Use at most COUNT values
    #[arg(short = 'C', long)]
    count: Option<i32>,

    /// Specify optional log file
    #[arg(short = 'f', long, conflicts_with = "count")]
    log_file: Option<PathBuf>,

    /// Input files
    #[arg(default_value = "-")]
    files: Vec<PathBuf>,
}

fn main() {
    let config = Config::parse();
    println!("{config:#?}");
}

There’s a lot going on here! At the top of the code, there are two use statements which inform the compiler about PathBuf from the standard library and Parser from clap.

Next we have a struct named Config. This structure defines the supported command line arguments of this example program. The main function calls the function Config::parse() which parses all of the command line arguments and crucially, returns an instance of Config with the structure’s members. We can access the members as config.color, config.count, config.log_file, and config.files.

The println!("{config:#?}") line will pretty-print the debug representation of config. The format specifier {config:?} (with the colon followed by a question mark) means to print the debug representation and the # in {config:#?} means to pretty-print it with new lines.

All good command line utilities have a --help option or similar. Before we look at this code in detail, here’s the output of running this program with the --help option which just prints out the help message and then exits the program.

$ cargo run -- --help
Example command line parsing

Usage: ex [OPTIONS] [FILES]...

Arguments:
  [FILES]...  Input files [default: -]

Options:
  -c, --color                Use colored output
  -C, --count <COUNT>        Use at most COUNT values
  -f, --log-file <LOG_FILE>  Specify optional log file
  -h, --help                 Print help
  -V, --version              Print version

This code defines three options, each of which has a short and a long form. Two of the options take arguments and two of the options conflict with one another meaning at most one of those options can be specified at a time. It also takes a list of file paths. Each file path is represented by a PathBuf in Rust.

Because the final line of code prints out the Config instance that was returned from Config::parse(), we can run the program a few times with different options to see the results. Note that in what follows I’m passing the --quiet option to cargo run because I don’t want cargo to print out any output like “Created binary …” As usual, the argument before the -- go to cargo run, the arguments after the -- go to the program itself.

$ cargo run --quiet --
Config {
    color: false,
    count: None,
    log_file: None,
    files: [
        "-",
    ],
}

$ cargo run --quiet -- --color -f log.txt input1.txt input2.txt
Config {
    color: true,
    count: None,
    log_file: Some(
        "log.txt",
    ),
    files: [
        "input1.txt",
        "input2.txt",
    ],
}

$ cargo run --quiet -- --version  
ex 0.1.0

$  cargo run --quiet -- -f log.txt -C 10
error: the argument '--log-file <LOG_FILE>' cannot be used with '--count <COUNT>'

Usage: ex --log-file <LOG_FILE> [FILES]...

For more information, try '--help'.

The definition of Config starts by deriving the clap::Parser and Debug traits. The Parser trait is what provides the parse() function which reads the command-line arguments and returns an instance of Config.

The next line

#[command(author, version, about = "Example command line parsing", long_about = None)]

is new. It is attaching metadata to the Config struct that the clap library will use to do all of the argument parsing and producing usage and error messages. The about = "..." lets us specify a short about message to appear in the --help message.

Each of the arguments consists of three parts.

    /// Use colored output
    #[arg(short, long)]
    color: bool,

The comment with three slashes is a doc comment. Normally, such a comment is used to write documentation. (Running cargo doc produces HTML documentation of your code. The doc comments are where the documentation comes from.) Clap is repurposing that doc comment to construct a --help message.

The next line, #[arg(short, long)], is attaching some more metadata for Clap. It says this should have a short option -c and a long option --color. It took both option names from the name of the variable color itself. Note that we can be explict about what the option names are. There are a variety of other things we can do including making options conflict. If we try to pass both -f and -C, we’ll get an error from Clap.

The third line is the variable and its type, of course. Because the type of color is bool, this acts as an on/off flag. For other types, using an Option<...> indicates that the argument is optional. If you want to allow 0 or more, you use a Vec rather than an Option as I did for files.

That’s a lot of information to take in! Fortunately, I’ve got a simple recipe to follow for parsing command line arguments:

Tip

Start by defining a new struct that derives Debug and Parser.

#[derive(Parser, Debug)]
#[command(author, version, about = "Short description", long_about = None)]
struct Config {
}

Now, for each option, modify one of the examples below and remember, you can change the short or long options by using short = 'c' or long = "blah", for example.

  • If you want a boolean flag, use a bool
      /// Boolean flag, defaults to false
      #[arg(short, long)]
      flag: bool,
  • If you want an option that takes an argument, use an Option
      /// Option that takes a String argument
      #[arg(short, long)]
      name: Option<String>,
  • If you want a path to a file, use an Option<PathBuf>
      /// Option that takes a file path
      #[arg(short, long)]
      path: Option<PathBuf>,
  • If you want a list of input files, don’t use short or long but do use Vec<PathBuf>; if you want to specify a default value, use default_value
      /// Input files but if none are specified, use -
      #[arg(default_value = "-")]
      files: Vec<PathBuf>,
  • If you want to specify that two options conflict, mark one of them (it doesn’t matter which) as conflicts_with the other
      /// An option that takes a path
      #[arg(short, long)]
      log: Option<PathBuf>,
    
      /// A flag that conflicts with option `--log`
      #[arg(short, long, conflicts_with = "log")]
      output: bool,

Finally, add a main function that prints out the debug representation. (Click the Run button to run this or the Show button to show the whole example.)

// You don't need the `extern crate clap` line. That's just to make this runnable on the web.
extern crate clap;
use std::path::PathBuf;

use clap::Parser;

#[derive(Parser, Debug)]
#[command(author, version, about = "Short description", long_about = None)]
struct Config {

    /// Boolean flag, defaults to false
    #[arg(short, long)]
    flag: bool,

    /// Option that takes a String argument
    #[arg(short, long)]
    name: Option<String>,

    /// Option that takes a file path
    #[arg(short, long)]
    path: Option<PathBuf>,

    /// Input files but if none are specified, use -
    #[arg(default_value = "-")]
    files: Vec<PathBuf>,

    /// An option that takes a path
    #[arg(short, long)]
    log: Option<PathBuf>,

    /// A flag that conflicts with option `--log`
    #[arg(short, long, conflicts_with = "log")]
    output: bool,
}

fn main() {
    let config = Config::parse();
    println!("{config:#?}");
}

The Clap tutorial has a lot more information.

Your task

Before you start, you need to create a new project inside your assignment repository and add the clap library.

$ cargo new head
$ cd head
$ cargo add clap --features derive

Your task is to implement parsing the head command line arguments and printing out a --help message. Your head will support the following options.

  • -c/--bytes;
  • -n/--lines;
  • -h/--help; and
  • -V/--version.

The --help and --version options are handled by clap automatically. Use clap’s conflicts_with mechanism for marking --bytes and --lines as conflicting. See the tip above.

Additionally, head takes 0 or more files as input. Configure clap such that if no files are passed, then it defaults to -. See the tip above for how to set default values.

Note

Many command line utilities use - in place of a file path to denote reading from stdin instead of the file or writing to stdout instead of writing to the file. Our head will support that convention.

Once you have this working, you should be able to run head --help and see this.

$ cargo run --quiet -- --help
Display the first few lines of a file

Usage: head [OPTIONS] [FILES]...

Arguments:
  [FILES]...  Input files or - for stdin [default: -]

Options:
  -n, --lines <LINES>  Print LINES lines of each of the specified files
  -c, --bytes <BYTES>  Print BYTES bytes of each of the specified files
  -h, --help           Print help
  -V, --version        Print version

The version number printed by --version can be set by editing the version field in Cargo.toml. Go ahead and set the version number to 1.2.3 so that you get this output.

$ cargo run --quiet -- --version
head 1.2.3

Part 3. More head (30 points)

In this part, you’re going to finish your implementation of head.

You should re-read the man page but this time focus on behavior. How does head behave when no files are passed to it? How does head behave when one file is passed to it? How does head behave when multiple files are passed to it? What exit code (called the exit status in the man page) does it return on success? What about on error?

One of the key things to notice about head, is that sometimes it reads from stdin and sometimes it reads from a file. But in either case, the operation it should perform is the same: print out some number of lines or bytes.

What we would like is to be able to write a function like

/// Print `count` lines of `input` to `stdout`.
fn print_lines(input: ???, count: usize) {
    todo!();
}

and a similar print_bytes() function. But what type should we give input?

If we think about what operations we want to perform on input, that can help guide us. For print_lines(), we want to be able to read one line at a time from input. For print_bytes(), we just want to read some number of bytes. It turns out that what we want is a std::io::BufReader. Or more precisely, we want anything that implements the std::io::BufRead trait, which BufReader does.

Click on the documentation for BufRead and take a look at what methods are available. In particular, the read_line() method will read characters into a String until it hits a new line character. The read_until() method is similar except it reads bytes into a Vec<u8> until it encounters the byte you specify. For a print_lines() function, you’ll likely want to use one of read_line() or read_until() (or experiment with the lines() method!).

For a print_bytes() function, none of these methods seem appropriate. Fortunately, any type that implements the BufRead trait also implements the std::io::Read trait which has a read() method.

Example

Here’s an example of opening a file, wrapping it in a BufReader, and then reading from it using read_line(), read_until(), and read().

use std::fs::File;
use std::io::{self, BufReader, BufRead, Read, Write};

fn read_some() -> io::Result<()> {
    let file = File::open("/tmp/example.txt")?;
    let mut reader = BufReader::new(file);

    // Read a line into a string.
    let mut line = String::new();
    let length = reader.read_line(&mut line)?;

    if length == 0 {
        // End of file.
        return Ok(());
    }
    print!("First line: {line}");

    // Read a line as bytes into a `Vec<u8>`.
    let mut line = Vec::new();
    let length = reader.read_until(b'\n', &mut line)?;

    if length == 0 {
        return Ok(());
    }
    print!("Second line: ");
    io::stdout().write_all(&line)?;

    // Read at most 100 bytes into a `Vec<u8>`.
    let mut bytes = Vec::new();
    bytes.resize(100, 0); // Fill it with 100 zero bytes.
    let length = reader.read(&mut bytes)?;

    if length == 0 {
        return Ok(());
    }
    print!("Next {length} bytes: ");
    io::stdout().write_all(&bytes[..length])?;

    Ok(())
}

fn main() -> io::Result<()> {
    const EXAMPLE: &str = "Rust is fun!\nIts mascot is a 🦀.\nIt's also kind of a lot…\n";
    let mut file = File::create("/tmp/example.txt")?;
    file.write_all(EXAMPLE.as_bytes())?;
    read_some()?;
    std::fs::remove_file("/tmp/example.txt")?;
    Ok(())
}

Click the Run button to see the result.

Note that we needed to have a use statement for std::io::Write in order to call the write_all() method which is defined by the Write trait

Tip

If you use

#![allow(unused)]
fn main() {
use std::io;
}

or by including self in the list of items to use from std::io like

#![allow(unused)]
fn main() {
use std::io::{self, BufReader};
}

you can refer to everything in std::io just by io. The previous example did this for io::stdout() and io::Result<()>.

We can wrap a File in a BufReader and we can also wrap the result of std::io::stdin() (which is a struct called, appropriately enough std::io::Stdin) in a BufReader. But there’s one hitch: the two types of BufReader are different.

Just as Vec is paramaterized by the type of element it holds—e.g., Vec<i32> and Vec<String>—a BufReader is paramaterized by the underlying type it’s reading from. The code snippet below is explict about the types of the two readers. They’re simply not the same.

use std::io::{self, Stdin, BufReader};
use std::fs::File;
fn main() -> io::Result<()> {
    let file_reader: BufReader<File> = BufReader::new(File::open("example.txt")?);
    let stdin_reader: BufReader<Stdin> = BufReader::new(io::stdin());
  Ok(())
}

Fortunately, Rust gives us a way to deal with this! What we’re going to do is we’re going to wrap these readers in a special type of Box. Recall that values in Boxes are store on the heap rather than on the stack. If we wrap these types directly in a Box, i.e., Box<BufReader<File>> and Box<BufReader<Stdin>>, we’d still have different types. Instead, we want to use a Box<dyn BufRead>. By using the dyn keyword followed by a trait, we’re saying that the value stored in the Box is some type that implements the trait—in this case, BufRead.

Example

Let’s create a function that takes a &Path and opens the file for reading. As mentioned previously, many command line utilities treat an input file path of - to mean stdin and we’re going to do the same.

use std::fs::File;
use std::io::{self, BufRead, BufReader};
use std::path::{Path, PathBuf};

fn open_input(path: &Path) -> io::Result<Box<dyn BufRead>> {
    if path.as_os_str() == "-" {
        Ok(Box::new(BufReader::new(io::stdin())))
    } else {
        Ok(Box::new(BufReader::new(File::open(path)?)))
    }
}

fn main () {
    let path = PathBuf::from("example.txt");
    let mut reader: Box<dyn BufRead> = match open_input(&path) {
        Ok(reader) => reader,
        Err(err) => {
            eprintln!("{}: {err}", path.display());
            std::process::exit(1);
        }
    };
}

Let’s revisit the print_lines() function.

#![allow(unused)]
fn main() {
use std::io::{self, BufRead};
/// Print `count` lines of `input` to `stdout`.
fn print_lines(input: &mut dyn BufRead, count: usize) -> io::Result<()> {
   let mut line = String::new();
   input.read_line(&mut line)?;
   print!("{line}");
   todo!()
}
}

We finally have a type for input! It’s a mutable reference to some type of object that implements BufRead. And notice print_lines() now returns an std::io::Result<()> so we can use ? to propogate errors.

We can now combine this print_lines() with the example code for opening a file.

use std::fs::File;
use std::io::{self, BufRead, BufReader};
use std::path::{Path, PathBuf};

fn open_input(path: &Path) -> io::Result<Box<dyn BufRead>> {
    if path.as_os_str() == "-" {
        Ok(Box::new(BufReader::new(io::stdin())))
    } else {
        Ok(Box::new(BufReader::new(File::open(path)?)))
    }
}

/// Print `count` lines of `input` to `stdout`.
fn print_lines(input: &mut dyn BufRead, count: usize) -> io::Result<()> {
   todo!()
}

fn main () {
    let path = PathBuf::from("example.txt");
    let mut reader: Box<dyn BufRead> = match open_input(&path) {
        Ok(reader) => reader,
        Err(err) => {
            eprintln!("{}: {err}", path.display());
            std::process::exit(1);
        }
    };
    if let Err(err) = print_lines(&mut reader, 10) {
        eprintln!("{err}");
    }
}

Your task

Your task is to finish implementing head. Your implementation must have the following features.

  • It must support all of the options described in Part 2.
  • If neither --bytes nor --lines (nor their short arguments) are passed, then default to printing the first 10 lines of reach file or stdin.
  • If no files are passed as arguments, read from stdin, otherwise read from each of the files (see the hints below).
  • If - is given as a file, read from stdin in its place. So if
    $ cargo run -- -n 3 foo.txt - bar.txt
    
    is run, it should print the first three lines from foo.txt, then the first three lines from stdin, followed by the first three lines of bar.txt.
  • If multiple files are passed, each should start with a header line. A blank line should appear after the last line of output for each file before the header line (except for the last file which doesn’t have a trailing blank line). See the example below.
  • If a file cannot be opened, print a message to stderr and continue. The example code above show writing a message in the proper format to stderr using eprintln!() if open_input() fails. Your code should do the same thing except that rather than exiting immediately by calling std::process::exit(1), your code should simply remember that there was an error and after all input files have been processed, if there was an error, exit with the value 1.

Example

Here is some sample output showing multiple files, including stdin.

$ echo "hi there!" | cargo run -- -n 3 foo.txt - bar.txt
==> foo.txt <==
First line of foo.txt
Second line of foo.txt
Third line of foo.txt

==> standard input <==
hi there

==> bar.txt <==
First line of bar.txt
Second line of bar.txt
Third line of bar.txt

Here’s an example when there’s an error with an input file.

$ cargo run -- -n 1 foo.txt no-such-file.txt bar.txt
==> foo.txt <==
First line of foo.txt
no-such-file.txt: No such file or directory (os error 2)

==> bar.txt <==
First line of bar.txt

Play around with the real head to see other examples.

If you’ve made it this far and you’ve implemented everything, you’re just about done! Just gotta submit it. Make sure you test that your code works correctly with 0, 1, and multiple files. Make sure it supports - as reading from stdin. Make sure when you have multiple files, it prints the header and the blank line. You can always test your behavior against the behavior of the real head.

If you’d like to make one small improvement to your implementation, read on. Otherwise, continue to the submission instructions.

Info

The command line utilities were designed when C was the dominant programming language. C’s notion of strings is wildly different from Rust’s notion of strings. In particular, C strings are just sequences of non-zero bytes followed by a zero byte. There are no other restrictions on the contents of a C string. In contrast, Rust strings are required to be valid, UTF-8 encoded strings of Unicode characters. These two different views of strings has two consequences for us

  1. The real head utility works fine even if the files it’s operating on don’t contain UTF-8 encoded data. It simply separates the input into lines by detecting the the newline character '\n'.
  2. The BufRead::read_line() function will fail if the input is not valid UTF-8.

If you would like to match the real head behavior, you may use the BufRead::read_until() function. An example of doing this was given in the read_a_bit() function in an example above.

Submitting (5 points)

To submit your lab, you must commit and push to GitHub before the deadline.

Your repository should contain the following files and directory.

repo
├── head
│   ├── Cargo.lock
│   ├── Cargo.toml
│   └── src
│       └── main.rs
└── wordle
    ├── Cargo.lock
    ├── Cargo.toml
    └── src
        ├── main.rs
        └── words.rs

Any additional files you have added to your repository should be removed from the main branch. (You’re free to make other branches, if you desire, but make sure main contains the version of the code you want graded.)

The README should contain the names of both partners (or just your name if you worked alone…but please don’t work alone if you can manage it).

After you have everything working, make sure you add your code, commit it, and push it to GitHub.

Lab 5: Gutensearch

Due: 2024-10-14 at 23:59

In this lab, you’re going to build a simple command-line search interface to Project Gutenberg. You’re going to read an index from a file and create objects representing entry.

In this lab, you’ll learn

  • how to create a generic Result<T> type alias which you can use to return multiple types of errors from a function;
  • how to use structs and enums to structure related data;
  • how to implement methods for structs and enums; and
  • how to implement a trait.

In the first part of this lab (which contains a lot of text to read but very little actual coding), you’ll learn a generic method for handling errors in functions which we’ll use for the rest of the semester.

In the subsequent parts, you’re going to implement an interactive search interface. When you run your completed program, you should see

$ cargo run --quiet
Welcome to Gutensearch! The index contains 71329 entries.
Example queries to search titles, authors, or both:
  > title: two cities
  > author: wodehouse
  > shelley

Enter a search query (Ctrl-D to exit)
> 

and then you’ll be able to enter search queries and get back nicely formatted entries. E.g., here’s one of the entries for the query: title: two cities:

Author: Dickens, Charles
Title: A Tale of Two Cities
URL: https://www.gutenberg.org/ebooks/98
Resources:
- text: https://www.gutenberg.org/ebooks/98.html.images
- text: https://www.gutenberg.org/files/98/98-h/98-h.htm
- text: https://www.gutenberg.org/files/98/98-h.zip
- epub: https://www.gutenberg.org/ebooks/98.epub3.images
- epub: https://www.gutenberg.org/ebooks/98.epub.images
- epub: https://www.gutenberg.org/ebooks/98.epub.noimages
- text: https://www.gutenberg.org/files/98/98-0.txt
- text: https://www.gutenberg.org/files/98/98-0.zip

Preliminaries

First, find a partner. You’re allowed to work by yourself, but I highly recommend working with a partner. Click on the assignment link. One partner should create a new team. The team name cannot be the same name used for a previous lab. The second partner should click the link and choose the appropriate team. (Please don’t choose the wrong team, there’s a maximum of two people and if you join the wrong one, you’ll prevent the correct person from joining.) You cannot choose a team name you’ve used previously.

Once you have accepted the assignment and created/joined a team, you can clone the repository and begin working.

Be sure to ask any questions on Ed.

Compiler warnings and errors

Make sure your code compiles and passes clippy without errors or warnings. That is, running

$ cargo clean
$ cargo build
$ cargo clippy

should build your program without errors or warnings.

Formatting

Your code must be formatted by running cargo fmt.

Part 1: Getting a result (5 points)

In this part, you’re going to implement a simple method for returning different types of errors from a single function.

Recall that in Rust, we typically indicate that a function may succeed and return a value or fail and return an error using the standard type Result<T, E>. For example, we might have the following code.

Example

fn safe_divide(dividend: i32, divisor: i32) -> Result<i32, String> {
    if divisor == 0 {
        Err(format!("Cannot divide {dividend} by zero"))
    } else {
        Ok(dividend / divisor)
    }
}

fn main() {
    println!("{:?}", safe_divide(25, -7));
    println!("{:?}", safe_divide(25, 0));
}

If divisor is 0, then it returns an error (click Run to see the results). Otherwise it returns the quotient.

This works really well! Unfortunately, it has one slightly unfortunate drawback: we can only return a single type of error. In this case, we’re returning a String as the error. The functions that perform input/output (e.g., reading from a file), use a special type std::io::Error to indicate an error.

So the question becomes, how can we return multiple different types of errors from a single function?

There are multiple ways to return different types of errors from a function. One approach is to pick an error type for the Result that can hold different types that share a common interface (i.e., implement some trait). In Lab 4, we encountered a similar situation. We needed to be able to say that we had some type that implemented the BufRead trait, but we didn’t want to be specific about exactly which type. We used a Box.

We’re going to do the same thing here. The type we want to return is a

Result<T, Box<dyn std::error::Error>>

An object of this type is either an Ok(T) where T is some concrete type like i32 or String (the success case) or it’s an Err(Box<dyn std::error::Error>) which is just some object on the heap that implements the std::error::Error trait (the error case). All of the error types in the Rust standard library implement this trait.

Returning to our example, we will need to do more than replace Result<i32, String> with Result<i32, Box<dyn std::error::Error>>, but not much more!

Bug

fn safe_divide(dividend: i32, divisor: i32) -> Result<i32, Box<dyn std::error::Error>> {
    if divisor == 0 {
        Err(format!("Cannot divide {dividend} by zero"))
    } else {
        Ok(dividend / divisor)
    }
}

fn main() {
    println!("{:?}", safe_divide(25, -7));
    println!("{:?}", safe_divide(25, 0));
}

Click the Run button to see the error.

As the error message states, “expected Box<dyn Error>, found String”. The issue is that a String is not a boxed error. Fortunately, Rust knows how to convert it into one using the .into() method:

Err(format!("Cannot divide {dividend} by zero").into())

Tip

There are a pair of traits From<T> and Into<T> which define .from() and .into() respectively. These functions are used to convert from some type T or into some type T. We’ll encounter these functions fairly frequently.

In this case, Box<dyn Error> implements the From<String> trait and String implements the Into<Box<dyn Error>>. (One typically only implements the From trait because the compiler will provide a default Into).

The previous example didn’t showcase returning multiple possible errors. So let’s examine one more example that will try to read an integer from a text file.

Example

Reading an integer from a file can fail in multiple ways: (1) The file might not exist or be readable; (2) the file might not contain valid UTF-8 text; or (3) the contents of the file might not constitute a valid integer.

#![allow(unused)]
fn main() {
use std::fs;

fn read_int_from_file() -> Result<i32, Box<dyn std::error::Error>> {
    let num = fs::read_to_string("/some/file.txt")? // cases 1 and 2
        .trim()
        .parse()?; // case 3
    Ok(num)
}
}

If the file cannot be read or if the contents aren’t valid UTF-8, fs::read_to_string() will return an error of type std::io::Error as mentioned above. The ? will unwrap the result if there is no error. Otherwise, it’ll return the error from the read_int_from_file() function. In fact, it’s slightly better than that, it’ll call .into() on it for us! This is super important because otherwise we’d get an error about how an std::io::Error is not the same as a Box<dyn std::error::Error>. By calling .into(), Rust knows to box the error up on the heap, just as it did with the String above.

If the contents of the file aren’t a valid integer (after trimming off whitespace), then the .parse() method will return a std::num::ParseIntError. This time, the ? will unwrap an Ok(i32) or convert the Err(std::io::Error) into the appropriately boxed type.

Your task

You’re going to define type alias for Result<T, Box<dyn std::error::Error>> which will save on typing. We’ll be using this type alias (or a similar one) in future labs!

A type alias is a way to give an existing type a second (usually shorter) name. An alias and its underlying type are the same in all respects. This is purely a way to make your code easier to read and write.

Inside your assignment repo, run

$ cargo init --name gutensearch

At the top of main.rs, add the line

#![allow(unused)]
fn main() {
type Result<T> = std::result::Result<T, Box<dyn std::error::Error>>;
}

Okay, but what did this accomplish? You’ve just defined a new generic type alias Result<T> which is exactly the same thing as the Result<T, Box<dyn std::error::Error>> we used above. Here are the two functions from the examples above using this type alias.

Example

use std::fs;

type Result<T> = std::result::Result<T, Box<dyn std::error::Error>>;

fn safe_divide(dividend: i32, divisor: i32) -> Result<i32> {
    if divisor == 0 {
        Err(format!("Cannot divide {dividend} by zero").into())
    } else {
        Ok(dividend / divisor)
    }
}

fn read_int_from_file() -> Result<i32> {
    let num = fs::read_to_string("/some/file.txt")? // cases 1 and 2
        .trim()
        .parse()?; // case 3
    Ok(num)
}

fn main() -> Result<()> {
    let divisor = read_int_from_file()?;
    let quotient = safe_divide(25, divisor)?;

    println!("Dividing 25 by {divisor} has quotient {quotient}");
    Ok(())
}

Note that you can have main return a Result and it’ll print out the error if there is an error. Since there’s no real return value, we return () wrapped in an Ok() to indicate success. This doesn’t show a terribly friendly error message though. Click Run to see what this looks like.

As you can see from the previous example, the error message we get when we return an error from main isn’t great. Let’s not do that. Instead, define a new function, run(), that will return a Result<()> that we can check in main. On an error, we can print the error message to stderr and then exit with return value 1.

Your code should look something like this now.

type Result<T> = std::result::Result<T, Box<dyn std::error::Error>>;

fn run() -> Result<()> {
    todo!("Code for the next part goes here.")
}

fn main() {
    if let Err(err) = run() {
        eprintln!("{err}");
        std::process::exit(1);
    }
}

You’re done with this part, so continue to Part 2.

Part 2. Reading the index (5 points)

In this part of the lab, you’re going to read in the index file and create an instance of an Entry structure for each entry in the index.

There are many common formats for exchanging data. One of the most simple is tab-separated values. In this format, each line in the file refers to a multi-field record. Each field is separated from the previous one in the line by a tab character (we specify a tab character in code using the escape sequence \t, similar to how \n is a newline character).

For example,

first field	second field	third field
XXX	YYY	ZZZ

shows a file with two records, each of which has three fields.

The Project Gutenberg index is a tab-separated values file. Each line of the index file has the format

authors<tab>title<tab>URL<tab>resource 1<tab>...<tab>resource n

where <tab> means there’s a tab character. The first field is the authors, the second field is the title, the third is the URL for the entry on Project Gutenberg’s website. The remaining fields represent individual resources associated with the entry (resources include .epub files, text files, and audio files).

Here’s one example,

Roosevelt, Wyn; Schneider, S.	The Frontier Boys in the Sierras; Or, The Lost Mine	https://www.gutenberg.org/ebooks/32253	text https://www.gutenberg.org/ebooks/32253.html.images	text https://www.gutenberg.org/files/32253/32253-h/32253-h.htm	text https://www.gutenberg.org/files/32253/32253-h.zip	epub https://www.gutenberg.org/ebooks/32253.epub3.images	epub https://www.gutenberg.org/ebooks/32253.epub.images	epub https://www.gutenberg.org/ebooks/32253.epub.noimages	text https://www.gutenberg.org/ebooks/32253.txt.utf-8	text https://www.gutenberg.org/files/32253/32253-8.txt	text https://www.gutenberg.org/files/32253/32253-8.zip	text https://www.gutenberg.org/files/32253/32253.txt	text https://www.gutenberg.org/files/32253/32253.zip

Example

Given a line of text from the index file, it’s easy to split it up into its parts by splitting the string on a tab character.

#![allow(unused)]
fn main() {
let line = "Roosevelt, Wyn; Schneider, S.	The Frontier Boys in the Sierras; Or, The Lost Mine	https://www.gutenberg.org/ebooks/32253	text https://www.gutenberg.org/ebooks/32253.html.images	text https://www.gutenberg.org/files/32253/32253-h/32253-h.htm	text https://www.gutenberg.org/files/32253/32253-h.zip	epub https://www.gutenberg.org/ebooks/32253.epub3.images	epub https://www.gutenberg.org/ebooks/32253.epub.images	epub https://www.gutenberg.org/ebooks/32253.epub.noimages	text https://www.gutenberg.org/ebooks/32253.txt.utf-8	text https://www.gutenberg.org/files/32253/32253-8.txt	text https://www.gutenberg.org/files/32253/32253-8.zip	text https://www.gutenberg.org/files/32253/32253.txt	text https://www.gutenberg.org/files/32253/32253.zip";
// Assuming we're reading through the file a line at a time and `line`
// holds the current line of text, we can split the line into parts.
let parts: Vec<&str> = line.split('\t').collect();
println!("Author: {}", parts[0]);
println!("Title: {}", parts[1]);
println!("URL: {}", parts[2]);
println!("{} resources", parts.len() - 3);
}

Click Run.

You’re going to represent each entry using an instance of an Entry structure. So you’re going to have to define the structure.

Each entry has an author field, a title field, a URL field and 0 or more resources associated with it. In fact, it’s possible for the author field to be empty. In this case, Project Gutenberg hasn’t recorded the author or the author is unknown. You’ll have to deal with this. Let’s ignore the resources for the moment. We’ll return to them in a later part.

Since the author field can be empty, let’s model that with an Option<String> so that None refers to the no author case and Some(authors) is the case where there are authors. The other fields should not be empty. This suggests that we define our Entry structure like this.

#![allow(unused)]
fn main() {
#[derive(Debug, Clone)]
struct Entry {
    author: Option<String>,
    title: String,
    // Other fields here.
}
}

Your task

First, you’ll need to download the index file and decompress it inside your assignment repository

$ gunzip pgindex.txt.gz

Your repo should look like this

repo
├── Cargo.toml
├── pgindex.txt
└── src
    └── main.rs

Do not add this file to your repository as it’s pretty large for a text file at 46 MB.

In main.rs, define a new Entry structure that has fields for author, title, and URL.

In your run() function, open the file pgindex.txt, wrap it in a BufReader. (If you forget how BufReader works, review Lab 4.)

Then read from the reader line by line (the .lines() method is extremely useful here) and parse each line into an Entry. Finally, print out the debug representation of the Entry.

#![allow(unused)]
fn main() {
use std::fs::File;
use std::io::{BufRead, BufReader};
type Result<T> = std::result::Result<T, Box<dyn std::error::Error>>;
#[derive(Debug)]
struct Entry { author: Option<String>, title: String, }
fn run() -> Result<()> {
    let mut reader = BufReader::new(File::open("pgindex.txt")?);

    for line in reader.lines() {
        //Turn the line from a result to a &str
        let line: &str = &line?;
        // TODO: Parse the entry here.  You will need to set the author field to None if the string length is 0
        let author = Some("TODO: get this from line".to_string());
        let title = "TODO: get this from line".to_string();
        // Other fields
        let entry = Entry {
            author,
            title,
            // ...
        };

        println!("{entry:?}");
    }
    Ok(())
}
}

Tip

If you run your code, you’re going to get a lot of output. Far more than you want. I recommend while you’re testing this part, you replace reader.lines() in the above code with reader.lines().take(5) which will only run the body of the loop for the first 5 lines of the file.

Just don’t forget to remove the .take(5) later.

When this all works, you should see this output when you run your code.

Entry { author: None, title: "\"Contemptible\", by \"Casualty\"", url: "https://www.gutenberg.org/ebooks/18103" }
Entry { author: None, title: "A Handbook of the Boer War With General Map of South Africa and 18 Sketch Maps and Plans", url: "https://www.gutenberg.org/ebooks/15699" }
Entry { author: None, title: "A Jolly by Josh", url: "https://www.gutenberg.org/ebooks/17499" }
Entry { author: None, title: "Baseball ABC", url: "https://www.gutenberg.org/ebooks/19169" }
Entry { author: None, title: "Bibelen, Det nye Testamente", url: "https://www.gutenberg.org/ebooks/2143" }

Note that the first few entries in the index don’t have authors so it prints None.

Part 3. Implementing some methods (55 points)

At this point, you should have some code that can parse the fields of the index file and extract the author, title, and URL fields and stick them in an Entry.

In this part, you’re going to create an Index structure which will hold all of the Entrys and implement some methods on it to perform a simple search.

Example

Recall that to implement methods for a structure, we use an impl block.

#![allow(unused)]
fn main() {
#[derive(Debug, Clone)]
struct Example {
    name: String,
    url: Option<String>,
}

impl Example {
    fn new(name: String) -> Self {
        Self {
            name,
            url: None
        }
    }

    fn has_url(&self) -> bool {
        self.url.is_some()
    }

    fn url(&self) -> Option<String> {
        // Return a copy of the url member.
        self.url.clone()
    }

    fn set_url(&mut self, url: &str) {
        self.url = Some(url.to_string());
    }

    fn into_name(self) -> String {
        self.name
    }
}
}

Some things to notice here:

  1. There’s a new() function which returns a Self. Inside an impl block, Self refers to the type being implemented, here Example;
  2. The new() function does not take self, &self, or &mut self as an argument because this isn’t a method you call on an instance of an Example. Instead, you’d call it to create an instance. This is similar to a constructor in Java although it’s just convention and not inforced. To call new(), we use Example::new("Blarg".to_string()). It’s a good practice to have a new() function (which may or may not take arguments as the situation demands) and returns one of Self, Option<Self> or Result<Self, SomeErrorType>. The last two are used when it’s possible for the new() function to fail.
  3. The has_url() and url() methods take &self as the first argument. This is similar to Python’s self argument and Java’s implicit this argument in methods;
  4. The set_url() method takes &mut self rather than &self. This is necessary because modification to self requires that it be mutable. If you remove the mut from the method signature, it’ll fail to compile; and
  5. The into_name() method takes self rather than &self. This method takes ownership of the Example object (meaning it cannot be used after calling .into_name()) and returns the name field without making a copy (as the url() method does).

Your task

Define a new structure called Index which will hold all of the Entrys you’re going to parse from the index file. You’re going to implement the following methods.

#![allow(unused)]
fn main() {
type Result<T> = std::result::Result<T, Box<dyn std::error::Error>>;
struct Entry;
#[derive(Debug, Clone)]
struct Index {
    // TODO: What field(s) do you want here?
}

impl Index {
    /// Create a new `Index` by reading from `pgindex.txt`.
    fn new() -> Result<Self> {
        todo!()
    }

    /// Returns the number of entries in the index.
    fn len(&self) -> usize {
        todo!()
    }

    /// Returns a vector containing references to entries whose titles
    /// contain `search_string`.
    fn title_search(&self, search_string: &str) -> Vec<&Entry> {
        todo!()
    }

    /// Returns a vector containing references to entries whose authors
    /// contain `search_string`.
    fn author_search(&self, search_string: &str) -> Vec<&Entry> {
        todo!()
    }

    /// Returns a vector containing references to entries whose titles
    /// or authors contain `search_string`.
    fn search(&self, search_string: &str) -> Vec<&Entry> {
        todo!()
    }
}
}

To implement new(), take your code from the run() in the previous part and move it into new(). Rather than print out the entries, you’re going to need to collect them all into some data structure and store them in your Index structure and return that.

The len() method should return the number of entries in the index.

The three search methods should behave similarly: Each converts the search_string to lowercase. Then, for each entry in the index, convert the relevant fields (title, author, or both) to lowercase. If the lowercased search_string is contained in the field, add a reference to the corresponding entry to a results vector and return that.

You’ll probably want something like this.

    fn title_search(&self, search_string: &str) -> Vec<&Entry> {
        let mut results = Vec::new();
        for entry in ??? {
            if ??? {
                results.push(entry);
            }
        }

        results
    }

The other two are similar; however, remember that some entries don’t have authors. You can use

if let Some(ref author) = entry.author {
    // This entry did have an author and `author` is currently a reference
    // to the `String` stored inside the `Option<String>` that is the
    // `entry.author` field.
}

If you omit the ref in if let Some(ref author), you’ll get an error about trying to move out of entry which isn’t mutable so you cannot. ref tells Rust that you want reference instead.

Once you have implemented one of these functions, it’s worth testing your code. If you change your run() function to

fn run() -> Result<()> {
    let index = Index::new()?;

    println!("There are {} entries in the index", index.len());
    let results = index.title_search("the inferno");
    println!("{results:#?}");
    Ok(())
}

you should see three results, The Inferno by Barbusse and O’Brien, The Divine Comedy of Dante Alighieri: The Inferno by Dante, and The Inferno by Strindberg.

In the remaining parts of the lab, you’re going to handle parsing the resources, implementing the Display trait to make printing better, and finally the simple user interface shown in the lab’s overview.

Part 4. Modeling resources (10 points)

In this part, you’re going to create a new type, an enumeration (enum) rather than a structure type (struct), in order to represent (or model) a resource associated with the entries in the Project Gutenberg index.

Briefly, an enum lets us model the situation where a value is one of a finite set of choices. For example, the Option<T> type is either None or it’s a Some(x) (for some value x of type T). Review Chapter 6 of the Rust book as needed for details on enums.

In this case, there are three different types of resources: epub (an ebook format), text, and audio.

Let’s return to the example index entry we saw before:

Roosevelt, Wyn; Schneider, S.	The Frontier Boys in the Sierras; Or, The Lost Mine	https://www.gutenberg.org/ebooks/32253	text https://www.gutenberg.org/ebooks/32253.html.images	text https://www.gutenberg.org/files/32253/32253-h/32253-h.htm	text https://www.gutenberg.org/files/32253/32253-h.zip	epub https://www.gutenberg.org/ebooks/32253.epub3.images	epub https://www.gutenberg.org/ebooks/32253.epub.images	epub https://www.gutenberg.org/ebooks/32253.epub.noimages	text https://www.gutenberg.org/ebooks/32253.txt.utf-8	text https://www.gutenberg.org/files/32253/32253-8.txt	text https://www.gutenberg.org/files/32253/32253-8.zip	text https://www.gutenberg.org/files/32253/32253.txt	text https://www.gutenberg.org/files/32253/32253.zip

Each of the tab-separated fields after the author, title, and URL fields consists of the word epub, text, or audio followed by a space and then a URL of the resource in question.

Your task

Define a new enum Resource which has variants corresponding to the epub, text, and audio resources. Each variant should hold its corresponding URL. For inspiration, look at the definition of enum IpAddr in the book.

You should make sure you derive Debug as normal so that you can print the debug representation using the {:?} format specifier.

let x = Resource::Audio("https://example.com");
println!("{x:?}"); // Prints debug representation of `x`

Next, modify your Entry structure to hold 0 or more Resources and update Index::new() to correctly fill in the resources from the resource fields pgindex.txt.

Tip

You can split each resource field on a space and then match on the first part.

fn main() -> Result<(), Box<dyn std::error::Error>> {
let parts = vec!["audio", "https://example.com"];
match parts[0] {
    "epub" => todo!(),
    "text" => todo!(),
    "audio" => todo!(),
    _ => return Err(format!("Unknown format: {}", parts[0]).into()),
}
Ok(())
}

How cool is that?

At this point, the run() function from the previous part should work (assuming you derived Debug as suggested) and its output should include all of the resources associated with each of the three entries matching the query.

Part 5. Implementing Display (15 points)

The debug representation of a type—the thing you get from the {:?} format specifier—is extremely handy for developers. Rust makes it trivial to automatically get this representation for almost every type using #[derive(Debug)] which is why we add it to all of our types.

The downside is that as a user of a program, it kinda sucks: It shows a bunch of internal details about the types and users don’t care about that; it can be too verbose by including data that’s important for the implementation, but not for the end users; it’s hard to read if you don’t know what you’re looking at; and so on.

In this part, you’re going to remedy this by implementing the std::fmt::Display trait. Unlike Debug, the compiler cannot derive the Display trait for you. Fortunately, Rust makes this pretty easy.

Every trait has a list of required methods that must be implemented by a type that implements a trait. This is just like implementing an interface in Java. Some traits have 0 required methods. (These so-called “marker” traits convey information to the compiler. The Copy trait is one such trait. A type that implements Copy behaves like the primitive types like i32 in that copies of the data are made rather than moving the data.) Most traits have 1 or a small handful of methods that must be implemented. Traits can have additional methods that have default implementations (much like abstract classes in Java).

The Display trait has a single method to implement: fmt(). Read the documentation for Display and pay particular attention to the example showing how to implement Display.

The write!() macro and its companion writeln!() work very similarly to the print!() and println!() macros. There are two differences between write!()/writeln!() and print!()/println!().

  1. The print macros write to stdout whereas the write macros write to their first argument, in this case it’s the mutable reference to a Formatter object; and
  2. The print macros will panic (a safe crash) if writing to stdout fails whereas the write macros return a Result.

The upshot of 2 is that if you want to make multiple calls to writeln!() in the fmt() method, you can use the ? operator as usual.

writeln!(f, "First line")?;
writeln!(f, "Second line")?;
Ok(())

Your task

Implement Display for Entry. It should behave as follows:

  1. If the entry has an author (i.e., it’s not None), then the author line should be printed (see examples below) otherwise the author line should be omitted;
  2. Print the title line;
  3. Print the URL line; and
  4. If there are resources, print the list of resources.

Change your run() function to print out the Display representation for each matching entry.

fn run() -> Result<()> {
    let index = Index::new()?;

    println!("There are {} entries in the index", index.len());
    let results = index.title_search("the inferno");
    for entry in results {
        println!("{entry}");
    }
    Ok(())
}

If everything has been implemented correctly, the three entries should appear like this.

Author: Barbusse, Henri; O'Brien, Edward J. (Edward Joseph)
Title: The Inferno
URL: https://www.gutenberg.org/ebooks/12414
Resources
- text: https://www.gutenberg.org/ebooks/12414.html.images
- epub: https://www.gutenberg.org/ebooks/12414.epub3.images
- text: https://www.gutenberg.org/ebooks/12414.html.noimages
- epub: https://www.gutenberg.org/ebooks/12414.epub.images
- epub: https://www.gutenberg.org/ebooks/12414.epub.noimages
- text: https://www.gutenberg.org/ebooks/12414.txt.utf-8
- text: https://www.gutenberg.org/files/12414/12414-8.txt
- text: https://www.gutenberg.org/files/12414/12414-8.zip
- text: https://www.gutenberg.org/files/12414/12414.txt
- text: https://www.gutenberg.org/files/12414/12414.zip

Author: Dante Alighieri; Sibbald, James Romanes
Title: The Divine Comedy of Dante Alighieri: The Inferno
URL: https://www.gutenberg.org/ebooks/41537
Resources
- text: https://www.gutenberg.org/ebooks/41537.html.images
- text: https://www.gutenberg.org/files/41537/41537-h/41537-h.htm
- epub: https://www.gutenberg.org/ebooks/41537.epub3.images
- epub: https://www.gutenberg.org/ebooks/41537.epub.images
- epub: https://www.gutenberg.org/ebooks/41537.epub.noimages
- text: https://www.gutenberg.org/ebooks/41537.txt.utf-8
- text: https://www.gutenberg.org/files/41537/41537-0.txt

Author: Strindberg, August; Field, Claud
Title: The Inferno
URL: https://www.gutenberg.org/ebooks/44108
Resources
- text: https://www.gutenberg.org/ebooks/44108.html.images
- text: https://www.gutenberg.org/files/44108/44108-h/44108-h.htm
- text: https://www.gutenberg.org/files/44108/44108-h.zip
- epub: https://www.gutenberg.org/ebooks/44108.epub3.images
- epub: https://www.gutenberg.org/ebooks/44108.epub.images
- epub: https://www.gutenberg.org/ebooks/44108.epub.noimages
- text: https://www.gutenberg.org/files/44108/44108-0.txt
- text: https://www.gutenberg.org/files/44108/44108-0.zip

Part 6. Building a TUI (20 points)

In the final part of this lab, you will implement the text-based user interface (TUI) shown in the overview.

Your task

Change your run() function to implement the text-based user interface. See the examples below.

When gutensearch starts, it should print out a welcome message containing the number of entries in the index and example queries.

Then in a loop it should print the instruction to enter a search query and how to exit. Then print out a prompt string "> " (don’t forget to flush stdout so that the prompt appears). Read a line from stdin. If the line is empty (because the user pressed control-d), exit the program successfully. If the line consists only of white space, print the prompt again and wait for more input.

There are three types of queries you need to handle:

  1. title: query: pass the trimmed query portion to Index::title_search(), unless the trimmed query is empty in which case it should print the prompt again;
  2. author: query: same as the title query except use Index::author_search(), of course; and
  3. query: pass the trimmed query to Index::search().

If a colon appears but the trimmed text before the colon is neither title nor author, print an error message and print the prompt again.

Tip

Some tips.

Example

Here’s an example run showing empty queries, errors, a success, and an empty search result for a valid query.

 cargo run --quiet
Welcome to Gutensearch! The index contains 71329 entries.
Example queries to search titles, authors, or both:
  > title: two cities
  > author: wodehouse
  > shelley

Enter a search query (ctrl-d to exit)
> 
> 
> title:
> author:
> subject:
Unknown query type: subject

> title: The Modern Prometheus
Author: Shelley, Mary Wollstonecraft
Title: Frankenstein; Or, The Modern Prometheus
URL: https://www.gutenberg.org/ebooks/20038
Resources
- audio: https://www.gutenberg.org/files/20038/mp3/20038-17.mp3
- audio: https://www.gutenberg.org/files/20038/mp3/20038-03.mp3
- audio: https://www.gutenberg.org/files/20038/mp3/20038-06.mp3
- audio: https://www.gutenberg.org/files/20038/m4b/20038-01.m4a
- audio: https://www.gutenberg.org/files/20038/ogg/20038-12.ogg
- audio: https://www.gutenberg.org/files/20038/m4b/20038-09.m4a
- audio: https://www.gutenberg.org/files/20038/spx/20038-24.spx
- audio: https://www.gutenberg.org/files/20038/ogg/20038-02.ogg
- audio: https://www.gutenberg.org/files/20038/20038-m4b.zip
- audio: https://www.gutenberg.org/files/20038/m4b/20038-04.m4a
- audio: https://www.gutenberg.org/files/20038/ogg/20038-08.ogg
- audio: https://www.gutenberg.org/files/20038/ogg/20038-09.ogg
- audio: https://www.gutenberg.org/files/20038/mp3/20038-09.mp3
- audio: https://www.gutenberg.org/files/20038/m4b/20038-13.m4a
- audio: https://www.gutenberg.org/files/20038/ogg/20038-16.ogg
- audio: https://www.gutenberg.org/files/20038/ogg/20038-14.ogg
- audio: https://www.gutenberg.org/files/20038/mp3/20038-02.mp3
- audio: https://www.gutenberg.org/files/20038/spx/20038-05.spx
- audio: https://www.gutenberg.org/files/20038/m4b/20038-24.m4a
- audio: https://www.gutenberg.org/files/20038/mp3/20038-14.mp3
- audio: https://www.gutenberg.org/files/20038/spx/20038-15.spx
- audio: https://www.gutenberg.org/files/20038/m4b/20038-08.m4a
- audio: https://www.gutenberg.org/files/20038/spx/20038-09.spx
- audio: https://www.gutenberg.org/files/20038/ogg/20038-19.ogg
- audio: https://www.gutenberg.org/files/20038/spx/20038-08.spx
- audio: https://www.gutenberg.org/files/20038/m4b/20038-10.m4a
- audio: https://www.gutenberg.org/files/20038/mp3/20038-20.mp3
- audio: https://www.gutenberg.org/files/20038/spx/20038-10.spx
- audio: https://www.gutenberg.org/files/20038/ogg/20038-06.ogg
- audio: https://www.gutenberg.org/files/20038/spx/20038-11.spx
- audio: https://www.gutenberg.org/files/20038/m4b/20038-15.m4a
- audio: https://www.gutenberg.org/files/20038/ogg/20038-17.ogg
- audio: https://www.gutenberg.org/files/20038/ogg/20038-01.ogg
- audio: https://www.gutenberg.org/files/20038/spx/20038-16.spx
- text: https://www.gutenberg.org/files/20038/20038-index.html
- audio: https://www.gutenberg.org/files/20038/mp3/20038-01.mp3
- audio: https://www.gutenberg.org/files/20038/mp3/20038-19.mp3
- audio: https://www.gutenberg.org/files/20038/spx/20038-07.spx
- audio: https://www.gutenberg.org/files/20038/mp3/20038-10.mp3
- text: https://www.gutenberg.org/files/20038/20038-readme.txt
- audio: https://www.gutenberg.org/files/20038/spx/20038-06.spx
- audio: https://www.gutenberg.org/files/20038/mp3/20038-04.mp3
- audio: https://www.gutenberg.org/files/20038/spx/20038-13.spx
- audio: https://www.gutenberg.org/files/20038/mp3/20038-05.mp3
- audio: https://www.gutenberg.org/files/20038/m4b/20038-02.m4a
- audio: https://www.gutenberg.org/files/20038/m4b/20038-03.m4a
- audio: https://www.gutenberg.org/files/20038/m4b/20038-05.m4a
- audio: https://www.gutenberg.org/files/20038/mp3/20038-12.mp3
- audio: https://www.gutenberg.org/files/20038/ogg/20038-18.ogg
- audio: https://www.gutenberg.org/files/20038/m4b/20038-16.m4a
- audio: https://www.gutenberg.org/files/20038/ogg/20038-03.ogg
- audio: https://www.gutenberg.org/files/20038/mp3/20038-24.mp3
- audio: https://www.gutenberg.org/files/20038/spx/20038-18.spx
- audio: https://www.gutenberg.org/files/20038/mp3/20038-15.mp3
- audio: https://www.gutenberg.org/files/20038/ogg/20038-07.ogg
- audio: https://www.gutenberg.org/files/20038/ogg/20038-05.ogg
- audio: https://www.gutenberg.org/files/20038/spx/20038-17.spx
- audio: https://www.gutenberg.org/files/20038/ogg/20038-15.ogg
- audio: https://www.gutenberg.org/files/20038/spx/20038-12.spx
- audio: https://www.gutenberg.org/files/20038/spx/20038-01.spx
- audio: https://www.gutenberg.org/files/20038/m4b/20038-20.m4a
- audio: https://www.gutenberg.org/files/20038/mp3/20038-07.mp3
- audio: https://www.gutenberg.org/files/20038/mp3/20038-08.mp3
- audio: https://www.gutenberg.org/files/20038/m4b/20038-21.m4a
- audio: https://www.gutenberg.org/files/20038/spx/20038-03.spx
- audio: https://www.gutenberg.org/files/20038/spx/20038-04.spx
- audio: https://www.gutenberg.org/files/20038/ogg/20038-10.ogg
- audio: https://www.gutenberg.org/files/20038/spx/20038-02.spx
- audio: https://www.gutenberg.org/files/20038/ogg/20038-21.ogg
- audio: https://www.gutenberg.org/files/20038/spx/20038-20.spx
- audio: https://www.gutenberg.org/files/20038/m4b/20038-11.m4a
- audio: https://www.gutenberg.org/files/20038/spx/20038-19.spx
- audio: https://www.gutenberg.org/files/20038/ogg/20038-20.ogg
- audio: https://www.gutenberg.org/files/20038/m4b/20038-06.m4a
- audio: https://www.gutenberg.org/files/20038/mp3/20038-16.mp3
- audio: https://www.gutenberg.org/files/20038/ogg/20038-24.ogg
- audio: https://www.gutenberg.org/files/20038/m4b/20038-17.m4a
- audio: https://www.gutenberg.org/files/20038/mp3/20038-13.mp3
- audio: https://www.gutenberg.org/files/20038/m4b/20038-19.m4a
- audio: https://www.gutenberg.org/files/20038/ogg/20038-13.ogg
- audio: https://www.gutenberg.org/files/20038/m4b/20038-14.m4a
- audio: https://www.gutenberg.org/files/20038/ogg/20038-11.ogg
- audio: https://www.gutenberg.org/files/20038/m4b/20038-18.m4a
- audio: https://www.gutenberg.org/files/20038/mp3/20038-18.mp3
- audio: https://www.gutenberg.org/files/20038/mp3/20038-11.mp3
- audio: https://www.gutenberg.org/files/20038/spx/20038-14.spx
- audio: https://www.gutenberg.org/files/20038/m4b/20038-07.m4a
- audio: https://www.gutenberg.org/files/20038/spx/20038-21.spx
- audio: https://www.gutenberg.org/files/20038/mp3/20038-21.mp3
- audio: https://www.gutenberg.org/files/20038/m4b/20038-12.m4a
- audio: https://www.gutenberg.org/files/20038/20038-mp3.zip
- audio: https://www.gutenberg.org/files/20038/ogg/20038-04.ogg

Author: Shelley, Mary Wollstonecraft
Title: Frankenstein; Or, The Modern Prometheus
URL: https://www.gutenberg.org/ebooks/41445
Resources
- text: https://www.gutenberg.org/ebooks/41445.html.images
- text: https://www.gutenberg.org/files/41445/41445-h/41445-h.htm
- epub: https://www.gutenberg.org/ebooks/41445.epub3.images
- epub: https://www.gutenberg.org/ebooks/41445.epub.images
- epub: https://www.gutenberg.org/ebooks/41445.epub.noimages
- text: https://www.gutenberg.org/ebooks/41445.txt.utf-8
- text: https://www.gutenberg.org/files/41445/41445-0.txt

Author: Shelley, Mary Wollstonecraft
Title: Frankenstein; Or, The Modern Prometheus
URL: https://www.gutenberg.org/ebooks/42324
Resources
- text: https://www.gutenberg.org/ebooks/42324.html.images
- text: https://www.gutenberg.org/files/42324/42324-h/42324-h.htm
- text: https://www.gutenberg.org/files/42324/42324-h.zip
- epub: https://www.gutenberg.org/ebooks/42324.epub3.images
- epub: https://www.gutenberg.org/ebooks/42324.epub.images
- epub: https://www.gutenberg.org/ebooks/42324.epub.noimages
- text: https://www.gutenberg.org/ebooks/42324.txt.utf-8
- text: https://www.gutenberg.org/files/42324/42324-8.txt
- text: https://www.gutenberg.org/files/42324/42324-8.zip
- text: https://www.gutenberg.org/files/42324/42324.txt
- text: https://www.gutenberg.org/files/42324/42324.zip

Author: Shelley, Mary Wollstonecraft
Title: Frankenstein; Or, The Modern Prometheus
URL: https://www.gutenberg.org/ebooks/84
Resources
- text: https://www.gutenberg.org/ebooks/84.html.images
- text: https://www.gutenberg.org/files/84/84-h/84-h.htm
- epub: https://www.gutenberg.org/ebooks/84.epub3.images
- epub: https://www.gutenberg.org/ebooks/84.epub.images
- epub: https://www.gutenberg.org/ebooks/84.epub.noimages
- text: https://www.gutenberg.org/ebooks/84.txt.utf-8
- text: https://www.gutenberg.org/files/84/84-0.txt


Enter a search query (ctrl-d to exit)
> author: Not A. Real Author
No matching results


Enter a search query (ctrl-d to exit)
>

Your output need not match mine exactly, but it should be substantially similar.

If your code works, congratulations! Submit your work and then consider finding a book on Project Gutenberg and giving it a read!

Submitting (5 points)

To submit your lab, you must commit and push to GitHub before the deadline.

Your repository should contain the following files and directory.

repo
├── Cargo.lock
├── Cargo.toml
├── README
└── src
    └── main.rs

Any additional files you have added to your repository should be removed from the main branch. (You’re free to make other branches, if you desire, but make sure main contains the version of the code you want graded.)

The README should contain the names of both partners (or just your name if you worked alone…but please don’t work alone if you can manage it).

After you have everything working, make sure you add your code, commit it, and push it to GitHub.

Lab 6: Processes I

Due: 2024-10-28 at 23:59

In this lab and the next, you are going to write several command line applications that work with processes. You will use the same repository for both labs.

The goal is to learn

  • how to create a library and binaries in the same project;
  • how to create a module;
  • one way to implement generic error handling;
  • how to define new structs and implement methods for them;
  • how to call functions in the C standard library to interact with the operating system;
  • about how Linux exposes information to users via the /proc file system; and
  • how processes, process groups, sessions, and terminals all fit together!

Unlike the previous labs, the next two are going to be specific to Linux. Windows, macOS, and other operating systems use different mechanisms to expose information to user programs. See the instructions for remote coding.

In the first part of this lab, you’re going to create a Rust project that contains both a library and a binary. In the second part of the lab, you’re going to copy the type alias Result<T> you created in the last lab to this lab because it’s extremely useful. In the third part, you’ll read information from /proc. In the remaining parts of this lab, you’re going to implement a process library that will contain functions that return information about running processes as well as most of an implementation for the ps command line tool which displays information about processes.

Preliminaries

First, find a partner. You’re allowed to work by yourself, but I highly recommend working with a partner. Click on the assignment link. One partner should create a new team. The team name cannot be the same name used for a previous lab. The second partner should click the link and choose the appropriate team. (Please don’t choose the wrong team, there’s a maximum of two people and if you join the wrong one, you’ll prevent the correct person from joining.) You cannot choose a team name you’ve used previously.

Once you have accepted the assignment and created/joined a team, you can clone the repository and begin working.

Be sure to ask any questions on Ed.

Compiler warnings and errors

Make sure your code compiles and passes clippy without errors or warnings. That is, running

$ cargo clean
$ cargo build
$ cargo clippy

should build your program without errors or warnings.

Formatting

Your code must be formatted by running cargo fmt.

Part 1: Making a library (5 points)

In this part, you’re going to create a new project using cargo that will build a library and binaries. A library is just a collection of code that we can use in multiple applications. We’ve seen and used several libraries already, rand, colored, and clap. Now it’s time to make your own!

There are several key differences between a binary and a library project.

  • A library project has a src/lib.rs file and a binary project has a src/main.rs file. (The names are configurable by changing the Cargo.toml file, but let’s stick with the defaults unless we have a reason to change the names.)
  • Libraries do not have a main function. Instead, they expose types, data, and functions that other code can use.

Your task

Clone the assignment repository (which will be empty) on your machine.

From inside the assignment repo, initialize a new library project using cargo.

$ cargo init --lib --name process

Note that we used cargo init rather than cargo new. The difference is that new will create a new directory whereas init will use an existing one (by default, the current directory, .). We’re using init because we’re not going to have multiple projects inside the repository. It’s more normal to just have a single project per get repo.

At this point, we cannot run $ cargo run because there’s no binary (we’re going to change that momentarily). Indeed, if you look at the files created by running cargo init --lib, you’ll see there’s a src/lib.rs but no src/main.rs. The former is used for a library whereas the latter is used for a binary.

Let’s add a binary.

Create a new directory named bin inside the src directory. Inside bin, create a file named runnable.rs. By default, cargo will look inside the src/bin directory and create executables, one per file. In this case, we’re creating an executable runnable that you will implement in Part 3.

For now, create a main function that will call the add method that cargo init generated inside the process library’s lib.rs

mod process { pub fn add(left: usize, right: usize) -> usize { left + right } }
fn main() {
    let result = process::add(5, 10);
    println!("5 + 10 = {result}");
}

Running $ cargo run will now run the runnable program and print 5 + 10 = 15.

If we wanted (and we will later), we could create a new file inside src/bin and then we would have multiple binaries that could use the library code. Neat!

Once you can run $ cargo run, it’s time to move on to next part.

Part 2: Result, again (5 points)

In the previous lab, you created a type alias Result<T> that you used for returning multiple types of errors from a function. You’re going to make extensive use of that type in this lab.

As a reminder, here’s a function that reads an integer from a file and returns it.

#![allow(unused)]
fn main() {
use std::fs;
use std::path::Path;

// A generic Result type that can hold any type of Error.
type Result<T> = std::result::Result<T, Box<dyn std::error::Error>>;

fn read_int_from_file(path: &Path) -> Result<i32> {
    let num = fs::read_to_string(path)?
        .trim()
        .parse()?;
    Ok(num)
}
}

This function can fail in multiple ways including the file not existing or being inaccessible and the contents of the file not being an integer.

There are three key benefits of doing this.

  1. We can use the ? operator to propagate errors to the calling function. E.g., if fs::read_from_string() was unable to read the file, read_int_from_file() will return an Err(err) where err is the boxed up error. This makes our code shorter and easier to read (with practice).

  2. The ? operator knows how to turn any Result<T, E> (where E is some type that implements std::error::Error, as all of the standard library errors do) into a Result<T, Box<dyn std::error::Error>>.

    So even though neither fs::read_from_file() nor str::parse() return the correct type of Result, ? will perform the conversion for us.

  3. The standard library knows how to turn a String into a Box<dyn std::error::Error>. This is incredibly helpful as the next tip shows.

Tip

When you are dealing with file input/output, we can use Result::map_err() to attach context to our errors. map_err() lets us convert from one error type to another. Here’s an example of attaching the path of a file that failed to open to the error message.

use std::fs::File;
use std::path::{Path, PathBuf};

type Result<T> = std::result::Result<T, Box<dyn std::error::Error>>;

fn bad_error_message(path: &Path) -> Result<()> {
    let mut f = File::open(path)?;
    todo!()
}

fn good_error_message(path: &Path) -> Result<()> {
    let mut f = File::open(path)
        .map_err(|err| format!("{}: {err}", path.display()))?;
    todo!()
}

fn main() {
    let path = PathBuf::from("/path/to/no-such-file");

    println!("Calling the function with the bad error message:");
    if let Err(err) = bad_error_message(&path) {
        // This should be eprintln!()
        println!("{err}");
    }

    println!("\nCalling the function with the good error message:");
    if let Err(err) = good_error_message(&path) {
        // This should be eprintln!()
        println!("{err}");
    }
}

Click the Run button and look at the two error messages. Notice that the good error message shows the path of the file that couldn’t be opened and the bad one does not.

Let me assure you there’s little that’s as frustrating as trying to debug code (in any language) where the error messages don’t tell you which files couldn’t be opened.

You are going to want to transform the errors, particularly file I/O errors, in this way to add context.

Your task

Start by deleting the contents of lib.rs and create a public type alias named Result<T> which is an alias for a standard Result whose error is boxed just as you did in the previous lab.

In order to be able to use this type from outside the process library, you need to make the type public. To do that, add pub before it.

pub type Result<T> = ...;

This is similar to the public keyword in Java.

In runnable.rs, add the line

use process::Result;

Now, any time you use Result in runnable.rs, it will use the Result type you defined in the library.

At this point, your runnable binary is now broken because you’ve removed the add function.

In Part 3, you will fix that by making runnable read a file from /proc, parse it, and print out some information.

Part 3. A proc primer (15 points)

In this part of the lab, you’re going to make a simple program that reads some information about running processes from a virtual file.

Linux exposes a bunch of information to user processes through a virtual file system called procfs. It’s a virtual file system because there are no actual files. Instead, when you read from the files in the procfs file system, Linux returns information about processes. Exposing information to users as a file is a common Unix approach. Read this very short Wikipedia article about the philosophy where everything is a file.

The procfs file system is usually mounted at /proc, meaning we can read the virtual files in the file system by treating them as normal files inside the /proc directory.

The proc(5) man page describes the contents of the files in /proc in great details. We’ll be returning to man page repeatedly. But it’s very long so don’t go read it now. This lab write up will tell you when and where to look in the file for the information you need.

Your task

In runnable.rs, write a function

#![allow(unused)]
fn main() {
type Result<T> = std::result::Result<T, Box<dyn std::error::Error>>;
/// Returns the number of currently runnable kernel scheduling entities
/// (processes and threads) and  the total number of kernel scheduling entities.
fn scheduling_entities() -> Result<(usize, usize)> {
    todo!()
}
}

This function is going to return either Ok((runnable, total)) or an Err(err). That is, on success, it’ll return a pair of numbers, (runnable, total) (hence the two parentheses in the Ok case) and on error, well, it’ll return some error. The number of runnable “kernel scheduling entities” tells us how many processes (and threads, which we’ll talk about later in the course) are currently ready to run (or are running). A process that’s blocked waiting for input, for example, is not runnable.

Linux exposes that information to you (as well as other information) in the virtual file /proc/loadavg. For example,

$ cat loadavg
0.00 0.01 0.04 5/756 26030

Remember that to split a String (or really a &str) on white space, we can use str::split_whitespace() to get an iterator and we can use collect() to read the elements from an interator into a collection, in this case, a Vec<&str>. If we wanted to split a string on a particular character, we can use the str::split() method. Click the link for the documentation.

The manual page informs me that the final number is the process identifier, PID, for the most recently created process.

#![allow(unused)]
fn main() {
let loadavg = "0.00 0.01 0.04 5/756 26030";
let parts: Vec<&str> = loadavg.split_whitespace().collect();
println!("Most recent PID: {}", parts[4]);
}

Figure out which of those numbers corresponds to the runnable and which to the total kernel scheduling entities by reading the man page and searching it for the documentation for /proc/loadavg.

Recall that you can parse an integer from a string using the parse() method.

Here’s an example main function you can use in runnable.rs to print out the number of runnable entities.

type Result<T> = std::result::Result<T, Box<dyn std::error::Error>>;
fn scheduling_entities() -> Result<(usize, usize)> { Ok((10, 300)) }
fn main() {
    match scheduling_entities() {
        Ok((runnable, total)) => {
            println!("Runnable entities {runnable} of {total}");
        }
        Err(err) => {
            eprintln!("{err}");
            std::process::exit(1);
        }
    }
}

If you run your code on a Linux system, you should see something like

Runnable entities 10 of 100

(your numbers will differ).

If you run your code on a non-Linux system (and you followed the tip about attaching file names to error messages using .map_err()), you’ll get an error like this.

$ cargo run
/proc/loadavg: No such file or directory (os error 2)

Now that you have some experience reading the procfs manual and getting information out of the files there, it’s time to start working on the process library in the next part.

Part 4. Modeling a process (15 points)

In this part of the lab, you’re going to define a new struct in the library (which is the lib.rs file and additional files you’ll create) which you’ll use to represent, or model, a process. This means that your struct, is going to contain enough information about a process for us to perform our task: implementing the ps command-line utility which prints out information about running processes.

A process is an instance of a running program. For example, there is one bash program which lives in the file system at /bin/bash but we can run many different instances of bash at the same time. Each instance is a process. The operating system assigns each running process a nonnegative number called the process identifier or PID.

Open a terminal and run $ ps. You should see output similar to the following.

$ ps
  PID TTY          TIME CMD
29604 pts/4    00:00:00 bash
31292 pts/4    00:00:00 ps

In the first column, we can see the process identifier. The second column gives the name of the controlling terminal. We’ll come back to that shortly. The third column gives the total execution time of the process. The fourth column gives the command name, or name of the process.

Returning to Rust, in order to replicate this behavior, your struct is going to contain a PID, a representation of the controlling terminal, the execution time, and the command name to start with. We’ll worry about the details of ps in a subsequent part.

Because this code will be useful in multiple binaries, you’re going to write this code in a new proc module in the library. Let’s get started!

Your task

First, create a new module named proc as follows:

  • Add the line mod proc; to the top of your lib.rs which will now look like this.
    mod proc;
    
    pub type Result<T> = std::result::Result<T, Box<dyn std::error::Error>>;
    This line informs the compiler that there’s a file named proc.rs in src which contains the code for a proc module.
  • Create the src/proc.rs file. Add the line
    use super::Result;
    to it. This will let you use the Result alias you defined in lib.rs in proc.rs.

Second, create a new struct to model a process. We can see from the ps output above that we’re going to need a process identifier, a controlling terminal, the total execution time, and the command name. I suggest starting simple and building it once you have the basics works. With that in mind, let’s start with a Process structure that looks like this.

#![allow(unused)]
fn main() {
/// Models a Linux process.
#[derive(Debug)]
pub struct Process {
    /// Process ID.
    pub pid: i32,

    /// Command name.
    pub command_name: String,
}
}

You will add more to this structure as you continue but for now, this is enough to get started.

The next step will be to implement a function Process::for_pid() in proc.rs that takes a PID as an argument and returns a Result<Process>.

#![allow(unused)]
fn main() {
pub type Result<T> = std::result::Result<T, Box<dyn std::error::Error>>;
pub struct Process { pid: i32, command_name: String };
impl Process {
    /// Look up information about a running process with the given PID.
    pub fn for_pid(pid: i32) -> Result<Self> {
        let command_name = todo!("Need to look up the command name");
        Ok(Self {
            pid,
            command_name,
        })
    }
}
}

Recall that inside an impl we can refer to the current type as Self hence Process::for_pid() is returning a Result<Process>. This returns a Result<Process> rather than a Process because there might not be any such process identifier and Process::for_pid() must return an error in that case.

Linux exposes information about each running process in the procfs file system by exposing a virtual directory /proc/<pid> where <pid> is the numeric process identifier. Run $ ls /proc to see all of the directories corresponding to processes, and other virtual files and directories with other information. You already saw /proc/loadavg but now we’ll investigate the numbered directories.

Run $ cat /proc/self/stat (not /proc/self/status which is similar information but designed for human consumption rather than programatic manipulation). You probably got some mysterious output like

$ cat /proc/self/stat
32177 (cat) R 29604 32177 29604 34820 32177 4194304 79 0 0 0 0 0 0 0 20 0 1 0 251402402 7798784 200 18446744073709551615 94224239239168 94224239270352 140734102210720 0 0 0 0 0 0 0 0 0 17 9 0 0 0 0 0 94224241367664 94224241369280 94224242470912 140734102216746 140734102216766 140734102216766 140734102220783 0

As we’ll see, this file contains most of the information we will need. Look in the proc(5) man page for the file /proc/pid/stat. There are many fields here. You’re going to pick out just the ones you need to fill out an instance of the Process structure.

Ideally, we’d like to read this file into a string and then split the string on white space into the fields. Then we can parse just the fields we care about. The one hitch here is the command name which is in the second field. It can have spaces or newlines or even a ) character! This complicates our task, but not by a great deal. The command name is preceded by a ( and followed by a ) and the parentheses characters will not appear anywhere else in this file other than the command name.

Use str::find('(') and str::rfind(')') to get the indices of the first ( and last ) characters and then extract three substrings.

Example

pub type Result<T> = std::result::Result<T, Box<dyn std::error::Error>>;
fn main() -> Result<()> {
// Imagine we read this string in from `/proc/32177/stat`.
let pid = 32177;
let stat = "32177 (odd ) proc name) R 29604 32177 29604 34820 32177 4194304 79 0 0 0 0 0 0 0 20 0 1 0 251402402 7798784 200 18446744073709551615 94224239239168 94224239270352 140734102210720 0 0 0 0 0 0 0 0 0 17 9 0 0 0 0 0 94224241367664 94224241369280 94224242470912 140734102216746 140734102216766 140734102216766 140734102220783 0";
let comm_start = stat.find('(')
    .ok_or_else(|| format!("Couldn't parse stat file for PID {pid}"))?;
let comm_end = stat.rfind(')')
    .ok_or_else(|| format!("Couldn't parse state file for PID {pid}"))?;

let pid_as_str = stat[..comm_start].trim();
let command_name = &stat[comm_start + 1..comm_end];
let remaining_fields = stat[comm_end + 1..].trim();
println!("Before command name: {pid_as_str}");
println!("Command name:        {command_name}");
println!("After command name:  {remaining_fields}");
Ok(()) 
}

Click the Run button to see the output.

The str::find() and str::rfind() methods return an Option<usize>. There are a bunch of methods for converting between an Option<T> and a Result<T, E>. The Option::ok_or_else(f) method turns a Some(x) into an Ok(x) and a None into Err(f()) (where f is a zero-argument function). Read the documentation for details and examples.

The upshot that if there aren’t at least one each of ( and ), the find() or rfind() methods will return None which the ok_or_else() will turn into Err("Couldn't parse...") and the ? will return the error as usual.

I recommend creating a mutable Vec::<&str>, pushing pid_as_str and command_name, and then split the remaining_fields on white space and use Vec::extend() to add the elements from the split. More complete documentation for extend() is here. The result will be a Vec containing the fields described in the proc(5) man page. But note that the fields in the man page are numbered starting at 1 whereas the elements in a Vec are numbered starting at 0!

Assuming the Vec is named fields, you can now construct a new instance of Process with something like

let proc = Process {
    pid: fields[0].parse()?,
    command_name: fields[1].to_string(),
};

Implement the function for_pid().

#![allow(unused)]
fn main() {
pub type Result<T> = std::result::Result<T, Box<dyn std::error::Error>>;
pub struct Process { pid: i32, command_name: String };
impl Process {
    /// Look up information about a running process with the given PID.
    pub fn for_pid(pid: i32) -> Result<Self> {
        let path = format!("/proc/{pid}/stat");
        let mut fields: Vec<&str> = Vec::new();
        todo!("Open path and read its contents and split into fields as described");
        Ok(Self {
            pid: fields[0].parse()?,
            command_name: fields[1].to_string(),
        })
    }
}
}

We’re nearly done with this part! All that remains is to make the Process structure visible outside of the library so that you can use it in bin/ps.rs, the new binary you’re about to create.

There are multiple ways to do this, but the way that you’re going to do it is by re-exporting the type. Add the line

pub use proc::Process;

to your lib.rs.

It’s time to start implementing the ps utility so create the file bin/ps.rs. Just as with bin/runnable.rs, Cargo will compile it into a binary named ps.

Add the following code to your new ps.rs.

use process::{Process, Result};

fn run() -> Result<()> {
    let proc1 = Process::for_pid(1)?;
    println!("{proc1:?}");
    Ok(())
}

fn main() {
    if let Err(err) = run() {
        eprintln!("{err}");
        std::process::exit(1);
    }
}

Try to understand what this code is doing before running it.

Tip

If you try to run your code using $ cargo run, you’ll get an error message.

error: `cargo run` could not determine which binary to run. Use the `--bin` option to specify a binary, or the `default-run` manifest key.
available binaries: runnable, ps

This makes sense. You now have two binaries, runnable and ps and cargo doesn’t know which one you want to run unless you tell it. You can use $ cargo run --bin ps to select the binary you want. This is a little tedious. Since you’re going to be working with ps for the rest of the lab, I suggest making that the default binary to run.

Edit Cargo.toml and add the line default-run = "ps" in the package settings. For reference, the package settings in my solution looks like this

[package]
name = "process"
version = "0.1.0"
edition = "2021"
description = "ps - report process status"
default-run = "ps"

$ cargo run now runs ps by default.

If all has gone according to plan, you should see

Process { pid: 1, command_name: "systemd" }

Part 5. More processing (40 points)

In this part, you’ll add more fields to your Process structure and implementing a few methods. The next part will explain the meaning of the fields in more detail.

Your task

Add the following fields to your Process structure

  • process state char
  • parent process ID i32,
  • session ID i32,
  • controlling terminal i32,
  • total execution time i64 (not i32),
  • command line Vec<String>, and
  • a user ID u32

You can pick whatever names you like. I chose to name my structure fields after the field names given in the proc(5) man page, when applicable.

The process state, parent process ID, session ID, and controlling terminal are parsed directly from fields named state, ppid, session, and tty_nr.

The total execution time comes from adding the results of parsing the fields utime and stime as i64. So something like this.

let utime: i64 = fields[14].parse()?;
let stime: i64 = fields[15].parse()?;
let execution_time = utime + stime;

The command line and user ID are slightly more complicated.

The command line (when available) is stored in /proc/<pid>/cmdline. Search the proc(5) man page for /proc/pid/cmdline (only the first paragraph of its description is important for us here).

Recall that a C string ends with a single 0 byte to indicate the end of the string. The same convention is happening here. So /proc/<pid>/cmdline consists of strings terminated by a 0.

Fortunately, Rust is awesome and comes with a standard library function for splitting “string data that is terminated, rather than separated by a pattern.” That function is str::split_terminator(). So read the file to a string and then use split_terminator() with a pattern of '\0', a zero character.

The user ID (often abbreviated UID) of the user who is running the process is not stored in any of the virtual files. Instead, Linux exposes that information as the user who owns the files in /proc/<pid>/. You have already opened up /proc/<pid>/stat so we can simply ask for the UID.

Example

#![allow(unused)]
fn main() {
use std::fs::File;
use std::os::unix::fs::MetadataExt;

pub type Result<T> = std::result::Result<T, Box<dyn std::error::Error>>;
fn run() -> Result<()> {
let path = "/proc/1/stat";
let mut file = File::open(path)
    .map_err(|err| format!("{path}: {err}"))?;
let metadata = file.metadata()
    .map_err(|err| format!("Couldn't get metadata for {path}: {err}"))?;
let uid = metadata.uid();

println!("The UID that owns {path} is {uid}");
Ok(())
}
let _ = run();
}

Four more methods to implement. One is involved, three are easy.

#![allow(unused)]
fn main() {
pub type Result<T> = std::result::Result<T, Box<dyn std::error::Error>>;
struct Process;
impl Process {
    /// Look up information about a running process with the given PID.
    pub fn for_pid(pid: i32) -> Result<Self> {
        todo!("You did this already")
    }

    /// Look up information for the current process.
    pub fn for_self() -> Result<Self> {
        todo!()
    }

    /// Returns a list of all running processes.
    pub fn all_processes() -> Result<Vec<Self>> {
        todo!()
    }

    /// Returns `true` if the process is a session leader.
    pub fn is_session_leader(&self) -> bool {
        todo!()
    }

    /// Returns `true` if the process has a controlling terminal.
    pub fn has_tty(&self) -> bool {
        todo!()
    }
}
}

Tip

The eagle-eyed among you may have noticed a symbolic link /proc/self. This always points to /proc/<pid> where <pid> is the process’s own PID.

However, there’s an easier way to get a process’s identifier for the current process than reading the target of the symbolic link.

#![allow(unused)]
fn main() {
let pid = std::process::id();
println!("The current process ID is {pid}");
}

The Process::for_self() function is easy if you use std::process::id() and Self::for_pid(). The one minor catch is std::process::id() returns a u32 and we want an i32. We can cast a u32 into an i32.

#![allow(unused)]
fn main() {
let x: u32 = 1;
let y: i32 = x as i32;
println!("{x} {y}");
}

The last two methods you have to implement, is_session_leader() and has_tty() are convenience methods you’ll use later. The meanings will be explained in the next part. For right now, a process is a session leader if and only if its PID matches its session id. A process has a controlling terminal if and only if its controlling terminal field is not 0.

The final function, Process::all_processes() is more complicated. You need to examine the contents of the /proc directory and find all of the directories whose names are numbers.

Example

Here’s some example code for reading the contents of the `/proc directory and printing the names of all directories contained within.

#![allow(unused)]
fn main() {
use std::fs;

pub type Result<T> = std::result::Result<T, Box<dyn std::error::Error>>;
fn go() -> Result<()> {
for entry in fs::read_dir("/proc")? {
    // Unwrap the result and return any errors that result from reading /proc.
    let entry = entry?;

    // If the entry is not a directory or reading the metadata failed, continue.
    match entry.metadata() {
        Ok(metadata) if metadata.is_dir() => (),
        _ => continue,
    }

    let file_name = entry.file_name();
    if let Some(name) = file_name.to_str() {
        println!("{name}")
    }
}
Ok(())
}
let _ = go();
}

There are lots of ways to determine if a string is a number, but the easiest is probably to try to parse it as an i32 using the slightly unpleasant “turbofish” syntax:

#![allow(unused)]
fn main() {
println!("{}", "1234".parse::<i32>().is_ok());
println!("{}", "12e4".parse::<i32>().is_ok());
}

Once you find a directory in /proc whose name is a number, you can use Process::for_pid() to get information about that process.

Bug

After reading the /proc directory and learning that there’s a process with PID 12345, for example, you might call Process::for_pid(12345) and this could fail! If the process has exited between you learning of its existence and you reading its /proc entry, the Linux kernel will have deleted its /proc entry and the File::open() will fail. If this happens, Process::all_processes() should continue finding the other processes rather than returning an error.

So in particular, you do not want to have code like this.

let mut result: Vec<Process> = Vec::new();

for entry in fs::read_dir("/proc")? {
    // ...
    let proc = Process::from_pid(pid)?; // <-- Don't do this here!
    result.push(proc);
}

The ? will cause an early return on an error and we don’t want that.

Test that your code is working by changing ps.rs’s main to iterate over all processes and print out the debug representation.

Part 6. Implementing ps (30 points)

Click the link and read the description of ps as defined by the POSIX standard. (The Portable Operating System Interface, or POSIX, is a standard that various Unix-like operating systems including Linux distributions, BSD distributions, macOS, and others follow to a greater or lesser extent.)

There are many different versions of ps. Most support the options and output format specified in the POSIX standard as well as their own, nonstandard options. We’re going to restrict ourselves to a subset of POSIX.

Your task

Implement a subset of the ps utility functionality. In particular, your implementation needs to support the following command-line arguments.

  • -a Write information for all processes with controlling terminals. Omit processes for which is_session_leader() returns true.
  • -A/-e Write information for all processes.
  • -d Write information for all processes, except session leaders.
  • -f Generate a full listing.
  • -l Generate a long listing.

The -e option and the -A option are the same. You can support this in clap via

#[derive(Debug, Parser)]
#[command(author, version, about = "ps - report process status", long_about = None)]
struct Config 
    /// Write information for all processes.
    #[arg(short = 'A', short_alias = 'e')]
    all: bool,

    // ...
}

in your Config structure. (Review the Lab 4 for a refresher on how to parse command-line arguments with clap.)

Notice that none of these options have equivalent long options. POSIX command line utilities tend not to.

The -a, -A, -d, and -e flags select which processes get displayed whereas the -f and -l options control what information about the selected processes to display.

Given a particular Process, you can decide if it should be printed by following this simple algorithm:

  1. If the -A or -e flag was passed, then print the process.
  2. If the -a flag was passed and the process has a controlling terminal and the process is not a session leader, then print the process.
  3. If the -d flag was passed and the process is not a session leader, then print the process.
  4. Otherwise if none of -a, -A, -d, and -e were passed and the process’s user ID is the same as the user ID of the ps process and the process’s controlling terminal is the same as the controlling terminal of the ps process, then print the process.

Remember that you can use Process::for_self() to get both the current user ID and controlling terminal.

I recommend implementing the algorithm to select the processes to print first and then print out the debug representation of the Process. Your printed output will differ, but the processes that you select for printing should match the behavior of the real ps when called with some of -a, -A, -d, and -e. Make sure you try combinations of arguments and none at all to make sure you’re getting the same number of processes.

Next, implement printing out the table that ps prints out. The spacing of the columns of output doesn’t need to match the spacing of the real system. So try to pick field widths large enough to contain the values in the table. (The real ps does a poor job of this because it expects some fields, like user IDs to be short.)

There are 15 fields that ps will print about each process (subject to the -l and -f options). These are listed in the STDOUT section of the ps standard. We will only support a subset of these, namely

  • S,
  • UID,
  • PID,
  • PPID,
  • TTY,
  • TIME, and
  • CMD.

Look at the table in the man page to see which fields get printed when the -l and/or -f options are present.

You should print the fields in the same order as shown but only if the appropriate flag was passed as indicated in the middle column of the table in the STDOUT section of the ps standard. For example, the state field, S, is only displayed if -l is passed.

For this part, make sure you can print out the S, UID (as a number), PID, PPID, and CMD fields. We’ll tackle the TTY, TIME, and non-numeric UID in the next lab. In the mean time, print out the UID as numeric, even with -f, print out the process’s tty_nr field for TTY and print the total execution time as an integer for the time.

Here are some examples.

$ cargo run --quiet --
       PID TTY             TIME CMD
     11050 34816             32 ps
     29327 34816             55 bash

$ cargo run --quiet -- -l
S        UID        PID       PPID TTY             TIME CMD
R 1425750506      11135      29327 34816             29 ps
S 1425750506      29327      29326 34816             57 bash

$ cargo run --quiet -- -f
UID               PID       PPID TTY             TIME CMD
1425750506      11141      29327 34816             28 target/debug/ps -f
1425750506      29327      29326 34816             58 -bash

$ cargo run --quiet -- -fl
S UID               PID       PPID TTY             TIME CMD
R 1425750506      11155      29327 34816             28 target/debug/ps -fl
S 1425750506      29327      29326 34816             60 -bash

(You can see the execution time for that bash process creep up from 57 to 58 to 60 as I run commands because Bash has to wake up, process the command line, arrange for Linux to start running the command, and after the command runs, print a prompt and go back to sleep waiting for input. As you can see in the real output below, the actual execution time for bash after that last one is less than 1 second. In fact, it was about 0.6 seconds. We’ll see how to determine this in the next lab.)

For comparison, here’s the real output with those options.

$ ps
  PID TTY          TIME CMD
 8637 pts/0    00:00:00 ps
29327 pts/0    00:00:00 bash

$ ps -l
F S   UID   PID  PPID  C PRI  NI ADDR SZ WCHAN  TTY          TIME CMD
0 R 1425750506 8641 29327  0 80 0 - 7223 -      pts/0    00:00:00 ps
0 S 1425750506 29327 29326  0 80 0 - 9167 wait  pts/0    00:00:00 bash

$ ps -f
UID        PID  PPID  C STIME TTY          TIME CMD
steve     8645 29327  0 12:57 pts/0    00:00:00 ps -f
steve    29327 29326  0 Jul31 pts/0    00:00:00 -bash

$ ps -fl
F S UID        PID  PPID  C PRI  NI ADDR SZ WCHAN  STIME TTY          TIME CMD
0 R steve     8651 29327  0  80   0 -  9344 -      12:57 pts/0    00:00:00 ps -fl
0 S steve    29327 29326  0  80   0 -  9167 wait   Jul31 pts/0    00:00:00 -bash

Note that the CMD field displays the command_name unless -f was passed in which case it shows all of the command line arguments, if it can. If -f was passed but /proc/<pid>/cmdline was an empty file, your process.command_line field will be empty. In this case, the standard says to print the command name in brackets. You might want some code like this.

#![allow(unused)]
fn main() {
struct Config { full: bool }
struct Process { command_name: String, command_line: Vec<String> }
let config = Config { full: true };
let process = Process { command_name: String::new(), command_line: Vec::new() };
let cmd: String = if !config.full {
    process.command_name
} else if process.command_line.is_empty() {
    format!("[{}]", process.command_name)
} else {
    process.command_line.join(" ")
};
}

Info

If your code is working as described up to this point (congratulations!), you’re done and can submit it by following the instructions. You don’t need to keep reading.

However, I want to explain some of the terms you encountered regarding processes. Specifically, “session” and “controlling terminal.” What the heck are these? Read on!

In some of the early days of computing, a user would interact with a computer by sitting down at a computer terminal which consisted of a keyboard and a display. Earlier terminals contained a printer rather than a display and output was displayed to the user by printing it! Click on that link to look at some images of computer terminals on Wikipedia. Note in particular that a hard-copy terminal (one that printed the output) was called a teletypewriter or TTY. This terminology persists today. The TTY in the ps output refers to exactly this. But that’s getting ahead of our story here.

If you think about multiple users using hardware terminals to interact with the same computer, it’s clear that input from the keyboard in one terminal should only go to certain processes on the computer, namely the ones that particular user is running from that terminal. By the same token, any output from those processes should go back to the same terminal.

We no longer use hardware terminals any longer, of courrse. Instead, we use terminal emulation programs that are a software implementation of the hardware terminals.

The notion of sessions and controlling terminals arose out of the need to keep input/output associated with the correct processes. These concepts were standardized in the POSIX standard in the General Terminal Interface portion of the standard. Read that link if you want all of the details.

In fact, there’s a third concept, a process group, which is essential to understand for this as well. I’ll layout the terminology first and then explain what happens when a user interacts with the system.

  • Process – A running instance of a computer program. Every process has a process ID (PID), a parent process ID (PPID), a process group ID (PGID), and a session ID (SID), and a controlling terminal (TTY). All of these IDs are just integers and are present in the /proc/<pid>/stat file as you saw.
  • Process group – A group of related processes that share a PGID. A process group is usually created by the shell for each pipeline. E.g., if you run $ sort foo | uniq -c | sort -rn, bash will create a new process group containing 3 processes, the two sort processes and the uniq process.
  • Session – A set of process groups. Every process in each process group in a session has the same SID and the same TTY (or they all have none).
  • Controlling terminal – A controlling terminal is an abstraction. It’s just an integer that the operating system kernel, Linux in this case, uses to represent a hardware of software terminal. Every controlling terminal has a foreground process group.
  • Foreground process group – Definitionally, a foreground process group is just the process group that is the foreground process group for the controlling terminal of the process group’s member processes. But more importantly, only processes in the foreground process group of a controlling terminal can receive input from that terminal.
  • Process group leader – A process is the process group leader of its process if and only if its PID is equal to its PGID.
  • Session leader – A process is the session leader of its session if and only if its PID is equal to the SID. We saw this in the Process::is_session_leader() function in Part 4.

These all fit together roughly like this.

The user sits down at the terminal and logs in. The shell, let’s say bash, is started. Initially, the bash process is the only process in its process group and its process group will be the only process group in its session. Thus bash will be both its process group’s leader and its session’s leader. We can get the real ps to print out the various identifiers using the -o option.

Here’s the real output from ps showing bash as the process group leader and the session leader.

$ ps -o pid,pgid,sid,tname,command
  PID  PGID   SID TTY      COMMAND
29327 29327 29327 pts/0    -bash

(I removed the output for the ps process itself.) Notice that the PID, PGID, and SID are all the same.

Let’s run a pipeline and look at the output from ps. On one terminal, I’m going to run $ sort | uniq -c | sort -nr. The first sort will sleep waiting for input from the terminal. On a second terminal, I’m going to run ps and use one of the nonstandard options -s to select only the processes that match the SID of the bash process shown above and it’s also printing out the parent process ID (PPID).

$ ps -s 29327 -o pid,ppid,pgid,sid,tname,command
  PID  PPID  PGID   SID TTY      COMMAND
17314 29327 17314 29327 pts/0    sort
17315 29327 17314 29327 pts/0    uniq -c
17316 29327 17314 29327 pts/0    sort -nr
29327 29326 29327 29327 pts/0    -bash

The key things to notice are

  1. bash is the parent process of the other three (i.e., the PPID of the sort and uniq processes is the PID of bash);
  2. All three processes that were created for the pipeline have the same process group ID (PGID), 17314, which is different from the PGID of bash; and
  3. The first sort process is the process group leader of its group (i.e., its PID is equal to its PGID, 17314).
  4. All four processes have the same TTY (which we know they must because they all have the same SID).

I said above that only processes in the foreground process group of a controlling terminal are allowed to read input from the terminal. We can ask ps to display the process group ID of the foreground process group of the controlling terminal of each process, if any. (That’s a mouthful. I mean something like this pseudocode: process.controlling_terminal().foreground_process_group().id().)

$ ps -s 29327 -o pid,ppid,pgid,sid,tgid,tname,command
  PID  PPID  PGID   SID TPGID TTY      COMMAND
17314 29327 17314 29327 17314 pts/0    sort
17315 29327 17314 29327 17314 pts/0    uniq -c
17316 29327 17314 29327 17314 pts/0    sort -nr
29327 29326 29327 29327 17314 pts/0    -bash

The TPGID column holds the process group ID of the active foreground process group for the controlling TTY. All four processes have the same TTY, symbolic name pts/0, and thus all have the same TPGID. Since foreground process group of pts/0 has ID 17314, we can see that it’s only the sort | uniq | sort pipeline which is able to receive input from the terminal. In particular, bash cannot do so at this time.

I had to log in to use a second terminal to run these commands because, as we’ve just discussed, the process group for the pipeline is the only one that can receive input from its controlling terminal right now. If I print out all of the processes belonging to my user (using the nonstandard -u option) rather than just the processes associated with the session we’ve been looking at so far, I get this (I cut off the right side of ps’s output as it’s long and unimportant).

$ ps -u steve -o pid,ppid,pgid,sid,tpgid,tname,command
  PID  PPID  PGID   SID TPGID TTY      COMMAND
17314 29327 17314 29327 17314 pts/0    sort
17315 29327 17314 29327 17314 pts/0    uniq -c
17316 29327 17314 29327 17314 pts/0    sort -nr
18314 29326 18314 18314 18408 pts/5    -bash
18408 18314 18408 18314 18408 pts/5    ps -u steve -o pid,ppid,pgid,
23147     1 22908 22908    -1 ?        sh /usr/users/noquota/faculty
23157 23147 22908 22908    -1 ?        /net/storage/zfs/faculty/stev
23218 23157 22908 22908    -1 ?        /net/storage/zfs/faculty/stev
24880     1 24880 24880    -1 ?        /lib/systemd/systemd --user
24882 24880 24880 24880    -1 ?        (sd-pam)
29326 29306 29306 29306    -1 ?        sshd: steve@pts/0,pts/5
29327 29326 29327 29327 17314 pts/0    -bash
29401 29326 29401 29401    -1 ?        -bash
29477 29401 29401 29401    -1 ?        bash
29529 23157 22908 22908    -1 ?        /net/storage/zfs/faculty/stev
29540 23157 22908 22908    -1 ?        /net/storage/zfs/faculty/stev
29596 29529 22908 22908    -1 ?        /net/storage/zfs/faculty/stev
29604 23218 29604 29604 29604 pts/4    /bin/bash --init-file /net/st
29673 29529 22908 22908    -1 ?        /usr/users/noquota/faculty/st
29723 29673 22908 22908    -1 ?        /net/storage/zfs/faculty/stev

There’s a lot of output here, but some things to notice:

  1. I’m actually connected to 3 different TTYs, pts/0, pts/4, and pts/5. All three of these come from sshing into a Linux server. The processes on pts/4 are actually from Visual Studio Code on my Mac sshing to the server.
  2. Some of these processes don’t have a controlling terminal, it shows ? instead. These processes will never read or write to a terminal. Most of these were started by Visual Studio Code and are things like the rust-analyzer VS Code extension running in the background.

Once the pipeline ends (all of the processes in it have completed), bash will arrange for its process group to once again be the foreground process group for its controlling terminal so that it can print out the prompt and wait for the next line of input.

In the next lab, you will implement the kill command-line utility which can terminate individual processes by PID or entire process groups by PGID. For example, I can kill the entire sort | uniq | sort pipeline’s process group. To distinguish between a PID and a PGID, we use a negative value to indicate the PGID and a positive value to indicate the PID. This command kills every process in process group 17314, namely the pipeline.

$ kill -TERM -17314

For the final time, let’s print out details about the processes in that first session.

$ ps -s 29327 -o pid,ppid,pgid,sid,tgid,tname,command
  PID  PPID  PGID   SID  TGID TTY      COMMAND
29327 29326 29327 29327 29327 pts/0    -bash

And there we go. We’re back to a single process, bash, in a single process group in the session. This process group is the controlling terminal’s foreground process group.

I didn’t show an example, but bash can create background process groups, change up which group is the foreground process group and perform a wide variety of so-called “job control.” There’s a bunch of information in the Bash manual. I almost never avail myself of this functionality, but in specific circumstances, it can be useful.

Hopefully, you now have a better understanding of the somewhat unusual behavior of ps with respect to which processes it prints information about in response to -a/-A/-d/-e. To recap:

  • If none of those options are passed, ps prints out just the processes running in the current terminal. This makes historical sense: a user would typically only be logged in to a single terminal so it makes sense to show only those processes by default.
  • The -a and -d options show all processes except for session leaders (and -a also omits processes without a controlling terminal). Why not show session leaders? My best guess is because this is basically always going to be the shell and you’re not likely to want to see a bunch of bash instances and if you really want to, you can use -A or -e.

If you read this far, I’m impressed! I think understanding the low-level details of how systems work is really cool. This is one of those super important, but not well understood areas of computers. Now, having read this, you’re better informed about how processes and terminals work on Unix-like systems, including macOS and Linux, than almost every single computer programmer.

Submitting (5 points)

To submit your lab, you must commit and push to GitHub before the deadline.

Your repository should contain the following files and directory.

repo
├── Cargo.lock
├── Cargo.toml
├── README
└── src
    ├── bin
    │   ├── ps.rs
    │   ├── runnable.rs
    ├── lib.rs
    └── proc.rs

Any additional files you have added to your repository should be removed from the main branch. (You’re free to make other branches, if you desire, but make sure main contains the version of the code you want graded.)

The README should contain the names of both partners (or just your name if you worked alone…but please don’t work alone if you can manage it).

After you have everything working, make sure you add your code, commit it, and push it to GitHub.

Lab 7: Processes II

Due: 2024-11-04 at 23:59

In this lab, you’re going to continue working on your process library and ps binary from last time.

In this lab, you will learn

  • how to make system calls; and
  • how to call functions in the C standard library to interact with the operating system.

Like Lab 6, this one is going to be specific to Linux.

Preliminaries

Reuse your assignment repository from the previous lab.

Be sure to ask any questions on Ed.

Compiler warnings and errors

Make sure your code compiles and passes clippy without errors or warnings. That is, running

$ cargo clean
$ cargo build
$ cargo clippy

should build your program without errors or warnings.

Formatting

Your code must be formatted by running cargo fmt.

Part 1. Calling C functions (5 points)

We saw last time (and will see a little more this time) how Linux uses the procfs virtual file system (which is usually mounted at /proc) to expose information to user processes.

This is not the only mechanism that operating systems use. Every modern OS works by enforcing a strict separation between user processes (the ones we run) and the operating system kernel. This is essential to prevent, among other things, buggy programs from crashing the whole computer. Unfortunately, it comes at a cost. If a user process wants information from the OS, it cannot simply call a function in the kernel. Instead, it makes a system call.

System calls are a mechanism for requesting the OS do something on your behalf. Some examples of system calls in different application domains:

  • Open a file
  • Read data from a file
  • Write data to a file
  • Get the current time of day
  • Run a new program (more about this in a later lab)
  • Terminate a program (we’re going to do this!)
  • Generate random numbers in a secure fashion
  • Open a network connection to a server (more about this in a later lab)
  • Send or receive data over the network to/from the server
  • Exit the current process

Every program you’ve written, has used at least one system call.

Most of the time, programmers do not issue system calls themselves. The reason for this is that the system call mechanism is not portable, meaning it’s different on every OS. Nevertheless, we can write code like

use std::fs::File;
fn main() -> Result<(), Box<dyn std::error::Error>> {
let mut file = File::open("example.txt")?;
Ok(())
}

and it can be compiled for any (modern) OS and it’ll open the file.

The way this works is that every OS exposes a set of functions in a library which will make the relevant system calls. For example, Apple calls their library libSystem. Even if the underlying system calls themselves change, as part of an update for example, the OS vendor (e.g., Micorsoft or Apple) updates this library to make the updated system calls on the program’s behalf.

Now we hit a bit of a snag. There are a million programming languages. Each language can have wildly different ideas about what a function is. One way to deal with this would be to have a library per programming language. E.g., we could have a libSystem-Python, libSystem-Java, libSystem-Rust, etc. This quickly becomes unworkable.

Instead, the C programming language is the de facto programming language for these libraries. And indeed, the most common name for the library containing functions that make system calls is libc. On an x86-64 Linux system, you’re likely to find the GNU version of libc living at /lib/x86_64-linux-gnu/libc.so.6.

This is really unusual, but you can actually run this libc as if it were a program and it’ll print out some information about the library. Almost no library does this.

$ /lib/x86_64-linux-gnu/libc.so.6
GNU C Library (Ubuntu GLIBC 2.27-3ubuntu1.6) stable release version 2.27.
Copyright (C) 2018 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.
There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A
PARTICULAR PURPOSE.
Compiled by GNU CC version 7.5.0.
libc ABIs: UNIQUE IFUNC
For bug reporting instructions, please see:
<https://bugs.launchpad.net/ubuntu/+source/glibc/+bugs>.

Okay, so if C is the de facto language, how can we make system calls in Rust if the Rust standard library doesn’t already have the functionality we need? Well, we call C functions! Nearly every programming language has some mechanism, often called a Foreign Function Interface (FFI), that lets you call C functions. Rust is no exception.

In Rust, the functions in libc are exposed in the libc crate. Here’s the documentation for x86-64 Linux.

As part of your implementation of ps, you printed out the user id (UID) of the user that is running the process. You’re going to write some code to turn the UID into a String containing the user’s username.

Your Task

You’re going to implement the most bare-bones version of the standard whoami command-line utility that prints out the user name of the user who runs the process. E.g., here’s the standard whoami on my laptop.

$ whoami
steve

Pretty straight forward. Unfortunately, there’s no Rust standard library function to get this information. We’re going to have to write it ourselves!

First, add the libc crate as a dependency by running

$ cargo add libc

in your assignment repository.

Next, create a file src/bin/whoami.rs. By putting the file in bin, $ cargo build will build two different applications and $ cargo run now needs to know which binary to run. Write a simple main function in whoami.rs that prints out a message and exits. You can (build and) run this via

$ cargo run --bin whoami
Hello world!

If you have two or more binaries and you run $ cargo run without specifying which to run, you’ll get an error.

Getting the user name for the user who is running the current process requires two steps:

  1. Ask the OS for the UID of the user
  2. Ask the OS for the username associated with the UID.

The C function we need to call for 1 is geteuid(). If you click that link, you’ll find no real documentation. That’s a shame, but not surprising as the libc crate is generated programatically and it doesn’t include documentation. Fortunately, there are manual pages for C functions. Here’s the man page for geteuid(2). Give it a read. (Note that it is in section 2 of the manual. Section 1 is where user programs like cat are documented. Section 2 is where system calls are documented. Section 3 is where C standard library functions are documented. Run $ man man for a list of the different sections.)

There are two functions documented, getuid() which returns the real UID (we don’t want this), and geteuid() which returns the effective UID (this is the one we want). These are usually the same. The most common way they can differ is if you run a process as the root user (which has UID 0) via the sudo command. In that case, geteuid() will return 0 and getuid() will return the real user. We want the effective one instead:

$ whoami
steve

$ sudo whoami
root

Important

Every function in libc is marked unsafe. This indicates that the function might do something that violates Rust’s safety guarantees and it’s up to the programmer to ensure that it doesn’t. We can only call an unsafe function within an unsafe function or within an unsafe {} block.

Because of this, it is considered best practice to encapsulate anything unsafe behind a safe interface. It’s up to you to perform error handling.

The geteuid() function is unusual in that it cannot fail. There is no return value that indicates an error condition (I know this because I read the man page, “These functions are always successful…”). As a result, we can trivially make a safe wrapper around geteuid by calling libc::geteuid() inside an unsafe {} block and returning the result.

extern crate libc;
type Result<T> = std::result::Result<T, Box<dyn std::error::Error>>;
fn geteuid() -> u32 {
    unsafe { libc::geteuid() }
}

fn main() {
    let uid = geteuid();
    println!("The current effective UID is {uid}");
}

There are a handful of libc functions like getpid(), getppid(), and getpgrp() which just return an numeric identifier and can be handled by just wrapping the call in an unsafe {} block. Most system calls are sadly not this easy to deal with. We’ll need to carefully handle memory and any errors that result.

Speaking of other system calls, you now know how to get the effective UID of the user running the process. Neat!

Turning that into a user name is more complex and can fail.

Create a new file src/user.rs.

Part 2. Looking up users (20 points)

In this part, you’re going to implement a function username() which takes a user ID (UID) and returns a string containing the username.

The libc function you called in Part 1 was very simple: It took no arguments and it could never fail. The libc function you’ll call in this part is much more complicated.

POSIX-compatible operating systems provide functions to get get information about a user. Take a look at the man page for getpwnam_r(3) (this is in section 3 of the manual which is the “Library Functions Manual”). The four functions described all behave similarly: They look up a user in the user database. getpwnam(), and getpwnam_r() look up a user given a user name whereas getpwuid() and getpwuid_r() look up a user given a UID.

The information returned by these functions is a passwd structure. In C, this structure looks like this.

struct passwd {
   char   *pw_name;       /* username */
   char   *pw_passwd;     /* user password */
   uid_t   pw_uid;        /* user ID */
   gid_t   pw_gid;        /* group ID */
   char   *pw_gecos;      /* user information */
   char   *pw_dir;        /* home directory */
   char   *pw_shell;      /* shell program */
};

Here’s the same definition in Rust, from the libc crate.

#![allow(unused)]
fn main() {
type c_char = i8;
type uid_t = u32;
type gid_t = u32;
#[repr(C)]
pub struct passwd {
    pub pw_name: *mut c_char,
    pub pw_passwd: *mut c_char,
    pub pw_uid: uid_t,
    pub pw_gid: gid_t,
    pub pw_gecos: *mut c_char,
    pub pw_dir: *mut c_char,
    pub pw_shell: *mut c_char,
}
}

The interesting types here are the string type here is *mut c_char. This is a raw mutable pointer to a C character (a character in C is just a byte). In fact, it’s a raw mutable pointer to a sequence of nonzero characters followed by a zero character (meaning a byte with value 0, '\0' rather than the digit '0' which has value 48).

Important

A raw pointer in Rust is essentially the same thing as a reference except without the safety guarantees: In particular, a pointer need not point at a valid, initialized object of its type whereas a reference always points to a valid, initialized object.

Just as there are two types of references, &T, and &mut T, there are two types of pointers, *const T and *mut T. A const pointer is used when the thing it is pointing to will not change (similar to &) whereas a mut pointer is used when the thing it is pointing to may be changed (similar to &mut).

It’s always safe to create a *const T from a &T or a *mut T from a &mut T.

Here’s an example.

#![allow(unused)]
fn main() {
let x: i32 = 10;
let x_ptr = &x as *const _;

let mut y: i32 = 20;
let y_ptr = &mut y as *mut _;

println!("Address of x: {x_ptr:?}");
println!("Address of y: {y_ptr:?}");
}

If you click the run button, you’ll see that addresses are just numbers. If you think of your computer’s memory as one giant array of bytes ranging from index 0 up to some n, then a pointer to some value in memory is just the index of the first byte of the value in that big array of bytes. (This isn’t completely true for several reasons. Nevertheless, if you think of a pointer as just the index of the first byte of your value, your mental model of how pointers work will be 95% correct.)

The operation that isn’t safe is dereferencing the pointer. Here’s an example:

#![allow(unused)]
fn main() {
let mut x: i32 = 10;
let x_ref = &mut x;

let y = *x_ref; // Dereferencing the reference, always safe. Creates y with value 10.
*x_ref = 20; // Dereferencing the reference, always safe. Assigns 20 to x.

let x_ptr = x_ref as *mut _;

// Dereferencing the pointer is safe in this case because it points to
// a valid, initialized `i32`. But dereferencing in general is unsafe because
// the pointer need not be valid.
// Creates z with value 20.
let z = unsafe { *x_ptr };
// Again, dereferencing is safe in this case, but not in general.
// Assigns 40 to x.
unsafe {
    *x_ptr = 40;
}
println!("x = {x}\ny = {y}\nz = {z}");
}

Click the Run button.

The getpwuid_r() function is quite complicated. The Rust binding to the C function is the following.

pub unsafe extern "C" fn getpwuid_r(
    uid: uid_t,
    pwd: *mut passwd,
    buf: *mut c_char,
    buflen: size_t,
    result: *mut *mut passwd
) -> c_int

The parameters match the parameters for the C function and are as follows.

  • uid — The user ID of the user to look up;
  • pwd — A (mutable) pointer to a passwd struct that will be filled out by getpwuid_r();
  • buf - A buffer to hold the contents of string data (see below);
  • buflen — The length of the buf buffer; and
  • result — On success, *result will set to pwd and on failure to std::ptr::null_mut(), a null pointer.

The passwd structure contains pointers to string data. The getpwuid_r() function needs to put that string data somewhere. It uses the provided buf as a place to store the strings.

Your task

Your task is to create a new user module and implement a fn username(uid: u32) -> String function that calls getpwuid_r() to retrieve the user name of the user corresponding to uid.

Start by creating a new file src/user.rs and add the user module to lib.rs with

pub mod user;

Then add

use std::ffi::CStr;
use std::ptr;
use libc;

pub fn username(uid: u32) -> String {
    unimplemented!()
}

to user.rs.

It is time to implement username(). Implementing it is a little tricky. You’re going to need to

  1. Create a passwd local variable with all of the fields set to either 0 or null
    let mut pwd = libc::passwd {
        pw_name: ptr::null_mut(),
        // fill out the rest of the fields
    };
  2. Create a Vec<libc::c_char> for the buf argument
    let mut buf: Vec<libc::c_char> = vec![0; 1024];
  3. Create a pointer to a libc::passwd named result that will be set by getpwuid_r() to the address of pwd on success.
    let mut result: *mut libc::passwd = ptr::null_mut();
  4. Call getpwuid_r(), passing the appropriate arguments.
    let ret = unsafe {
        libc::getpwuid_r(
            uid,
            &mut pwd as *mut _,
            buf.as_mut_ptr(),
            buf.len(),
            &mut result as *mut _,
        )
    };
  5. If ret is libc::ERANGE, then the provided buffer wasn’t large enough to hold the string data. Resize buf (something like buf.resize(buf.len() * 2)) and call getpwuid_r() again. Steps 4 and 5 should be in a loop and and you should break out of the loop if ret is any other value.
  6. If ret is any nonzero value (other than libc::ERANGE which you handled in step 5) or if result.is_null(), then either the getpwuid_r() call failed or there was no user with that UID. In either case, convert the UID to a String (uid.to_string()) and return it.
  7. At this point, everything has succeeded and we just need to convert from a C string into a Rust String and return it. For this, we can use a &CStr reference which behaves like a &str for a C string.
     // UNSAFE: This is safe because ret is 0 and result is not null and
     // therefore pwd was filled out. Additionally, pwd.pw_name is a valid,
     // NUL-terminated C-string.
     let username: &CStr = unsafe {
         CStr::from_ptr(pwd.pw_name)
     };
    
     // Convert CStr into a String and return it.
     username.to_string_lossy().into_owned()

Finally, modify the main() function in whoami.rs to call user::username(uid) with the UID that results from libc::geteuid(). You’ll want to add use process::user; to whoami.rs to be able to call functions in the user module.

If all has worked correctly, running cargo run --bin whoami should print out your username.

$ cargo run --quiet --bin whoami
steve

Part 3. Adding usernames to ps (10 points)

Now that user::username() has been implemented, modify your ps to use user::username().

Your task

Modify your implementation of ps to print out a left-aligned column with the username when the -f option is passed and a right-aligned column with the UID when -l but not -f is passed.

$ cargo run --bin ps --quiet -- -f
UID               PID       PPID TTY             TIME CMD
steve         1519068    1518177 34816              0 /bin/bash --init-file /zfs/faculty/steve/.vscode-server/bin/f1b07bd25dfad64b0167beb15359ae573aecd2cc/out/vs/workbench/contrib/terminal/browser/media/shellIntegration-bash.sh
steve         1521087    1519068 34816              0 target/debug/ps -f

$ cargo run --bin ps --quiet -- -l
S        UID        PID       PPID TTY             TIME CMD
S 1425750506    1519068    1518177 34816              0 bash
R 1425750506    1521098    1519068 34816              0 ps

$ cargo run --bin ps --quiet -- -fl
S UID               PID       PPID TTY             TIME CMD
S steve         1521440    1518177 34819              0 /bin/bash --init-file /zfs/faculty/steve/.vscode-server/bin/f1b07bd25dfad64b0167beb15359ae573aecd2cc/out/vs/workbench/contrib/terminal/browser/media/shellIntegration-bash.sh
R steve         1521657    1521440 34819              0 target/debug/ps -fl

Part 4. Execution times (15 points)

You’re nearly finished with ps. Two tasks remain. The first is making the execution time of a process print out in the [DD-]hh:mm:ss format. Meaning that if the process has been running for less than a day, the hours, minutes, and seconds are printed out (two digits each) and if the process has been running for longer than a day, then the number of days are printed out followed by a dash and then hours, minutes, and seconds. For example, 00:05:23 for a process that has been running for 5 minutes and 23 seconds and 03-15:59:21 for a process that has been running for 3 days, 15 hours, 59 minutes, and 21 seconds.

The process for computing the execution time has two parts: First, you need to get the execution time in seconds and second, you need to convert that into days, hours, minutes, and seconds. The second part is standard. The first part isn’t difficult, but isn’t obvious.

Linux keeps track of time in terms of ticks. There are some number of ticks per second which can vary by system. You will have to ask the OS how many ticks there are per second. Fortunately, there’s a libc function that will do that for us! Here’s a function you can use which returns the number of ticks per second.

/// Returns the number of clock ticks per second.
fn ticks_per_second() -> Result<i64> {
    let ret = unsafe { libc::sysconf(libc::_SC_CLK_TCK) };

    if ret == -1 {
        return Err(io::Error::last_os_error().into());
    }
    Ok(ret)
}

Your task

In your ps’s run(), call ticks_per_second(). To compute the execution time of a process in seconds, use the execution_time field of the Process struct and divide it by the result of ticks_per_second().

Convert this time to days, hours, minutes, and seconds, and when printing the execution time, use the appropriate format.

Part 5. Controlling terminals (40 points)

The final thing to add to ps is printing a human-readable description of the controlling terminal (the TTY column).

In this part, you’re going to create a new DeviceMap struct which you will use to convert the “controlling terminal” integer to a string like pts/3.

The controlling terminal integer has two parts, a minor number that’s in the range [0, 255] and a major number. The integer is computed as major * 256 + minor. The major number identifies the type of “character device.”

Linux exposes the mapping between major number and device name in the file /proc/devices. Go ahead and take a look at that file to see how it is structured.

Your task

Create a new file src/device_map.rs (and add a module declaration in lib.rs) and define a new DeviceMap struct. This struct should contain a std::collections::HashMap<i32, String> field. (You’ll probably want to use std::collections::HashMap.) Don’t forget to make DeviceMap public.

You’re going to implement a public constructor named new() and a public lookup() method.

#![allow(unused)]
fn main() {
type Result<T> = std::result::Result<T, Box<dyn std::error::Error>>;
struct DeviceMap;
impl DeviceMap {
    pub fn new() -> Result<Self> {
        todo!()
    }

    pub fn lookup(&self, tty_nr: i32) -> String {
        let major = tty_nr >> 8;
        let minor = tty_nr & 0xFF;
        todo!()
    }
}
}

(You will need to use super::Result; to have access to your Result type).

The new() function should open /proc/devices and read lines corresponding to the Character devices section of the file and insert the names of the devices into the hash map using the integer value as the key. See the HashMap documentation for the methods you can use.

Tip

To simplify your implementation, you may want to use this code

let devices = std::fs::read_to_string("/proc/devices")
    .map_err(|err| format!("/proc/devices: {err}"))?;

for line in devices
    .lines()
    .skip_while(|&line| line != "Character devices:")
    .skip(1)
    .take_while(|&line| !line.is_empty())
{
    todo!("Split the line on whitespace and insert into the map")
}

This reads /proc/devices into a String named devices. Then it iterates over the lines by (1) skipping lines until Character devices: is found; (2) skipping that line; and (3) only returning the lines that are nonempty.

There are duplicate numbers in /proc/devices. That’s okay, just insert them into the hash map one at a time. The earlier values will be replaced by the later values.

What remains is to finish the implementation of lookup(). The provided code above extracts the major and minor number. Look up the major number in the hash map. If it’s not found, return the string "?". If it is found, then return format!("{device}/{minor}") (where device is the returned value from the hash map).

The final step is to update ps to create a DeviceMap in main() and when printing the TTY column, call the lookup() method to get the name of the TTY.

Here’s some sample output as compared to the real ps.

$ cargo run --bin ps --quiet
       PID TTY             TIME CMD
   1521440 pts/3       00:00:00 bash
   1523501 pts/3       00:00:00 ps

$ ps
    PID TTY          TIME CMD
1521440 pts/3    00:00:00 bash
1523768 pts/3    00:00:00 ps

Submitting (5 points)

To submit your lab, you must commit and push to GitHub before the deadline.

Your repository should contain the following files and directory.

repo
├── Cargo.lock
├── Cargo.toml
├── README
└── src
    ├── bin
    │   ├── ps.rs
    │   ├── runnable.rs
    │   └── whoami.rs
    ├── device_map.rs
    ├── lib.rs
    ├── proc.rs
    └── user.rs

Any additional files you have added to your repository should be removed from the main branch. (You’re free to make other branches, if you desire, but make sure main contains the version of the code you want graded.)

The README should contain the names of both partners (or just your name if you worked alone…but please don’t work alone if you can manage it).

After you have everything working, make sure you add your code, commit it, and push it to GitHub.

Lab 8: Osh I

In this lab and the next, you’re going to implement a simple shell named Osh. The shell will be able to

  1. Run simple commands like wc -l some/file; and
  2. Run pipelines containing an arbitrary number of commands.

You will learn

  • one way to parse input;
  • how to model a pipeline; and
  • how to run programs and wait for them to complete.

Preliminaries

First, find a partner. You’re allowed to work by yourself, but I highly recommend working with a partner. Click on the assignment link. One partner should create a new team. The team name cannot be the same name used for a previous lab. The second partner should click the link and choose the appropriate team. (Please don’t choose the wrong team, there’s a maximum of two people and if you join the wrong one, you’ll prevent the correct person from joining.) You cannot choose a team name you’ve used previously.

Once you have accepted the assignment and created/joined a team, you can clone the repository and begin working.

Be sure to ask any questions on Ed.

Compiler warnings and errors

Make sure your code compiles and passes clippy without errors or warnings. That is, running

$ cargo clean
$ cargo build
$ cargo clippy

should build your program without errors or warnings.

Formatting

Your code must be formatted by running cargo fmt.

Part 1. Parsing input (40 points)

At a high level, your shell will (1) read a line of input from stdin; (2) parse the input into tokens; (3) construct a pipeline from the tokens; and (4) run the whole pipeline.

In this part, you will parse input into a sequence of tokens.

Initially, a token will be either a pipe character | or a word. For example, the input echo 'Why hello there!'| wc -l -c consists of the following tokens.

[Word("echo"), Word("Why    hello there!"), Pipe, Word("wc"), Word("-l"), Word("-c")]

Notice how the | turns into a pipe token rather than a word token. In contrast, if the pipe is quoted, it is a word as in this example echo '|'.

[Word("echo"), Word("|")]

This is a perfect use case for an enum.

Your task

Create a new binary project in the assignment repo using

$ cargo init --bin --vcs git --name osh .

Define the Result type in main.rs the same way you did for the past few labs.

Create a new file src/input.rs and add the input module to main.rs.

Inside input.rs, add a use line for Result.

use crate::Result;

Create a new, public enum called InputToken which you will use to represent either a word or a pipe.

Create a new function

#![allow(unused)]
fn main() {
type Result<T> = std::result::Result<T, Box<dyn std::error::Error>>;
enum InputToken { Word(String), Pipe };
pub fn split_input(input: &str) -> Result<Vec<InputToken>> {
    todo!()
}
}

which on success will return a Vec of InputTokens.

To parse the input into tokens, you’ll want to iterate over the input string one character at a time which you can do using the .chars() method which returns an iterator over characters.

What makes this tricky is that how you handle any given character depends on what other characters have been read. For example, a space character can separate two words or it could be part of a quoted word. Similarly, a pipe character might be part of a word (if it is quoted) or it might be a separator in the pipeline (see the two examples at the top of this page).

Your split_input() function should support unquoted words, single quoted words, and double quoted words or a mix. Here’s an example input line

echo unquoted_word 'single "quoted" word' "double 'quoted' word"  'mixing different 'quoting" styles"

and its parsing

[Word("echo"), Word("unquoted_word"), Word("single \"quoted\" word"), Word("double 'quoted' word"), Word("mixing different quoting styles")]

Note that the unquoted quotation marks are removed from each word. (The single quoted word containing a double quoted inner word is shown with a backslash before the inner quotes. This is an artifact of the Debug representation, not actual characters.)

Hint

To parse the input, I recommend creating a state machine which examines its input one character at a time. A state machine works by maintaining a current state and reading the input in a loop. For each character read, use the state to determine what to do with the character and what state to move to next.

My implementation used the following four states and a mutable State variable which is initially State::Blank and represents the current state of parsing.

#![allow(unused)]
fn main() {
#[derive(Clone, Copy, Debug, PartialEq, Eq)]
enum State {
    Blank,
    Unquoted,
    SingleQuoted,
    DoubleQuoted,
}
let mut state = State::Blank;
}

The Blank state indicates that the state machine is not currently parsing a word. Initially, the current state is Blank, but after reading an unquoted white space character (a blank, if you will), the state will change back to Blank.

The other three states indicate that the state machine is in the middle of parsing an unquoted word, a single quoted word, or a double quoted word.

This is a graphical representation of the state machine.

split_tokens() state machine

Each circle represents one of the four states. The arrows indicate how the state machine’s current state changes between them. For example, starting in the Blank state parsing the input foo 'bar', the state changes to Unquoted upon reading the f. It stays in Unquoted until the space when it transitions back to Blank. Upon reading the ', it transitions to SingleQuoted where it stays until it reaches the last ' which causes it to transition to Unquoted.

Note that in state SingleQuoted, a single quote character causes a transition to state Unquoted. The DoubleQuoted state behaves similarly with a double quote. It doesn’t transition to the Blank state because a closing quote doesn’t end the word. For example, the input foo' 'bar is the same as 'foo bar': both of them produce a Word("foo bar") token.

The bulk of the state machine is implemented in a loop containing a match.

for ch in input.chars() {
    match (state, ch) {
        _ -> todo!()
    }
}

Rust will ensure that we add cases for every possible state and input character combination.

For example, here’s how my solution handles characters in the Blank state.

let mut current_word = String::new();
let mut result = Vec::new();
for ch in input.chars() {
    match (state, ch) {
        (State::Blank, '|') => result.push(InputToken::Pipe),

        (State::Blank, _) if ch.is_ascii_whitespace() => (),
        (State::Blank, '\'') => state = State::SingleQuoted,
        (State::Blank, '"') => state = State::DoubleQuoted,
        (State::Blank, _) => {
            current_word.push(ch);
            state = State::Unquoted;
        }
        // Handle the other states
    }
}

In the Blank state, if a | is read, it pushes an InputToken::Pipe token into a result vector.

If the character is ASCII whitespace, then it’s ignored. Notice that we can add conditions to the match patterns.

match (state, ch) {
    (State::Blank, _) if ch.is_ascii_whitespace() => (),
}

This will match if state is State::Blank and ch.is_ascii_whitespace() returns true. (The _ wildcard will match any character.)

If the character is a single or double quote, the state changes to the appropriate state.

In every other case, the character is pushed onto the current_word and changes the state to Unquoted.

The Unquoted state is similar to Blank except that a | or ASCII whitespace ends the current_word (so it should be pushed onto result). If it’s a |, then an InputToken::Pipe should also be pushed. The state should be set to Blank.

The SingleQuoted and DoubleQuoted are similar to each other. They should keep collecting characters in the word until the matching single or double quote character is read in which case the state should be changed to Unquoted.

After all characters have been read, if the state is Unquoted, the current word should be pushed onto the result vector. If the state is one of the quoted states, return an error. Otherwise, return the vector of tokens.

For a complete example, see below.

Implement a main() function in main.rs that reads a line of input from stdin, removes the newline, calls split_input() on it, and prints the debug representation.

Write some unit tests in input.rs to test your code. Here are two sample unit tests to get you started.

#[cfg(test)]
mod test {
    use super::*;

    #[test]
    fn simple_pipeline() {
        let input = "echo hello | wc -c";
        let tokens = split_input(input).unwrap();
        assert_eq!(
            tokens,
            vec![
                InputToken::Word("echo".to_string()),
                InputToken::Word("hello".to_string()),
                InputToken::Pipe,
                InputToken::Word("wc".to_string()),
                InputToken::Word("-c".to_string()),
            ]
        );
    }

    #[test]
    fn mixed_quotes() {
        // This is a "raw" string. It has delimiters r#" and "#.
        // Note that we can use " inside of it.
        let input = r#"'mixing different 'quoting" styles""#;
        let tokens = split_input(input).unwrap();
        assert_eq!(
            tokens,
            vec![
                InputToken::Word("mixing different quoting styles".to_string()),
            ]
        );
    }
}

Remember that you can add additional unit tests by adding functions to the test module. Each function that you want to run as a test needs to be annotated with #[test]. Note the use of raw strings in the second test.

To run all unit tests, run $ cargo test.

Example parsing

The table below shows how current_word and result change as a result of parsing the input echo 'one two' three" four"| tr a-z A-Z.

The first (body) row of the table indicates that the state machine starts in the Blank state and the first character is e. This causes the e to be pushed into current_word and result remains empty. The state changed to the Unquoted state which you can see in the second row.

The second row indicates that the state machine is in the Unquoted state and the next letter is c. This causes the c to be pushed into current_word and result remains empty. The state remains Unquoted.

The final row of the table shows that after reading all of the input characters, the state machine is in the Unquoted state and the current_word from the row above has been pushed into the result vector.

Statechcurrent_wordresult
Blank'e'"e"[]
Unquoted'c'"ec"[]
Unquoted'h'"ech"[]
Unquoted'o'"echo"[]
Unquoted' '""[Word("echo")]
Blank'\''""[Word("echo")]
SingleQuoted'o'"o"[Word("echo")]
SingleQuoted'n'"on"[Word("echo")]
SingleQuoted'e'"one"[Word("echo")]
SingleQuoted' '"one "[Word("echo")]
SingleQuoted't'"one t"[Word("echo")]
SingleQuoted'w'"one tw"[Word("echo")]
SingleQuoted'o'"one two"[Word("echo")]
SingleQuoted'\''"one two"[Word("echo")]
Unquoted' '""[Word("echo"), Word("one two")]
Blank't'"t"[Word("echo"), Word("one two")]
Unquoted'h'"th"[Word("echo"), Word("one two")]
Unquoted'r'"thr"[Word("echo"), Word("one two")]
Unquoted'e'"thre"[Word("echo"), Word("one two")]
Unquoted'e'"three"[Word("echo"), Word("one two")]
Unquoted'"'"three"[Word("echo"), Word("one two")]
DoubleQuoted' '"three "[Word("echo"), Word("one two")]
DoubleQuoted'f'"three f"[Word("echo"), Word("one two")]
DoubleQuoted'o'"three fo"[Word("echo"), Word("one two")]
DoubleQuoted'u'"three fou"[Word("echo"), Word("one two")]
DoubleQuoted'r'"three four"[Word("echo"), Word("one two")]
DoubleQuoted'"'"three four"[Word("echo"), Word("one two")]
Unquoted' '""[Word("echo"), Word("one two"), Word("three four")]
Blank'|'""[Word("echo"), Word("one two"), Word("three four"), Pipe]
Blank' '""[Word("echo"), Word("one two"), Word("three four"), Pipe]
Blank't'"t"[Word("echo"), Word("one two"), Word("three four"), Pipe]
Unquoted'r'"tr"[Word("echo"), Word("one two"), Word("three four"), Pipe]
Unquoted' '""[Word("echo"), Word("one two"), Word("three four"), Pipe, Word("tr")]
Blank'a'"a"[Word("echo"), Word("one two"), Word("three four"), Pipe, Word("tr")]
Unquoted'-'"a-"[Word("echo"), Word("one two"), Word("three four"), Pipe, Word("tr")]
Unquoted'z'"a-z"[Word("echo"), Word("one two"), Word("three four"), Pipe, Word("tr")]
Unquoted' '""[Word("echo"), Word("one two"), Word("three four"), Pipe, Word("tr"), Word("a-z")]
Blank'A'"A"[Word("echo"), Word("one two"), Word("three four"), Pipe, Word("tr"), Word("a-z")]
Unquoted'-'"A-"[Word("echo"), Word("one two"), Word("three four"), Pipe, Word("tr"), Word("a-z")]
Unquoted'Z'"A-Z"[Word("echo"), Word("one two"), Word("three four"), Pipe, Word("tr"), Word("a-z")]
Unquoted[Word("echo"), Word("one two"), Word("three four"), Pipe, Word("tr"), Word("a-z"), Word("A-Z")]

Once you finish the rest of the lab, this input should print out ONE TWO THREE FOUR

Part 2. Building a pipeline (25 points)

Now that your split_input() function takes a string and returns a vector of InputTokens, the second step is to build a pipeline. To do this, you’re going to create a SimpleCommand struct and a Pipeline struct.

The Pipeline struct will contain a vector of SimpleCommands to run.

At a high level, the Pipeline constructor,

#![allow(unused)]
fn main() {
type Result<T> = std::result::Result<T, Box<dyn std::error::Error>>;
enum InputToken { Word(String), Pipe }
struct Pipeline;
impl Pipeline {
    pub fn new(tokens: &[InputToken]) -> Result<Self> { todo!() }
}
}

takes a slice of InputTokens and constructs a Pipeline by splitting the tokens on InputToken::Pipe. Then, for each sequence of tokens between the InputToken::Pipe, new() will construct a SimpleCommand.

Example

For example, the command cat Cargo.toml | wc -l is split into the following tokens.

[Word("cat"), Word("Cargo.toml"), Pipe, Word("wc"), Word("-l")]

When these tokens are passed to Pipeline::new(tokens), it constructs the following Pipeline.

Pipeline {
    commands: [
        SimpleCommand {
            path: "cat",
            args: [
                "Cargo.toml",
            ],
        },
        SimpleCommand {
            path: "wc",
            args: [
                "-l",
            ],
        },
    ],
}

Your task

Create a new pipeline module by creating a pipeline.rs file and adding the appropriate mod line to main.rs.

Create a SimpleCommand struct that holds a path (or command name) of the program to execute and a vector of String arguments. (In the next lab, we’ll expand this struct.)

Next, create a public Pipeline struct containing a vector of SimpleCommands.

Implement the Pipeline::new() constructor shown above.

Tip

We can use the .split() method on a slice to split it on InputToken::Pipe. The result is an iterator over sub-slices.

for command_tokens in tokens.split(|token| token == &InputToken::Pipe) {
    let mut words: Vec<String> = Vec::new();

    for token in command_tokens {
        todo!("Push each word into words");
    }
    if words.is_empty() {
        return Err("Command missing".into());
    }
    todo!("Create a new SimpleCommand from the words");
}

words.remove(0) will remove and return the first word from words. You can use this to create the SimpleCommand.

Update your main() function to print the debug representation of the Pipeline that results from the input tokens you parsed in the previous part.

Create some unit tests to test the construction of a Pipeline. Here’s one to get you started using the example above.

#[cfg(test)]
mod test {
    use super::*;

    #[test]
    fn simple_pipeline() {
        let tokens = [
            InputToken::Word("cat".to_string()),
            InputToken::Word("Cargo.toml".to_string()),
            InputToken::Pipe,
            InputToken::Word("wc".to_string()),
            InputToken::Word("-l".to_string()),
        ];
        let pipeline = Pipeline::new(&tokens).expect("Failed to create a Pipeline");
        assert_eq!(
            pipeline,
            Pipeline {
                commands: vec![
                    SimpleCommand {
                        path: "cat".to_string(),
                        args: vec!["Cargo.toml".to_string()]
                    },
                    SimpleCommand {
                        path: "wc".to_string(),
                        args: vec!["-l".to_string()]
                    },
                ]
            }
        );
    }
}

Part 3. Running a pipeline (25 points)

In the previous parts, you parsed input into tokens and then the tokens into a representation of a pipeline. In this part, you’re going to implement a run() method which will create and run the processes in the pipeline.

To create a new process, you’re going to use Rust’s standard std::process::Command. It’s worth reading the documentation now.

Example

#![allow(unused)]
fn main() {
use std::process::{Command, Stdio};

let arguments = vec!["-l", "/usr"];
let child = Command::new("ls")
    .args(arguments)
    .stdin(Stdio::null())
    .stdout(Stdio::inherit())
    .stderr(Stdio::inherit())
    .spawn();

match child {
    Ok(mut child) => {
        // Wait for the child to exit.
        child.wait().unwrap();
    }
    Err(err) => {
        eprintln!("Error: {err}");
    }
}
}

(Click Run to see the output.)

There are several things to notice in this example. First, the style of code with the Command struct is called the “builder pattern” because Command is only used for configuring the command to run. Each of the methods you call on a Command return a &mut Self, meaning you can chain together methods as shown in the example. The final method, .spawn() consumes the Command and returns a Child (wrapped in a Result, of course).

Second, you configure the arguments by calling .args() and passing it a vector of arguments. (Cleverly, it can handle arrays, vectors, and slices of &str or String.)

Third, you configure the behavior of stdin, stdout, and stderr by passing the corresponding methods an instance of Stdio. You’ll see next lab how to redirect those to/from files. For now, you’ll want to use Stdio::inherit() to mean use the stdin/stdout/stderr of the current process, Stdio::piped() to mean create a pipe (see the next example), and Stdio::null() to mean connect it to /dev/null.

Fourth, The .spawn() method returns a Child. To wait for the child to exit, use the .wait() method. The .wait() method returns a Result, but the error condition shouldn’t happen since .spawn() just spawned a new process so the example just unwraps the result.

The previous example shows how to spawn one process. To spawn two processes and have the second process’s stdin be the first process’s stdout, you’ll need to (1) spawn the first process; (2) extract the ChildStdout from the Child returned by .spawn(); (3) convert that to a Stdio; (4) spawn the second process using the newly converted Stdio as stdin; and (5) wait for both children.

Example

#![allow(unused)]
fn main() {
use std::process::{Command, Stdio};

let mut children = Vec::new();

// 1. Spawn the first child.
let arguments = vec!["-l", "/usr"];
let first_child = Command::new("ls")
    .args(arguments)
    .stdin(Stdio::inherit())
    .stdout(Stdio::piped())
    .stderr(Stdio::inherit())
    .spawn();

let first_child_stdout: Stdio = match first_child {
    Ok(mut child) => {
        let stdout = child.stdout
            .take() // 2. Extract the ChildStdout.
            .map_or(Stdio::null(), Stdio::from); // 3. Convert it to a Stdio.
        children.push(child);
        stdout
    }
    Err(err) => {
        eprintln!("Error: {err}");
        Stdio::null()
    }
};

// 4. Spawn the second child.
let second_child = Command::new("wc")
    .stdin(first_child_stdout)
    .stdout(Stdio::inherit())
    .stderr(Stdio::inherit())
    .spawn();

match second_child {
    Ok(mut child) => {
        children.push(child)
    }
    Err(err) => {
        eprintln!("Error: {err}");
    }
}

// 5. Wait for all of the children that were created successfully.
for mut child in children {
    child.wait().unwrap();
}
}

One tricky portion of this is

        let stdout = child.stdout
            .take() // 2. Extract the ChildStdout.
            .map_or(Stdio::null(), Stdio::from); // 3. Convert it to a Stdio.

child.stdout is an Option<ChildStdout>. The .take() method for an Option takes the value from self if it’s a Some, replacing it with None and returns a new Option. Here’s some example code showing this behavior. (Click Run.)

#![allow(unused)]
fn main() {
let mut x: Option<i32> = Some(10);
let taken_from_x = x.take();

let mut y: Option<i32> = None;
let taken_from_y = y.take();
println!("{x:?}  {taken_from_x:?}");
println!("{y:?}  {taken_from_y:?}");
}

The .map_or(default_value, function) method for an Option will return default_value if the Option is None, otherwise, it unwraps the Option and calls function on the unwrapped value. In this case, if child.stdout is None, stdout will be set to Stdio::null() and if child.stdout is Some(child_stdout), then stdout will be set to Stdio::from(child_stdout).

Your task

Implement the run() method.

#![allow(unused)]
fn main() {
type Result<T> = std::result::Result<T, Box<dyn std::error::Error>>;
struct Pipeline;
impl Pipeline {
    pub fn run(self) -> Result<()> { todo!() }
}
}

Your code should spawn processes in a loop, connecting the stdout of each process to stdin of the next process. I suggest starting with something like

let mut last_stdout = Stdio::inherit();

and using it as the argument to .stdin() for each process. After spawning a process, extract the ChildStdout and convert it to a Stdio as shown in the example and assign it to last_stdout.

Each child that is successfully spawned should be added to a vector of Child. Any errors spawning a child should be printed to stderr, but the rest of the pipeline should be spawned. After all children have attempted to be spawned, you should wait for all of them to complete. See the example for details.

Modify your main() function to run the Pipeline you’ve created.

Part 4. Making a shell (15 points)

Congratulations, you’ve completed the difficult part! All that remains is to turn your code into a proper shell.

Here’s a short sample session with my implementation.

Welcome to the Oberlin Shell!
$ whoami
steve
$ date
Wed Oct 18 13:51:00 EDT 2023
$ echo "I can run" 'commands with' multiple' 'arguments | tr a-z A-Z    
I CAN RUN COMMANDS WITH MULTIPLE ARGUMENTS
$ ^D

To exit the shell, I pressed Ctrl-D (which it printed as ^D). This causes stdin to be closed.

Your task

Modify your main() function to print out a welcome message and then read lines of input in a loop, parse their input, build a pipeline, and run the pipeline. Any error should be printed out but the shell should not exit.

If the user enters a line containing only whitespace, the line should be ignored and the prompt printed again.

I found it helpful to separate out the logic of reading a line, parsing it, and running it into a separate function which returned a Result<()> and then in main(), I have a loop that calls the function and if it returns an error, print the error out.

Tip

You can determine if stdin is closed when you read from it because it will return Ok(0). Here’s some code that reads from stdin and and exits the program if stdin is closed. this.

let input_length = io::stdin().read_line(&mut line)?;

if input_length == 0 {
    println!("");
    std::process::exit(0);
}

Submitting (5 points)

To submit your lab, you must commit and push to GitHub before the deadline.

Your repository should contain the following files and directory.

repo
├── Cargo.lock
├── Cargo.toml
├── README
└── src
    ├── input.rs
    ├── main.rs
    └── pipeline.rs

Any additional files you have added to your repository should be removed from the main branch. (You’re free to make other branches, if you desire, but make sure main contains the version of the code you want graded.)

The README should contain the names of both partners (or just your name if you worked alone…but please don’t work alone if you can manage it).

After you have everything working, make sure you add your code, commit it, and push it to GitHub.

Lab 9: Osh II

In this lab, you’re going to finish writing your shell. After this lab, your shell will

  1. support file redirection (e.g., $ echo foo >output.txt); and
  2. handle the interrupt (Ctrl-C) and quit (Ctrl-\) signals.

You will learn

  • how to launch processes with stdin/stdout/stderr connected to files;
  • how to open files with more complex open options; and
  • how to install signal handlers and respond to signals.

Preliminaries

Continue working with your existing repository.

Be sure to ask any questions on Ed.

Compiler warnings and errors

Make sure your code compiles and passes clippy without errors or warnings. That is, running

$ cargo clean
$ cargo build
$ cargo clippy

should build your program without errors or warnings.

Formatting

Your code must be formatted by running cargo fmt.

Part 1. Parsing redirections (60 points)

In this part, you’re going to start implementing redirections. Implementing redirections has two parts: parsing the input containing redirections; and (2) performing the redirection when the pipeline is run. In Part 2, you will implement performing the redirections. So let’s get started with parsing!

Recall that a simple command consists of words (the command and the options and arguments) and redirections interspersed. For example, the following commands (even the ridiculous-looking final one) are all equivalent.

  • $ wc -l <input.txt >output.txt
  • $ wc <input.txt -l >output.txt
  • $ wc <input.txt >output.txt -l
  • $ <input.txt >output.txt wc -l All of them run wc -l with stdin connected to input.txt and stdout connected to output.txt.

You’re going to implement support for redirecting input from files in the following cases

  • n>path where n is a digit. This creates a file at path (truncating it if it exists) and connects file descriptor n to the created file.
  • n>>path where n is a digit. This opens a file at path (creating if necessary) in append mode and connects file descriptor n to the opened file.
  • n<path where n is a digit. This opens the file at path (it’s an error if the file doesn’t exist) and connects file descriptor n to the opened file.
  • >path. The same as 1>path.
  • >>path. The same as 1>>path.
  • <path. The same as 0<path.

Your task

There are three basic types of redirections: redirecting input, redirecting output, and redirecting output by appending.

First, add some new variants to your InputToken enum to handle the three redirection cases. Note that in each case, a redirection needs two pieces of data: the file descriptor number (a u32), and a path (a String). For example, one of my variants is

enum InputToken {
    // ...
    RedirectInput(u32, String),
    // ...
}

The difficult part of parsing the input is modifying the state machine to parse the new redirections.

Note

Note that it isn’t sufficient leave the current parsing mechanism in split_tokens() alone and simply examine each of the Word tokens to see if it has the right form. Consider echo '>/dev/null' and echo >/dev/null. If you correctly implemented the input parsing in the previous lab, then the output of parsing these should be identical, e.g., [Word("echo"), Word(">/dev/null")]. The problem is these two commands do not behave the same way. The first prints out >/dev/null whereas the second connects echo’s stdout to /dev/null and echo is run with no arguments.

Recall that the state machine described in the hint in the previous lab has four states. To support redirections, you will need some more states.

Hint

There are many ways to implement this. Here’s one way.

Since redirections consist of an optional digit, a redirection operator, <, >, or >>, and a path, it would be nice to reuse the code you’ve already written to parse Words to parse the paths. You can do this by maintaining additional state about what type of redirection is currently being parsed.

#![allow(unused)]
fn main() {
enum RedirectionState {
    None,
    Input(u32),
    Output(u32),
    Append(u32),
}
}

The None variant is when a normal word is being parsed. The u32 keep track of which file descriptor will be redirected.

In addition, you will need to be able to distinguish input 5>path from 5x. Since the state machine looks at the input one character at a time, you’ll need to modify State to have an additional variant, Digit which represents the case where a digit was read in the Blank state.

#![allow(unused)]
fn main() {
enum State {
    Blank,
    Unquoted,
    SingleQuoted,
    DoubleQuoted,
    Digit(u32),
}
}

Here’s an updated state machine.

split_tokens() state machine

What isn’t represented is how the RedirectionState changes.

The intended behavior is in the Blank state, if the next input character is a digit, transition to state Digit(num) where num is the number.

In the Digit(num) state, if the next input character is < or >, set the redirection state to RedirectionState::Input(num) or RedirectionState::Output(num) and change to the Unquoted state.

In the Digit(num) state, if the next input character is a quotation mark, write num into the current word and transition to the appropriate quoted state. This is handling input like 3"foo"

Otherwise, in the Digit(num) state, if the next input character is anything else, write num into the current word and transition to the Unquoted state. This handles input like 4bar.

The transitions from the Blank and Unquoted states need to be updated to handle the redirection state.

In the Blank state, if the next input character is < or >, the redirection state should be set to RedirectionState::Input(0) or RedirectionState::Output(1) and transition to the Unquoted state.

In the Unquoted state, if the next input character is a |, then you either need to insert a Word token or one of the Redirect* token into the results vector. And then you need to insert a Pipe token. Here’s some sample code for this.

    (State::Unquoted, '|') => {
        let token = match redirection_state {
            RedirectionState::None => InputToken::Word(current_word),
            _ if current_word.is_empty() => return Err("Empty redirection target".into()),
            RedirectionState::Input(num) => InputToken::RedirectInput(num, current_word),
            RedirectionState::Output(num) => InputToken::RedirectOutput(num, current_word),
            RedirectionState::Append(num) => InputToken::RedirectAppend(num, current_word),
        };
        result.push(token);
        result.push(InputToken::Pipe);
        current_word = String::new();
        state = State::Blank;
        redirection_state = RedirectionState::None;
    }

Note that if the current word is empty, and a redirection is being parsed, this is an error. I.e., the input 2>| is an error.

In the Unquoted state, if the next input character is whitespace, then this behaves similar to the | case except you don’t push a Pipe token.

In the Unquoted state, if the next input character is a < or a >, then based on the redirection state, you need to (1), insert a new Word token, (2) insert a new Redirect* token, or (3) change the redirection state from RedirectionState::Output(num) to RedirectionState::Append(num).

The third option occurs when the redirection state is RedirectionState::Output(num) and the current word is empty. For example, when parsing input 3>>foo, the first > changes the redirection state to Output(3) and the state to Unquoted. The second > changes the redirection state to Append(3).

For example, the input >foo>>bar<baz should be parsed as three separate redirection tokens.

Finally, after all of the input has been consumed, based on the state and the redirection state, you may need to insert either a Word or a Redirect* token. You will also need to update your code to handle the Digit state.

Write unit tests. In fact, you should probably write your unit tests first. Here are a few to help you get started.

Example

#[test]
fn redirections() {
    use InputToken::*;

    let input = ": >foo >>bar <baz";
    let tokens = split_input(input).unwrap();
    assert_eq!(
        tokens,
        vec![
            Word(":".to_string()),
            RedirectOutput(1, "foo".to_string()),
            RedirectAppend(1, "bar".to_string()),
            RedirectInput(0, "baz".to_string()),
        ]
    );
}

#[test]
fn redirections_nospace() {
    use InputToken::*;

    let input = ":>foo>>bar<baz";
    let tokens = split_input(input).unwrap();
    assert_eq!(
        tokens,
        vec![
            Word(":".to_string()),
            RedirectOutput(1, "foo".to_string()),
            RedirectAppend(1, "bar".to_string()),
            RedirectInput(0, "baz".to_string()),
        ]
    );
}

#[test]
fn digit_redirections() {
    use InputToken::*;

    let input = ": 1word 2>foo 3>>bar 4<baz 56>abc";
    let tokens = split_input(input).unwrap();
    assert_eq!(
        tokens,
        vec![
            Word(":".to_string()),
            Word("1word".to_string()),
            RedirectOutput(2, "foo".to_string()),
            RedirectAppend(3, "bar".to_string()),
            RedirectInput(4, "baz".to_string()),
            Word("56".to_string()),
            RedirectOutput(1, "abc".to_string()),
        ]
    );
}

#[test]
fn empty_redirection_targets() {
    assert!(split_input("<").is_err());
    assert!(split_input(">").is_err());
    assert!(split_input(">>>foo").is_err());
    assert!(split_input("><foo").is_err());
    assert!(split_input("<>foo").is_err());
    assert!(split_input(">><foo").is_err());
}

The good news is this was the hard part! The rest is simpler. (Just for fun, check out the state machine definition for HTML. It has 80 states. And that’s just the first of two state machines used to parse HTML. The second has about 25 states but is even more complex.)

Part 2. Processing redirections (25 points)

Now that you have a split_tokens() function that can parse redirections, it’s time to actually implement them! Fortunately, this is much easier than parsing the input.

Although your implementation from the previous part supported file descriptors 0-9, for this part, you only need to support file descriptors 0, 1, and 2.

In the last lab, implementing the pipeline consisted of two parts, (1) splitting the slice of InputTokens on Pipe tokens and constructing a vector of SimpleCommands in Pipeline::new(); and (2) spawning the processes in Pipeline::run().

Your task

You will need to update the constructor to handle the new InputToken variants and update the .run() method to perform redirections.

First, update SimpleCommand to hold a vector of redirections. It’s up to you how you want to model a redirection. I used a new enum with two variants, one for input and one for output. Both variants have fields for the file descriptor number and the path. The output variant also has a bool field which is true when appending.

What you should not do is have a vector of InputTokens for the redirections. It wouldn’t make sense to have a redirection of, InputToken::Word or InputToken::Pipe after all. Make some new type.

In the Pipeline::new() constructor, insert the appropriate redirections into each SimpleCommand’s redirections.

In the .run() method, to process the redirections, for each command, you need to construct a Stdio object for stdin, stdout, and stderr. You did part of this before when hooking up stdout of one command to stdin of the next command.

Initially, the Stdio for stdin should be the last stdout, the Stdio for stdout should be either Stdio::piped() or Stdio::inherit() and the Stdio for stderr should be Stdio::inherit(). This is what you had from the last lab but with an explicit Stdio for each of stdin/stdout/stderr.

Then, for each redirection in the SimpleCommand, you want to open/create a file. The File object can be converted to a Stdio using .into(). This can be assigned to variables for stdin/stdout/stderr using code like

match num {
    0 => stdin = file.into(),
    1 => stdout = file.into(),
    2 => stderr = file.into(),
    _ => return Err(format!("Redirecting {num} not supported").into()),
}

Finally, when creating the Command builder, you want to call .stdin(), .stdout(), and .stderr() and pass the three Stdios.

Hint

Remember that we can open a file for input using

let file = File::open(path).map_err(|err| format!("{path}: {err}"))?;

For output redirections, we need to create a file if it doesn’t exist and truncate it if it does. For append redirections, we need to create the file if it doesn’t exist and open it in append mode.

In append mode, all writes to the file are appended atomically. This means if multiple processes are trying to append to the file at the same time, you won’t get some bytes from one process and some bytes from the other process intermixed. A complete line of output will be written at a time.

To open files with different options, we need to use the OpenOptions builder. Read the documentation for OpenOptions. Pay particular attention to the methods .write(), .truncate(), and .append(). You can handle output and append redirections in a similar manner by passing different values to .truncate() and .append().

Part 3. Handling signals (30 points)

When the user types a Ctrl-C, the kernel sends a signal to every process in the foreground process group of the controlling terminal (review this info box for details on process groups and controlling terminals). By default, this signal—its name is SIGINT and it is the interrupt signal—causes the process that receives it to terminate.

There are other signals that can be sent in response keys being pressed. SIGINT is by far the most common signal sent by a key press, but Ctrl-\ sends SIGQUIT, the quit signal. (Go ahead, try running a program like $ sleep 10 that is long running and kill it by typing Ctrl-C or Ctrl-\.) Note this is different from Ctrl-D which closes stdin but doesn’t terminate processes.

A process can choose to handle a signal by registering a signal handler function which will be called in the event of a signal. Or, a process can choose to ignore a signal if it doesn’t want the default action—often termination—to occur.

Shells need to deal with SIGINT and SIGQUIT to avoid being terminated. When Bash receives a SIGINT, it stops reading its current line of input, prints a newline and a prompt and waits for the next line of input. When Bash receives a SIGQUIT, it ignores it.

A process can choose to handle or ignore a signal using the sigaction(2) system call (by calling the libc wrapper function as usual). The first argument is the signal number. The second argument is a pointer to a sigaction structure (yes, the struct and the function have the same name) that controls the action that should be taken when that signal is received. The third argument is a pointer to a sigaction struct that will be filled out by sigaction() with the previous action.

These are the Rust definitions for the libc::sigaction() function and the libc::sigaction.

To ignore a particular signal, you create a libc::sigaction struct with its sa_sigaction field set to libc::SIG_IGN and pass it to libc::sigaction().

Example

When this example program is run, SIGINT is ignored and then the process goes to sleep for 10 seconds. During this time, pressing Ctrl-C has no effect.

use std::io;

fn main() -> io::Result<()> {
    unsafe {
        let action = libc::sigaction {
            sa_sigaction: libc::SIG_IGN,
            ..std::mem::zeroed()
        };

        if libc::sigaction(libc::SIGINT, &action, std::ptr::null_mut()) < 0 {
            return Err(io::Error::last_os_error());
        }
    }

    std::thread::sleep(std::time::Duration::from_secs(10));
    Ok(())
}

One thing to notice is that action was constructed using .. to fill the rest of the fields other than sa_action. See the Book for details about ... This is because different OSes use different structures with different members. We only need to use the sa_sigaction field which appears in all of the different sigaction structs. Essentially, doing it this way makes the code slightly more portable. For example, it works on both macOS and Linux.

The std::mem::zeroed() is creating a libc::sigaction struct where all of the fields have been set to 0. In general, doing this is unsafe, hence the reason it is in the unsafe block. It’s safe in this particular case because all of the fields may be safely set to 0.

To install a signal handler, you set the sa_sigaction field to the address of a function.

When a signal is received for which a signal handler has been registered, the kernel will run the handler. But the question is when does it run it? The answer is it’s a little unpredictable. It will be run the next time the kernel returns control to the process. This can happen as the result of returning from a system call or simply because the kernel has scheduled the process as the next process to run.

Since signal handler functions can be run at any time, the actions a signal handler can take are extremely limited. There are a very small number of functions that are safe to be called from a signal handler. In particular, a signal handler is not allowed to allocate memory or use perform I/O using things like println!(). Instead, about all a signal handler should do is set flag which is checked outside the signal handler to see if the signal occurred.

Setting the flag and checking if the flag has been set must happen atomically, meaning without being interrupted. You will be unsurprised to learn that Rust has great support for atomic variables. For the flag, you want to use an AtomicBool. This will have values true or false, but we assign values to them using the .store() method and load values using the .load() methods. The .swap() method is used to atomically load the old value and assign a new value.

Example

This example shows how to register a signal handler for SIGINT.

use std::io;
use std::sync::atomic::{AtomicBool, Ordering};

static INTERRUPTED: AtomicBool = AtomicBool::new(false);

extern "C" fn handler(_sig: libc::c_int) {
    INTERRUPTED.store(true, Ordering::Relaxed);
}

fn main() -> io::Result<()> {
    unsafe {
        let action = libc::sigaction {
            sa_sigaction: handler as libc::sighandler_t,
            ..std::mem::zeroed()
        };

        if libc::sigaction(libc::SIGINT, &action, std::ptr::null_mut()) < 0 {
            return Err(io::Error::last_os_error());
        }

    }
    std::thread::sleep(std::time::Duration::from_secs(10));

    if INTERRUPTED.load(Ordering::Relaxed) {
        println!("Received a SIGINT");
    }
    Ok(())
}

Some key points to notice: The handler function is declared as extern "C" because the handler function is expected to be a C function. INTERRUPTED is our first example of a global variable in Rust. The handler function is incredibly simple: All it does is set a flag.

If you run the code in that example and press Ctrl-C during the sleep, you’ll notice something disconcerting: The process sleeps for the full 10 seconds and then afterward it prints out Received a SIGINT. The whole point of the interrupt signal is to interrupt a process so what is going on here?

Some system calls are allowed to be interrupted by signals. When this happens, the C wrapper around the underlying system call sets the thread-local variable errno to EINTR and returns -1. If you examine the source code for the sleep() function, you’ll see it calls the underlying libc::nanosleep() function in a loop if it is awakened by a signal. (Note the if on line 242 checking if the return value of libc::nanosleep() is -1 and then asserting that the value of os::errno() is libc::EINTR.)

If we replace the call to sleep() with a call to libc::nanosleep(), we can see the difference.

Example

In this example, the call to nanosleep() is explicit and since it is not called in a loop, interrupting via SIGINT works as expected.

use std::io;
use std::sync::atomic::{AtomicBool, Ordering};

static INTERRUPTED: AtomicBool = AtomicBool::new(false);

extern "C" fn handler(_sig: libc::c_int) {
    INTERRUPTED.store(true, Ordering::Relaxed);
}

fn main() -> io::Result<()> {
    unsafe {
        let action = libc::sigaction {
            sa_sigaction: handler as libc::sighandler_t,
            ..std::mem::zeroed()
        };

        if libc::sigaction(libc::SIGINT, &action, std::ptr::null_mut()) < 0 {
            return Err(io::Error::last_os_error());
        }
    }

    let ts = libc::timespec {
        tv_sec: 10,
        tv_nsec: 0,
    };
    if unsafe { libc::nanosleep(&ts, std::ptr::null_mut()) } < 0 {
        let err = io::Error::last_os_error();
        if err.kind() != io::ErrorKind::Interrupted {
            return Err(err);
        }
    }

    if INTERRUPTED.load(Ordering::Relaxed) {
        println!("Received a SIGINT");
    }
    Ok(())
}

Note how a io::Error is constructed from errno by calling io::Error::last_os_error() and then returned if it isn’t io::ErrorKind::Interrupted.

Many Rust standard library functions, such as the read_line() method, will perform this same thing where if a signal interrupts a system call, the standard library function will make the call again. Usually, that’s the behavior you want. For a shell, it isn’t. And when it isn’t, you have to implement the functionality you want which sometimes means calling the underlying libc functions.

Your task

Osh should work slightly differently than Bash. In particular, if Osh receives either a SIGINT or SIGQUIT signal, it should ignore the current line of input that is being typed, print a newline, and then prompt for the next line of input.

Create a new module named signals. Inside it write a function install_signal_handlers() -> io::Result<()> that sets the signal handler for both signals to call a function named handler. Make sure any errors from libc::sigaction() are returned. Write the handler function

extern "C" fn handler(_sig: libc::c_int) {
    todo!()
}

that stores true in an AtomicBool following the examples above.

Write a was_interrupted() -> bool function that atomically loads the value of the AtomicBool and stores false. Use the .swap() method for this purpose. Return the loaded value. In this way, was_interrupted() returns true if a SIGINT was received since the last time was_interrupted() was called.

Add a call to install_signal_handlers() to the beginning of main().

Info

As mentioned above, the .read_line() method explicitly ignores interrupts from receiving a signal. Since Osh needs a way to read a line of input from stdin but also return immediately when a SIGINT or SIGQUIT is received, you cannot use the standard library methods.

Fortunately, it’s easy to look at the Rust source code and modify it to suit your needs. So I did. Here’s a function you can use which reads from stdin and returns a io::Result<String>.

#![allow(unused)]
fn main() {
/// This is similar to std::io::stdin().read_line() except that
/// being interrupted by a signal returns an Err(err).
/// This implementation comes from modifying
/// https://doc.rust-lang.org/src/std/io/mod.rs.html#1940
fn read_line() -> std::io::Result<String> {
    use std::io::BufRead;
    let mut stdin = std::io::stdin().lock();
    let mut buf = Vec::new();

    loop {
        let (done, used) = {
            let available = stdin.fill_buf()?;
            match available.iter().position(|&b| b == b'\n') {
                Some(i) => {
                    buf.extend_from_slice(&available[..=i]);
                    (true, i + 1)
                }
                None => {
                    buf.extend_from_slice(available);
                    (false, available.len())
                }
            }
        };
        stdin.consume(used);
        if done || used == 0 {
            let s = String::from_utf8(buf)
                .map_err(|err| std::io::Error::new(std::io::ErrorKind::InvalidData, err))?;
            return Ok(s);
        }
    }
}
}

The key point is that if stdin.fill_buf() returns an error because it was interrupted by a signal, then read_line() will return that error. You can check if the returned error was due to being interrupted by a signal by using err.kind() == std::io::ErrorKind::Interrupted.

In your function that prints a prompt and reads from stdin, replace that with a loop that calls signals::was_signaled() and if a signal has been received, prints a newline. Then print the prompt and call that read_line() function. If read_line() returns an error and the error’s kind is Interrupted, then continue the loop. If it returns some other error, return it. Otherwise break out of the loop and process the line as before.

Here’s some sample output.

$ cargo run --quiet
Welcome to the Oberlin Shell!
$ this line is interrupted^C
$ this one is quit^\
$ echo this one is not
this one is not
$ sleep 10
^C
$ 

If your shell can now handle redirections and signals, you’re done! Congratulations, this was a lot of work.

Submitting (5 points)

To submit your lab, you must commit and push to GitHub before the deadline.

Your repository should contain the following files and directory.

repo
├── Cargo.lock
├── Cargo.toml
├── README
└── src
    ├── input.rs
    ├── main.rs
    ├── pipeline.rs
    └── signals.rs

Any additional files you have added to your repository should be removed from the main branch. (You’re free to make other branches, if you desire, but make sure main contains the version of the code you want graded.)

The README should contain the names of both partners (or just your name if you worked alone…but please don’t work alone if you can manage it).

After you have everything working, make sure you add your code, commit it, and push it to GitHub.

Lab 10: Threads

In this lab, you’re going to take a single threaded program and turn it into a multithreaded program by splitting up the computation into multiple threads.

You will learn

  • how to spawn threads; and
  • how to return values from threads.

Preliminaries

First, find a partner. You’re allowed to work by yourself, but I highly recommend working with a partner. Click on the assignment link. One partner should create a new team. The second partner should click the link and choose the appropriate team. (Please don’t choose the wrong team, there’s a maximum of two people and if you join the wrong one, you’ll prevent the correct person from joining.) You cannot choose a team name you’ve used previously.

Once you have accepted the assignment and created/joined a team, you can clone the repository and begin working.

Be sure to ask any questions on Ed.

Compiler warnings and errors

Make sure your code compiles and passes clippy without errors or warnings. That is, running

$ cargo clean
$ cargo build
$ cargo clippy

should build your program without errors or warnings.

Formatting

Your code must be formatted by running cargo fmt.

Part 1. Hello threads! (10 points)

In this part, you’re going to write a very short program that will spawn several threads.

There are multiple ways to spawn threads in Rust. We’re going to use the thread::scope() method. Click the link to read its documentation, specifically the example. Pay particular attention to how the scope() function takes a closure as an argument. That closure has an argument s which is used to spawn new threads by calling s.spawn(). The spawn() method takes a (zero-argument) closure as an argument.

Your task

Modify the starter code in the hello directory to spawn n threads in a loop where n is the value returned by thread::available_parallelism().

Note

The value returned by thread::available_parallelism()? is a NonZeroUsize and not a normal usize. To turn it into one, you can write your loop like

for thread_num in 0..n.into() { }

Each of your threads should print hello from its thread number. Here’s an example run of the program. Your output will likely look different since you may have a different number of threads and they will run in some unpredictable order.

$ cargo run --quiet
This program should probably only use 10 threads
Hello from thread 0
Hello from thread 1
Hello from thread 2
Hello from thread 5
Hello from thread 6
Hello from thread 3
Hello from thread 4
Hello from thread 8
Hello from thread 9
Hello from thread 7

Part 2. Parallelizing jfrac (20 points)

In this part, you’re going to modify a version of the jfrac code from lab 2 that produces animated gifs of the Julia set fractal. Specifically, you’re going to turn it into a multi-threaded program where each thread will generate some of the frames of the animated gif.

The provided code in the jfrac directory can be run with several new options. Here’s the output from --help.

Usage: jfrac [OPTIONS] <OUTPUT>

Arguments:
  <OUTPUT>  Output file path

Options:
  -s, --size <SIZE>                  Sets the width and height of the image [default: 800]
  -c, --constant <CONSTANT>          Sets the constant `c` in the equation `f(z) = z^2 + c` [default: "-0.8 + 0.156i"]
  -i, --initial-zoom <INITIAL_ZOOM>  Sets the initial zoom [default: 1.0]
  -f, --final-zoom <FINAL_ZOOM>      Sets the final zoom [default: 20.0]
  -n, --frames <FRAMES>              Sets the number of frames in the gif [default: 100]
  -t, --threads <THREADS>            Sets the number of threads to use [default: 1]
  -h, --help                         Print help

If you run the code with the default options, it will take a fairly long time to run, nearly 5 minutes for me. We can time it using the time shell command.

$ time cargo run -- output.gif
    Finished dev [unoptimized + debuginfo] target(s) in 0.04s
     Running `target/debug/jfrac output.gif`
cargo run -- output.gif  293.09s user 0.33s system 99% cpu 4:55.70 total

We can speed it up substantially by compiling our code using compiler optimizations. To do that, pass the --release argument to cargo build and cargo run. After $ cargo build --release, running the code still takes about 25 seconds for me.

$ time cargo run --quiet --release -- output.gif
cargo run --quiet --release -- output.gif  25.51s user 0.08s system 99% cpu 25.827 total

You should run this yourself and look at the produced output.gif to get a sense of what the program is doing.

We can do better by making use of threads.

Tip

While testing your code, you may wish to change the number of frames of animation using the --frames command line argument.

Your task

Your task is to finish the program by handling the case where the number of threads the user selects via the --threads option is greater than 1. (There is currently a todo!() marking where your code needs to go.)

Let’s look at the main loop in the program.

for frame_num in 0..config.frames {
    let zoom = config.initial_zoom
        * zoom_factor.powf(frame_num as f32 / (config.frames - 1) as f32);
    let frame = make_frame(&config, zoom);
    encoder.encode_frame(frame)?;
}

The two key pieces here are creating a frame of animation by calling make_frame(&config, zoom) which creates the fractal at the specified level of zoom and encoding that frame using the Gif encoder.

You will need to write similar code for the case where the number of threads is greater than 1.

When parallelizing code in general, you’ll need to identify which portions of the task can be handled concurrently and which must be handled serially. In this case, this is easy: Each frame of animation can be created independently of the other frames but encoding the gif must happen serially since we need the frames to appear in order.

This suggests the following steps.

  1. In a loop, create config.threads number of threads, each of which is responsible for creating a Vec<Frame> containing (approximately) config.frames / config.threads frames of animation.
  2. In a second loop, wait for the threads to finish by using join() (see below) and encode the frames returned by each thread in order.

Both steps will need to occur inside the body of the closure that we pass to thread::scope().

thread::scope(|s| -> Result<()> {
    // Step 1.
 
    // Step 2.

    Ok(())
})?;

Notice that this code contains an explicit return type on the closure. The Rust compiler can often deduce return types, but not always. Since step 2 involves calling encoder.encode_frame() which can return a Result, we can propagate errors by using ? if we mark the closure as returning a Result.

For step 1, you’ll need to figure out what range of frames to create for each thread. One approach is to have a loop like this.

for thread_num in 0..config.threads {
    let start_frame = thread_num * config.frames / config.threads;
    let end_frame = (thread_num + 1) * config.frames / config.threads;
    s.spawn(move || { 
        todo!("Create frames in the range start_frame..end_frame");
    });
}

This ensures all frames will be generated and each thread will have approximately the same number of frames to generate. In fact, the number of frames per thread will differ by at most 1 using this approach. For example, generating 100 frames using 8 threads with this approach gives the following frame ranges per thread.

Thread 0: 0..12
Thread 1: 12..25
Thread 2: 25..37
Thread 3: 37..50
Thread 4: 50..62
Thread 5: 62..75
Thread 6: 75..87
Thread 7: 87..100

Note the use of the move keyword when spawning threads. This causes values from outside the closure that are referenced inside the closure to be moved (or copied) into the closure. E.g., start_frame and end_frame will be copied into the closure. Without the move, the compiler will give an error about lifetimes. (See the note at the end of this page when you get an error about using the moved value of config.)

For step 2, you’ll need to wait for each thread to finish and return the frames. To support waiting for threads to finish and for getting their return values, the s.spawn() method returns a thread handle. You should insert that handle into a vector of handles for use in step 2.

thread::scope(|s| -> Result<()> {
    let mut handles = Vec::with_capacity(config.threads);

    // Step 1.
    for thread_num in 0..config.threads {
        // ...
        let handle = s.spawn(move || { /* ... */ });
        handles.push(handle);
    }

    // Step 2.
    for handle in handles {
        // ...
    }

    Ok(())
})?;

In the loop for step 2, you can use let frames = handle.join().unwrap() to wait for the thread to finish and return the vector of frames. You can then encode the frames.

Note

There’s one final little detail you’ll need to handle: The move keyword you needed to use when creating the threads tried to move the config value into the thread. You actually only want a shared reference to config since each thread needs to use it.

You can force Rust to use a reference by shadowing the config variable with a reference to config. You’ll want something like this.

thread::scope(|s| -> Result<()> {
    let mut handles = Vec::with_capacity(config.threads);
    let config = &config;

    // Step 1.

    // Step 2.

    Ok(())
})?;

With the let config = &config, the move || { } closure will make a copy of the reference for use in each thread rather than trying to move the underlying Config structure itself into the thread.

Part 3. Timing (15 points)

In this part, you’ll run your code several times with different numbers of threads to determine how long it takes to generate the image.

Your task

Time how long it takes to generate the image with several different numbers of threads using the --threads option. Remember to use cargo build --release and time cargo run --release to use the optimized release build rather than the unoptimized debug build. Put the results of your timing experiment in your README.

Answer the following questions in the README.

  1. Using --threads 2, you may have noticed that the program runs much faster than when run with only a single thread but it takes more than half the time of a single thread. Why is that?
  2. At what point does increasing the number of threads stop speeding up the program?
  3. How does the number you found for question 2 compare with the available parallelism the hello program you wrote in Part 1?

At this point, you’re all done and should submit your work. Feel free to play around with the other options to jfrac like --constant, --initial-zoom, --final-zoom, --frames to produce other interesting images.

Submitting (5 points)

To submit your lab, you must commit and push to GitHub before the deadline.

Your repository should contain the following files and directory.

repo
├── README
├── hello
│   ├── Cargo.lock
│   ├── Cargo.toml
│   └── src
│       └── main.rs
└── jfrac
    ├── Cargo.lock
    ├── Cargo.toml
    └── src
        ├── colormap.rs
        └── main.rs

Any additional files you have added to your repository should be removed from the main branch. (You’re free to make other branches, if you desire, but make sure main contains the version of the code you want graded.)

The README should contain the names of both partners (or just your name if you worked alone…but please don’t work alone if you can manage it) in addition to the answers to the questions in Part 3.

After you have everything working, make sure you add your code, commit it, and push it to GitHub.

Final Group Project

Overview

The project consists of a proposal, a status update, implementation, final report, and presentation.

The specific project is up to each group. However, it must involve a significant amount of effort—more than one or two people could do. All partners are expected to contribute to the implementation, the write-ups, and the presentation. Division of labor within each part is expected and good!

Any group member who does not participate in every part of the project will fail the course. No exceptions without prior approval from your Professor.

The GitHub Classroom link for the final project is here. Do not create a team or accept the assignment until after you have formed a group.

Requirements

The programming portion of the project must be (mostly) written in Rust and must use one or more crates from crates.io. Otherwise, the specific project you choose is up to you.

Collaboration and coordination between group members must happen on GitHub. This means regular commits from all group members are expected. I strongly recommend you learn about pull requests and use them along with code review of each commit before it gets pushed to master. Using GitHub issues to track bugs that need to be fixed or features that need to be implemented is a great idea and is encouraged.

Project suggestions

Some project suggestions include

  • Program a microcontroller (like an Arduino) to use some sensors and lights/actuators to do something (you can test with a simulator)
  • Implement a game using a game engine.
  • Use the Rust bindings for OpenCV to do something with computer vision
  • Implement some machine learning algorithms (like k-means or k-nearest neighbors) and run them on some interesting datasets and do some visualization or use existing implementations of ML algorithms
  • Build a website in Rust via WebAssembly.
  • Make some interactive art (audio and video)

Previous projects have included games, scenes in virtual reality, programs communicating over a network.

Project proposal

A written proposal (750-word maximum) is the first step.

You should provide the following information

  • What are you doing?
  • How are you doing it?
  • What do you need to learn to do it?
  • A proposed schedule with milestones (the status update will discuss the milestones so you might want to track these on GitHub.)

You should provide this information in narrative form (i.e., don’t just submit a list of questions and answers). You should use this as an opportunity to practice technical writing.

The proposal should be added your README.md file in your github repository.

Status update

A maximum 750-word status update is due several weeks into the project. The report should summarize the project, including any changes in direction or scope from the proposal and why those changes were made.

The update should also provide a revised schedule with milestones. If you’re on-track, you should say so. If you’re not, explain how your revised schedule differs from the original schedule.

It’s completely fine to drop features if you find that you were overly ambitious in the proposal. Just explain how doing so will let you complete the project successfully.

The status update will be added to your README.md in your github repository.

Final report

A maximum 1500–2000-word final report is due before presentations begin. This should include a standalone description of your project. You should describe what you accomplished and what you weren’t able to accomplish. Explain what the most challenging aspect of the project is. What was the most fun part?

Include anything else you think I should know.

The final report should be uploaded as a pdf to your github repository.

Presentation

During the scheduled final exam time, you will give a seven minute presentation on the project. This includes five minutes of talking (among the four of you) and two minutes of answering audience questions. Every group member must speak.

If you’re able to do so, I strongly encourage you to give a demonstration of your project.

You should tell the class who you are, what you did, and how you did it. You can tell us what didn’t work or was almost working if you would like. If you do a demonstration, you should show off some cool features.

Public speaking is very hard. This is a low-stakes way to get some practice at it in a supportive environment.