Part 4. Modeling a process (15 points)
In this part of the lab, you’re going to define a new struct
in the library (which is the lib.rs
file and additional files you’ll create)
which you’ll use to represent, or model, a process. This means that your
struct
, is going to contain enough information about a process for us to
perform our task: implementing the ps
command-line utility which
prints out information about running processes.
A process is an instance of a running program. For example, there is one
bash
program which lives in the file system at /bin/bash
but we can run
many different instances of bash
at the same time. Each instance is a
process. The operating system assigns each running process a nonnegative
number called the process identifier or PID.
Open a terminal and run $ ps
. You should see output similar to the following.
$ ps
PID TTY TIME CMD
29604 pts/4 00:00:00 bash
31292 pts/4 00:00:00 ps
In the first column, we can see the process identifier. The second column gives the name of the controlling terminal. We’ll come back to that shortly. The third column gives the total execution time of the process. The fourth column gives the command name, or name of the process.
Returning to Rust, in order to replicate this behavior, your struct
is going
to contain a PID, a representation of the controlling terminal, the execution
time, and the command name to start with. We’ll worry about the details of
ps
in a subsequent part.
Because this code will be useful in multiple binaries, you’re going to write
this code in a new proc
module in the library. Let’s get started!
Your task
First, create a new module named proc
as follows:
- Add the line
mod proc;
to the top of yourlib.rs
which will now look like this.
This line informs the compiler that there’s a file namedmod proc; pub type Result<T> = std::result::Result<T, Box<dyn std::error::Error>>;
proc.rs
insrc
which contains the code for aproc
module. - Create the
src/proc.rs
file. Add the line
to it. This will let you use theuse super::Result;
Result
alias you defined inlib.rs
inproc.rs
.
Second, create a new struct to model a process. We can see from the ps
output above that we’re going to need a process identifier, a controlling
terminal, the total execution time, and the command name. I suggest starting
simple and building it once you have the basics works. With that in mind,
let’s start with a Process
structure that looks like this.
#![allow(unused)] fn main() { /// Models a Linux process. #[derive(Debug)] pub struct Process { /// Process ID. pub pid: i32, /// Command name. pub command_name: String, } }
You will add more to this structure as you continue but for now, this is enough to get started.
The next step will be to implement a function Process::for_pid()
in proc.rs
that takes a PID as an argument and returns a Result<Process>
.
#![allow(unused)] fn main() { pub type Result<T> = std::result::Result<T, Box<dyn std::error::Error>>; pub struct Process { pid: i32, command_name: String }; impl Process { /// Look up information about a running process with the given PID. pub fn for_pid(pid: i32) -> Result<Self> { let command_name = todo!("Need to look up the command name"); Ok(Self { pid, command_name, }) } } }
Recall that inside an impl
we can refer to the current type as Self
hence
Process::for_pid()
is returning a Result<Process>
. This returns a
Result<Process>
rather than a Process
because there might not be any such
process identifier and Process::for_pid()
must return an error in that case.
Linux exposes information about each running process in the procfs
file
system by exposing a virtual directory /proc/<pid>
where <pid>
is the
numeric process identifier. Run $ ls /proc
to see all of the directories
corresponding to processes, and other virtual files and directories with other
information. You already saw /proc/loadavg
but now we’ll investigate the
numbered directories.
Run $ cat /proc/self/stat
(not /proc/self/status
which is similar
information but designed for human consumption rather than programatic
manipulation). You probably got some mysterious output like
$ cat /proc/self/stat
32177 (cat) R 29604 32177 29604 34820 32177 4194304 79 0 0 0 0 0 0 0 20 0 1 0 251402402 7798784 200 18446744073709551615 94224239239168 94224239270352 140734102210720 0 0 0 0 0 0 0 0 0 17 9 0 0 0 0 0 94224241367664 94224241369280 94224242470912 140734102216746 140734102216766 140734102216766 140734102220783 0
As we’ll see, this file contains most of the information we will need. Look at the
bottom of the proc(5) man page for the link to the documentation for the file /proc/pid/stat
. There are many fields
here. You’re going to pick out just the ones you need to fill out an instance
of the Process
structure.
Ideally, we’d like to read this file into a string and then split the string
on white space into the fields. Then we can parse just the fields we care
about. The one hitch here is the command name which is in the second field. It
can have spaces or newlines or even a )
character! This complicates our
task, but not by a great deal. The command name is preceded by a (
and
followed by a )
and the parentheses characters will not appear anywhere else
in this file other than the command name.
Use str::find('(')
and str::rfind(')')
to get
the indices of the first (
and last )
characters and then extract three substrings.
pub type Result<T> = std::result::Result<T, Box<dyn std::error::Error>>; fn main() -> Result<()> { // Imagine we read this string in from `/proc/32177/stat`. let pid = 32177; let stat = "32177 (odd ) proc name) R 29604 32177 29604 34820 32177 4194304 79 0 0 0 0 0 0 0 20 0 1 0 251402402 7798784 200 18446744073709551615 94224239239168 94224239270352 140734102210720 0 0 0 0 0 0 0 0 0 17 9 0 0 0 0 0 94224241367664 94224241369280 94224242470912 140734102216746 140734102216766 140734102216766 140734102220783 0"; let comm_start = stat.find('(') .ok_or_else(|| format!("Couldn't parse stat file for PID {pid}"))?; let comm_end = stat.rfind(')') .ok_or_else(|| format!("Couldn't parse state file for PID {pid}"))?; let pid_as_str = stat[..comm_start].trim(); let command_name = &stat[comm_start + 1..comm_end]; let remaining_fields = stat[comm_end + 1..].trim(); println!("Before command name: {pid_as_str}"); println!("Command name: {command_name}"); println!("After command name: {remaining_fields}"); Ok(()) }
Click the Run button to see the output.
The str::find()
and str::rfind()
methods return an Option<usize>
. There
are a bunch of methods for converting between an Option<T>
and a Result<T, E>
. The Option::ok_or_else(f)
method turns a
Some(x)
into an Ok(x)
and a None
into Err(f())
(where f
is a
zero-argument function). Read the documentation for details and examples.
The upshot that if there aren’t at least one each of (
and )
, the find()
or rfind()
methods will return None
which the ok_or_else()
will turn
into Err("Couldn't parse...")
and the ?
will return the error as usual.
I recommend creating a mutable Vec<&str>
, pushing pid_as_str
and
command_name
, and then split the remaining_fields
on white space and use
Vec::extend()
to add the elements from the split. More
complete documentation for extend()
is
here. The result will
be a Vec
containing the fields described in the man page.
But note that the fields in the man page are numbered starting at 1 whereas
the elements in a Vec
are numbered starting at 0!
Assuming the Vec
is named fields
, you can now construct a new instance of Process
with something like
let proc = Process {
pid: fields[0].parse()?,
command_name: fields[1].to_string(),
};
Implement the function for_pid()
.
#![allow(unused)] fn main() { pub type Result<T> = std::result::Result<T, Box<dyn std::error::Error>>; pub struct Process { pid: i32, command_name: String }; impl Process { /// Look up information about a running process with the given PID. pub fn for_pid(pid: i32) -> Result<Self> { let path = format!("/proc/{pid}/stat"); let mut fields: Vec<&str> = Vec::new(); todo!("Open path and read its contents and split into fields as described"); Ok(Self { pid: fields[0].parse()?, command_name: fields[1].to_string(), }) } } }
We’re nearly done with this part! All that remains is to make the Process
structure visible outside of the library so that you can use it in bin/ps.rs
, the new binary you’re about to create.
There are multiple ways to do this, but the way that you’re going to do it is by re-exporting the type. Add the line
pub use proc::Process;
to your lib.rs
.
It’s time to start implementing the ps
utility so create the file bin/ps.rs
. Just as with bin/runnable.rs
, Cargo will compile it into a binary named ps
.
Add the following code to your new ps.rs
.
use process::{Process, Result};
fn run() -> Result<()> {
let proc1 = Process::for_pid(1)?;
println!("{proc1:?}");
Ok(())
}
fn main() {
if let Err(err) = run() {
eprintln!("{err}");
std::process::exit(1);
}
}
Try to understand what this code is doing before running it.
If you try to run your code using $ cargo run
, you’ll get an error message.
error: `cargo run` could not determine which binary to run. Use the `--bin` option to specify a binary, or the `default-run` manifest key.
available binaries: runnable, ps
This makes sense. You now have two binaries, runnable
and ps
and cargo
doesn’t know which one you want to run unless you tell it. You can use $ cargo run --bin ps
to select the binary you want. This is a little tedious. Since you’re going to be working with ps
for the rest of the lab, I suggest making that the default binary to run.
Edit Cargo.toml
and add the line default-run = "ps"
in the package settings. For reference, the package settings in my solution looks like this
[package]
name = "process"
version = "0.1.0"
edition = "2021"
description = "ps - report process status"
default-run = "ps"
$ cargo run
now runs ps
by default.
If all has gone according to plan, you should see
Process { pid: 1, command_name: "systemd" }