Part 4. Modeling a process (15 points)

In this part of the lab, you’re going to define a new struct in the library (which is the lib.rs file and additional files you’ll create) which you’ll use to represent, or model, a process. This means that your struct, is going to contain enough information about a process for us to perform our task: implementing the ps command-line utility which prints out information about running processes.

A process is an instance of a running program. For example, there is one bash program which lives in the file system at /bin/bash but we can run many different instances of bash at the same time. Each instance is a process. The operating system assigns each running process a nonnegative number called the process identifier or PID.

Open a terminal and run $ ps. You should see output similar to the following.

$ ps
  PID TTY          TIME CMD
29604 pts/4    00:00:00 bash
31292 pts/4    00:00:00 ps

In the first column, we can see the process identifier. The second column gives the name of the controlling terminal. We’ll come back to that shortly. The third column gives the total execution time of the process. The fourth column gives the command name, or name of the process.

Returning to Rust, in order to replicate this behavior, your struct is going to contain a PID, a representation of the controlling terminal, the execution time, and the command name to start with. We’ll worry about the details of ps in a subsequent part.

Because this code will be useful in multiple binaries, you’re going to write this code in a new proc module in the library. Let’s get started!

Your task

First, create a new module named proc as follows:

  • Add the line mod proc; to the top of your lib.rs which will now look like this.
    mod proc;
    
    pub type Result<T> = std::result::Result<T, Box<dyn std::error::Error>>;
    This line informs the compiler that there’s a file named proc.rs in src which contains the code for a proc module.
  • Create the src/proc.rs file. Add the line
    use super::Result;
    to it. This will let you use the Result alias you defined in lib.rs in proc.rs.

Second, create a new struct to model a process. We can see from the ps output above that we’re going to need a process identifier, a controlling terminal, the total execution time, and the command name. I suggest starting simple and building it once you have the basics works. With that in mind, let’s start with a Process structure that looks like this.

#![allow(unused)]
fn main() {
/// Models a Linux process.
#[derive(Debug)]
pub struct Process {
    /// Process ID.
    pub pid: i32,

    /// Command name.
    pub command_name: String,
}
}

You will add more to this structure as you continue but for now, this is enough to get started.

The next step will be to implement a function Process::for_pid() in proc.rs that takes a PID as an argument and returns a Result<Process>.

#![allow(unused)]
fn main() {
pub type Result<T> = std::result::Result<T, Box<dyn std::error::Error>>;
pub struct Process { pid: i32, command_name: String };
impl Process {
    /// Look up information about a running process with the given PID.
    pub fn for_pid(pid: i32) -> Result<Self> {
        let command_name = todo!("Need to look up the command name");
        Ok(Self {
            pid,
            command_name,
        })
    }
}
}

Recall that inside an impl we can refer to the current type as Self hence Process::for_pid() is returning a Result<Process>. This returns a Result<Process> rather than a Process because there might not be any such process identifier and Process::for_pid() must return an error in that case.

Linux exposes information about each running process in the procfs file system by exposing a virtual directory /proc/<pid> where <pid> is the numeric process identifier. Run $ ls /proc to see all of the directories corresponding to processes, and other virtual files and directories with other information. You already saw /proc/loadavg but now we’ll investigate the numbered directories.

Run $ cat /proc/self/stat (not /proc/self/status which is similar information but designed for human consumption rather than programatic manipulation). You probably got some mysterious output like

$ cat /proc/self/stat
32177 (cat) R 29604 32177 29604 34820 32177 4194304 79 0 0 0 0 0 0 0 20 0 1 0 251402402 7798784 200 18446744073709551615 94224239239168 94224239270352 140734102210720 0 0 0 0 0 0 0 0 0 17 9 0 0 0 0 0 94224241367664 94224241369280 94224242470912 140734102216746 140734102216766 140734102216766 140734102220783 0

As we’ll see, this file contains most of the information we will need. Look at the bottom of the proc(5) man page for the link to the documentation for the file /proc/pid/stat. There are many fields here. You’re going to pick out just the ones you need to fill out an instance of the Process structure.

Ideally, we’d like to read this file into a string and then split the string on white space into the fields. Then we can parse just the fields we care about. The one hitch here is the command name which is in the second field. It can have spaces or newlines or even a ) character! This complicates our task, but not by a great deal. The command name is preceded by a ( and followed by a ) and the parentheses characters will not appear anywhere else in this file other than the command name.

Use str::find('(') and str::rfind(')') to get the indices of the first ( and last ) characters and then extract three substrings.

Example

pub type Result<T> = std::result::Result<T, Box<dyn std::error::Error>>;
fn main() -> Result<()> {
// Imagine we read this string in from `/proc/32177/stat`.
let pid = 32177;
let stat = "32177 (odd ) proc name) R 29604 32177 29604 34820 32177 4194304 79 0 0 0 0 0 0 0 20 0 1 0 251402402 7798784 200 18446744073709551615 94224239239168 94224239270352 140734102210720 0 0 0 0 0 0 0 0 0 17 9 0 0 0 0 0 94224241367664 94224241369280 94224242470912 140734102216746 140734102216766 140734102216766 140734102220783 0";
let comm_start = stat.find('(')
    .ok_or_else(|| format!("Couldn't parse stat file for PID {pid}"))?;
let comm_end = stat.rfind(')')
    .ok_or_else(|| format!("Couldn't parse state file for PID {pid}"))?;

let pid_as_str = stat[..comm_start].trim();
let command_name = &stat[comm_start + 1..comm_end];
let remaining_fields = stat[comm_end + 1..].trim();
println!("Before command name: {pid_as_str}");
println!("Command name:        {command_name}");
println!("After command name:  {remaining_fields}");
Ok(()) 
}

Click the Run button to see the output.

The str::find() and str::rfind() methods return an Option<usize>. There are a bunch of methods for converting between an Option<T> and a Result<T, E>. The Option::ok_or_else(f) method turns a Some(x) into an Ok(x) and a None into Err(f()) (where f is a zero-argument function). Read the documentation for details and examples.

The upshot that if there aren’t at least one each of ( and ), the find() or rfind() methods will return None which the ok_or_else() will turn into Err("Couldn't parse...") and the ? will return the error as usual.

I recommend creating a mutable Vec<&str>, pushing pid_as_str and command_name, and then split the remaining_fields on white space and use Vec::extend() to add the elements from the split. More complete documentation for extend() is here. The result will be a Vec containing the fields described in the man page. But note that the fields in the man page are numbered starting at 1 whereas the elements in a Vec are numbered starting at 0!

Assuming the Vec is named fields, you can now construct a new instance of Process with something like

let proc = Process {
    pid: fields[0].parse()?,
    command_name: fields[1].to_string(),
};

Implement the function for_pid().

#![allow(unused)]
fn main() {
pub type Result<T> = std::result::Result<T, Box<dyn std::error::Error>>;
pub struct Process { pid: i32, command_name: String };
impl Process {
    /// Look up information about a running process with the given PID.
    pub fn for_pid(pid: i32) -> Result<Self> {
        let path = format!("/proc/{pid}/stat");
        let mut fields: Vec<&str> = Vec::new();
        todo!("Open path and read its contents and split into fields as described");
        Ok(Self {
            pid: fields[0].parse()?,
            command_name: fields[1].to_string(),
        })
    }
}
}

We’re nearly done with this part! All that remains is to make the Process structure visible outside of the library so that you can use it in bin/ps.rs, the new binary you’re about to create.

There are multiple ways to do this, but the way that you’re going to do it is by re-exporting the type. Add the line

pub use proc::Process;

to your lib.rs.

It’s time to start implementing the ps utility so create the file bin/ps.rs. Just as with bin/runnable.rs, Cargo will compile it into a binary named ps.

Add the following code to your new ps.rs.

use process::{Process, Result};

fn run() -> Result<()> {
    let proc1 = Process::for_pid(1)?;
    println!("{proc1:?}");
    Ok(())
}

fn main() {
    if let Err(err) = run() {
        eprintln!("{err}");
        std::process::exit(1);
    }
}

Try to understand what this code is doing before running it.

Tip

If you try to run your code using $ cargo run, you’ll get an error message.

error: `cargo run` could not determine which binary to run. Use the `--bin` option to specify a binary, or the `default-run` manifest key.
available binaries: runnable, ps

This makes sense. You now have two binaries, runnable and ps and cargo doesn’t know which one you want to run unless you tell it. You can use $ cargo run --bin ps to select the binary you want. This is a little tedious. Since you’re going to be working with ps for the rest of the lab, I suggest making that the default binary to run.

Edit Cargo.toml and add the line default-run = "ps" in the package settings. For reference, the package settings in my solution looks like this

[package]
name = "process"
version = "0.1.0"
edition = "2021"
description = "ps - report process status"
default-run = "ps"

$ cargo run now runs ps by default.

If all has gone according to plan, you should see

Process { pid: 1, command_name: "systemd" }