Part 1. Calling C functions (5 points)

We saw last time (and will see a little more this time) how Linux uses the procfs virtual file system (which is usually mounted at /proc) to expose information to user processes.

This is not the only mechanism that operating systems use. Every modern OS works by enforcing a strict separation between user processes (the ones we run) and the operating system kernel. This is essential to prevent, among other things, buggy programs from crashing the whole computer. Unfortunately, it comes at a cost. If a user process wants information from the OS, it cannot simply call a function in the kernel. Instead, it makes a system call.

System calls are a mechanism for requesting the OS do something on your behalf. Some examples of system calls in different application domains:

  • Open a file
  • Read data from a file
  • Write data to a file
  • Get the current time of day
  • Run a new program (more about this in a later lab)
  • Terminate a program (we’re going to do this!)
  • Generate random numbers in a secure fashion
  • Open a network connection to a server (more about this in a later lab)
  • Send or receive data over the network to/from the server
  • Exit the current process

Every program you’ve written, has used at least one system call.

Most of the time, programmers do not issue system calls themselves. The reason for this is that the system call mechanism is not portable, meaning it’s different on every OS. Nevertheless, we can write code like

use std::fs::File;
fn main() -> Result<(), Box<dyn std::error::Error>> {
let mut file = File::open("example.txt")?;
Ok(())
}

and it can be compiled for any (modern) OS and it’ll open the file.

The way this works is that every OS exposes a set of functions in a library which will make the relevant system calls. For example, Apple calls their library libSystem. Even if the underlying system calls themselves change, as part of an update for example, the OS vendor (e.g., Micorsoft or Apple) updates this library to make the updated system calls on the program’s behalf.

Now we hit a bit of a snag. There are a million programming languages. Each language can have wildly different ideas about what a function is. One way to deal with this would be to have a library per programming language. E.g., we could have a libSystem-Python, libSystem-Java, libSystem-Rust, etc. This quickly becomes unworkable.

Instead, the C programming language is the de facto programming language for these libraries. And indeed, the most common name for the library containing functions that make system calls is libc. On an x86-64 Linux system, you’re likely to find the GNU version of libc living at /lib/x86_64-linux-gnu/libc.so.6.

This is really unusual, but you can actually run this libc as if it were a program and it’ll print out some information about the library. Almost no library does this.

$ /lib/x86_64-linux-gnu/libc.so.6
GNU C Library (Ubuntu GLIBC 2.27-3ubuntu1.6) stable release version 2.27.
Copyright (C) 2018 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.
There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A
PARTICULAR PURPOSE.
Compiled by GNU CC version 7.5.0.
libc ABIs: UNIQUE IFUNC
For bug reporting instructions, please see:
<https://bugs.launchpad.net/ubuntu/+source/glibc/+bugs>.

Okay, so if C is the de facto language, how can we make system calls in Rust if the Rust standard library doesn’t already have the functionality we need? Well, we call C functions! Nearly every programming language has some mechanism, often called a Foreign Function Interface (FFI), that lets you call C functions. Rust is no exception.

In Rust, the functions in libc are exposed in the libc crate. Here’s the documentation for x86-64 Linux.

As part of your implementation of ps, you printed out the user id (UID) of the user that is running the process. You’re going to write some code to turn the UID into a String containing the user’s username.

Your Task

You’re going to implement the most bare-bones version of the standard whoami command-line utility that prints out the user name of the user who runs the process. E.g., here’s the standard whoami on my laptop.

$ whoami
steve

Pretty straight forward. Unfortunately, there’s no Rust standard library function to get this information. We’re going to have to write it ourselves!

First, add the libc crate as a dependency by running

$ cargo add libc

in your assignment repository.

Next, create a file src/bin/whoami.rs. By putting the file in bin, $ cargo build will build two different applications and $ cargo run now needs to know which binary to run. Write a simple main function in whoami.rs that prints out a message and exits. You can (build and) run this via

$ cargo run --bin whoami
Hello world!

If you have two or more binaries and you run $ cargo run without specifying which to run, you’ll get an error.

Getting the user name for the user who is running the current process requires two steps:

  1. Ask the OS for the UID of the user
  2. Ask the OS for the username associated with the UID.

The C function we need to call for 1 is geteuid(). If you click that link, you’ll find no real documentation. That’s a shame, but not surprising as the libc crate is generated programatically and it doesn’t include documentation. Fortunately, there are manual pages for C functions. Here’s the man page for geteuid(2). Give it a read. (Note that it is in section 2 of the manual. Section 1 is where user programs like cat are documented. Section 2 is where system calls are documented. Section 3 is where C standard library functions are documented. Run $ man man for a list of the different sections.)

There are two functions documented, getuid() which returns the real UID (we don’t want this), and geteuid() which returns the effective UID (this is the one we want). These are usually the same. The most common way they can differ is if you run a process as the root user (which has UID 0) via the sudo command. In that case, geteuid() will return 0 and getuid() will return the real user. We want the effective one instead:

$ whoami
steve

$ sudo whoami
root

Important

Every function in libc is marked unsafe. This indicates that the function might do something that violates Rust’s safety guarantees and it’s up to the programmer to ensure that it doesn’t. We can only call an unsafe function within an unsafe function or within an unsafe {} block.

Because of this, it is considered best practice to encapsulate anything unsafe behind a safe interface. It’s up to you to perform error handling.

The geteuid() function is unusual in that it cannot fail. There is no return value that indicates an error condition (I know this because I read the man page, “These functions are always successful…”). As a result, we can trivially make a safe wrapper around geteuid by calling libc::geteuid() inside an unsafe {} block and returning the result.

extern crate libc;
type Result<T> = std::result::Result<T, Box<dyn std::error::Error>>;
fn geteuid() -> u32 {
    unsafe { libc::geteuid() }
}

fn main() {
    let uid = geteuid();
    println!("The current effective UID is {uid}");
}

There are a handful of libc functions like getpid(), getppid(), and getpgrp() which just return an numeric identifier and can be handled by just wrapping the call in an unsafe {} block. Most system calls are sadly not this easy to deal with. We’ll need to carefully handle memory and any errors that result.

Speaking of other system calls, you now know how to get the effective UID of the user running the process. Neat!

Turning that into a user name is more complex and can fail.

Create a new file src/user.rs.