Preface
This website contains labs for CSCI 210.
At the top of this page, you’ll find three icons. The left shows or hides the table of contents. The middle one lets you pick a theme for this book (go ahead and pick one you like). The right one lets you search the labs.
Throughout the labs, you’ll find a number of colorful blocks like the example above. Here are a few you’ll encounter in the labs.
Example boxes contain important examples, usually of code that’s similar to the code you will be writing.
Tip or hint boxes contain important information that you’re almost certain to want to use in your implementations.
There are a variety of other boxes, including note, info, and caution. At this point, they have no fixed meaning. This may change.
Lab 1. Introduction to MIPS
Due: Sunday, February 23 at 23:59
Your task is to write a program which takes in a number and stores it in a four-element array at a user-specified index, then prints the contents of the array. You will write this program using the MARS MIPS simulator, which you can download here.
Preliminaries
First, download MARS (there’s a “Download MARS” button at the top of the page).
If you don’t already have a GitHub account, make one. As a student, you can get additional GitHub benefits such as unlimited private repositories here, but you don’t need to for this course. All of your repositories will be private to you and course instructors.
You’ll need Git installed in order to download and submit assignments. On macOS, Git is included with Xcode which can be installed from the App Store. On Linux, Git can be installed using your distribution’s package manager. You can install any of the many other Git tools instead, including GitHub Desktop on macOS and Windows.
Click on the assignment link.
Once you have accepted the assignment, you can clone the repository on your computer by following the instruction and begin working.
Be sure to ask any questions on Ed.
Program specification
Read two input values from the keyboard:
- a number to enter in the array; and
- the index of the array where you should store the first number.
Print out each index of the array, and its contents. It should look like this.
Please enter a number 6
Which element of the array would you like this to be? 1
0: 0
1: 6
2: 0
3: 0
Or like this.
Please enter a number 3
Which element of the array would you like this to be? 0
0: 3
1: 0
2: 0
3: 0
You can assume the user will not input any indexes outside of the bounds of the array. You are required to store the user’s value in memory at the specified position, and to load every value in the array from memory into a register to print it out.
Note that you will need to translate between indices, which are word addresses, and values in memory, which are byte addresses.
Make sure to document your programs thoroughly. (This is especially important in assembly language programs, since the code itself is less easily read than high-level language code.) This should include:
- A block of comment lines at the beginning of the source file, giving the name and author of the program and a black-box description of what it does.
- A few comment lines between major sections of the program, describing the contents of each section.
- A comment at the end of most source lines, describing what the instruction on that line does.
Helpful resources
Hints
- You will need to use various system calls to print strings, integers, etc. These are all provided on the list of system calls linked above.
- The
la
instruction loads the address of a label. You can use this to load the addresses of any of the variables in the.data
section of your code. - The
mul
instruction will allow you to multiply a register by a constant. - I have included some strings you may find helpful in the
.data
section. - You do not need to write any loops.
- Your array must be 4-byte aligned. You can use
to get a 4-byte aligned array..align 2 arr: .space 16
Submission
Submit the lab by committing your code and pushing it to your GitHub repository. From the command line, you would enter the commands
$ git commit -m "Finished!" lab01.asm
$ git push
from inside your repository’s directory.
The first command instructs Git to take the changes that you made to
lab01.asm
and record those changes on your computer. Usually, when writing
software, you would make multiple commits. This allows you to “easily” go back
to previous versions. (“Easily” is in quotes because nothing with Git is ever
actually easy.)
The second command instructs Git to take all of the changes that you have committed and push them to your repository on GitHub. You can push each time you commit, or you can push after several commits have been made and all commits that haven’t yet been pushed to GitHub will be pushed at that time.
If you don’t push after committing the final version of your code, the graders will not be able to see your changes (because they’ve only been recorded on your local computer).
Rather than use the command line, you can visit the web page for your repository on GitHub and drag and drop the files on the page. I don’t really recommend doing this; it’s better to learn how to use Git properly. But it does work and you’re welcome to do it.
Lab 2. Bitwise Operations
Due: Sunday, March 2 at 23:59
Your task is to write a Rust program that performs bit operations on integers.
Preliminaries
Click on the assignment link.
Once you have accepted the assignment, you can clone the repository on your computer by following the instruction and begin working.
This lab is in Rust. You’ll either need to have the Rust toolchain installed on your computer (I recommend using rustup
) or use the lab machines which have Rust installed.
Be sure to ask any questions on Ed.
Program specification
In the assignment repository you will find a Rust project named bitops
. This
project contains source code for a library src/lib.rs
which you will be
editing and a binary src/main.rs
which you will not be editing.
Your task is to implement the functions in src/lib.rs
. You are only allowed
to use the Rust bit operators, &
(and), |
(or), ^
(xor), >>
(right
shift), and <<
(left shift). You will not receive credit for the lab if
you use addition, subtraction, division, multiplication, modulo/remainder, or
control flow constructs like if
and loops.
You must perform bit operations directly on the integer passed to each
function, and you are not allowed to convert the integer to another format.
In particular, you may not convert the integer to a String
. For any method
that asks you to change specific bits, you should assume bits are numbered 31
to 0, with 31 being the most significant bit or highest bit, and 0 being the
least significant bit or lowest bit. All of these methods can (and should) be
implemented with a single line of code.
For example, say I asked you to write a function named set_bit_one
that sets bit
one of an integer to one while leaving the rest of the bits unchanged. My code
for that method would look like this:
#![allow(unused)] fn main() { fn set_bit_one(x: u32) -> u32 { x | 2 } }
Make sure you understand why this is ORing x
with 2.
It may be helpful to know that you can specify the value of an integer in
hexadecimal by prepending it with 0x
, e.g., let x = 0xBADBEEF;
. You can
also specify the value of an integer in octal by prepending it with 0o
or in
binary by prepending it with 0b
.
Fun fact: Many programming languages—such as C and Java—use a leading 0
to indicate that a number is in octal. For example, in C or Java 0755
is how
the integer 493 would be written in octal. In Rust, we would write 0o755
.
The bitops
binary whose source code is in src/main.rs
takes input in the
form of command line arguments. Each command line argument is parsed as an
integer and passed to each of the functions you have to implement and the
results are printed to stdout
. The inputs can be specified in decimal, octal,
hexadecimal, binary, or as an IPv4 “dotted quad” address like
132.162.201.24
. See the examples below.
You will need to implement the following functions.
-
fn is_odd(x: i32) -> i32
: This method returns 1 ifx
is odd, and 0 if it is even. It returns 0 on 6, and 1 on 5. -
fn is_negative(x: i32) -> i32
: This method returns 0 if x is positive, and 1 if x is negative. It returns 0 on 4, and 1 on -3. -
fn div_by_4(x: u32) -> u32
: This methods performs unsigned integer division by 4. It returns 1 on 6, 3 on 13, and 0 on 3. -
fn nearest_odd(x: i32) -> i32
: This method rounds up the number to the nearest odd number. It returns 7 on 6, 5 on 5, and -3 on -4. -
fn flip_parity(x: i32) -> i32
: This method adds 1 to even numbers, and subtracts one from odd numbers. It returns 7 on 6, 4 on 5, and -4 on -3. -
fn routing_prefix(addr: u32) -> u32
: IPv4 network addresses are stored as 32 bit integers. Each address is divided into two parts: the routing address and the host address. The routing address specifies the network the computer is on (for example, Oberlin’s network). The host address identifies a specific computer on that network. A subnet mask is applied to an IPv4 address to zero out bits corresponding to the host address, leaving just the routing prefix.Your task is to write a function that returns the routing addresses for a network with 256 host addresses: these routing addresses will consist of the 24 highest order bits of the address, with the last 8 bits of the address set to 0. If you code this function correctly, the IP form of your result should look like
x.x.x.0
. -
fn make_readable(perms: u32) -> u32
: The chmod, or change mode, command in Unix is used to change access permissions. It accepts user permission flags as a 3 digit octal number, where the first digit corresponds to the User permissions, the second digit corresponds to the Group permissions, and the third digit corresponds to the Other permissions. Each octal digit is three bits. For chmod, we set the most significant bit of the digit to 1 to give read permission, the middle bit to 1 to give write permission, and the least significant bit to 1 to give execute permission. For this function, take in a number that represents a set of permissions, and set all the read permission bits to 1. All other permission bits should remain unchanged. Good values for testing are 0, 64 (0o100
in octal, i.e. execute for user and no permissions for anyone else), and 146 (0o222
in octal, or only write permissions set for everyone).
Testing
For each function that you implement, write a unit test containing at least 3
assertions. Your tests should appear in the test
module in src/lib.rs
. A
test function for is_odd
is provided for you to use as a template for other
tests. You don’t need to add additional tests for is_odd
, but you may if you
wish.
To run the tests, you can run
$ cargo test --lib
in the terminal.
Examples
Here are some sample outputs when running bitops
with different arguments.
$ cargo run --quiet -- 0o640
is_odd(416) = 0
is_negative(416) = 0
div_by_4(416) = 104
nearest_odd(416) = 417
flip_parity(416) = 417
routing_address(416 /* 0.0.1.160 */) = 256 /* 0.0.1.0 */
make_readable(0o640) = 0o644
$ cargo run --quiet -- -87
is_odd(-87) = 1
is_negative(-87) = 1
div_by_4(4294967209) = 1073741802
nearest_odd(-87) = -87
flip_parity(-87) = -88
routing_address(4294967209 /* 255.255.255.169 */) = 4294967040 /* 255.255.255.0 */
make_readable(0o37777777651) = 0o37777777655
$ cargo run --quiet -- 132.162.201.24
is_odd(-2069706472) = 0
is_negative(-2069706472) = 1
div_by_4(2225260824) = 556315206
nearest_odd(-2069706472) = -2069706471
flip_parity(-2069706472) = -2069706471
routing_address(2225260824 /* 132.162.201.24 */) = 2225260800 /* 132.162.201.0 */
make_readable(0o20450544430) = 0o20450544474
These examples make good test cases but don’t necessarily cover every case you may wish to test.
Submission
Submit the lab by committing your code and pushing it to your GitHub repository.
Lab 3. MIPS Fibs
Due: Sunday, March 9 at 23:59
Your task is to write a program which computes an initial portion of a generalized Fibonacci sequence and prints it out. You will write this program in MIPS using the MARS simulator.
Preliminaries
Click on the assignment link.
Once you have accepted the assignment, you can clone the repository on your computer by following the instruction and begin working.
Be sure to ask any questions on Ed.
Program specification
No starter code is provided this time. Create a file in MARS called lab3.asm
and create your program there. You may wish to look at your code for Lab
1 for reference.
Input
Read three input values from the keyboard:
- the first number in the sequence,
- the second number in the sequence, and
- the number of elements of the sequence.
Each element of the sequence (beyond the first two) is equal to the sum of the previous two. For example, if the user inputs 3, 1, and 10, then your program should generate the sequence 3, 1, 4, 5, 9, 14, 23, 37, 60, 97.
Output
For each element of the sequence that you generate, display
- the number in decimal notation (using system call 1),
- the number in hexadecimal, and
- the number of 1-bits in the binary representation of the number.
For the example above, you would display the following.
3 0x00000003 2
1 0x00000001 1
4 0x00000004 1
5 0x00000005 2
9 0x00000009 2
14 0x0000000E 3
23 0x00000017 4
37 0x00000025 3
60 0x0000003C 4
97 0x00000061 3
The hex version can be displayed using system call 34.
Implementation
To get full credit for this assignment, you will need to create a function to count the number of ones in a number. You should use proper calling conventions as discussed in class. You will also need to use a loop to generate the generalized Fibonacci numbers.
You will not need to use either arrays or recursion for this problem. You are required to write a separate function which counts the number of ones in a number. (Hint: Think about using bit operations to isolate each bit in the number, and add them together.)
Make sure to document your programs thoroughly. (This is especially important in assembly language programs, since the code itself is less easily read than high-level language code.) This should include:
- A block of comment lines at the beginning of the source file, giving the name and author of the program and a black-box description of what it does.
- A few comment lines between major sections of the program, describing the contents of each section.
- A comment at the end of most source lines, describing what the instruction on that line does.
- Write a program that does this in a high-level language and then translate it to MIPS. It will be a lot easier to solve this problem if you figure out the logic first and write the assembly for it second.
- Syscall 35 will print an integer in binary.
- To count the ones in a number, think about using bit operations to isolate each bit of the number and then add the bits together. A loop and bit shifting instructions will be helpful here.
- Printing the string
\t
(the tab character) between each number will give you the nice even spacing shown in the example. Likewise, printing the string\n
will give you a newline.
Helpful resources
Submission
Submit the lab by committing your code and pushing it to your GitHub repository.
Lab 4. MIPS Array
Due: Sunday, March 16 at 23:59
Your task is to write a program to read a list of (x,y)
pairs representing
points in the xy-plane and print them in ascending order, according to their
distance from the origin. You will write this program in MIPS using the MARS
simulator
Preliminaries
Click on the assignment link.
Once you have accepted the assignment, you can clone the repository on your computer by following the instruction and begin working.
Be sure to ask any questions on Ed.
Program specification
No starter code is provided this time. Create a file in MARS called lab4.asm
and create your program there. You may wish to look at your code for Lab
3 for reference.
Input
The first line of input is an integer (with ) representing the number of points in the list. This is followed by additional lines of input containing the coordinates of the points to be sorted. For example, the list would be input as:
4
3
4
4
2
0
-5
1
3
You can read the values using system call 5 (read integer). (System call 5 requires that only one integer appear on each line.)
The input should be stored in an array. You can view the storage conceptually as a series of pairs, as shown below.
3 4
4 2
0 -5
1 3
Output
The output of the program will consist of lines, each containing the x- and y-coordinates of one point in the sorted list. For example, the output from the list of 4 points shown above would be the following.
1 3
4 2
0 -5
3 4
Implementation
You will need to
- allocate space for the data;
- decide how to map the data to the space you allocate; and
- determine how to address the x- and y-coordinates of each point.
Once you have read in all the data, you need to sort it in increasing order according to the value of ; that is, the square of the distance from the point to the origin. If two points in your list are the same distance from the origin, they should be sorted in lexicographic order; that is, first in increasing order according to the x-coordinate, and if the x-coordinates are the same, then according to the y-coordinate. If the same point appears more than once in the input, it should appear with the same multiplicity in the output. For example, the correct ordering of the points , , , , would be , , , , .
You may use any sort method that you choose. I would recommend using something simple like insertion sort. Keep in mind that when you swap two points in your list, you must swap both their x- and y-coordinates.
You may assume that overflow will not occur as a result of computing .
For full points, you must structure your solution to use functions. See the next section for a suggested list of functions. Functions must be proper functions with arguments passed in the argument registers, results returned in the return registers, caller-saved registers that are used must be saved before use and restored before returning.
Plan for developing the program
First, write and debug a program that will read and the list of points, store each point in the array, and print it in the desired format.
After that, go back and add the sort operation. Make sure to test!
You will need to write the following functions.
- (5 pts) A
main
function that reads the number of points from the user, allocates enough space to hold the points (each point is two integers) and then calls three functions,read_points
,sort_points
, andprint_points
. It ends by calling the exit system call. - (5 pts) A
read_points
function that takes a pointer to the array of points and the number of points. In a loop, it reads the x- and y-coordinates for each point and stores them in the array. - (9 pts) A
sort_points
function that takes a pointer to the array of points and the number of points. I used one of the sorting algorithms you learned about in 150 rather than something faster like Quicksort or Mergesort. Thesort_points
function needs to determine if one point is “smaller” than another. To that end,sort_points
calls apoint_less_than
function whenever two points need to be compared. Many sorting algorithms are built on the idea of swapping two elements so you will write aswap
function that swaps two elements. - (8 pts)
point_less_than
takes two pointers as arguments. The first pointer points to the first point and the second to the second point. Since each point is just two consecutive integers, ifp
were a pointer to a point, then (in C),p[0]
is the x-coordinate andp[1]
is the y-coordinate. In MIPS, since$a0
will be the argument register holding the first pointer, we can access the x- and y-coordinates as follows.lw $t0, 0($a0) # load the x-coordinate of the first point into $t0 lw $t1, 4($a0) # load the y-coordinate of the first point into $t1
point_less_than
returns 1 if its first argument is less than its second; otherwise, it returns 0. - (8 pts) Like
point_less_than
,swap
takes two pointers to points as arguments. It loads the four integers (two for each point), and then stores them in the appropriate location for the other point. - (5 pts) A
print_points
function which prints the points in a loop.
Each of these 6 functions must be a proper function meaning arguments are
passed in $a0
through $a3
and the return value (if any) is in $v0
.
Proper stack manipulation must be performed (reserving space on the sack for
registers, including $ra
in the prologue and cleaning it up in the
epilogue).
You may initially find writing proper functions to be a hassle, but doing so will make your code vastly easier to read and reason about. And it’s required to get full points.
Helpful resources
Submission
Submit the lab by committing your code and pushing it to your GitHub repository.
Lab 5. 7-segment display
Due: Sunday, March 30 at 23:59
In this assignment, you will design a combinational logic circuit which will control a 7-segment LED display, which looks like this.
Preliminaries
We will be using the CircuitVerse website to complete this assignment. You should have received an email inviting you to the “CSCI 210 Fall 2024” group on CircuitVerse. If you did not, please use this link to join the group. To start the assignment, click on your name on the upper righthand side of the website. Go to “My Groups” and then select the group for this class and you should see “Lab 5” under assignments. Click the “Start Working” button to start the assignment. After you click “Start Working” you can click “Launch Simulator” to start creating the assignment.
You can find more details on how to use the CircuitVerse website here.
4-16 Decoder
You’re going to use a 4-16 decoder in order to implement the 7-segment display controller. As a subcircuit, implement a 4-16 decoder. It should have 4 input lines representing a number from 0 to 15 and it should have 16 output lines. At any given moment, exactly one output line must be 1 and the other 15 must be 0.
7-Segment Display Controller
The input to circuit is a set of 4 signals a3, a2, a1, a0 representing the digits of a 4-bit binary number. The output is a set of the seven signals a, b, c, d, e, f, and g used to drive the seven segments of the LED display. When implemented correctly, your circuit will display value of the input as a hexadecimal digit on the 7 segment display. Below is the display with each segment labeled.
Your display controller should pass its 4 inputs to the 4-16 decoder you constructed and the output of the decoder should be used to drive the seven LED signals.
Each of the LED signals can be represented as a simple boolean function of the
16 decoder outputs. For example, if the input is 0101
, the output of the
decoder should be a single 1 which turns on segments a, b, d, f, and g, while
c and e are off, forming the digit “5” on the display.
You should implement all 16 hexadecimal digits, 0
, 1
, 2
, 3
, 4
, 5
,
6
, 7
, 8
, 9
, A
, b
, C
, d
, E
, and F
. (Note that b
and d
are lower case to distinguish them from 8
and 0
.)
Implementation plan
You should use the following plan for implementing this lab.
- Draw out all the numbers on graph paper first.
- For each of the segments a–g, write down which numbers correspond to that segment.
- Build a 4-bit decoder as a subcircuit. You MUST have your decoder as a subcircuit in order to receive full credit for this lab. Documentation for how to create a subcircuit is here.
- For each segment, create one or more OR gates (as needed), add the appropriate outputs from the decoder to the OR gate.
- Keep wires straight or right-angle turns. Pick the order the decoder outputs will go into the OR gates so as to minimize wire crossing.
- If inadvertent connections are made, undo, don’t try to fix it.
Performing step 2 before step 4 is critical. You don’t want to be trying to wire this up while figuring out which decoder outputs you need to OR.
Submission
Make sure to select the “save online” button to save your project. This will make your project available to us for grading.
Lab 6. Adder/Subtracter
Due: Sunday, April 6 at 23:59
In this lab, you will build an eight-bit adder/subtracter.
Preliminaries
We will be using the CircuitVerse website to complete this assignment. To start the assignment, click on your name on the upper righthand side of the website. Go to “My Groups” and then select the group for this class and you should see “Lab 6” under assignments. Click the “Start Working” button to start the assignment. After you click “Start Working” you can click “Launch Simulator” to start creating the assignment.
You can find more details on how to use the CircuitVerse website here.
Adder/Subtracter
Your adder/subtractor must take in two 8
bit numbers, a
and b
, and one select value, s
, and have an 8 bit output, c
. If s
is zero, your adder should output a + b
. If s
is one, your adder should output
a - b
. For some examples, look at the table below.
a | b | s | c |
---|---|---|---|
00001100 | 00001000 | 0 | 00010100 |
00001100 | 00001000 | 1 | 00000100 |
01101000 | 01010000 | 0 | 10111000 |
01101000 | 01010000 | 1 | 00011000 |
Note that in the third example, we get the wrong answer because of overflow—just like many modern programming languages, we are going to ignore this.
Implementation plan
You are required to build this adder/subtracter using only with basic logic gates (i.e., AND, OR, NOT, NOR, NAND and XOR). You are not allowed to use the CircuitVerse adder. You should take the following steps.
- Build a one-bit half-adder as a subcircuit;
- Build a one-bit full-adder out of half-adders, also as a subcircuit; and
- Build an eight-bit adder/subtracter out of eight one-bit full-adders.
You are required to have a separate half-adder and full-adder subcircuit to get full credit for the lab.
Submission
Make sure to select the “save online” button to save your project. This will make your project available to us for grading.
Lab 7. Four-bit Counter
Due: Sunday, April 13 at 23:59
In this lab, you will build an four-bit counter out of J-K flip-flops. To build the J-K flip-flops, you’re going to build a clocked S-R latch.
Preliminaries
We will be using the CircuitVerse website to complete this assignment. To start the assignment, click on your name on the upper righthand side of the website. Go to “My Groups” and then select the group for this class and you should see “Lab 7” under assignments. Click the “Start Working” button to start the assignment. After you click “Start Working” you can click “Launch Simulator” to start creating the assignment.
You can find more details on how to use the CircuitVerse website here.
Clocked S-R latch
You should first build a clocked S-R Latch as a subcircuit. Lectures 18 and 19, as well as Appendix B.8 of the textbook should be helpful to you here. The S-R latch must be a subcircuit to get full credit for the lab.
Your subcircuit should take in inputs S, R and C (for clock), and have outputs Q and Q. When C is 1, the outputs Q and Q are controlled by S and R. When C is 0, the outputs should not change regardless of the values of S and R. (Note that for this subcircuit, you should make the clock an input value, and not actually hook up the clock to the circuit.)
The S-R latch must be a subcircuit to get full credit for the lab.
J-K flip-flop
Next you will build a J-K flip-flop out of two clocked S-R latches. Your subcircuit inputs should be J, K and C (for clock again), and your output should be Q. The J-K flip-flop must be a subcircuit in order to get full credit for the lab. Below is a circuit diagram for a J-K flip-flop. (Please note that the circles on the Q output of this diagram are just to indicate that the output is the inverse of the Q output; they don’t mean that Q is being inverted to be the same as Q!)
Counter
Your main circuit is a four-bit counter which you will build out of four J-K flip-flops.
This circuit should not require any input from the user/viewer of the circuit; you should use the clock as input, and it should automatically make your counter increase by one for every clock cycle. Your output should be a binary number displaying the current count. You can either display this as 4 individual 1-bit outputs, or 1 four-bit output. Your counter should count through 0000, 0001, 0010, 0011, …, 1110, 1111 and then start over at 0000.
Let’s start by considering a 1-bit counter. Every clock cycle, the output should alternate between 0 and 1. See the timing diagram below. Notice how output alternates between 0 and 1 and it stays at each of those for an entire clock period (the interval between the blue lines).
Look at Problem Set 8. There is a particular configuration of inputs to a J-K flip-flop that causes the output to “toggle” between 0 and 1 every time its clock input, C, changes from 1 to 0. Using a single J-K flip-flop, you can construct a 1-bit counter.
Next, let’s build a 2-bit counter. This should count as 00, 01, 10, 11, 00. Notice that the least-significant bit, , toggles on every clock cycle but only toggles when changes from a 1 to a 0, not when the clock changes from a 1 to a 0. In the image below, the blue vertical lines correspond to the counter values 00, 01, 10, 10, 11, 00, from left to right.
To get a 2-bit counter, we just need two J-K flip-flops with appropriate J and K values and with the clock input of the second J-K flip-flop being the output from the first flip-flop.
Similarly, we can build a 3-bit counter by adding another flip-flop. Look at the timing diagram below. When does toggle?
We can continue this process to get a 4-bit counter.
Implementation plan
You are required to build this counter using only with basic logic gates (i.e., AND, OR, NOT, NOR, NAND and XOR). You are not allowed to use the CircuitVerse latches and flip-flops. You should take the following steps.
- Build an S-R latch as a subcircuit;
- Build a J-K flip-flop as a subcircuit; and
- Build a four-bit counter out of four J-K flip-flops.
You are required to have a separate S-R latch and J-K flip-flop subcircuits to get full credit for the lab. You must build your own J-K flip-flops and S-R latches. Using the CircuitVerse latches or flip-flops will result in a failing grade for this lab.
Submission
Make sure to select the “save online” button to save your project. This will make your project available to us for grading.
Lab 8. Floating Point
Due: Sunday, April 27 at 23:59
XXX: This is the old version of the lab which will be updated this semester
In this lab, you will implement part of the IEEE 754 Standard for Floating-Point Arithmetic. In particular, you will implement single precision (32-bit) floating point addition and multiplication.
Preliminaries
You may discuss your implementation with a partner; however, each of you needs to write and submit your own code. So no copying and pasting, but you should feel free to work together on a solution and to debug.
Click on the assignment link.
Once you have accepted the assignment, you can clone the repository on your computer by following the instruction and begin working.
Be sure to ask any questions on Ed.
Program Specification
In the assignment repo, you’ll find a single Java file, Xfloat.java
which
defines an Xfloat
class. This class represents a single-precision floating
point number with the following members.
private byte sign;
private byte exponent;
private int significand;
Of course, these represent the three fields of a floating point number.
The file also contains a main
method that takes two floats as command-line
arguments and prints them out along with the product and sum of the numbers.
For example, running the program with floats 2.5
and 7.25
gives this
output.
$ java Xfloat 2.5 7.25
x: 40200000 (0 10000000 01000000000000000000000) sign: 0 exp: 80 sig: 200000 2.5
y: 40E80000 (0 10000001 11010000000000000000000) sign: 0 exp: 81 sig: 680000 7.25
x*y: 00000000 (0 00000000 00000000000000000000000) sign: 0 exp: 00 sig: 000000 0.0
x+y: 00000000 (0 00000000 00000000000000000000000) sign: 0 exp: 00 sig: 000000 0.0
The first 8 hex-digit number corresponds to the bits of the actual IEEE
floating point number. The binary representation follows, separated into sign,
exponent and significand fields. Next are the corresponding Xfloat
fields in
hex. Note that the significand field requires only 6 hex digits since only 23
of the 32 bits are actually used.
You only need to implement two methods.
public static Xfloat xadd(Xfloat x, Xfloat y)
: Add Xfloat
s x
and y
together and return the result.
public static Xfloat xmult(Xfloat x, Xfloat y)
: Multiply Xfloat
s x
and y
together and return the result.
You must implement the methods by performing the addition/multiplication
by operating on the three fields of each Xfloat
.
You should not modify any other methods.
You may ignore subnormal numbers, infinities, and NaN, but you
must properly handle 0.0
.
You must use bit masking and shifting to manipulate bits. In particular, you are not allowed to convert your integers to strings.
Hints
-
When implementing
xadd
, you’ll need to shift the significand of the number with the smaller magnitude so that the two numbers have the same exponent. However, if the number of bits you need to shift by is greater than 31, you should just set the significand to rather than trying to shift by more than 31. This is becauseval >> 32
in Java is the same thing asval
andval >> 33
is the same asval >> 1
. In general ifshamt >= 32
, thenval >> shamt
is the same asval >> (shamt % 32)
which can lead to surprising results.For example, when adding
0.000000237
and234624.0
, the exponent for the first is and the exponent for the second is 17. To make the first have the same exponent as the second, we’d need to shift the first’s significand right by . In this case, the final result should be234624.0
. -
When implementing
xmult
, you’ll need to multiply the significands. You should perform the following steps.- Convert the significands to 64-bit
long
s; - Use
Xfloat.HIDDEN_BIT
to add the hidden bit to the significands; - Multiply the two 64-bit significands;
- Shift the result appropriately (making necessary adjustments to the exponent); and
- Cast the significand back to 32 bits and remove the implicit 1.
- Convert the significands to 64-bit
-
The various
XXX_MASK
constants defined inXfloat
may be helpful. -
In Java, the unsigned right shift operator is
>>>
. -
Because Java only uses signed numbers, you may want to use types that are larger than the ones you need. For example, you will probably want to work with the exponents by assigning them to
int
s before working with them. Something likeint e = x.exponent & 0xFF;
will give you
x
’s exponent as an integer in the range [0, 255]. If you omit the& 0xFF
, then anyx.exponent
that’s larger than 127 will give you a negative number. -
Check out the page of worked examples.
Tests
Included in the repository is a Python test script, test.py
. This script
will compile and test your code against 10 fixed inputs or, if you pass the
-r
option to it, against an additional 100 randomly chosen inputs.
$ ./test.py
Compiling
x=0.0 y=0.0: PASS
x=-1.0 y=0.0: PASS
x=0.0 y=-1.0: PASS
x=0.9 y=0.1: PASS
x=1.5 y=1.5: PASS
x=6.66666 y=3.3333333: PASS
x=2.25 y=0.5: PASS
x=-4.0 y=2.0: PASS
x=-20486.0 y=-2.5e-06: PASS
x=4.70197740328915e-38 y=2.1267647932558654e+37: PASS
10/10 tests passed
Or
$ ./test.py -r
Compiling
x=0.0 y=0.0: PASS
x=-1.0 y=0.0: PASS
x=0.0 y=-1.0: PASS
x=0.9 y=0.1: PASS
x=1.5 y=1.5: PASS
x=6.66666 y=3.3333333: PASS
x=2.25 y=0.5: PASS
x=-4.0 y=2.0: PASS
x=-20486.0 y=-2.5e-06: PASS
x=4.70197740328915e-38 y=2.1267647932558654e+37: PASS
10/10 tests passed
Performing 100 random tests
x=-1.5824437428766114e+19 y=7.654288931433146e-16: PASS
x=-519727136.0 y=-4.693174584159976e+27: PASS
x=5.338058977860687e-34 y=-110286184.0: PASS
...
x=-1.7480781718172477e-25 y=3.453274344019857e+34: PASS
100/100 random tests passed
Submission
Submit the lab by committing your code and pushing it to your GitHub repository.
Lab 8. Floating Point examples
XXX: This is the old version of the lab which will be updated this semester
Lab 8 asks you to implement addition and multiplication for single-precision, 32-bit floating point numbers. A few worked examples are given below.
Addition
My implementation for xadd
looks something like this.
- Check if either
x
ory
is zero and return the other one if so; - Denormalize
x
ory
to give them the same exponent (the exponent of the larger one); - Add the significands, taking sign into account;
- If the result is 0, return 0;
- Normalize the result; and
- Encode the result (bias the exponent; and mask off the hidden bit) and
finally construct the new
Xfloat
with the result’s sign, exponent, and significand.
Let’s apply this algorithm to adding and to see how this works.
Let’s start by encoding these as a 32-bit single precision floating point number, performing the 5 steps, and looking at the result.
. Therefore, the three fields
of the float are sign = 0, exponent = 1, and significand =
. These are encoded as 0 10000000 01000000000000000000000
(where I’ve added 127 to the exponent and dropped the
hidden bit from the significand).
.
Therefore, the three fields of the float are sign = 1, exponent = 2, and
significand = . Similarly, these are encoded as
1 10000001 11010000000000000000000
.
Step 1. Neither x
nor y
are zero, so we continue to step 2.
Step 2. Extract and unbias the exponents for x
and y
; extract and add in
the hidden bit to the significands for x
and y
. This gives the following.
x_exp = 1
y_exp = 2
x_sig = 00000000 1 01000000000000000000000
y_sig = 00000000 1 11010000000000000000000
In x_sig
and y_sig
, the first 8 bits are 0s, the next bit is the hidden 1
bit and the remaining 23 are the significand without the hidden bit.
Since y_exp > x_exp
, we denormalize x
by shifting x_sig
right by
y_exp - x_exp
and setting x_exp
to y_exp
. This gives us the following
variables.
x_exp = 2
y_exp = 2
x_sig = 00000000010100000000000000000000
y_sig = 00000000111010000000000000000000
Step 3. We need to add the significands together paying attention to the sign.
Because x
is positive and y
is negative, we need to compute z_sig = x_sig - y_sig
where z
is going to be the result. In this case, z_sig
is
negative and we can’t have that, so we’ll set z_sign = 1
and z_sig = -z_sig
.
In any case, z_exp = x_exp = y_exp
, although that can change in
step 5 when we normalize the result.
z_sign = 1
z_exp = 2
z_sig = 00000000100110000000000000000000
Step 4. The result is not zero, so move to the next step.
Step 5. Now, we need to normalize the result, but in this particular case,
there’s actually nothing to be done! I can tell there’s nothing to be done
because the significand is 00000000 1 00110000000000000000000
. In other
words, it starts with eight 0s and then a 1 (the hidden bit) followed by 23
bits. If it started with fewer than eight 0s, we’d need to shift the
significand to the right and increment the exponent until it started with
eight 0s. If it started with more than eight 0s, we’d need to shift the
significand to the left and decrement the exponent until it started with eight
0s.
This gives our final value of
z_sign = 1
z_exp = 2
z_sig = 00000000 1 00110000000000000000000
We can check we did this correctly with some arithmetic: . Success!
Step 6. The last thing we need to do is encode the result. So we need to add 127 to the exponent and mask off the hidden bit in the significand giving
1 10000001 00110000000000000000000
.
The previous example shows off all of the steps except for step 5, normalize the result. So let’s do another quick example, but this time using 2.5 and 7.25.
Steps 1 and 2 are identical. After them, we have
x_exp = 2
y_exp = 2
x_sig = 00000000010100000000000000000000
y_sig = 00000000111010000000000000000000
In step 3, we simply add the significands together since neither x nor y is negative. This gives.
z_sign = 0
z_exp = 2
z_sig = 00000001001110000000000000000000
Step 4 is the same: z_sig
is not 0, so we continue to step 5.
Step 5. Here, z_sig
starts with seven 0s so we need to shift z_sig
right by 1 and increment the exponent by one giving
z_sign = 0
z_exp = 3
z_sig = 00000000 1 00111000000000000000000
Checking our arithmetic, we have . Success!
Step 6. We encode by adding 127 to the exponent and masking off the hidden bit
as before giving 0 10000010 00111000000000000000000
.
Multiplication
My implementation of xmult
looks something like this.
- If either
x
ory
is 0, return 0; - Compute the sign of the result;
- Add the exponents;
- Multiply the significands and shift;
- Normalize the result; and
- Encode the exponent and significand.
As a simple example, let’s multiply and . These have encodings
x: 1 01111111 00000000000000000000000
y: 0 10000100 00001010000000000000000
Step 1. Neither x
nor y
are 0, so continue to step 2.
Step 2. Since x
is negative and y
is positive, we know the result is negative.
z_sign = 1
Step 3. Add the exponents. Since the exponent of x
is 0 and the exponent of
y
is 5, the exponent of the result is 5.
z_exp = 5
Step 4. Multiply the significands as 64-bit long
s, don’t forget the hidden bits!
x_sig = 00000000 00000000 00000000 00000000 00000000 10000000 00000000 00000000
y_sig = 00000000 00000000 00000000 00000000 00000000 10000101 00000000 00000000
z_sig = 00000000 00000000 01000010 10000000 00000000 00000000 00000000 00000000
The significands of x
and y
aren’t really and
, they’re actually
and (note the binary point). That is, the real
significands are the 24-bit values (23 bits from the float plus the hidden
bit) multiplied by . And when we multiply them together, our result
is really
To bring the result back to our normal representation as a number times
, we need to shift z_sig
right by 23 bits.
z_sig = 00000001 00001010 0000000 00000000
Steps 5 and 6, normalizing and encoding the result work the same as for addition.
In this case, there’s nothing to be done, z_sig
is already normalized (as a
32-bit integer, it has 8 leading zeros followed by the 1 corresponding to the
hidden bit). Thus, our result is
and its encoding is
1 10000100 00001010000000000000000
.
Final: Cache simulation
Due: Friday, May 16 at 11:00
XXX: This is the old version of the lab which will be updated this semester.
For this final project, you will write a configurable cache simulator using the infrastructure provided. Your cache simulator will read an address trace (a chronological list of memory addresses referenced), simulate the cache, generate cache hit and miss data, and calculate the execution time for the executing program. The address traces have been generated by a simulator executing real programs. Your cache simulator will be graded for accuracy, but is not the end product of this project, rather it is a tool you will use to complete the project. In this project, you will experiment with various cache configurations and make conclusions about the optimal cache organization for this set of programs.
Preliminaries
You can work on this project with a partner if you choose. If you decide to work with a partner, you and your partner should check out a single repository. The first partner will create a team name, and the second partner should choose that team name. Please be careful choosing a team, as this cannot be undone. Please name your team something that makes it clear who you are.
If you choose to work with a partner, you and your partner must complete the entire project together. Dividing the project up into pieces and having each partner complete a part of it on their own will be considered a violation of the honor code. Both you and your partner are expected to fully understand all of the code you submit.
Click on the assignment link.
Once you have accepted the assignment, you can clone the repository on your computer by following the instruction and begin working.
Be sure to ask any questions on Ed.
Address trace
An address trace is simply a list of addresses produced by a program running
on a processor. These are the addresses resulting from load and store
instructions in the code as it is executed. Some address traces would include
both instruction fetch addresses and data (load and store) addresses, but you
will be simulating only a data cache, so the provided traces in the traces
directory only have data addresses.
These traces were generated by a simulator of a RISC processor running three
programs, art
, mcf
, and swim
from the SPEC benchmarks. The files are
art.trace.gz
, mcf.trace.gz
, and swim.trace.gz
. The number of
loads/stores in the traces vary by benchmark. They are all compressed with
gzip. You do not need to decompress the traces because Cache.java
knows how
to read both compressed and uncompressed files. (In addition, there is a
test.trace
file containing the contents of the example trace shown below.)
Use the following command to run your simulator on the given trace file.
$ java Cache [cache args] tracefile
Because your workload is three programs, you will run three simulations for
each architecture you simulate, and then combine the results in some
meaningful way. The simulator arguments should be taken in as command line
arguments. For example, to simulate a 32 kB, 4-way set-associative cache with
32-byte blocks and a 30-cycle miss penalty on the traces/mcf.trace.gz
trace
file, you’d run the following.
$ java Cache -s 32 -a 4 -b 32 -mp 30 traces/mcf.trace.gz
Your code should support any reasonable values for cache size, associativity, etc.
Format of the address trace
All lines of the address trace are of the format
# LS ADDRESS IC
where LS
is a 0 for a load and 1 for a store, ADDRESS
is an 8-character
hexadecimal number, and IC
is the number of instructions executed between the
previous memory access and this one (including the load or store instruction
itself). There is a single space between each field. The instruction count
information will be used to calculate execution time (or at least cycle
count). A sample address trace starts out like this:
# 0 7fffed80 1
# 0 10010000 10
# 0 10010060 3
# 1 10010030 4
# 0 10010004 6
# 0 10010064 3
# 1 10010034 4
You should assume no accesses address multiple cache blocks (e.g., assume all accesses are for 32 bits or less and appropriately aligned).
Simulator (40 points)
Your simulator will model an n-way set associative, write-back, write-allocate cache. The cache replacement policy is always least-recently used for associative caches.
When run on a trace file, it should produce
- total execution time in cycles;
- number of instructions;
- number of memory access instructions (i.e., number of loads and stores);
- overall cache miss rate;
- cache miss rate for load instructions;
- cycles per instruction (CPI);
- average memory access time in cycles (cycles per memory access, assuming 0 cycles for a cache hit and a miss penalty for a cache miss; see below);
- the number of dirty cache blocks evicted;
- the number of load misses;
- the number of store misses;
- the number of load hits; and
- the number of store hits.
See the examples below.
For execution time, assume the following.
- Instructions other than loads and stores take one cycle;
- A load or store takes one cycle plus a miss penalty if applicable;
- A load or store that misses in the cache has a configurable miss penalty with a default of 30 cycles;
- A load or store that misses in the cache and causes the cache to evict a dirty cache block has an additional 2 cycle penalty.
(We’re assuming that on a cache miss that causes a dirty block to be evicted, the write back to memory happens mostly in the background and the additional 2-cycle penalty is to write the dirty block to a write buffer.)
To recap: With the default base miss penalty of 30 cycles, a load or store instruction takes 1 cycle for a cache hit, 30 cycles for a cache miss that does not evict a dirty block, and 32 cycles for a cache miss that evicts a dirty block.
In the trace shown above, the first 31 instructions take 151 cycles, assuming four cache misses and 3 cache hits for the 5 loads and 2 stores, and a 30-cycle base miss penalty.
Each trace contains the memory accesses of just over 5 million instructions. Your simulations should process all of them. Your output must follow the format provided in the skeleton code exactly. See the examples below.
Questions (40 points)
For the second part of this project, you will use your cache simulator to test
out different cache configurations and answer questions about your findings.
You must try every configuration below on the art
, swim
, and mcf
cache
traces provided to you, and discuss all of them in your answers to the
questions. Note that your question answers are worth just as much as the code
for this project—you are expected to give detailed answers that clearly
backup your claims with your simulator results.
The default cache configuration will be 16-byte block size, direct-mapped, 16 kB cache size, write-back, and write-allocate. You will re-evaluate some of these parameters one at a time, in the given order. In each case, choose a best value for each parameter, then use that for all subsequent analyses.
Look at 16 kB, 32 kB, and 128 kB cache sizes. Larger caches take longer to access, so assume that a processor with a 32 kB cache requires a 5% longer cycle time, and the 128 kB 15% longer. Choose the best size/cycle time combination and proceed to the next step.
Look at cache associativity of direct-mapped, 2-way set-associative, and 8-way set-associative. Assume that 2-way associative adds 5% to the cycle time, and 8-way adds 10% to the cycle time. Choose the best associativity and cycle time, and proceed to the next step.
Look at cache block sizes of 16, 32, and 64 bytes. Assume that it takes two extra cycles to load 32 bytes into the cache, and 6 extra cycles to load 64 bytes. (i.e., raise the miss penalty accordingly). Choose the best size and miss penalty and proceed to answering the following questions.
- (10 points) What is the optimal cache size, associativity, and block size, given the parameters above?
- (10 points) Is the cache miss rate a good indicator of performance? In which cases did the option with the lowest miss rate not have the lowest execution time? Why?
- (10 points) Were results uniform across the three programs? In which cases did different programs give different conclusions? Speculate as to why that may have been true.
- (10 points) What was the speedup of your final design over the default for each trace? Use the definition of speedup covered in class.
Put the answers to these questions in the README
file.
Hints
Think about how to intelligently debug and test your program. Running immediately on the entire input gives you little insight on whether it is working (unless it is way off). To do this create separate memory tests (you can see the text format above) to ensure cache size, cache associativity, block size, and miss penalty are functioning correctly. You do not need to turn them in, but they will help tremendously.
Speed matters. These simulations should take less than a couple minutes (actually, much less) on an unloaded machine. If it is taking much more than that, do yourself a favor and think about what you are doing inefficiently.
Simulations are not the same as hardware. If your tag only takes 16 bits, feel free to use an integer for that value. Other time-saving optimizations along these lines might be useful.
Submission
Submit the project by committing your code and answers to the questions and pushing it to your GitHub repository.
Examples
Here are three example simulation runs and the command line to produce them. Your output should be identical.
$ java Cache -s 32 -a 4 -b 32 -mp 30 traces/test.trace
Cache parameters:
Cache Size (kB) 32
Cache Associativity 4
Cache Block Size (bytes) 32
Miss penalty (cycles) 30
Simulation results:
execution time 151 cycles
instructions 31
memory accesses 7
overall miss rate 0.57
load miss rate 0.60
CPI 4.87
average memory access time 17.14 cycles
dirty evictions 0
load_misses 3
store_misses 1
load_hits 2
store_hits 1
$ java Cache -s 32 -a 4 -b 32 -mp 30 traces/art.trace.gz
Cache parameters:
Cache Size (kB) 32
Cache Associativity 4
Cache Block Size (bytes) 32
Miss penalty (cycles) 30
Simulation results:
execution time 20127356 cycles
instructions 5136716
memory accesses 1957764
overall miss rate 0.25
load miss rate 0.27
CPI 3.92
average memory access time 7.66 cycles
dirty evictions 60015
load_misses 475672
store_misses 20015
load_hits 1256211
store_hits 205866
$ java Cache -a 8 -s 64 -b 32 -mp 42 traces/mcf.trace.gz
Cache parameters:
Cache Size (kB) 64
Cache Associativity 8
Cache Block Size (bytes) 32
Miss penalty (cycles) 42
Simulation results:
execution time 143963250 cycles
instructions 19999998
memory accesses 6943857
overall miss rate 0.42
load miss rate 0.36
CPI 7.20
average memory access time 17.85 cycles
dirty evictions 995694
load_misses 2036666
store_misses 867426
load_hits 3552806
store_hits 486959