Part 4. Top file types (20 points)
In this part, you’re going to write a complex pipeline to answer a simple question: What are the top 8 file types in the Linux 6.4 source code? We’re going to use a file’s extension to determine what type of file it is. We’re going to ignore any files that don’t have extensions.
Your task
Write a script topfiletypes
to answer this question. Your script should
consist of a single complex pipeline. Put the output of your script in your
README
.
The structure of the pipeline will look like this
while IFS= read -r file; do ...; done < <(find linux-6.4 -name '*.*') | ... | ... | ... | ...
Notice how the output from the while
loop is passed as standard input to the
next command in the pipeline.
Your output should look like this.
$ ./topfiletypes
32463 c
23745 h
3488 yaml
...
-
The
while
loop given here is slightly different from the one you used before (no-d ''
or-print0
options). Most filenames don’t actually have newlines in them so we can use this technique to read one line at a time from the output of thefind
command rather than reading up to a 0 byte. -
As discussed in lecture, Bash supports a lot of convenient variable expansions. You can use this inside your
while
loop to print the extension of the file. -
Once you print out the extension, you will want to sort the output alphabetically. You can run
apropos -a <search term>
to find a commands that sort lines. -
Then, on the output from the command in 3., you can use a command to omit repeated lines. You can run
apropos -a <search term>
again to find this command. -
You should also find an option for that command (look at its man page) to print out the number of occurences of each line.
-
You can use the same command you found in 3. a second time to sort your output from 5. However, this time, you should sort numerically (instead of alphabetically), and in reverse, so the biggest number appears first. Look at the man pages to find the appropriate options.
-
You should then find a command to output the first part of the input, and apply that to your output from 6. You can use this to output the 8 file extensions that appear the most. Find an option that allows you to specify the number of lines the command outputs.
-
In a loop, you will want to print out the file extension and combine all of the commands described above in a pipeline to produce the final result.
-
You can write this all as a single line. Don’t do that. Use
\
at the end of each line to continue on the next line.while IFS= read -r file; do echo ... done < <(find linux-6.4 -name '*.*') \ | ... \ | ... \ | ... \ | ...