Part 4. Top file types (20 points)

In this part, you’re going to write a complex pipeline to answer a simple question: What are the top 8 file types in the Linux 6.4 source code. We’re going to use a file’s extension to determine what type of file it is. We’re going to ignore any files that don’t have extensions.

Your task

Write a script topfiletypes to answer this question. Your script should consist of a single complex pipeline. Put the output of your script in your README.

The structure of the pipeline will look like this

while IFS= read -r file; do ...; done < <(find linux-6.4 -name '*.*') | ... | ... | ... | ...

Notice how the output from the while loop is passed as standard input to the next command in the pipeline.

Your output should look like this.

$ ./topfiletypes
  32463 c
  23745 h
   3488 yaml
...

Hints

  1. The while loop given here is slightly different from the one you used before (no -d '' or -print0 options). Most filenames don’t actually have newlines in them so we can use this technique to read one line at a time from the output of the find command rather than reading up to a 0 byte.
  2. Bash supports a lot of convenient variable expansions. You want to read the section of that page that describes ${parameter##word}. Here are some examples:
    $ x=foo.bar.baz
    $ echo "${x##*.}"
    baz
    
    $ y=linux-6.4/mm/mempool.c
    $ echo "${y##*.}"
    c
    
    You can use this inside your while loop to print the extension of the file.
  3. The sort command can be used to sort lines of input. For example, given the file example
    foo
    bar
    foo
    foo
    cat
    cat
    foo
    
    if we run $ sort example, we get
    bar
    cat
    cat
    foo
    foo
    foo
    foo
    
  4. The uniq command can be used to compress identical, consecutive lines of input into a single line of output. sort and uniq are frequently used together as sort | uniq to read lines of input, sort them, and then only output the unique lines. Running $ uniq example produces
    foo
    bar
    foo
    cat
    foo
    
    Running $ sort example | uniq gives
    bar
    cat
    foo
    
  5. The uniq command can take an argument to print out the count of each line. Here’s $ sort example | uniq -c.
          1 bar
          2 cat
          4 foo
    
  6. sort can sort in reverse as well as performing a numeric sort (i.e., sort numbers in the usual way so that 9 comes before 10).
  7. The head command can be used to print the first several lines of a file. Look up the options to print the first 8 rather than the default 10.
  8. In the loop, print out the extension of the path in the file variable. Use a combination of sort, uniq, and head in the pipeline to produce the final result. You’ll want to use sort multiple times.
  9. You can write this all as a single line. Don’t do that. Use \ at the end of each line to continue on the next line.
    while IFS= read -r file; do
      echo ...
    done < <(find linux-6.4 -name '*.*') \
      | ... \
      | ... \
      | ... \
      | ...