Academic Integrity: tutoring, explanations, and feedback — we don’t complete graded work or submit on a student’s behalf.

(COMMAND LINE) I have a file, abc.csv and I am interested in a command line solu

ID: 3594228 • Letter: #

Question

(COMMAND LINE) I have a file, abc.csv and I am interested in a command line solution (not something like C coding) for the following:

In which zip code was the highest number of narcotic/drug offenses.

ZIP_CODE and ABC_CODE are the relevant headings but there are many headings and many records within the file. Within ABC_CODE, 35A and 35B are narcotic and drug offenses. So, I want to find out which zip code has the highest number of narcotic/drug offenses, which are denoted by only two of many possible codes within ABC_CODE.

What is a command line solution for this in bash ubuntu?

Explanation / Answer

I am using the below t.csv file as input for testing you can use yours

zipcode narcotics   
1 C
1 B
2 A
2 C
2 D

My command goes like this:

cat t.csv | awk '{a[$1]++}END{for (n in a) print n, a[n]}' | sort -k2,2r -k1 | head -1 | awk '{print $1}'

1)we output out csv (t.csv) using cat command and then we pipe it to

2)we use awk for the previous output to take the 1st i.e Zipcode coloum and finding out the Zipcode and their frequency count and pipe it to


3) sort the previous output by second coloumn(frequency) as we want highest frequency zipcode on top. r option is for reverse and k is to specify key


4) then we use the previous output and pipe it to head -1 which prints the 1st row which is our desired output

5)from here we are cutting out our filed name i.e the desired zipcode value