Academic Integrity: tutoring, explanations, and feedback — we don’t complete graded work or submit on a student’s behalf.

Learning to use awk, grep and sed There is a protein database le (4HKD.pdb) in D

ID: 3881618 • Letter: L

Question

Learning to use awk, grep and sed There is a protein database le (4HKD.pdb) in Dr. Harrison’s main directory. These les consist of many individual records (lines) each of which starts with a keyword that identies it. The les are somewhat complicated. Your task is to use Unix tools to simplify looking at these les. It is a good idea to write script les out as demonstrated in class, rather than trying to compose them on the command line. - you can turn the scripts in as part of the answer. You should copy the le to your own area and write programs to solve the following problems.
1. Records other than ”ATOM”,”CONNECT”, ”HETATM”, ”TER” and ”END” are considered header records which describe the metadata about the molecule. Use grep to generate the header. Please give the grep command(s) and the header you found.
2. The records that have ”HETATM” and ”MSE” should be ”ATOM ” (the two spaces after ATOM are important) and ”MET” respectively. (This reects an experimental technique used to solve the structure - but results in a syntactical inconsistency that can cause problems). Please use sed and/or awk to x this. Please give the commands you used and show the corrected lines.
3. Use awk to nd the maximum and minimum x,y,z values for the ATOMs
ATOM 93 OG SER A 12 20.901 10.643 45.146 1.00 34.66 O ATOM 94 N MET A 13 22.086 11.751 41.731 1.00 22.99 N
The 7th through 9th elds are the x,y,z positions.
4. nd the mean values for x,y,z for the HETATM records (same elds as ATOM records )
5. The standard name for a water molecule is HOH. Unfortunately it needs to be called WAT to be used by some (slightly braindead) computational chemistry program. Make the changes automatically with sed. What command did you use?
6. produce a list of atoms sorted by their b-factor (11th position in an ATOM record). How did you do it?

Explanation / Answer

Ans 1- This command will give you header records alone-

grep -v '^ATOM|^CONNECT|^HETATM|^TER|^END 4 HKD.pdb

Ans 2- Sed command for ATOM is

sed 's/HETATM/ATOM /g' 4HKD.pdb

sed command for MET is-

sed 's/MSE/MET/g' 4HKD.pdb

Ans 3- As file format is not given, so I am not sure what is the source file delimiter to separate the records. I assume that your file is comma separated.

Awk code to get max and min value for columns is-

For max value-

grep 'ATOM' | sort -nrk7 4HKD.pdb | head -1 | awk -F',' '{print $7}'

For min value-

grep 'ATOM' | sort -nk7 4HKD.pdb | head -1 | awk -F',' '{print $7}'

Ans 4-

grep 'ATOM' | awk '{total+= $7} END { print total/NR}' 4HKD.pdb

Ans 5

sed 's/ HOH/WAT/g' 4HKD.pdb

Ans 6-

grep 'ATOM' | sort -nk11 4HKD.pdb

Hire Me For All Your Tutoring Needs
Integrity-first tutoring: clear explanations, guidance, and feedback.
Drop an Email at
drjack9650@gmail.com
Chat Now And Get Quote