A DNA string (also called a DNA strand) is a sequence of the letters a,c,g,and t
ID: 3685355 • Letter: A
Question
A DNA string (also called a DNA strand) is a sequence of the letters a,c,g,and t in any order. For example, aacgtttgtaaccagaactgt is a DNA string of length 21. Each sequence of three consecutive letters is a codon. For example, in the preceding string, the codons are aac, gtt, tgt, aac, cag, aac, and tgt. If we ignored the first leeter and started listing them with the second a, the codons would be acg, ttt, gta, acc, aac, aga, and act. For simplicity, we will assume that we always start reading the codons at the first letter of the string. A DNA string can be hundreds of thousands of codons long, even millions of codons long, which means that it is infeasible to count them by hand. It would be usefil to have a simple script that could count the number of occurences of a specific codon in such a string. For instance, for the example string above such a script would tell us that aac occurs three times and tgt occurs twice.
Your job is to write a script named countcodons that expects two arguments on the command line. The first argument is a three letter codon string such as aaa or cgt. The second is the path name of a file containing a valid DNA string with no newline characters or white space charactersof any kind within it. It is just a sequqnce of the letters a,c,g and t and nothing else. If it is given two valid arguments, the script will output a single number, which is the number of occurences of the given codon in the given file. As with the first exercise, it should output absoltuley nothing but that number. If it finds no occurences, it should output 0. For example, if the above string is in a file named dnafile, then it should work like this:
$ countcodons ttt dnafile
1
$ countcodons aac dnafile
3
$ countcodons ccc dnafile
0
Warning: if given valid arguments the script is not to output anything but a number. It will lose points if it does. The script should check that it has two arguments and exit with a usage message if it does not. It is not required to check that the file is in the proper form, or that the string is actually a codon. However, it should print an error message and exit if the second arguemnt cannot be opened or is not a file containing only the four letters, a, c, g, and t. YOu will not be able to solve this problem using the grep command alone. There are a number of commands that might be useful , such as sort, cut, fold and uniq. One of these makes it very easy. Find the one.
Explanation / Answer
my $DELIMITER = ' ';
if [ $# -ne 2 ]; then
echo "We need two Arguments!"
exit 3
else
open( SOURCE, "$2" )
or die "Can not open $2 for reading: $!";
while read line
do
if [ ${line[0]} ?? '$1' ]
number_of_occurrences=$(grep -o "$1" <<< "line" | wc -l)
echo-e number_of_occurrences
done <$2
fi
fi
fi
Related Questions
drjack9650@gmail.com
Navigate
Integrity-first tutoring: explanations and feedback only — we do not complete graded work. Learn more.