3. [Bonus Problem] DNA Subsequence A DNA sequence is a sequence of some combinat
ID: 3854302 • Letter: 3
Question
3. [Bonus Problem] DNA Subsequence
A DNA sequence is a sequence of some combination of the characters A (adenine), C (cytosine), G (guanine), and T (thymine) which correspond to the four nucleobases
that make up DNA.
Given a long DNA sequence, it is often necessary to compute the number of instances of a certain subsequence.
For this exercise, you will develop a program that processes a DNA sequence from a file and, given a subsequences, searches the DNA sequence and counts the number of times s appears.
As an example, consider the following sequence: GGAAGTAGCAGGCCGCATGCTTGGAGGTAAAGTTCATGGTTCCCTGGCCC If we were to search for the subsequence GTA, it appears twice.
You will write a program (place your source in a file named dnaSearch.c) that takes, as command line inputs, an input file name and a valid DNA (sub)sequence. That is, it should be callable from the command line as follows:
./dnaSearch dna01.txt GTA
Sample out put: GTA appears 2 times
Explanation / Answer
#include<stdio.h>
#include<string.h>
//Recurrence function to count number of subsequences
int countSubSequence(char DNA[],char target[],int txtLen,int targetLen)
{
//base condition
if((txtLen==0&&targetLen==0)||targetLen==0)
return 1;
//if DNA[] is empty, then return 0
if(txtLen==0)
return 0;
//if last characters are same,
//call recurrence function with targetLen-1 and txtLen-1
if(DNA[txtLen-1]==target[targetLen-1])
return countSubSequence(DNA,target,txtLen-1,targetLen-1)+countSubSequence(DNA,target,txtLen-1,targetLen);
//if last characters not same then call with txtLen-1
else
return countSubSequence(DNA,target,txtLen-1,targetLen);
}
//main function with command line arguments
int main(int argc,char *argv[])
{
int i=0,txtLen,targetLen,count;
//target[] is to store the required subsequence
char target[10],DNA[10],ch;;
//getting requested subsequence into target array
if(argc>1)
{
strncpy(target,argv[2],10);
target[10]='';
}
targetLen=strlen(target);//length of the target[] array
FILE *f=fopen("argv[2]","r");//opening and reading txt file from the commandline argument argc[1]
//if file is not exist
if(f==NULL)
printf("can not open file ");
//store all the characters from the text file to DNA[]
while(fscanf(f,"%c,",&ch)>0)
{
DNA[i++]=ch;
}
//length of the text file length
//that is number of characters in the text file
txtLen=i;
fclose(f); //closing file that we have opened
//calling recurrece funtion countSubsequence
//this function will return the count of the number of subsequences
count=countSubSequence(DNA,target,txtLen,targetLen);
//printing count of the subsequences
printf("The number of subsequences are %d. ",count);
}
Related Questions
drjack9650@gmail.com
Navigate
Integrity-first tutoring: explanations and feedback only — we do not complete graded work. Learn more.