plesea i need help with this Assignment: Follow the steps that we have outlined
ID: 3767525 • Letter: P
Question
plesea i need help with this Assignment:
Follow the steps that we have outlined in class for algorithm development to generate a program that reads in DNA sequences from a file and determines the content of A, T, C, and G in the sequence. Specifically, I am interested in the GC content (the percentage of the sequence that it G or C). The first line of the file will be in integer that tells you how many sequences there are in the file. Each line following will contain a single sequence. You will need to store the percent of A, T, C, G in a 2D array, this is because you need to know the average GC content of the genome to determine whether a bacterial gene is, or is not, pathogenic. If a bacterial gene has a higher GC content than the genome as a whole, then it is likely that that gene is pathogenic.
The Wikipedia page on GC content gives additional explanation: https://en.wikipedia.org/wiki/GC-content
Specifications:
Inputs:
- File called sequences.txt (contains a plasmid of Yersina pestis)
http://www.filedropper.com/sequences
Outputs:
- File called content.txt containing A, T, C, G, and GC content of each sequence along with a pathogenicity
http://www.filedropper.com/content_2
prediction:
EX:
%A %T %C %G %GC pathogenic?
10 20 40 30 70 Y
20 50 10 20 30 N
Functions:
1. void printToFile(int seq, float content[seq][4], float avgGC)
a. prints the results out to a file
b. You should open and close your file in this function
2. float averageGC(int seq, float content[seq][4])
a. calculates the average GC content for the whole genome
3. char isPathogenic(float avgGC, float seqGC)
a. returns Y if pathogenic, N if not
Explanation / Answer
#include <stdio.h>
#include <stdlib.h>
#include <stdbool.h>
#include <string.h>
void input_sequence(int length,char input[]);
void search(char C[],char DNA[],int length);
int main(void) {
//Given array
char DNA[] = {'A', 'G', 'C', 'G', 'G', 'G', 'A', 'C', 'C', 'G', 'T', 'C',
'C', 'C', 'G', 'A', 'C', 'A', 'T', 'T', 'G', 'A', 'T', 'G',
'A', 'A', 'G', 'G', 'G', 'T', 'C', 'A', 'T', 'A', 'G', 'A',
'C', 'C', 'C', 'A', 'A', 'T', 'A', 'C', 'G', 'C', 'C', 'A',
'C', 'C', 'A', 'C', 'C', 'C', 'C', 'A', 'A', 'G', 'T', 'T',
'T', 'T', 'C', 'C', 'T', 'G', 'T', 'G', 'T', 'C', 'T', 'T',
'C', 'C', 'A', 'T', 'T', 'G', 'A', 'G', 'T', 'A', 'G', 'A',
'T', 'T', 'G', 'A', 'C', 'A', 'C', 'T', 'C', 'C', 'C', 'A',
'G', 'A', 'T', 'G', ''};
int length,i=0,k;
/*Program should repeatedly ask the user for two things: the length of a search sequence,
and the search sequence itself*/
/*The program should terminate when the length of the input sequence is zero or less*/
do{
printf("Enter length of DNA sequence to match: ");
scanf("%d",&length);
Search sequence array
char input[length];
//input sequence length has to be >0
if(length>0){
input_sequence(length,input[]);
/*The elements of the search sequence may take on one of five characters: A,G,T,C and *. The
meaning of the ‘*’ character is that it matches all four nucleotides: A,G,T and C.*/
for(i=0; i<length; i++){
k=0;
if(input[i]!='A'&&input[i]!='G'&&input[i]!='T'&&input[i]!='C'&&input[i]!='*'){
printf("Erroneous character input ’%c’ exiting ",input[i]);
k=1;
}
if(k==1)
break;
}
if(k==0){
search(input,DNA,length);
}
k=0;
}
}
while(length>0);
printf("Goodbye");
return (EXIT_SUCCESS);
}
//Function to search for input sequence in the given array
void search(char C[],char DNA[],int length){
int numFound = 0,i,foundIndex;
bool found = false;
for(i=0;i<length && !found;i++) {
int n=0;
char temp=C[i];
if (temp==DNA[i]) {
numFound++;
if (numFound == length) {
found = true;
foundIndex = i - (length-1);
}
}
else numFound = 0;
}
if (found)
printf("Match of search sequence found at element %d ",foundIndex);
}
void input_sequence(int length,char input[]){
int i;
printf("Enter %d characters (one of AGTC*) as a search sequence: ",length);
for(i=0; i<length; i++){
scanf(" %c", &input[i]);
}
}
Related Questions
Navigate
Integrity-first tutoring: explanations and feedback only — we do not complete graded work. Learn more.