Can anyone solve this in C++? Example: Part 3- Find best matching genome for a g
ID: 3873054 • Letter: C
Question
Can anyone solve this in C++?
Example:
Part 3- Find best matching genome for a given sequence We have a random DNA sequence, and we want to find the closest species to it. Is the DNA sequence more similar to human, mouse, or unknown? When could this kind of comparison be useful? Suppose that the emergency room of some hospital sees a sudden and drastic increase in patients presenting with a particular set of symptoms. Doctors determine the cause to be bacterial, but without knowing the specific species involved they are unable to treat patients effectively. One way of identifying the cause is to obtain a DNA sample and compare it against known bacterial genomes. With a set of similarity scores, doctors can then make more informed decisions regarding treatment, prevention, and tracking of the disease The goal of this part of the assignment is to write functions that can be useful to determine the identity of different species of bacteria, animals, etc... . By simply using the similarity score routine you implemented you can compare an unknown sequence to different genomes and figure out the identity of the unknown sample float findBestMatch(string genome, string seq) The findBestMatch function should take two string arguments and retum a floating point value of the highest similarity score found for the given sequence at any position within the genome. In other words, this function should traverse the entire genome and find the highest similarity score by using similarityScore0 for the comparisons between seq and each sequential substring of genome hint this function is very similar in structure to the countMatches function> int findBestGenome(string genome1, string genome2, string genome3, string seq) . The findBestGenome function should take four string arguments(unknown sequence, mouse_genome, human_genome and unknown_genome). . Return an integer indicating which genome string, out of the 3 given, had the highest similarity score with the given sequence For each genome, the function will find the highest similarity score of the sequence (at any position) within that genome (call function findBestMatch described above) The return value from this function will indicate which genome had the best match, 1, 2, or 3. In the case that two or more of the sequences have the same best similarity score, return 0.Explanation / Answer
// Since you have not specified the implementation of findBestMatch, I am using LCS (Longest common subsequence) algorithm to find the match between two strings.
#include<iostream>
#include <cstring>
#include <algorithm>
using namespace std;
double lcs( char *X, char *Y, int m, int n ) {
if (m == 0 || n == 0) {
return 0.0;
}
if ((X[m] == Y[n])&&(m>0&&n>0)) {
return 1 + lcs(X, Y, m-1, n-1);
}
else
return max(lcs(X, Y, m, n-1), lcs(X, Y, m-1, n));
}
double findBestMath(char *X, char *Y, int m, int n ) {
return lcs(X, Y, m, n) / (double)n;
}
int findBestGenome(char *X, char *Y, char *Z, char *A) {
double AX = findBestMath(X, A, strlen(X), strlen(A));
double AY = findBestMath(Y, A, strlen(Y), strlen(A));
double AZ = findBestMath(Z, A, strlen(Z), strlen(A));
double max_val = 0;
int result = -1;
if (AX > AY && AX > AZ) {
result = 1;
}
else if (AY > AZ && AY > AX) {
max_val = AY;
result = 2;
}
else if (AZ > AX & AZ > AY) {
max_val = AZ;
result = 3;
}
if ((AY == AZ && max_val == AZ) || (AX == AY && max_val == AX) || (AZ == AY && max_val == AZ)) {
return 0;
}
return result;
}
int main() {
char humanDNA[] = "CGCAAATTTGCCGGATTTCCTTTGCTGTTCCTGCATGTAGTTTAAACGAGATTGCCAGCACCGGGTATCATTCACCATTTTTCTTTTCGTTAACTTGCCGTCAGCCTTTTCTTTGACCTCTTCTTTCTGTTCATGTGTATTTGCTGTCTCTTAGCCCAGACTTCCCGTGTCCTTTCCACCGGGCCTTTGAGAGGTCACAGGGTCTTGATGCTGTGGTCTTCATCTGCAGGTGTCTGACTTCCAGCAACTGCTGGCCTGTGCCAGGGTGCAGCTGAGCACTGGAGTGGAGTTTTCCTGTGGAGAGGAGCCATGCCTAGAGTGGGATGGGCCATTGTTCATG";
char mouseDNA[] = "CGCAATTTTTACTTAATTCTTTTTCTTTTAATTCATATATTTTTAATATGTTTACTATTAATGGTTATCATTCACCATTTAACTATTTGTTATTTTGACGTCATTTTTTTCTATTTCCTCTTTTTTCAATTCATGTTTATTTTCTGTATTTTTGTTAAGTTTTCACAAGTCTAATATAATTGTCCTTTGAGAGGTTATTTGGTCTATATTTTTTTTTCTTCATCTGTATTTTTATGATTTCATTTAATTGATTTTCATTGACAGGGTTCTGCTGTGTTCTGGATTGTATTTTTCTTGTGGAGAGGAACTATTTCTTGAGTGGGATGTACCTTTGTTCTTG";
char unknownDNA[] = "CGCATTTTTGCCGGTTTTCCTTTGCTGTTTATTCATTTATTTTAAACGATATTTATATCATCGGGTTTCATTCACTATTTTTCTTTTCGATAAATTTTTGTCAGCATTTTCTTTTACCTCTTCTTTCTGTTTATGTTAATTTTCTGTTTCTTAACCCAGTCTTCTCGATTCTTATCTACCGGACCTATTATAGGTCACAGGGTCTTGATGCTTTGGTTTTCATCTGCAAGAGTCTGACTTCCTGCTAATGCTGTTCTGTGTCAGGGTGCATCTGAGCACTGATGTGGAGTTTTCTTGTGGATATGAGCCATTCATAGTGTGGGATGTGCCATAGTTCATG";
char sampleDN[] = "CGCAAATTTGCCGGATTTCCTTTGCTGTTCCTGCATGTAGTTTAAACGAGATTGCCAGCACCGGGTATCATTCACCATTTTTCTTTTCGTTAACTTGCCGTCAGCCTTTTCTTTGACCTCTTCTTTCTGTTCATGTGTATTTGCTGTCTCTTAGCCCAGACTTCCCGTGTCCTTTCCACCGGGCCTTTGAGAGGTCACAGGGTCTTGATGCTGTGGTCTTCATCTGCAGGTGTCTGACTTCCAGCAACTGCTGGCCTGTGCCAGGGTGCAGCTGAGCACTGGAGTGGAGTTTTCCTGTGGAGAGGAGCCATGCCTAGAGTGGGATGGGCCATTGTTCATG";
cout<<endl<<"Maximum match is = "<<findBestGenome( humanDNA, mouseDNA, unknownDNA, sampleDN)<<endl;
return 0;
}
Related Questions
drjack9650@gmail.com
Navigate
Integrity-first tutoring: explanations and feedback only — we do not complete graded work. Learn more.