You are given the following five 14-long reads below. Map them to the sequence o
ID: 3601695 • Letter: Y
Question
You are given the following five 14-long reads below. Map them to the sequence of the gene responsible for the ABO blood type (https://www.ncbi.nlm.nih.gov/nuccore/LC068776.1) , keeping in mind that each read might include a single nucleotide error. Report their respective starting positions along the gene (answers should be integers between 1 and 177).
1) ccggcctcgggaag
2) ttgcggacgctagc
3) tcgggctccccccg
4) ggggggaaggcgga
5) tctgtccccccccg
Explanation / Answer
val s1 = "ccggcctcgggaag" val s2 = "ttgcggacgctagc" val s3 = "tcgggctccccccg" val s4 = "ggggggaaggcgga" val s5 = "tctgtccccccccg" val g = "ggccgcctcccgcgcccctctgtcccctcccgtgttcggcctcgggaagtcggggcggcgggcggcgcgggccgggaggggtcgcctcgggctcaccccgccccagggccgccgggcggaaggcggaggccgagaccagacgcggagccatggccgaggtgttgcggacgctggccg" // compares 2 strings and finds how many chars are different: def similarity(source: String, dest: String): Int = source.zip(dest).foldLeft(0){ case (sum, (x,y)) => if(x == y) sum else sum + 1 } // returns a closest match (Int, Int) with position (0 based) and how many chars were different: def pos(read: String, gene: String): (Int, Int) = { gene.sliding(read.size, 1).zipWithIndex.map{ case (snip, pos) => pos -> similarity(snip, read) }.minBy(_._2) } // find position: scala> pos(s1, g) res25: (Int, Int) = (35,1) // verify: scala> g.substring(35, 35 + s1.size) res32: String = tcggcctcgggaag scala> s1 res33: String = ccggcctcgggaag
Related Questions
drjack9650@gmail.com
Navigate
Integrity-first tutoring: explanations and feedback only — we do not complete graded work. Learn more.