USE RUBY PROGRAMMING The most frequent k-mer problem seeks the set of substrings
ID: 3589166 • Letter: U
Question
USE RUBY PROGRAMMING
The most frequent k-mer problem seeks the set of substrings of length k (where integer k is an input) that occur most frequently. We add to the DNA class a most_frequent_kmers method that gets called with integer k and returns an array whose first element is the set of k-mers that occur most frequently in this DNA and whose second element is the number of times each one appears.
>> dna1 = DNA.new('ATTGATTCCG')
=> ATTGATTCCG
>> dna1.most_frequent_kmers(1)
=> [#<Set: {"T"}>, 4]
>> dna1.most_frequent_kmers(2)
=> [#<Set: {"AT", "TT"}>, 2]
>> dna1.most_frequent_kmers(3)
=> [#<Set: {"ATT"}>, 2]
>> dna1.most_frequent_kmers(4)
=> [#<Set: {"ATTG", "TTGA", "TGAT", "GATT", "ATTC", "TTCC", "TCCG"}>, 1]
Explanation / Answer
def most_frequent_kmers(opt={}) str = opt[:str] min_chunk_size = opt[:min_chunk_size] || 1 max_chunk_size = opt[:max_chunk_size] || str.length - 1 min_occurences = opt[:min_occurences] || 1 results = {} top_scoring = {} (min_chunk_size..max_chunk_size).each do |cs| chunk_size = cs results[cs] = {} (0..str.length - chunk_size).each do |n| bottom = n top = bottom + chunk_size -1 sub_string = str[bottom..top] results[cs][sub_string] ||= 0 results[cs][sub_string] += 1 end end results.each do |cs, cs_results_hash| cs_results_hash.each do |str, occurrences| top_scoring[occurrences] ||= [] top_scoring[occurrences] top_scoring[top_score].select{|x| x.length >= min_chunk_size } } endRelated Questions
Navigate
Integrity-first tutoring: explanations and feedback only — we do not complete graded work. Learn more.