Academic Integrity: tutoring, explanations, and feedback — we don’t complete graded work or submit on a student’s behalf.

Translating Reading Frames Given a sequence of DNA, it is necessary to examine a

ID: 3721600 • Letter: T

Question

Translating Reading Frames

Given a sequence of DNA, it is necessary to examine all six
reading frames of the DNA to find the coding regions the cell uses
to make proteins.

Very often we won't know where, in the DNA we are studying, the
cell begins translating DNA into protein. Since we don't know where
the translation begins, we have to consider the six possible reading
frames when looking for coding regions.

Your task is to write a program to translate a DNA sequence,
given in a GenBank file format called sequence.gb, into all six
reading frames as output.

Read the file from the following URL:

http://web.njit.edu/~kapleau/teach/current/bnfo135/sequence.gb

Warning! The file may be changed at any time. Your program
must read from the above URL.

Output should be to the screen in the form of "Reading Frame n"
where n is a number between 1 and 6 inclusive followed by the
translation of the given reading frame.

What follows is an example of output:

from urllib.request import urlopen

def dna2rna(seq):

    ''' The dna2rna function converts a sequence of DNA, given as a

        parameter and returns an RNA sequence.

    '''

   

    return ''

codon2aa = {'aaa': 'K', 'aac': 'N', 'aag': 'K', 'aau': 'N',

            'aca': 'T', 'acc': 'T', 'acg': 'T', 'acu': 'T',

            'aga': 'R', 'agc': 'S', 'agg': 'R', 'agu': 'S',

            'aua': 'I', 'auc': 'I', 'aug': 'M', 'auu': 'I',

            'caa': 'Q', 'cac': 'H', 'cag': 'Q', 'cau': 'H',

            'cca': 'P', 'ccc': 'P', 'ccg': 'P', 'ccu': 'P',

            'cga': 'R', 'cgc': 'R', 'cgg': 'R', 'cgu': 'R',

            'cua': 'L', 'cuc': 'L', 'cug': 'L', 'cuu': 'L',

            'gaa': 'E', 'gac': 'D', 'gag': 'E', 'gau': 'D',

            'gca': 'A', 'gcc': 'A', 'gcg': 'A', 'gcu': 'A',

            'gga': 'G', 'ggc': 'G', 'ggg': 'G', 'ggu': 'G',

            'gua': 'V', 'guc': 'V', 'gug': 'V', 'guu': 'V',

            'uaa': '_', 'uac': 'Y', 'uag': '_', 'uau': 'Y',

            'uca': 'S', 'ucc': 'S', 'ucg': 'S', 'ucu': 'S',

            'uga': '_', 'ugc': 'C', 'ugg': 'W', 'ugu': 'C',

            'uua': 'L', 'uuc': 'F', 'uug': 'L', 'uuu': 'F'}

if __name__ == '__main__':

    with urlopen('http://web.njit.edu/~kapleau/teach/current/bnfo135/sequence.gb') as conn:

        data = conn.readlines()

    lines = [line.strip() for line in [datum.decode() for datum in data]]

    flag = False

    dna = ''

    for line in lines:

        ## if the flag is 'True', append the line to 'dna'.

        ## if the word "ORIGIN" is in the line, set 'flag' to 'True'

        pass

    ## gets rid of any non-dna character.

    dna = dna.translate(str.maketrans('acgt', 'acgt', '0123456789 /'))

    ## calls the dna2rna function

    rna = dna2rna(dna)

    ## process the first 3 reading frames

    for i in range(3):

        ## create a variable 'seq' and assign it the rna to process

        seq = ''

        amino = ''

        while len(seq) >= 3:

            ## use the codon2aa table to append an amino acid to 'amino'

            ## update 'seq' to the next codon

            pass

        print('--- Reading Frame %i ---' % (i+1), amino, sep=' ')

    ## compute the reverse complement of 'rna' and assign the result

    ## back into the 'rna' variable

    ## process the next 3 reading frames. hint: just like the first 3

    for i in range(3):

        ## same as the first 3

        print('--- Reading Frame %i ---' % (i+4), amino, sep=' ')

Explanation / Answer

Answer:

It's helpful if you can say where you're stuck, as I don't know what level of feature you need; i.e., do you see how to read input since a file? do you distinguish how to strip whitespace from a string?

Since respectively amino acid is characterized by a codon (three bases), here are six "reading frames" you could use to appearance for start codons: three going onward and three regressive. For instance, reflect the following DNA fragment:

AATTGCTGUAATC

There is an AUG start codon hidden inside it, which is only originate if you start in the precise reading frame and move onward one codon at a time. Here, the reading edges are:

AATTGCTGUATCCTG
ATTGCTGUATCCTG
TTGCTGUATCCTG
GTCCTAUGTCGTTAA
TCCTAUGTCGTTAA
CCTAUGTCGTTAA

Only in the last one is AUG a codon in the right shape (left to right).

So you essential to read in the file, store all the appearances after "ORIGIN", strip out the whitespace/statistics and combine into one cord, then present three shifted forms of the string going forward and three successful back.

Again, I don't know what you know programming-wise, but that must be plenty to get you successful. Feel allowed to ask for clarification.

Hire Me For All Your Tutoring Needs
Integrity-first tutoring: clear explanations, guidance, and feedback.
Drop an Email at
drjack9650@gmail.com
Chat Now And Get Quote