Translating Reading Frames Given a sequence of DNA, it is necessary to examine a
ID: 3721600 • Letter: T
Question
Translating Reading Frames
Given a sequence of DNA, it is necessary to examine all six
reading frames of the DNA to find the coding regions the cell uses
to make proteins.
Very often we won't know where, in the DNA we are studying, the
cell begins translating DNA into protein. Since we don't know where
the translation begins, we have to consider the six possible reading
frames when looking for coding regions.
Your task is to write a program to translate a DNA sequence,
given in a GenBank file format called sequence.gb, into all six
reading frames as output.
Read the file from the following URL:
http://web.njit.edu/~kapleau/teach/current/bnfo135/sequence.gb
Warning! The file may be changed at any time. Your program
must read from the above URL.
Output should be to the screen in the form of "Reading Frame n"
where n is a number between 1 and 6 inclusive followed by the
translation of the given reading frame.
What follows is an example of output:
from urllib.request import urlopen
def dna2rna(seq):
''' The dna2rna function converts a sequence of DNA, given as a
parameter and returns an RNA sequence.
'''
return ''
codon2aa = {'aaa': 'K', 'aac': 'N', 'aag': 'K', 'aau': 'N',
'aca': 'T', 'acc': 'T', 'acg': 'T', 'acu': 'T',
'aga': 'R', 'agc': 'S', 'agg': 'R', 'agu': 'S',
'aua': 'I', 'auc': 'I', 'aug': 'M', 'auu': 'I',
'caa': 'Q', 'cac': 'H', 'cag': 'Q', 'cau': 'H',
'cca': 'P', 'ccc': 'P', 'ccg': 'P', 'ccu': 'P',
'cga': 'R', 'cgc': 'R', 'cgg': 'R', 'cgu': 'R',
'cua': 'L', 'cuc': 'L', 'cug': 'L', 'cuu': 'L',
'gaa': 'E', 'gac': 'D', 'gag': 'E', 'gau': 'D',
'gca': 'A', 'gcc': 'A', 'gcg': 'A', 'gcu': 'A',
'gga': 'G', 'ggc': 'G', 'ggg': 'G', 'ggu': 'G',
'gua': 'V', 'guc': 'V', 'gug': 'V', 'guu': 'V',
'uaa': '_', 'uac': 'Y', 'uag': '_', 'uau': 'Y',
'uca': 'S', 'ucc': 'S', 'ucg': 'S', 'ucu': 'S',
'uga': '_', 'ugc': 'C', 'ugg': 'W', 'ugu': 'C',
'uua': 'L', 'uuc': 'F', 'uug': 'L', 'uuu': 'F'}
if __name__ == '__main__':
with urlopen('http://web.njit.edu/~kapleau/teach/current/bnfo135/sequence.gb') as conn:
data = conn.readlines()
lines = [line.strip() for line in [datum.decode() for datum in data]]
flag = False
dna = ''
for line in lines:
## if the flag is 'True', append the line to 'dna'.
## if the word "ORIGIN" is in the line, set 'flag' to 'True'
pass
## gets rid of any non-dna character.
dna = dna.translate(str.maketrans('acgt', 'acgt', '0123456789 /'))
## calls the dna2rna function
rna = dna2rna(dna)
## process the first 3 reading frames
for i in range(3):
## create a variable 'seq' and assign it the rna to process
seq = ''
amino = ''
while len(seq) >= 3:
## use the codon2aa table to append an amino acid to 'amino'
## update 'seq' to the next codon
pass
print('--- Reading Frame %i ---' % (i+1), amino, sep=' ')
## compute the reverse complement of 'rna' and assign the result
## back into the 'rna' variable
## process the next 3 reading frames. hint: just like the first 3
for i in range(3):
## same as the first 3
print('--- Reading Frame %i ---' % (i+4), amino, sep=' ')
Explanation / Answer
Answer:
It's helpful if you can say where you're stuck, as I don't know what level of feature you need; i.e., do you see how to read input since a file? do you distinguish how to strip whitespace from a string?
Since respectively amino acid is characterized by a codon (three bases), here are six "reading frames" you could use to appearance for start codons: three going onward and three regressive. For instance, reflect the following DNA fragment:
AATTGCTGUAATC
There is an AUG start codon hidden inside it, which is only originate if you start in the precise reading frame and move onward one codon at a time. Here, the reading edges are:
AATTGCTGUATCCTG
ATTGCTGUATCCTG
TTGCTGUATCCTG
GTCCTAUGTCGTTAA
TCCTAUGTCGTTAA
CCTAUGTCGTTAA
Only in the last one is AUG a codon in the right shape (left to right).
So you essential to read in the file, store all the appearances after "ORIGIN", strip out the whitespace/statistics and combine into one cord, then present three shifted forms of the string going forward and three successful back.
Again, I don't know what you know programming-wise, but that must be plenty to get you successful. Feel allowed to ask for clarification.
Related Questions
drjack9650@gmail.com
Navigate
Integrity-first tutoring: explanations and feedback only — we do not complete graded work. Learn more.