You have to read the questions from A-G but I have completed A-C already. I have
ID: 3797294 • Letter: Y
Question
You have to read the questions from A-G but I have completed A-C already. I have A-C complete. I just need D-G completed. Thank you.
Problem 4 (30 points): Suffix tree, suffix array and BWT
A). Show the suffix tree for string actgctcggct.
B). Compute the suffix array for the same string.
C). Compute the BWT transformation for the same string.
D). Assume that an implementation of suffix tree for DNA sequences uses 20 bytes per base. Estimate the memory requirement for a suffix tree representation of the human genome, which has 3 billion bases.
E). Estimate the memory requirement for a suffix array representation of the human genome.
F). Estimate the memory requirement for a BWT representation of the human genome.
G). Considering that we are indexing a large database of strings with 3 billions “characters”, where each “character” is an integer in the range of 1 to 216. Does this change the memory requirement of suffix array and BWT? What impact it might have on suffix trees?
Explanation / Answer
Answer:
D) Given:- Bytes per base for DNA sequence = 20, No. of bases = 3 billion = 3e+9
To calculate: Memory requirements for suffix tree of DNA sequence
Memory requirements = Bytes per base * No. of bases = 20 * 3e+9 = 60e+9 = 6e+10 bytes = 60 GB
E) A suffix array can be used by using approx. one word per base. Assuming one word is 4 bytes long. Then a suffix array for human genome will require = Bytes per base * No. of bases = 4 * 3e+9 = 12e+9 = 1.2e+10 bytes = 12 GB
F) A BWT transformation's memory requirements will be of the order of size of human genome. Hence, we can say that that it will require at least 3e+9 bytes or 3 GB of memory.
G) Indexing will not change the space requirements. It will just fasten the query response time.
Related Questions
drjack9650@gmail.com
Navigate
Integrity-first tutoring: explanations and feedback only — we do not complete graded work. Learn more.