Academic Integrity: tutoring, explanations, and feedback — we don’t complete graded work or submit on a student’s behalf.

The genome of a newly discovered bacterial species (Bacillus sanfranciscus) was

ID: 166215 • Letter: T

Question

The genome of a newly discovered bacterial species (Bacillus sanfranciscus) was sequenced and found to have a circular genome of 4 x 106 base pairs (bp). Open reading frame (ORF) analysis indicated the presence of 3,190 ORFs that encode proteins with a median length of 270 amino acids (aa) and an average length of 360 aa.

A. What is the information content of this genome (i.e. – how much information can be encoded in this length of DNA)? Since the genetic code is digital in nature, let’s convert the information content of base pairs into bytes and compare this value with the information content of a digital device many of us routinely use. The iPhone operating system, iOS8, requires approximately 5 GB of information to perform its functions. To compare the information content of B. sanfranciscus and the iPhone OS, the following assumptions about the digital content of DNA-encoded information might be helpful. The double helix can potentially encode information in both strands but this is not usually the case; most stretches of DNA encode information in only one strand, although for any given gene it can be either of the two strands. So it is therefore reasonable to assume that each base pair of DNA encodes 2 bits of information (since there are 4 possible nucleotides). Keeping in mind that 1 byte = 8 bits and 1 GB = 109 bytes, the calculation is pretty straightforward from there. Express your answer as iPhone iOS units.

B. Now calculate the percentage of the bacterial genome that encodes the cell’s complete proteome. Assume that: all of the predicted ORFs actually encode proteins, each gene is encoded by only one of the two strands of DNA, and there are no overlapping genes (i.e. - no region of DNA encodes more than a single gene).

Explanation / Answer

A. From the above given information:

Total Genome content = 4 X 106 base pairs = 40,00,000 bp

Information encoded by each base pair = 2 Bits

Therefore

* 40,00,000 X 2 = 80,00,000 bits information is present in the total genome

* now convert 80,00,000 bits into bytes (since 1 byte = 8 bits)

so 80,00,000 / 8 = 10,00,000 bytes information is present in the total genome

* now convert 10,00,000 bytes into GB (since 1 GB = 109 bytes)

10,00,000 / 109 = 9174.3 GB of information is present in the total genome of Bacillus sanfranciscus.

B. The open reading frame is the total amount of a gene, given to make a protein, from the start codon to the stop codon. The mean is given by the total cumulative length of all ORFs divided by the total number of ORFs. Now take this mean ORF length, multiply that by the total number of ORFs to get the total length of all ORFs in the genome, divide that by the total length of the genome, and multiply by 100.

Mean = 360 / 3190 = 0.112

= 0.112 X 3190 = 360

= 360 X 3190 = 763200 ----> total lengths of all ORFs

763200 / 4000000

Total lengths of all ORFs

The % of the bacterial genome that encodes the cell’s complete proteome =-------------------------------------- X 100

Total bacterial genome length

= 763200 / 40,00,000 X 100 = 0.19 X 100

= 19%

Hire Me For All Your Tutoring Needs
Integrity-first tutoring: clear explanations, guidance, and feedback.
Drop an Email at
drjack9650@gmail.com
Chat Now And Get Quote