Academic Integrity: tutoring, explanations, and feedback — we don’t complete graded work or submit on a student’s behalf.

3. You are given the protein sequence for Yfg1 and asked to identify distinct do

ID: 204090 • Letter: 3

Question

3. You are given the protein sequence for Yfg1 and asked to identify distinct domains within the protein. As a first step you use BLAST to search Yfp1 against the non-redundant database (nr) and the top hit is as follows: Score Query Name insulin precursor 320 100% E-value 1e-95 Ident Accession 100% NP_001191615.1 Observing that your best hit was to a RefSeq entry, you re-run your search against the RefSeq database (all parameters exactly the same, only changing the database utilized) and obtain the following top hit: Name insulin precursor Score Query 320 Ident Accession 100% NP_001191615.1 E-value 100% 9e-1 11 (a) 3 pt How did you know the hit was to a RefSeq gene? (b) 7pt Give what you know about how E-values are calculated, why has the E- value changed between the two searches?

Explanation / Answer

a) The accession number of the RefSeq are different from that of INSDC. The accession number format of RefSeq record begins with two characters followed by an underscore. Here NP_ means it is a protein molecule. Accession numbers of INSDC never include an underscore.  

b) The size of the the Refseq database entry must vary from the initial search. The Expect value (E) describes the number of hits that can be expected when searching a database and it depends on the particular size of the sequence. E-value decreases as the Score (S) of the match increases. When the E-value is low or closer to zero, the match is more significant. The result when zero can be interpreted as the probability of observing a hit in the results of the database by chance, is improbable, since the two sequences compared are identical. But virtually identical short alignments may have a relatively high E values because shorter sequences have a higher probability of occurring in the database purely by chance. E-value is calculated by the formula:

E= Kmne-lambdaS

Where E is the expected value, K is the scale for search space size, Lambda is the scoring system, m and n are sequence lengths and S is the score.

Hire Me For All Your Tutoring Needs
Integrity-first tutoring: clear explanations, guidance, and feedback.
Drop an Email at
drjack9650@gmail.com
Chat Now And Get Quote