I have a set of nucleotide sequences for which I have aligned using Clustal Omeg
ID: 38893 • Letter: I
Question
I have a set of nucleotide sequences for which I have aligned using Clustal Omega. In particular, I performed a full alignment, and obtained a full distance matrix.
The distance matrix scores range between 0 and 1. I am looking to use this score to back-compute the number of different positions present in the alignment. Is this possible? If possible, I'm looking to avoid using code (my own or otherwise) to re-compute the number of positions differing between each pair of segments, and instead compute it directly from the distance score.
Here is a toy example of what I am receiving from ClustalOmega:
Sequence 1 2 3 4
1 0 0.1 0.06 0.1
2 0 0.4 0.23
3 0 0.05
4 0
The numbers are the "distances" as calculated by ClustalOmega. According to the README file, they are computed by the k-tuple measure. I tried parsing the original paper (published in 1983 in PNAS), but I could not figure out how k-tuple distances are computed, and I could not figure out how the distance metric (as reported like above) is computed from k-tuple distances.
I would like to convert those numbers into the number of positions that differ between each pair of sequences when the two are aligned. This includes substitutions, insertions, deletions. I am currently doing this for 520 sets of virus sequences. Is this possible?
Explanation / Answer
It uses a gonet matrix to compare each of these two sequences. Since you you could have insertions and extensions as well as substitutions, it becomes a 3 parameter problem. 1*open + 6*penalty + substitution_penalty = X. X can be solved by a linear combination of substitution_penalties, extensions, and insertions. So I think this will be really really hard
with that being said. Clustal outputs the multiple sequence alignment. Why don't you just look at sequence 1 and sequence 2 and see what the insertions and substitutions are!
Related Questions
drjack9650@gmail.com
Navigate
Integrity-first tutoring: explanations and feedback only — we do not complete graded work. Learn more.