Academic Integrity: tutoring, explanations, and feedback — we don’t complete graded work or submit on a student’s behalf.

Standard models in population genetics look up at the evolution of few loci whic

ID: 36891 • Letter: S

Question

Standard models in population genetics look up at the evolution of few loci which impact fitness. The variance in fitness is determined by the genetic variance and the environmental variance (and the co-variance between environment and genetics). In this question I am interested only about genetic variance and about what percentage of the total genetic variance in fitness do 'n' loci explain.

The question is:

In general, in natural populations, what percentage of the total genetic variance is explained by the 'n'- most important loci? Here, by "most important loci" I mean loci which variance explain much of the total genetic variance.

In other words, the subquestions are of the kind:

Explanation / Answer

From the statistical point of view, this question is rather vague. One would need a mathematical definition for the term "genetic variance".

In one extreme, if the "genetic variance" merely means the categorial variations of nucleotides (i.e. ACTG) in the pooled genomes of interest, then the distribution of total "genetic variance" vs. loci variation is uniform and only depends on the size of the locus.

In another extreme (among many dimensions of extremes), if the "genetic variance" is only manifest by the organism's immediate "fitness" and only has two values: life and death (on birth), then all the "essential genes" are "the most important" loci. If you're interested in the n most important loci where n > the number of essential genes, then you would first look at the binary genetic interactions in the database such as BioGrid where two non-essential genes would "interact" and change the organism's fitness (in life and death).

Of course none of the two extremes is very interesting in population genetics or evolution, but a statistical question is best phrased by statistical terms. I would try to find the mathematical definition for "fitness variance", too.

For a semi-empirical/informatics study, I think you could start with the simplest organism whose genome is well studied.

Choose an organism (e.g. yeast)
Assume uniform inheritability
Choose a specific measurable phenotype/environment (e.g. the ability to grow on a specific sugar x)
Scan each gene in the yeast genome and see its quantitative impact on growth (They're documented in various database)
Ignore genetic interaction
(Or scan each gene pair/triplet/.../n-cluster to see its impact on growth on x)
Try to model your empirical distribution. It's only valid for that specific phenotype/environment
Define your "TOTAL genetic variance in fitness" meaningfully and rigorously. "Additivity" would be a very drastic assumption.

My guess as a non-geneticist is that, as GriffinEvo suggested, for each phenotype as a function of the environment, the distribution would follow a power law. They would not have the properties that would allow you to use central limit theorem to "add them up". But for a specific phenotype, your empirical cumulative distribution function (cdf) would answer your question.

Hire Me For All Your Tutoring Needs
Integrity-first tutoring: clear explanations, guidance, and feedback.
Drop an Email at
drjack9650@gmail.com
Chat Now And Get Quote