The following model is proposed for the distribution of family size in a large p
ID: 2980461 • Letter: T
Question
The following model is proposed for the distribution of family size in a large population: P(k children in family= P(0 children in family) = Here is alpha is an unknown parameter and 0 < alpha < 1/2. Fifty families were chosen at random from the population. The observed number of children is as follows No. of children/Frequency observed: 0children/17obs; 1ch/22obs; 2ch/7 obs; 3ch/3 obs; 4ch/1 obs; >4ch/0 obs 1. Find the maximum likelihood estimate of alpha 2. Calculate the estimated expected frequencies 3. A large study done 20 years ago indicated that alpha=.45. Is this a plausible value for the current data (use relative likelihood function)Explanation / Answer
Output of analysis with HOMOG For significance testing use the following standard critical values: HOMOGENEITY: Z(theta) > 3 HETEROGENEITY: Z(alpha,theta) > 3.3 Under homogeneity - Maximum Lod Score = 0.909022 Theta = 0.380 Under heterogeneity - Maximum lod score = 3.849152 Theta = 0.100 Proportion of Linked Families, Alpha = 0.350 Under Heterogeneity there is significant evidence of linkage. 3.3-lod-unit support interval for alpha and theta is as follows: Alpha: ( 0.01, 1.00) Theta: ( 0.00, 0.44) This part of the file summarizes the essential linkage information obtained from analysis using the parametric model defined in your datain.dat for the trait locus. The analysis is done using MLINK (Lathrop et al, 1984), letting the recombination fraction vary from 0 to 0.5 in steps of 0.02, so here we see that the maximum lod score occurs at recombination fraction 0.38, with said lod score being 0.91. Then, a heterogeneity analysis is performed using the HOMOG program (Ott, 1991), allowing alpha (the proportion of linked families in the dataset) to vary between 0 and 1 in steps of 0.01. In this case, we can see that the maximum lod score has risen to 3.85 when allowed for the presence of some unlinked families in our dataset. At the top of this section it explains that the conventional criteria have been used to interpret the significance of these lod scores - under homogeneity a lod score of 3 (1000:1 likelihood ratio) is taken as significant evidence of linkage (Morton, 1955; Bailey, 1961; Chotai, 1984), while if this is not significant, it has been suggested (Ott, 1991) that a lod score of 3.3 (likelihood ratio of 2000:1) be used as a critical value to compensate for the added free parameter, alpha. Note that this is by convention, and in reality the distribution of this latter statistic is very complicated (cf. Davies, 1977; Faraway et al, 1993). Because the critical limit for our test has been surpassed, I have included the support interval for our estimated values of alpha and theta - here I used a 3.3 unit support interval to be consistent with the test statistic critical value (cf. Terwilliger and Ott, 1994). Looking at the same section for the second marker locus (locus 3), we see the following: Output of analysis with HOMOG For significance testing use the following standard critical values: HOMOGENEITY: Z(theta) > 3 HETEROGENEITY: Z(alpha,theta) > 3.3 Under homogeneity - Maximum Lod Score = 5.582551 Theta = 0.280 Under homogeneity there is significant evidence of linkage. Test for heterogeneity GIVEN Linkage Chi-square for heterogeneity = 33.510799 Theta = 0.080 Alpha = 0.390 Significant evidence of heterogeneity at p < 0.0001 level! In this example, there was significant evidence of linkage under the assumption of homogeneity - the lod score being 5.58 at recombination fraction 0.28. In this situation, there was no need to look at the lod score test for linkage and homogeneity jointly, since we have already demonstrated that there is linkage. In this situation, a test of heterogeneity given linkage is performed with HOMOG (Smith, 1963; Ott, 1991), and the statistic presented is 2ln(L(theta,alpha)/L(theta,alpha=1)) which is asymptotically distributed as a chi-square statistic with 1 degree of freedom - the test is further one-sided, since the null hypothesis, alpha = 1, is compared with the one-sided alternative alpha < 1. In this case, there was significant evidence of heterogeneity at the 0.0001 level (estimated theta = 0.08, and alpha = 0.39). Transmission disequilibrium test (TDT) ******************************************************** ***** ***** ***** You are using TDTLIKE - Alpha Test Version ***** ***** for computing TDT-like likelihood ratio ***** ***** statistics based on an algorithm of ***** ***** J. Terwilliger (AJHG 56:777-787 (1995)) ***** ***** ***** ******************************************************** Locus 1 Alleles which appear at least 5 times shown. Multiple test corrected ORIG # CASE CONTROL TDT One-Sided P-Value 1 209 129 18.9349117279 0.0000397356 2 46 83 10.6124029160 1.0000000000 3 76 129 13.7024393082 1.0000000000 4 91 93 0.0217391308 0.9880542409 5 87 75 0.8888888955 0.6593401134 Multiallelic Statistic - Based on Terwilliger (AJHG - March 1995) Maximum Likelihood Estimate of TDT Lambda = 0.62000 -2ln(L) difference = 15.8929973831 P-Value = 0.0000338507 This is the summary of results of analysis with the TDT (Spielman et al, 1993). This test looks at all affected offspring of heterozygous parents for a given allele - it compares the frequency with which they transmit said allele to their affected children with the frequency with which they transmit the other allele to their affected children. In other words, if you have a sample affected kids, with a parent with genotype 1/?, and there are X affecteds who received the 1 allele from these heterozygous parents, and Y affecteds who got the other allele (which may be different in each case), the TDT test statistic is of the form (X-Y)^2/(X+Y), which is asymptotically distributed as a chi-square statistic with one degree of freedom - again, this test is one-sided because we are only interested in cases where a specific allele is transmitted more frequently than expected to the affected offspring(Ewens and Spielman, 1993). Some thought might be given to what is actually being tested with the TDT. In the absence of linkage, it is random which allele segregates to any given affected individual from a heterozygous 1/? parent, so under the hypothesis of no linkage, clearly X = Y, and the test is valid, since all meiotic events are independent in this case - even within a pedigree. If you have a sample of unrelated affected kids, and their heterozygous parents, in the absence of allelic association, it is random which allele is "in phase" with the disease allele in these heterozygous parents, so under this hypothesis as well, all observations are independent (even if there were linkage), and it is a valid test of the hypothesis of no allelic association. However, in extended pedigrees, if there is linkage, the different affected kids are not independent relative to the hypothesis of no allelic association, and the test is not a valid one for this null hypothesis. To see this, consider the situation in which you have a set of sibpairs, and there is 0 recombination between marker and disease, and a fully penetrant recessive disease - then all affected sibs would have the identical marker alleles received from the heterozygous parents. Then, effectively you are counting the same parental alleles twice. The effect would be the same as if you had a standard case-control association test, and decided to double the number of counts of each allele in case and control samples - clearly one would never conclude that this was a valid test procedure, and this is the same effect as you have here when there is linkage, so to say that a positive TDT in multiplex families is evidence of association is a misnomer - the only null hypothesis you can reject validly with this test is that of no linkage, especially in a small set of large pedigrees (Ott, 1989; Ewens and Spielman, 1995; Terwilliger, 1996b) In the statistical analysis outlined above, I have not used the chi-square approximation to compute the p-values, but rather, since sample sizes are sometimes a bit too small for the chi-square approximation to hold, I have computed exact p-values from a binomial distribution, looking at the probability of having the allele under study transmitted X or more times to the affected offspring of heterozygous parents out of (X+Y) opportunities. Then, I have corrected for the fact that multiple tests have been performed (if there are 5 alleles at a locus, there are approximately 4 "independent" tests - therefore the p-value presented is the probability of such an extreme result occurring for one of the n alleles at a given locus, rather than assuming you had only tested the one single allele. Only alleles which have had at least 5 opportunities to be transmitted are included, to reduce in an unbiased way, the number of necessary tests one should then correct for. This can be made higher or lower by altering a program constant, min, in the file tdtlikena.p and then recompiling the program. At the bottom, I have also performed a likelihood ratio based TDT test (Terwilliger, 1995;1996b) which considers all the alleles jointly and allows that one of them would be transmitted preferentially to affected offspring, and not the others. The test is parametrized such that the probability of transmitting said allele is equal to lambda, which is then estimated by maximum likelihood, where lambda is constrained to be > 0.5, lambda = 0.5 being the null hypothesis in this nonparametric linkage test. 2ln(L(lambda)/L(lambda=0.5)) is assumed to be asymptotically one-sided chi-square with one df. In this case, there is significant evidence of linkage, with allele 1 being preferentially transmitted to affected offspring of 1/? parents, with a p-value of 0.00004 in the standard multiple test corrected TDT, and using my likelihood ratio test, the p-value is 0.00003, though this test does not provide any determination of which allele is actually preferentially transmitted, but since this TDT is performed using all affected individuals in multiplex pedigrees, no conclusions about allelic association can be made, so this information is really somewhat irrelevant in this test. Haplotype relative risk (HRR and HHRR) ************************************************************* * * * Program HRRLAMB - Version 2.1 (1/31/96) * * * * AJHG 56:777-787 (1995) * * * ************************************************************* Disease allele frequency = 0.01000000 ========================================= CASE | 43. | 1. | 13. | 12. | 11. | CONTROL | 27. | 10. | 14. | 18. | 11. | ========================================= Estimated parameters for likelihood ratio test: Allele frequencies: Allele H0: H1 1 0.43750000 0.34604554 2 0.06875000 0.08098217 3 0.16875000 0.19567004 4 0.18750000 0.21743885 5 0.13750000 0.15986340 Lambda 0.00000000 0.291021 LRT Chi-Square = 4.40863 p-value = 0.017888760783279 Lambda = 0.291021 NO SIGNIFICANT EVIDENCE OF LINKAGE DISEQUILIBRIUM BY LRT TEST 2 x n table Chi-square = 12.25782 P-value = 0.015533597133660 NO SIGNIFICANT EVIDENCE OF LINKAGE DISEQUILIBRIUM BY 2 x N TABLE CHI-SQUARE TEST The next set of output is for the haplotype relative risk test for allelic association (Rubinstein et al, 1981; Falk and Rubinstein, 1987; Terwilliger and Ott, 1992). This test is very much related to the TDT with two salient differences - first, only one affected individual is used per pedigree - so they are independent relative to the null hypothesis of no association; second, all alleles transmitted from parents (heterozygous or homozygous) to the affected child are included in the case sample, and all remaining (i.e. not transmitted) alleles in the parents of the affected children are included in the control sample (cf. Falk and Rubinstein, 1987; Ott, 1989; Terwilliger and Ott, 1992). In this case, the affected child selected in any given pedigree is chosen as the first affected individual who is himself typed at the marker locus and has both parents typed at the marker as well. If no affected child in a pedigree meets these criteria, then the program searches for the first child who is himself typed and has one parent typed at the marker locus - in this case there would be two alleles in the case sample ( the two alleles of the affected child) and one allele in the control sample (the not transmitted allele from the genotyped parent). If no affected child meets this criterion in a pedigree, then the first affected child who is typed himself is taken as the case sample, and no matching control is allowed for - the effect of this sampling procedure is that the case sample will be, in general, larger than the control sample, but it avoids the necessity of throwing away information about linkage disequilibrium from independent unrelated affecteds, just because their parents are not available for testing and thus no controls are available.Related Questions
Navigate
Integrity-first tutoring: explanations and feedback only — we do not complete graded work. Learn more.