|
|
||||||||



* Institute of Cell, Animal and Population Biology, University of Edinburgh, EH9 3JT, Scotland, U.K. and
and
Roslin Institute, Midlothian EH25 9PS, Scotland, U.K.
2 Correspondence:
West Mains Rd. (E-mail:
albert.tenesa{at}ed.ac.uk).
| Abstract |
|---|
|
|
|---|
Key Words: Dairy Cattle Linkage Disequilibrium Mapping Markers
| Introduction |
|---|
|
|
|---|
Although the extent and patterns of LD have been extensively studied in human populations, (Daly et al., 2001; Jeffreys et al., 2001) farm animal populations have been rarely studied.
Farnir et al. (2000) and McRae et al. (2002) studied the extent of LD in the Dutch black-and-white dairy cattle population and in two sheep populations, respectively. Both these studies used family information to infer the most likely phase of the dams. However, family information is not always available and, if available, collecting the additional family members required may be an inefficient use of resources.
In this study, we estimate the extent of LD in the U.K. dairy cattle population. This will determine the feasibility of LD mapping methods in this population and the marker density required for LD mapping to be effective. We illustrate the use of statistical methods that do not require family information to infer population haplotype frequencies as an alternative to family-based haplotyping methods. These methods to estimate haplotype frequencies are relatively efficient compared to those that require family information (Hill, 1974; McKeigue, 2001). We applied these methods in a small data set and assessed the extent of LD in two regions of the genome of 50 randomly selected dairy cattle bulls that were being progeny tested. They were assumed to produce a representative sample of the future extent of LD in the U.K. dairy cattle population.
| Materials and Methods |
|---|
|
|
|---|
|
|
|
|
Bayesian estimates of six- and seven-loci haplotype frequencies for chromosome 2 and 6, respectively, were obtained using PHASE (Stephens et al., 2001). No attempt to estimate LD among nonsyntenic loci (loci in a different linkage group) using the Bayesian approach was made. Haplotypes were reconstructed 10 independent times to ensure that the results obtained were robust even if the algorithm was not converging, as suggested by Stephens et al. (2001). We ran the algorithm for 107 iterations after a burn-in period of 104 and kept estimates from every 100th iteration. The program PHASE assumes, by default, a stepwise mutation model; however, this assumption was relaxed by using a parent-independent mutation model in which each microsatellite allele has the same chance to mutate to any of the other alleles. Although a stepwise mutation model is more appropriate for microsatellite markers if the length of each microsatellite allele is known, we did not know the actual length of the microsatellite alleles in these data; therefore, this model could not be assumed.
Departures from Hardy-Weinberg equilibrium (HWE) proportions were tested using an exact test as described by Guo and Thompson (1992). This algorithm is implemented in Arlequin (Genetics and Biometry Lab, University of Geneva). The Hardy-Weinberg Equilibrium is an assumption of the EM algorithm, and departures from HWE might lead to biased estimates of haplotype frequencies (Excoffier and Slatkin, 1995). In addition, departures from HWE can be an indication of population stratification, selection of the locus or linked locus, different fertility of parents or different allele frequencies in male and female parents, finite population size, and so on.
Level of Linkage Disequilibrium
Hedricks normalized measure of disequilibrium (Hedrick, 1987) was obtained from the estimates of the two-loci haplotype frequencies. Hedricks normalized measure of disequilibrium is the extension to multiallelic loci of the normalized measure of disequilibrium defined by Lewontin (1964). It is defined as follows:
![]() | [1] |
![]() | [2] |
![]() | [3] |
To test the statistical significance of the allelic association, we compared the statistic S = 2ln(LLD/LLE) to a
2 distribution with (k 1) x (l 1) degrees of freedom (Slatkin and Excoffier, 1996). Assuming random mating, LLD is the likelihood computed using the haplotype frequencies found by the EM algorithm, and LLE is the likelihood under the assumption of linkage equilibrium. We assumed that the available sample size was large enough for asymptotic assumptions to hold.
We performed a large number of tests (n = 78); therefore, we applied a Bonferroni correction to obtain an appropriate significance level for association between each pair of marker loci. The individual test significance level after correction to give a total significance level (
) of 0.05 was P = 1 (1
)1/n = 0.0007, where n was the total number of tests performed. Because some tests are likely to be correlated, our stringent threshold is expected to be conservative with respect to the type-I error rate.
| Results |
|---|
|
|
|---|
Nine of the 13 markers studied showed a deficiency of heterozygotes; however, only four of these nine showed significant (P < 0.001) departures from HWE proportions. Relatedness between individuals in our sample and the small effective population size of the worldwide dairy cattle population could be the cause of the observed deficiency of heterozygotes.
Linkage Disequilibrium Between Syntenic Marker Loci Using the EM Algorithm
Figure 1
shows a plot of the extent of disequilibrium (D') vs genetic map distance measured in cM (genetic map distance is hereafter referred to as genetic distance). The average D' was 44%. The most remarkable observation was that D' did not seem to vary as a function of the genetic distance. We fitted a nonlinear equation of type y = a + becx using nonlinear regression as implemented by Genstats FITCURVE directive (Genstat 5 Committee, 1993), where y is D' and x is genetic distance in cM. Note that y tends to a when x tends to infinity and y tends to a + b when x tends to zero. Only a was (P < 0.0001) different from zero. The estimated parameter values are 0.42 ± 0.06 for a, 0.11 ± 0.18 for b, and 0.76 ± 0.59 for e-c. The fit of y = a and y = a + becx was compared using a likelihood ratio test. The fit of the two curves was not significantly different.
|
|
2456 df
(X2 = 646] << 107), indicating that the mean level of disequilibrium was different from zero and that we lacked power when testing individual pairs.
Figure 3
shows a plot of log10(P) for each pair of marker loci as a function of D'. Significant LD tended to increase with D', although it was very variable. This variance seemed to depend on the value of D'. Pairs of loci with larger values of D' showed more variable levels of significance.
|
|
|
Results using a stepwise mutation model (results not shown) were not significantly different from those from the parent-independent mutation model. This suggests that the algorithm is relatively insensitive to the underlying assumptions about the mutation model.
Linkage Disequilibrium Between Nonsyntenic Marker Loci Using the Expectation-Maximization Algorithm
Figure 6
shows the distribution of D' values observed between pairs of nonsyntenic loci. We estimated the mean level of LD between nonsyntenic loci, measured as D', to be 39%. None of the loci pairs showed significant association between alleles. Indeed, the most significant association was for the pair BM2113-BM1236 (P = 0.03; D' = 0.53). The sum of the 42 statistics obtained between nonsyntenic loci was 548, and the sum of the 42 associated df was 539. The overall level of association between pairs of nonsyntenic loci was not significant (P[
2539 df
(X2 = 548] = 0.39). In addition to this overall test, we performed a Fishers combined probability test (Fisher, 1970) for syntenic and nonsyntenic groups that gave similar results (results not shown). Overall, average levels of LD were fairly similar between syntenic and nonsyntenic loci; however, association could be statistically detected between syntenic loci, but not between nonsyntenic loci, even when the D' values were similar.
|
| Discussion |
|---|
|
|
|---|
Some aspects of our results differ from those reported by Farnir et al. (2000). First, they found extensive significant LD between both syntenic and nonsyntenic loci. Second, they found average D' values in the same range as ours only for genetic distances <5 cM. Third, they found that only those D' values for the more distant syntenic markers were similar to those between nonsyntenic markers. These differences might arise because of two reasons. First, our sample is more related than theirs, and therefore showed larger identical by descent regions. They used two different samples for estimating the extent of LD. One sample was composed of bull-dams and the other of cows selected from the general population. Although their first data set might have a level of relatedness as high as that in our data, it is unlikely that cows in their second data set were as related as our bulls. Relatedness between individuals can cause an increase in the level of LD, even between unlinked loci, because larger portions of the genome are identical between related individuals. Second, the sample size of both studies is very different and a comparison might be difficult and even inappropriate. The expectation of D under equilibrium is zero; however, its sampling variance depends on the sample size from which it is estimated: The larger the sample size, the smaller the sampling variance. If the sampling variance is large, then it is more likely that, just by chance, the estimated value for D' differs from zero. Weir and Hill (1980) derived the variance of R, the correlation of gene frequencies, for biallelic loci. Their arguments about the two sampling processes involved in estimating LD can be extended to a different measure of disequilibrium, say D'. For closely linked loci, the variance of R is approximately
, where Ne is the effective population size, c is the recombination fraction between the two loci, and n is the sample size. The variance of R is due to two different sampling processes, one that reflects the finite size of the population (1/[1 + 4Nec]) and another that reflects that a limited sample of the population (1/n) has been drawn (from which disequilibrium and allele frequencies have been estimated). It is worth noting that n is either a sample of n identified chromosomes or n unphased individuals from which disequilibrium and allele frequencies have been estimated. Additionally, for D', the difference from its expected value under equilibrium is aggravated by the fact that D' uses the absolute value of D'mn. Even small deviations from equilibrium between pairs of alleles accumulate, leading to an upwards bias in the estimate of D'.
We believe that lack of statistical power, especially after correcting for multiple testing, and an upwards bias (due to the small sample size) in the estimate of D' is the reason why the larger D' values observed did not correspond to more significant allelic associations. We assumed that all the tests performed were independent; however, tests between loci on the same chromosome are correlated, especially if the distance between loci is not large as in our data. The significance thresholds we applied after correction are, therefore, very conservative as the number of independent tests actually performed was smaller than assumed.
It is unlikely that the departures from HWE expectations we observed led to an important degree of bias in the estimates of haplotype frequencies. The only problem when estimating haplotype frequencies from genotypes comes from individuals that are heterozygous at the loci considered. In this situation, haplotype frequencies cannot be directly counted because it is not possible to distinguish between the two different diplotypes (i.e., an individual with the two-loci genotype AaBb could have diplotype Ab/aB or AB/ab). In this case, the EM algorithm iteratively estimates the frequencies of the different haplotypes until the likelihood of the data is maximized and, therefore, maximum likelihood haplotype frequencies are obtained. When there is an excess of homozygotes, the number of doubly heterozygous individuals to be resolved is smaller. Consequently, there is little or no bias in the haplotype frequency estimates caused by deviations from HWE due to an excess in homozygosity (Osier et al., 1999; Fallin and Schork, 2000).
Six- and seven-loci maximum likelihood haplotype frequencies for chromosome 2 and 6, respectively, could not be obtained. This was because the algorithm failed to reach a global maximum. After each step of the EM algorithm, the likelihood of the data increases (Dempster et al., 1977); however, if the likelihood surface is concave or very flat, then there is no guarantee that a global maximum is reached. Generally, there is no obvious way of knowing whether the estimated maximum is just a local or a global maximum. In order to be sure that a global maximum is reached, the algorithm is usually started several times from different starting points, and the solution with the maximum likelihood is assumed to be the global maximum. In our case, although the likelihood of the data was the same for different runs, we obtained different haplotype frequencies in each of the runs. This suggests that the likelihood surface was very flat due to the insufficient amount of data or dependencies between the data, and that the iterative process stopped before reaching the global maximum.
Differences observed between the maximum likelihood and Bayesian approaches were small and the general conclusions obtained from both estimation procedures were essentially the same. Differences observed between both approaches are slightly larger for chromosome 2, which has more missing values, than for chromosome 6. This might suggest that the amount of data for some loci on chromosome 2 is too small and this is reflected in the slightly larger discrepancies between both approaches. An advantage of the Bayesian approach is that it provides estimates of the uncertainty associated with each phase, at the cost of a much larger computing time. An advantage of the maximum likelihood over the Bayesian approach is that implementation of the testing procedure is straightforward in the maximum likelihood framework. Therefore, the decision about the most appropriate method would depend on the intended use of the haplotype frequencies. For example, if one just wanted to test for the presence of LD, then the maximum likelihood approach seems adequate and straightforward, but if one wanted to compare haplotype frequencies in a cases/control design then an estimate of the uncertainty of each phase would be necessary.
The fact that the disequilibrium parameter (D') did not depend on distance (cM) but P did depend on distance (Figures 1 and 2![]()
), and that similar values of D' were observed between syntenic and nonsyntenic loci (but significance level was different), suggests that the utility of D' to assess the amount of disequilibrium is limited. This is important if assessment of disequilibrium is done as a preliminary study to determine, for example, the marker density required for a mapping study. In this case, the correlation between P and distance will give a clearer "picture" of the marker density required.
The region of chromosome 6 where we detected the most significant LD has been reported to harbor QTL influencing milk, fat, and protein yield in the U.K. dairy population (Wiener et al., 2000) and other populations, such as the Israeli Holstein population (Ron et al., 2001). This suggests that selection for milk production traits could have generated LD in this region, which was detectable even with the large amount of background LD observed.
| Implications |
|---|
|
|
|---|
| Footnotes |
|---|
Received for publication May 23, 2002. Accepted for publication October 28, 2002.
| Literature Cited |
|---|
|
|
|---|
This article has been cited by other articles:
![]() |
A. P. W. de Roos, B. J. Hayes, R. J. Spelman, and M. E. Goddard Linkage Disequilibrium and Persistence of Phase in Holstein-Friesian, Jersey and Angus Cattle Genetics, July 1, 2008; 179(3): 1503 - 1512. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. Sargolzaei, F. S. Schenkel, G. B. Jansen, and L. R. Schaeffer Extent of Linkage Disequilibrium in Holstein Cattle in North America J Dairy Sci, May 1, 2008; 91(5): 2106 - 2117. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. Gautier, T. Faraut, K. Moazami-Goudarzi, V. Navratil, M. Foglio, C. Grohs, A. Boland, J.-G. Garnier, D. Boichard, G. M. Lathrop, et al. Genetic and Haplotypic Structure in 14 European and African Cattle Breeds Genetics, October 1, 2007; 177(2): 1059 - 1070. [Abstract] [Full Text] [PDF] |
||||
![]() |
W. Barendse, A. Reverter, R. J. Bunch, B. E. Harrison, W. Barris, and M. B. Thomas A Validated Whole-Genome Association Study of Efficient Food Conversion in Cattle Genetics, July 1, 2007; 176(3): 1893 - 1905. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. S. Khatkar, K. R. Zenger, M. Hobbs, R. J. Hawken, J. A. L. Cavanagh, W. Barris, A. E. McClintock, S. McClintock, P. C. Thomson, B. Tier, et al. A Primary Assembly of a Bovine Haplotype Block Map Based on a 15,036-Single-Nucleotide Polymorphism Panel Genotyped in Holstein-Friesian Cattle Genetics, June 1, 2007; 176(2): 763 - 772. [Abstract] [Full Text] [PDF] |
||||
![]() |
H. H. Zhao, R. L. Fernando, and J. C. M. Dekkers Power and Precision of Alternate Methods for Linkage Disequilibrium Mapping of Quantitative Trait Loci Genetics, April 1, 2007; 175(4): 1975 - 1986. [Abstract] [Full Text] [PDF] |
||||
![]() |
W. Barendse, R. J. Bunch, J. W. Kijas, and M. B. Thomas The Effect of Genetic Variation of the Retinoic Acid Receptor-Related Orphan Receptor C Gene on Fatness in Cattle Genetics, February 1, 2007; 175(2): 843 - 853. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. F. McRae, J. M. Pemberton, and P. M. Visscher Modeling Linkage Disequilibrium in Natural Populations: The Example of the Soay Sheep Population of St. Kilda, Scotland Genetics, September 1, 2005; 171(1): 251 - 258. [Abstract] [Full Text] [PDF] |
||||
![]() |
N. B. Sutter, M. A. Eberle, H. G. Parker, B. J. Pullar, E. F. Kirkness, L. Kruglyak, and E. A. Ostrander Extensive and breed-specific linkage disequilibrium in Canis familiaris Genome Res., December 1, 2004; 14(12): 2388 - 2396. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. Nsengimana, P. Baret, C. S. Haley, and P. M. Visscher Linkage Disequilibrium in the Domesticated Pig Genetics, March 1, 2004; 166(3): 1395 - 1404. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |