|
|
||||||||
ANIMAL GENETICS |



* Istituto Zooprofilattico Sperimentale della Sicilia "A. Mirri," Via G. Marinuzzi 3, 90129 Palermo, Italy;
and
Biometris, Wageningen-UR, P.O. Box 16, 6700 AA, Wageningen, The Netherlands;
and
Department S.En.Fi.Mi.Zo.-Animal Production Section, University of Palermo, Viale delle Scienze-Parco dOrleans, 90128 Palermo, Italy;
and
Animal Breeding and Genetics Group, Wageningen Institute of Animal Sciences, Wageningen University, P.O. Box 338, 6700 AH, Wageningen, The Netherlands;
and
# Department of Animal Science, College of Agriculture and Life Sciences, Cornell University, Ithaca, NY 14853
| Abstract |
|---|
|
|
|---|
Key Words: Bayesian theory chicken genetic marker Monte Carlo method multiple environment quantitative trait locus
| INTRODUCTION |
|---|
|
|
|---|
The number of parameters to estimate increases rapidly as the number of traits increases, which can decrease power (Jiang and Zeng, 1995
; Williams et al., 1999
). Principal components, canonical transformations, and discriminant analysis have been suggested to reduce the order of the problem (Weller et al., 1996
; Korol et al., 1998
, 2001
; Cali
ski et al., 2000
; Gilbert and Le Roy, 2003
). Disadvantages of these methods are: 1) They can only be applied if all traits are recorded on all individuals; 2) They do not distinguish among polygenic, QTL, and environmental correlations; 3) The transformed traits often do not have a clear meaning.
Falconer (1952)
stated that the analysis of one trait in multiple environments with cross-environment genetic correlations is formally equivalent to the analysis of multiple genetically correlated traits in a single environment. In this paper, we build on this similarity between traits and environments. Our single trait Bayesian method for complex outbred populations with heterogeneity of variance between sexes (Van Kaam et al., 2002
) was extended to analyze multiple environments jointly. Assumptions regarding the covariance structure were made to limit the number of parameters to estimate. The method employs a reduced animal model (RAM) with a single pleiotropic QTL affecting all environments and sexes and can account for different magnitudes of polygenic and QTL allelic effects through scale parameters. Simulated and real data are used to demonstrate our method.
| MATERIALS AND METHODS |
|---|
|
|
|---|
![]() |
with the prior assumptions
![]() |
where we specify n as nm + nf, with subscript m = male and f = female; hence y is an n-vector of phenotypes with subscript s indicating sex, X is an n x p incidence matrix relating systematic environmental effect levels and covariates to phenotypes, b is a p-vector of systematic environmental effect levels and covariates, cs are polygenic scale parameters, Z is an n x q incidence matrix relating individuals to phenotypes,
is a q-vector of random additive polygenic effects on the standard normal scale, ds are QTL scale parameters, W is an n x 2q incidence matrix relating each individuals 2 QTL alleles to phenotypes,
is a 2q-vector [following the Fernando and Grossman (1989)
model where the true number of QTL alleles is unknown and their inheritance is followed with flanking markers] of random additive QTL allelic effects on the standard normal scale, A is the additive genetic relationship matrix, Gk is the gametic relationship matrix for the QTL and depends on the marker information at QTL position k, and R is the matrix of residual (co)variances.
In the above representation, uncorrelated error terms e, with homogeneous variances within each sex but heterogeneous variances between sexes, were assumed; hence, R = I
2es. Inverted gamma distributions with predefined hyperparameters
(shape) and ß (scale) were used to represent prior knowledge on these error variances, as
2es ~ IG(
, ßs). Each sex had a scaling parameter for the polygenic and QTL allelic effects. The variances of the random genetic effects were fixed to 1, and hence standard normally distributed
and
were used (Van Kaam et al., 2002
).
Scale parameters can be considered as standard deviations; hence, variances can be obtained by squaring scale parameters. As prior distributions for the scale parameters, normal distributions left-truncated at zero were employed. With truncated normal priors it is possible to have a high or a low prior density at zero, depending on the mean and variance of the prior distribution. This is very useful because, for polygenic effects, one would prefer a nihil density at zero. For QTL effects, however, one might assume a priori a large probability of not having a QTL at the position analyzed; hence, the density at zero should be substantial. With an inverted gamma-style prior distribution, however, the density at zero would always be nihil. The priors can be represented as
![]() |
Because these prior distributions were left-truncated, the prior expectations were greater than the means µcs and µds and the prior variances were smaller than the variances
2cs and
2ds of untruncated normal distributions. New candidate values were sampled directly from a left-truncated normal density.
Multiple Environments with One Heteroskedastic Trait Each
The scaled model was extended to account for multiple environments with 1 trait per environment. Only the case of 2 environments is considered here. Each animal had 2 polygenic effects, 1 for each environment, where the traits of interest had a polygenic correlation,
, between environments. However, for the QTL, each animal had only 2 additive allelic effects (paternal and maternal), assuming perfect correlation between sexes and traits/environments. In other words, the correlation between the effects of the QTL alleles was 1 or 0 for all sex-trait/environment combinations, assuming that the direction of the QTL allelic effects in all sex-trait/environment combinations was the same, so only the magnitude of the effect differed. This led to the following extended scaled model in a 2-trait/environment situation with heterogeneity of variance between sexes:
![]() |
Here, the observations were divided into 4 sets defined by sex, indicated with subscript s, and trait/environment, indicated with subscript t. Each sex-trait/environment combination had its own scaling parameter for the polygenic and QTL allelic effects. Therefore, the putative QTL could have an effect on some of the sex-trait/environment combinations and not on others for which the scaling parameter is nearly zero.
A diffuse uniform prior was chosen for the polygenic correlation, in which the boundaries were
min = 1 and
max = +1. In situations with more than 2 traits, the boundaries should be determined by the requirement that the correlation matrix remains nonnegative definite. New candidate values for
were sampled using Metropolis-Hastings with a uniform candidate-generating density. Homogeneous error variances,
2est, within, but heterogeneous variances between, each sex-trait/ environment combination were assumed, so Rst = I
2est, and an inverted gamma prior distribution on the error variances was specified as
2est ~ IG(
, ßst). Hence, we now have the following prior assumptions:
![]() |
Marker information was needed to calculate the inverse of Gk, similar to Bink and van Arendonk (1999)
. This was described in terms of the allelic constitution of the chromosomal homologues of the founders, and identity-by-descent values for all nonfounders (Thompson, 1994
; Jansen et al., 1998
). Additional clarification of the model is provided in Appendix 1.
Bayesian inferences were based on the joint posterior distribution of missing data and parameters, given observed marker and phenotypic data. Posterior mean parameter estimates were obtained using Markov Chain Monte Carlo (MCMC) algorithms, which enabled sampling from the conditional posterior distribution of parameters. A RAM, similar to that of Van Kaam et al. (2002)
, was used to obtain solutions more efficiently because polygenic effects of nonparents and QTL allelic effects of ungenotyped nonparents do not have to be sampled as done by Bink et al. (1998)
because they are not needed. Haplotypes, systematic environmental effect levels and covariates, and additive polygenic and QTL allelic effects were sampled using Gibbs sampling. Use of RAM meant that the residual of the nonparents consisted of the error and the Mendelian part of the additive polygenic and QTL variance. Therefore, Metropolis-Hastings algorithms were used to sample the scale parameters, polygenic correlation, and error variances. The full conditional distributions of the dispersion parameters are in Appendix 2.
Population Structure
Simulated and experimental data were analyzed. The simulation mimicked the data structure of the experimental population. Both populations consisted of founders (G0), parents (G1), progeny (G2), and grand progeny (G3). In this design, G1 and G2 individuals were typed for genetic markers, and phenotypic observations were collected on G3 individuals, which were randomly divided between 2 experiments. In a growth efficiency experiment, a single bird was randomly assigned to each individual cage (IC), whereas in a carcass experiment, birds were housed in group pens (GP). Numbers of individuals and population structure are given in Table 1
. A more detailed description of the population was given by Van Kaam et al. (1998
, 1999a
, b
). In the analyses, G0 individuals are omitted because the method required known marker genotypes on all founders.
|
Twenty simulations were analyzed. In each simulation, the phenotypic variance for each sex-trait/environment combination was set at different levels for males and females, and the QTL location was always fixed in the middle of the same marker bracket. The polygenic and QTL allelic correlations between both traits/environments were set to 0.70 and unity, respectively. Genetic correlations between sexes were unity. Heritabilities of 0.20, 0.30, and 0.40 were used. The proportion of the genetic variance assigned to the pleiotropic QTL varied from 0.15 to 0.40 for the first environment and from 0.15 to 0.30 for the second environment.
Experimental Data
The trait was BW at 48 d (BW48) of meat chickens raised in 2 experiments with different housing environments, that is, IC or GP. Different G3 hatches were used in each experiment. After removal of 22 outliers, 2,049 observations remained in the IC environment, and after removing 8 outliers, 1,953 observations remained in the GP environment. Removal of outliers was described by Van Kaam et al. (1999a
, b)
.
In both environments, an interaction between hatch of the dam and hatch of the progeny was included as a systematic environmental effect, resulting in 40 levels in the IC environment and 47 levels in the GP environment. Furthermore, in the IC environment, the location of the animals cage within the building was included as a systematic environmental effect with 36 levels.
Marker Data
Genotypes for microsatellite markers were determined using DNA derived from blood samples from all 20 G1 and 451 G2 animals. Marker alleles were recorded in base pair units. Only 1 chromosomal region showed suggestive significance in the previous QTL analyses of the IC environment (Van Kaam et al., 1999b
); however, no evidence was found in this region in the GP environment (Van Kaam et al., 1999a
). This region was located on chromosome 1 and contained 11 markers over 96.2 cM, beginning with marker ADL0150 and ending with marker ADL0314 (Groenen et al., 1998
). These markers with their positions were: ADL0150 (0.0 cM), MCW0018 (1.0 cM), LEI0174 (2.1 cM), UMA1.107 (40.9 cM), MCW0058 (45.1 cM), LEI0071 (47.1 cM), MCW0101 (54.6 cM), LEI0101 (68.4 cM), ADL0251 (78.3 cM), UMA1.220 (92.0 cM), and ADL0314 (96.2 cM). A minimum marker spacing of about 2 cM was aimed for but with closer spacing at the ends of the map. Marker alleles were determined in all 10 families for 7 of the markers in this region. For the other 4 markers, genotypes were only collected in 4 families. All 20 parents were informative in this region.
MCMC and Prior Distributions
In analysis of the experimental data, the QTL position found in our previous regression analysis (Van Kaam et al., 1998
, 1999b
) was investigated in more detail using several runs, while keeping the QTL position fixed in the middle of the marker bracket MCW0058-LEI0071.
Five Markov chains, consisting of a burn-in time of 1,000,000 cycles and a stationary phase of 10,000,000 cycles, were obtained. A single run required 339 min on a 3.4 Ghz Pentium 4 using just half the CPU power due to a single threaded application on a CPU with hyperthreading. For comparison, a model without a QTL was studied as well. This required 92 min for the same number of cycles. Parameters that were suspected to converge more slowly were sampled more often than other parameters, as suggested by Uimari et al. (1996)
. The polygenic correlation was sampled 5 times each cycle; scale parameters and error variances were sampled once each cycle; systematic environmental effects, polygenic effects, and QTL allelic effects were sampled every 5 cycles; and population allele frequencies and haplotypes were sampled every 50 cycles because haplotypes were time-consuming to sample.
The
shape-hyperparameter of the inverted gamma prior distribution of the error variances was set to 2.000001 for all sex-trait/environment combinations following Van Tassell et al. (1995)
, so prior mean and variance are finite, and the estimates should be similar to REML. Other hyperparameters of the priors for dispersion parameters were set based on the following assumptions: 1) expected residual variances were 40% of the variance in the observations unadjusted for systematic environmental effects, resulting in the scale-hyperparameter ß; 2) expected heritabilities in both environments were 0.30; 3) putative QTL accounted for an expected 15% of the additive genetic variance with a mode at zero and, hence, µdst = 0 without truncation; 4) variance of the polygenic scale parameter,
2cst, was 0.09 x the expected polygenic variance; and 5) there was no heterogeneity of variance assumed between sexes and so the same priors were used for males and females.
Based on assumptions 1 and 5, the
hyperparameter was set to 41,200 g2 for the error variances in the IC environment and to 44,000 g2 in the GP environment. Using assumptions 1 and 2, the expected additive genetic variance can be calculated. With assumption 3, the additive genetic variance can be divided between the polygenic and QTL variances. Then
2dst followed from the expected QTL variance, and µcst was obtained by simulation. The priors for BW48 in the IC environment were set to TN(113g,1271g2) for the polygenic scale parameters and TN(0g,1766g2) for the QTL scale parameters. The corresponding priors for BW48 in the GP environment were set to TN(117g,1358g2) and TN(0g,1886g2), respectively. In the model without QTL, the polygenic scale priors used were TN(127g,1606g2) for the IC environment and TN(131g,1702g2) for the GP environment. The latter priors were chosen to have the same expected additive genetic variance and the same coefficient of variation for the polygenic scale parameters in the models with and without QTL. All priors for scale parameters were left-truncated at zero.
In the analysis of the simulated data, the QTL position was fixed at the simulated location. For each replicate, a single chain consisting of a burn-in time of 1,000,000 cycles and a stationary phase of 10,000,000 cycles was obtained. The settings chosen for the analysis of simulated data were equal to those for the real data because the simulation mimicked the real data. The only exception was the prior of the QTL allelic effects, for which the mode was set to 0, and the variance was chosen such that the expectation of the QTL scale parameter reflected the simulated value.
Post-MCMC Analyses
Monte Carlo standard errors of the posterior means were calculated using the posterior variance across all runs as in Gelman and Rubin (1992)
and the effective sample size as in Sorensen et al. (1995)
, cumulated over all runs. This means that within- and between-sequence variation and serial correlation were accounted for.
| RESULTS |
|---|
|
|
|---|
|
Models with a QTL.
Five independent analyses with a model containing polygenic and QTL effects were performed for the MCW0058-LEI0071 marker bracket on chromosome 1. The posterior means of the phenotypic variance for each given sex and environment were similar for the different runs (Table 3
). The difference of these means with estimates obtained under the model without QTL ranged from +0 to +1%. The estimates of the polygenic correlation were slightly reduced by including the QTL in the model. Figure 1
shows the marginal posterior density of the polygenic correlation obtained in 5 independent analyses of BW48. Similarities among densities indicated that convergence was good. The polygenic correlation had the greatest serial correlation of all parameters. The lag 1 serial correlation of the polygenic correlation was 0.990. The trace of the polygenic correlation (not shown), however, reflects proper mixing of the Markov chain. Lag 1 serial correlations for the polygenic scale parameters were approximately 0.97 and approximately 0.95 for the QTL scale parameters. Only slightly greater heritability estimates were found after including the QTL in the model.
|
|
|
|
|
|
| DISCUSSION |
|---|
|
|
|---|
In the present analyses, the polygenic correlation for BW measured in both sexes was assumed to be 1. This agreed with the genetic correlations of BW48 between sexes of 0.97 and 0.92 obtained in earlier analyses of the data (Van Kaam et al., 1999a
,b
). Only additive QTL effects were modeled. Furthermore, these QTL allelic effects were assumed to have a genetic correlation of 1 for all sex-trait/environment combinations. However, using separate scale parameters for each sex-trait/environment combination the presence of a QTL is independent between all sex-trait/environment combinations. The scaled model can be extended to accommodate the possibility of opposite genetic effects by using normally distributed scale parameters instead of scale parameters with left-truncated normal distributions. Allowing scale parameters to be negative means that the direction of the effects is handled entirely by the scale parameters and the polygenic correlation(s) would have to be limited to be nonnegative.
Body Weight
In this study, BW measured in 2 different environments was treated as 2 different traits. Potential effects of hatch of dam and offspring or cage location on QTL manifestation were ignored. The difference in environment caused a genotype x environment interaction, which resulted in a polygenic correlation below unity, with a posterior mean of 0.73 and standard deviation of 0.15. Phenotypic variances in both environments were similar. Summarizing the evidence obtained in these analyses, we concluded that a QTL affecting BW48 is most likely located in the region of marker bracket MCW0058LEI0071.
Multiple vs. Single Trait/Environment QTL Analysis
Multiple trait/environment QTL analysis should increase the power of detection and hence increase the significance of a QTL if the QTL is not a false positive result (Amos et al., 1990
; Amos and Laing, 1993
; Schork, 1993
; Jiang and Zeng, 1995
; Korol et al., 1995
; Ronin et al., 1995
; Almasy et al., 1997
; De Andrade et al., 1997
; Wijsman and Amos, 1997
; Mangin et al., 1998
; Henshall and Goddard, 1999
; Knott and Haley, 2000
). We demonstrated our methodology using a single trait measured in 2 environments, although we could have chosen different traits as well. The results of analyses of simulated data showed an increase in the detection rate of 75%, indicating that multiple trait/environment analysis with our Bayesian method is more powerful than a single trait, single environment analysis. A major reason behind the increased power to identify the smaller QTL effect was that the larger QTL effects help to move the sampler toward the more likely haplotype reconstruction and inheritance patterns. Our findings demonstrated that greater power for multiple trait/environment analysis as found by many authors also holds within the Bayesian framework. The main limitation of the current multiple trait/environment method is the need for traits to be measured on different animals because error terms are assumed to be independent. This could be resolved more easily in an animal model than in RAM.
A question in multiple trait/environment analysis is how many traits/environments should be analyzed simultaneously. There is a trade-off between the increase in number of parameters to be estimated and the diminishing increase of power resulting from extra traits/environments. To remain feasible, multiple trait/environment methods accounting for many traits/environments need more simplifying assumptions than multiple trait/environment methods for just 2 or 3 traits/environments. It is important to realize that results depend on the assumptions made by any method employed and if these assumptions are violated results will be biased, possibly resulting in false positives or real QTL remaining undetected. Additional assumptions needed for multiple trait analysis and increase of power therefore are not without risk.
Pleiotropy vs. Linkage
A multiple trait/environment approach can dissect the genetic covariance among traits/environments. Pleiotropy as well as linkage could occur with multiple traits/environments. However, pleiotropy seems more likely in multiple environment mapping compared with multiple trait mapping because the same biological pathways are likely involved in each environment. Mangin et al. (1998)
used the term pleiotropy mapping, which would suit our method. A multiple trait/environment approach might need the ability to test whether the genetic correlation is due to pleiotropy of a single QTL or linkage between 2 separate QTL (Jiang and Zeng, 1995
; Mangin et al., 1998
). The current method provides some possibilities to evaluate if pleiotropy or linkage occurs. In the chromosomal region of interest, one could make a scan. If the maximum QTL effect occurs at the same point for both traits, this would suggest pleiotropy. Otherwise linkage seems more likely. Alternatively, it is possible to evaluate subsamples of different families to verify if putative QTL effects always occur for both traits, suggesting pleiotropy, or only for a single trait, suggesting linkage. Our approach could be extended by removing the restriction of a genetic correlation of 1 at the QTL. Almasy et al. (1997)
successfully compared the likelihood of a bivariate model with a QTL for each trait in which the correlation between the QTL effects was constrained to 0 or 1. Williams et al. (1999)
suggested comparison of the likelihood of a bivariate model with a QTL for each trait in which the correlation between the QTL is estimated vs. constrained to 1. Such comparisons, however, require data sets of sufficient size to be able to estimate and test such a correlation. If other QTL, affecting the traits under analysis are known, in particular on the same linkage group, it would be useful to include these to reduce background noise as is done in composite interval mapping (Zeng, 1993
) or multiple QTL mapping (Jansen, 1993
). Alternatively, multiple QTL can be mapped simultaneously. A multiple QTL method capable of handling multiple traits/environments simultaneously is a daunting task that still remains and is outside the scope of this paper.
| IMPLICATIONS |
|---|
|
|
|---|
| APPENDIX 1 |
|---|
|
|
|---|
![]() |
For understanding of the parsimonious scaled model specification, we will compare it to the conventional animal model. Again, parameters on the standard normal scale are denoted with a dot above. Furthermore, we define ns and nt as number of sex classes and number of traits/environments.
Using scale parameters and with a fully modeled covariance matrix, the notation of the lower triangular part of symmetric block A for 2 traits measured on both sexes is as follows:
![]() |
The conventional polygenic effect of an animal is equivalent to the polygenic effect under the scaled model multiplied with the scale parameter; hence ust = cst
t Remember here that
t ~N(0,1); hence ust ~N(0,c2st). In a conventional notation without scale parameters, terms like cm1R1m1m1cm1 would be R1m1m1, and
m1 would be um1 because the variance of um1 is c2m1, so that the c terms are absorbed in the u-vector instead of in the covariance part.
Under our parsimonious scaled model, the notation of the lower triangular part of the symmetric block A becomes
![]() |
Note that in the parsimonious model, the size of block A is just 2 x 2 instead of 4 x 4 because the assumption is that polygenic effects in males and females have a unity correlation and only the magnitude of the effect differs. The parsimonious model, therefore, does not need to have separate variables for polygenic effects in males and females. Under the full model, the polygenic effects ust would need ns x nt elements per animal, whereas the parsimonious model needs nt elements for
t instead. Furthermore, instead of (ns x nt)(ns x nt + 1)/2 polygenic (co)variances under the full model, the parsimonious model needs only ns x nt polygenic scale parameters plus nt(nt 1)/2 polygenic correlations.
The polygenic correlation matrix under the parsimonious scaled model is defined as
![]() |
This matrix of the parsimonious scaled model could be related to the full-scaled model as
![]() |
This matrix is singular because of the unity correlation between sexes that the parsimonious scaled model assumes and can therefore be reduced.
The lower triangular part of the symmetric blocks B and C, using scale parameters related to the effects in a conventional animal model with a fully modeled covariance matrix, can be written in the same manner as for block A by replacing every c with d and every
with
. Here again, ns x nt elements for each allele would be needed for QTL allelic effects under a full model specification. Under our parsimonious scaled model, just a single QTL allelic effect is needed per allele. The QTL allelic effects between models are related as vst = dst
.
The notation of the symmetric blocks B and C under the parsimonious scaled model becomes
![]() |
Therefore, instead of (ns x nt)(ns x nt + 1)/2 covariances for QTL allelic effects, just ns x nt scale parameters are needed.
In addition to the parsimonious scaled model specification, the use of a RAM results in further reduction of the variables that need to be sampled, because non-parents do not require sampling.
| APPENDIX 2 |
|---|
|
|
|---|
,
,
, cst, dst,
2est), where i represents the RAM category and j is the animal. Therefore, a Metropolis-Hastings update is used for these parameters.
Let the error term be eistj = yistj x'istjb cstz'istj
dstw'istj
, and the residual variance be
ist =
2est +
i(c2st + 2d2st), where
i reflects the total amount of additive genetic variance present in
ist. There is 1 RAM category for parents (
1 = 0) and 3 for nonparents with both parents known (
2 = 0.5), 1 parent known (
3 = 0.75), and both parents unknown (
4 = 1.0). Then the RAM likelihood for a sex-trait/environment combination is
![]() |
and the full conditional distributions of the scale parameters and error variances are
![]() |
The full conditional distribution of the polygenic correlation is
![]() |
| Footnotes |
|---|
2 Corresponding author: jtkaam{at}unipa.it
Received for publication November 5, 2005. Accepted for publication March 14, 2006.
| LITERATURE CITED |
|---|
|
|
|---|
ski, T., Z. Kaczmarek, P. Krajewski, C. Frova, and M. Sari-Gorla. 2000. A multivariate approach to the problem of QTL localization. Heredity 84:303310.[Medline]
| |||||||||||||||||||||||||||||||||||||||||||||||||