|
|
||||||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
ANIMAL GENETICS |
Department of Animal Science and Center for Integrated Animal Genomics, Iowa State University, Ames 50011-3150
| Abstract |
|---|
|
|
|---|
Key Words: commercial performance crossbreeding genotype by environment interaction marker-assisted selection
| INTRODUCTION |
|---|
|
|
|---|
|
The objective of this paper was therefore to evaluate the potential benefit of using marker information obtained at the CC level for selection within PB lines. A secondary objective was to develop a deterministic model to predict responses to MAS in these and other scenarios by standard selection index theory and associated software.
| MATERIALS AND METHODS |
|---|
|
|
|---|
Population Structure and Selection Strategies
A deterministic model was developed for selection within a terminal sire-line that contributes to a CC population (Figure 1
). Populations of finite size with discrete generations reflecting a breeding program in pigs were considered. All selection was within the BP nucleus population. Each generation in the PB nucleus, ns PB males were selected and randomly mated to npd PB females, each resulting in npo PB offspring (half male, half female) per mating. Selection was based on EBV derived from PB or CC data or both, with or without marker information, as explained in the next section. For CCPS, selected PB males were simultaneously mated to females from the maternal line to produce ncb CC offspring from each PB male, with performance in the CC population.
The selection objective was additive genetic improvement of performance for a single trait in the CC population. This trait can, however, also represent a comprehensive multiple-trait economic breeding objective. The trait was assumed to be measurable on all selection candidates prior to the age of selection. In addition, all PB selection (PBS) candidates were assumed to be genotyped for a set of markers that are segregating in the PB populations, the effects of which were assumed to be estimated in prior studies on both PB and CC performance (see next section). These markers could represent markers that were chosen because of their effects on phenotype, as identified in prior QTL mapping studies, or markers that are randomly or uniformly distributed across the genome, as would be the case for genomic selection (Meuwissen et al., 2001
). Genomic selection proposes to estimate effects associated with a large number of markers across the genome, capitalizing on linkage disequilibrium (LD) between markers and closely linked quantitative trait loci (QTL), without prior screening of markers based on the significance of their associations with phenotype. The resulting predictions of the random effects of marker haplotypes (Meuwissen et al., 2001
) or alleles at each marker (Solberg et al., 2006
), are then used to predict breeding values for individuals based on their genotype for all markers. Although the theory that will be developed herein also applies to cases in which a selected group of markers is used, genomic selection will be considered as a special case, because the random association of markers with QTL across the genome results in several simplifying assumptions.
Genetic Model
By considering only additive genetics, performance in the PB nucleus population can be modeled as a trait correlated to performance in the CC population. To enable modeling of MAS, the total additive genetic value of each trait (Gp for PB performance and Gc for CC performance) is partitioned into genetic effects that are correlated with markers through LD (Qp and Qc) and residual genetic effects (Rp and Rc) that are independent of markers. Note that, in addition to QTL that are not in LD with markers, R also includes effects resulting from incomplete LD of QTL with markers. This partitioning results in the following models for phenotypes Pp and Pc (ignoring fixed effects):
![]() |
where Ep and Ec are random environmental effects. A path diagram of this model is presented in Figure 2
. Note that when markers used for marker-assisted genetic evaluation are randomly located across the genome, as would be the case for genomic selection, effects included in Q and R represent a random partitioning of QTL effects into effects that are associated with markers through LD (= Q) and effects that are independent of marker genotypes (= R).
|
and
denote total heritabilities for PB and CC performance, and
and
the proportions of genetic variance contributed by Qp and Qc. The proportions
and
depend on the genetic variance contributed by QTL that are in LD with markers and the extent of LD between markers and QTL. For an individual QTL linked to a single marker, q2 will be equal to the product of LD between the marker and the QTL, as measured by r2 (Hill and Robertson, 1968
) = E(
) = rmax–2. Parameter rmax–2 can be approximated from marker-marker LD studies, for example, based on the distribution of the maximum r2 for markers across the genome, as in Spelman and Coppieters (2006)
Let
Gpc,
Rpc, and
Qpc be the genetic correlations between PB and CC performance for the genetic components G, R, and Q. The partitioning of genetic effects into Q and R results in Qp being uncorrelated with Rp and Qc with Rc. Genetic correlations between Qp and Qc are expected to be equal to the genetic correlation between PB and CC performance [E(
Rpc) = E(
Qpc) =
Gpc], if the same markers are used for MA-genetic evaluation in the PB and CC populations and markers have not been preselected based on associations with phenotype, as would be the case for genomic selection. This holds because genetic effects associated with markers will then be composed of a random proportion of genetic effects that contribute to each trait. The correlation between environmental components contributing to PB and CC phenotypes is undefined because a given animal is evaluated in only 1 environment.
Marker data allow estimates of effects associated with marker alleles or haplotypes to be obtained. When using direct or LD-markers (Dekkers, 2004
), marker effects can be estimated across families and under multiple environments. In the present context, it will be assumed that estimates are available for markers or haplotypes that are segregating in the PB population of their effects on both PB and CC performance. Such estimates can be obtained from analysis of the phenotype and marker genotype data obtained from the PB and the CC populations, although estimates for the CC population will require identification of alleles in the crossbreds that originated from the sire vs. the dam line (see the discussion section). Here, we will assume that estimates are obtained from fitting markers or haplotypes as random rather than fixed effects (i.e., they represent BLUP EBV), similar to what is obtained for polygenic EBV. Such a model was described by Meuwissen et al. (2001)
for genomic selection, with estimates derived from phenotypic data and high-density SNP genotypes in a single generation.
When based on multiple regions of the genome, or on all regions of the genome, as with genomic selection, estimates of the breeding value of a PB animal, based on the effect of its marker genotypes on the performance in population i (i = p or c), can be computed as the sum of EBV across alleles or haplotypes for each genomic region j as:
![]() |
where
ijpat and
ijmat are the BLUP estimates of the effects in population i of the paternal and maternal marker or QTL allele for interval j, or of the paternal and maternal haplotypes. When representing the cumulative effect over multiple intervals, marker-based EBV can be modeled by approximation as a polygenic trait with heritability equal to 1, which then allows marker-based EBV to be incorporated into standard selection index theory and associated software applications (e.g., Rutten et al., 2002
) for prediction of response to selection. Thus, although the marker-based EBV represents an estimate, it can be viewed and modeled as a genetic trait that is inherited in a Mendelian manner and that can be observed on individuals without error (i.e., no environmental effect). This is possible because the marker-based EBV of a progeny can be written as the average of the marker-based EBV of its parents plus the sum of Mendelian deviations for alleles transmitted to the progeny:
![]() |
When based on multiple QTL regions and markers, the Mendelian sampling term
will approximately follow a normal distribution, which is what is assumed for polygenic traits. Further, because an individuals marker-based EBV is fixed conditional on marker genotypes, it can be modeled as a trait with heritability equal to 1. Note that this does assume that (if needed) parental origin of alleles or haplotypes can be determined without error and that estimates of marker or haplotype effects remain consistent across several generations. The latter will be true if markers are tightly linked to the QTL and when dominance and epistatic effects, which can change additive effects as allele frequencies change, are not important.
By using properties of BLUP EBV (Henderson, 1984
), the relationship between marker-based EBV and genetic effects Q can be modeled as:
![]() |
where ei represents the prediction error for the marker-based EBV. By denoting the correlation between Qi and
i by r
i, path coefficients associating Qi,
i, and ei can be derived and are presented in Figure 2
. The correlation of
i with Gi is then equal to rMGi = qir
i. This correlation represents the accuracy of the marker-based EBV as a predictor of the total genetic value Gi, and is equivalent to the accuracies of marker-based EBV that were obtained by Meuwissen et al. (2001)
for genomic selection.
The use of BLUP to estimate marker-based EBV results in a zero correlation between
i and ei, as reflected in Figure 2
; that is, BLUP EBV and their prediction errors are uncorrelated (Henderson, 1984
). However, when obtained from single-trait procedures, which is what is assumed here, prediction errors of an individuals marker-based EBV for PB performance will be correlated with prediction errors of its marker-based EBV for CC performance. Prediction errors for PB (CC) performance will also be correlated with the EBV for CC (PB) performance. Based on the assumption that phenotypic data that contribute to
p and
c are independent, which is valid here because PB and CC phenotypes are measured on different animals, these correlations can be derived to be equal to:
![]() |
![]() |
![]() |
Note that, by using path diagram theory (Lynch and Walsh, 1998
), these correlations result in the correct correlation between Qp and Qc:
![]() |
which, when substituting the previous equations for correlations among EBV and prediction errors, simplifies to
Qpc.
By using the path coefficient diagram in Figure 2
, the following phenotypic and genetic correlations between traits (PB and CC phenotype and PB and CC marker-based EBV) can be determined, which are necessary for derivation of selection indices:
![]() |
These parameters are summarized in Table 1
.
|
For each selection candidate in the PB nucleus (male and female), the following sources of information were assumed to be available for derivation of selection criteria:
p); and 9. own "phenotype" for marker-based EBV for CC performance (
c). Only own phenotype was included for the "trait" marker-based EBV because it has unit heritability. Inclusion of marker-based EBV on relatives will increase the accuracy of using a relatives information to estimate polygenic effects; however, this was not considered here.
To evaluate the benefit of marker-based EBV from PB or CC data, responses and rates of inbreeding, when selecting on EBV for CC performance derived from the following alternate sources, were derived:
Response in CC performance to selection on each of these criteria was predicted by using the pseudo-BLUP selection index theory described in Bijma and van Arendonk (1998)
and Bijma et al. (2001)
, with the program SelAction (Rutten et al., 2002
). In this program, asymptotic responses to selection per generation are derived by using the approach of Wray and Hill (1989)
and Villanueva et al. (1993)
, accounting for the Bulmer effect but not for inbreeding. Rates of inbreeding were predicted as implemented in the program SelAction by using long-term genetic contribution theory as developed by Woolliams and Bijma (2000)
.
Choice of Parameters
Following the example of the pig breeding program used by Bijma and van Arendonk (1998)
, the parameters used were selection of 20 PB males per generation (ns = 20), each mated to 3 PB dams (npd = 3) and producing 8 PB offspring per dam (npo = 8), and each producing 60 CC progeny (ncb = 60). Heritability of both PB and CC performance was 0.4 (
=
= 0.4) and the genetic correlation between PB and CC performance was 0.7. The markers used were assumed to be randomly allocated across the genome, reflecting genomic selection; thus,
Gpc =
Qpc =
Rpc = 0.7. This same assumption also causes the expected proportion of genetic variance that is associated with markers to be equal for PB and CC data; thus, qp = qc, which leads to qir
i = qjr
i = rMGi, the correlation of the marker-based EBV with the total genetic value. Under these assumptions and by using r as an input parameter, MGi phenotypic and genetic correlations between phenotypes and marker-based EBV depend only on rMGi, and not on its partition into qi and r
i. This makes the results more general and applicable to different combinations of qi and r
i for a given level of accuracy of marker-based EBV (rMGi). The resulting equations are presented in Table 1
. The correlation between marker-based EBV (r
p
c = r
p r
c
pc) does depend on r
i, but because none of the selection criteria considered included both marker-based EBV (PB and CC), this correlation was not needed in the calculations. To evaluate the impact of marker information, various levels of r , ranging from 0.2 to 0.9, were evaluated. In all cases, rMGi was equal for PB and CC data.
| RESULTS |
|---|
|
|
|---|
Figures 3
and 4
show the effects of alternate selection criteria on asymptotic responses to selection and rates of inbreeding. Selection based on phenotypic data collected on PB in the nucleus (PBS) resulted in a response of 0.38
p in CC performance per generation. Adding phenotypic data from CC half sibs (CCPS) increased the response by 22% to 0.46
p (Figure 3
). However, the use of CC phenotypic data also increased the rate of inbreeding from 2.1 to 3.0% per generation (Figure 4
). These results confirm those of Bijma et al. (2001)
that inclusion of CC data can substantially increase rates of response but will also increase rates of inbreeding. The latter is because the CC data added are on individuals that are paternal half sibs of the PBS candidates, which increases the correlation of EBV among full and half sibs and increases the probability that individuals that rank high for EBV are related to each other (Bijma et al., 2001
).
|
|
p), either by itself (PB-MS) or in combination with PB phenotypic records (PB-MAS), does not result in substantial increases in genetic improvement of CC performance compared with selection on PB phenotype alone, because the marker-based EBV are derived from a trait that is different from the trait in the selection objective (i.e., PB vs. CC performance). Only with an accuracy of PB marker-based EBV (rMGp) greater than 0.7 was the response from PB-MS greater than the response from PBS. Even use of
p in combination with PB phenotypic data (PB-MAS) resulted in a lower response than CCPS when rMGp was as high as 0.9. The use of marker data derived at the PB level did, however, reduce rates of inbreeding (Figure 4
p was selected on alone (PB-MS). When combined with PB phenotypic data (PB-MAS), rates of inbreeding decreased with increasing values of rMGp to as low as 1.1%, that is, substantially lower than with either selection on phenotype alone through PBS or CCPS. The reason for the lower rates of inbreeding with MS or MAS is that, by providing information on Mendelian sampling terms, marker data differentiate relatives, including full sibs, which reduces the probability of coselection of relatives.
Figure 3
clearly shows the potential of marker-EBV derived from CC phenotypes (
c) to increase responses to selection. When selection was exclusively on marker-based EBV from CC data (CC-MS), responses exceeded those of PBS for rMGc0.52 and exceeded those of CCPS when rMGc was >0.63. For rMGc >0.8, responses from CC-MS were more than 54 and 25% greater than responses from PBS and CCPS. Combining
c with PB phenotypic data (CC-MAS) resulted in a greater response than CCPS when rMGc was >0.5. The extra response from adding PB or CC phenotypic data to
c was limited for rMGc >0.8. In addition, for rMGc >0.6, addition of CC data to selection on PB pheno-type and
c (CCPS-MAS compared with CC-MAS) resulted in extra responses of less than 6%.
Similar to selection on
p, selection on
c resulted in substantial reductions in rates of inbreeding compared with selection on phenotypic data, when selecting both on marker-based EBV alone (CC-MS) or in combination with phenotypic data (CC-MAS; Figure 4
). However, because of the greater informativeness of
c than
p, when combined with phenotypic data, reductions in rates of inbreeding were even greater for CC-MAS than for PB-MAS. At rMGc = 0.6, rates of inbreeding were 1.4% for CC-MAS, compared with 1.8% for PB-MAS and 2.1 for PBS. It is also noteworthy that addition of CC phenotypic data to CC-MAS (CCPS-MAS) resulted in a substantial increase in the rate of inbreeding: from 1.4 to 2.1% at rMGc = 0.6.
| DISCUSSION |
|---|
|
|
|---|
Benefits of MAS for CC Performance
Although the use of MAS has shown great promise for increasing response to selection in cases in which phenotype-based selection strategies are limiting (see Dekkers, 2004
, for a review), the benefit of MAS for improving CC will be limited if marker effects are estimated from data from PB under nucleus-type environments, which is most frequently the case because that is where the required DNA samples, phenotypic data, and pedigrees are available. The resulting PB marker-based EBV are, however, strictly relevant only to the studied population and environment and, as demonstrated here, may not help much to improve selection for CC performance if substantial genotype by environment and genotype by genetic background interactions are present. Thus, for MAS to be effective, the effects of markers must be estimated by using CC data. If such estimates are available (estimation of such effects will be discussed in the next section), the results presented here clearly demonstrate the benefit of their use in selection, compared with inclusion of phenotypic records on CC relatives or of marker-based EBV estimated from PB data. As demonstrated herein, CC-MAS not only increases response to selection but also reduces rates of inbreeding, the latter in particular in relation to CCPS, which puts extra emphasis on family information, thereby increasing the coselection of relatives. If implemented by using LD-markers (see below), CC-MAS would not require routine recording of pedigreed phenotypic data on CC individuals, in contrast to CCPS. Instead, as argued in the next section, estimates of marker effects on CC performance, which are required to compute marker-based EBV for PBS candidates, could be estimated from phenotypes and marker genotypes on a random sample of CC individuals and used for several generations of selection.
One of the benefits that is demonstrated herein for CC-MAS, and for MAS in general, is a reduction in rates of inbreeding. This occurs because markers improve estimates of Mendelian sampling effects and reduce the emphasis on family information. This is in contrast to CCPS, which increases the emphasis of family information by adding informative records on CC performance of half sibs of the selection candidates. In the analyses conducted here, selection was on EBV regardless of the resulting rates of inbreeding. Efforts to control inbreeding may further increase the emphasis that should be placed on marker data, which will further reduce the benefits of CC phenotypic data. For example, for rMGc >0.6, the addition of CC phenotypic data to a selection strategy based on PB phenotype and CC marker information (CCPS-MAS compared with CC-MAS) resulted in extra responses of less than 6% but in a substantial increase in the rate of inbreeding, from 1.4 to 2.1% per generation at r = 0.6.
Implementation of CC-MAS
Implementation of CC-MAS requires the effects of QTL or markers that segregate within the PB on CC performance to be estimated. This requires phenotypic and genotypic data collected at the CC level. The extent and nature of the data that must be collected depend on whether the markers that are used are in LD or linkage equilibrium (LE) with QTL in the PB. The CC-MAS can in principle be implemented by using markers that are in LE in the PB (LE-markers; Dekkers, 2004
). However, their use requires cosegregation of markers and QTL to be modeled, which requires pedigree records on CC individuals. Thus, markers that are in LD with QTL in the PB (LD-markers; Dekkers, 2004
), or ideally, direct markers (Dekkers, 2004
) would be preferred. Such markers, which include candidate gene markers, can in principle be analyzed in nonpedigreed samples because effects tend to be consistent across families, thereby removing the requirement of obtaining pedigree records on CC animals. In addition, the ability to estimate effects across families reduces the number of phenotypic records that must be collected within and across generations. However, estimating LD-marker effects from CC data is complicated by the fact that crossbreds represent a mixture of the specific marker-QTL associations that exist within the breeds that contributed to the cross, which can differ between breeds because of differences in the extent and even direction of LD. Thus, in a 2-way cross, the effect of the 1/2 genotype for a marker may depend on whether allele 1 came from the sire or the dam breed. Knowledge of breed-specific associations is essential for MAS within the pure lines and requires strategies to trace alleles from the crossbreds back to the PB. This may require multiple markers within a genomic region to be genotyped, allowing the trace-back of marker haplotypes.
Thus, the basic steps for the proposed strategy of CC-MAS using LD-markers are as follows:
This strategy can be applied for targeted QTL or candidate gene regions, or can be implemented across the genome by using high-density marker genotypes and genomic selection (Meuwissen et al., 2001
).
Although a 2-way breed cross was considered here, the principles developed can also be applied to 3- or 4-way crosses. In those cases, however, estimation of effects of markers on CC performance will be complicated by the more complex mixture of PB haplotypes that will be present in the CC population, which will make it more difficult to track marker alleles or haplotypes back to the parental lines and may reduce the accuracy of marker-based EBV.
Deterministic Model for MAS
The development of a model that allows evaluation of the incorporation of marker information in selection strategies by using pseudo-BLUP selection index theory is another important contribution of this paper. Although developed here for a specific purpose, the same theory can be applied to other cases of MAS. The main concept was to model marker-based EBV as a separate trait with heritability equal to 1 and to use BLUP and path coefficient theory to derive the associated variances and covariances. The resulting methodology allows marker information to be included in standard selection index software such as SelAction (Rutten et al., 2002
). Inclusion of QTL information in SelAction as a trait with unit heritability was also used by Schrooten et al. (2005)
, but only for a single-trait situation. Although the assumption of multivariate normality of marker-based EBV was used here, derivation of selection index weights does not require this assumption and can be applied even to MAS with 1 QTL or gene (e.g., Dekkers and Settar, 2003
). However, their use to predict response to selection and inbreeding does rely on the fundamental assumption of multivariate normality. This assumption will be valid if marker-based EBV are based on a substantial number of markers or QTL regions, in which case the Central Limit theorem dictates an approximate normal distribution of marker-based EBV, allowing them the be modeled as a polygenic trait. Although the validity of this assumption depends on the number of markers included in the marker-based EBV and on to distribution of marker effects, it will approximately be valid for genomic selection.
The model also assumes that the accuracy of marker-based EBV remains constant over generations, apart from the effects of the Bulmer effect. If LD between markers is not complete, or if dominance and epistatic effects play a role, marker effects will need to be reestimated on a regular basis to maintain accuracy. The model also assumed that marker effects were estimated on phenotypic data that were independent of the phenotypic data used for phenotype-based EBV. This will be approximately true when using LD-markers because marker effects will be estimated from a sample across families, thereby limiting the impact of individual families on marker estimates. Although the model was developed by using a single-trait concept, it can in principle be expanded to simultaneous selection on multiple traits.
It is clear that, ultimately, the deterministic predictions developed here must be validated by stochastic simulation. This was beyond the scope of the current study because of the complexity of the simulation and genetic evaluation models that would be required; however, this is the subject of ongoing research. Nevertheless, the developed models are based on established theory that has been validated under the infinitesimal model and should also apply with use of marker-based EBV under the assumption of normality. The developed models therefore allow an initial assessment of the benefit of marker-based information and can provide the basis for further development of deterministic models for MAS that allow rapid assessment of alternate strategies of selection.
The correlation of the marker-based EBV with the total genetic value for PB (rMGp) or CC performance (rMGc) is an important parameter to evaluate the benefit of MAS and was used as an input in the present analyses. Parameter rMGc is the product of 2 parameers (rMGi= qjr
i): the proportion of genetic variance that is explained by markers (qi), which for LD-markers depends of the average LD between markers and QTL, and the accuracy with which effects of QTL associated with markers are estimated (r
i), which depends on the amount and structure of the phenotypic data and on the statistical model that is used to estimate marker effects.
Although the model was developed allowing for separate accuracies of marker-based EBV for PB and CC performance and for separate genetic correlations between PB and CC performance for genetic effects associated with markers and residual polygenic effects, genetic correlations were assumed to be equal in the scenarios that were evaluated. This assumption is expected to be valid if the same markers are used in both populations and if markers are random across the genome. If markers are selected based on significance, then genetic correlations may differ between marker-associated and residual polygenic effects.
In the present paper, the trait was assumed to be affected only by additive effects. The proposed strategies for CC-MAS, however, enable selection for improved performance of a breed when mated to the other breed or breeds that contribute to the cross, that is, selection for specific combining ability with the other breed(s). To further capitalize on nonadditive effects at specific loci, the strategies developed by Dekkers and Chakraborty (2004)
could be applied, although they will require extension to multiple loci.
In summary, this study shows that the limitations of current pig breeding programs of improving CC performance can be overcome by MAS, by using estimates of effects of markers on CC performance. Estimation of the latter has become feasible with recent advances in molecular technology. When implemented with large numbers of markers, CC-MAS will not only increase the response in CC performance, but will also reduce the rates of inbreeding without requiring extensive programs for pedigree recording at the field level. The CC-MAS will also enable selection for disease and survival traits that cannot be recorded under the bio-secure environments in which most PB seed stock populations are kept.
| Footnotes |
|---|
2 Corresponding author: jdekkers{at}iastate.edu).
Received for publication October 11, 2006. Accepted for publication May 8, 2006.
| LITERATURE CITED |
|---|
|
|
|---|
This article has been cited by other articles:
![]() |
K. Kizilkaya, R. L. Fernando, and D. J. Garrick Genomic prediction of simulated multibreed and purebred performance using observed fifty thousand single nucleotide polymorphism genotypes J Anim Sci, February 1, 2010; 88(2): 544 - 551. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Toosi, R. L. Fernando, and J. C. M. Dekkers Genomic selection in admixed and crossbred populations J Anim Sci, January 1, 2010; 88(1): 32 - 46. [Abstract] [Full Text] [PDF] |
||||
![]() |
D. J. Garrick and B. L. Golden Producing and using genetic evaluations in the United States beef industry of today J Anim Sci, April 1, 2009; 87(14_suppl): E11 - E18. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Legarra, C. Robert-Granie, E. Manfredi, and J.-M. Elsen Performance of Genomic Selection in Mice Genetics, September 1, 2008; 180(1): 611 - 618. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |