|
|
||||||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
ANIMAL GENETICS |
,2
* Animal and Dairy Science Department, University of Georgia, 425 River Road, Athens 30602;
and
Departamento de Producción Animal, Facultad de Veterinaria, Universidad de León, Campus de Vegazana, León 24071, Spain; and
Instituto Nacional de Investigación Agropecuaria, Las Brujas, Uruguay
| Abstract |
|---|
|
|
|---|
0.99 for BWT and WWT and
0.98 for YWT); also, some bulls dropped from the top 100 list when these lists were compared across methods. For maternal effects, the estimated correlations were slightly smaller, particularly for YWT; again, some drops from the top 100 animals were observed. As in regular MT multibreed genetic evaluations, the heterosis effects and the additive genetic effects of the breed could not be estimated from field data, because there were not enough contemporary groups with the proper composition of purebred and crossbred animals; thus, prior information based on literature values had to be included. The inclusion of prior information had a negligible effect in the overall ranking for bulls with greater than 20 birth weight progeny records; however, the effect of prior information for breeds or groups poorly represented in the data was important. The Pearson correlations for direct and maternal WWT and YWT ranged from 0.95 to 0.98 when comparing evaluations of data sets for which the out-of-range age records were removed or retained. Random regression allows for avoiding the discarding of records that are outside the usual age ranges of measurement; thus, greater accuracies are achieved, and greater genetic progress could be expected.
Key Words: beef cattle growth multibreed random regression model spline
| INTRODUCTION |
|---|
|
|
|---|
The conventional approach for MB genetic evaluation of growth traits in the United States has been to use multitrait (MT) models that treat weights recorded at different ages as different traits. The use of MT models creates the need for the establishment of age ranges and the elimination of measurements recorded outside them. This could lead to more data losses as more animals are measured outside these age ranges. Legendre polynomial random regression models have been proposed as an alternative to MT models for the genetic evaluation of growth data of varying ages (Albuquerque and Meyer, 2001
; Meyer, 2004
). Recently, linear random regression-spline (RR-spline) models have received major attention (Bohmanova et al., 2005
; Iwaisaki et al., 2005
; Robbins et al., 2005
), and Misztal (2006)
has shown properties of linear spline functions used in random regression models.
The general aim of this study was to determine the suitability of the RR-spline model for evaluating growth in MB populations. The detailed goals are to investigate the effect of prior information on various estimates and to compare EBV from the RR-spline and the MT models.
| MATERIALS AND METHODS |
|---|
|
|
|---|
Animal Care and Use Committee approval was not obtained, because the data used in this study was provided by the American Gelbvieh Association (AGA) and included the information used to produce the June 2006 AGA Sire Summary.
Two data sets were formed depending on the assumed range of ages for considering weaning weight (WWT) and yearling weight (YWT) as valid. In one case, the ranges of ages were 160 to 250 d and 305 to 410 d for WWT and YWT, respectively (data set D1), and in the other, the ranges were 50 to 285 d and 286 to 600 d (data set D2). Table 1
shows the descriptive statistics for these data sets. There was a 17 and 35% increase in WWT and YWT records, respectively, from D1 to D2. However, nearly 71 and 76% of the records measured outside the D1 age ranges were within 20 d of the age range bounds.
|
In our study, D1 was analyzed using both a MT and RR-spline model, and D2 was analyzed using only the RR-spline model. In both analyses, BWT and WWT were precorrected for age of the dam according to the equations for the Gelbvieh breed (Beef Improvement Federation, 1996
).
Because the animals contained in the breed groups in the AGA data set cannot be considered as a random sample from the breed, it is more appropriate to term the effect of breed groups as breed of founder (BOF) effects. There were 54 breeds represented in this data set, either as a purebred or a significant breed proportion (1/8) of a crossbred animal. Due to the large number of breeds and the small number of animals available for several of the breed groups in the data set, many of the breeds were grouped together into common BOF groups based on origin and trait characteristics of the breeds. Thus, the additive direct and maternal genetic BOF group effects fit in the models were Gelbvieh, Angus, Hereford, Shorthorn, Simmental, Limousin, Charolais, British Beef, British Dairy, Continental Beef, Continental Dairy, and Zebu. The BOF effect was fit as unknown parents groups using the rules of Westell et al. (1988)
. To do this, the pedigree was traced back for all of the animals until purebred parents were obtained; if a crossbreed animal was found to be the final known ancestor, phantom parents with the appropriate breed composition were added until purebred (phantom) ancestors were obtained. To account for the trend in BOF effects, they were defined in time intervals of 5 yr beginning before 1979 so that, in essence, BOF x generation group effects were contained in the models. For each of the BOF x generation group effects, prior information was considered. It was assumed that the BOF trend within each breed group follows an autocorrelated process, as described by Klei et al. (1996)
, with an autocorrelation coefficient of 0.9. Thus, a priori BOF effects for each breed and trait were considered to follow a multivariate normal process with means equal to literature values and variance-covariance matrix Ht, defined by,
![]() |
where
2a,t = the additive genetic variance of trait t for the MT model or the additive genetic variance at knot t for the RR-spline model, and
= a scale parameter that allowed less or more weight to be given to the literature values relative to the data. The BOF x generation group effects between breeds were assumed to be uncorrelated.
In the AGA database, there were many breeds involved in crossbred combinations, and most of these combinations had few observations. Also, several crosses present in the AGA data had few or no literature heterosis estimates available. Therefore, it would not be possible to estimate heterosis for each individual F1 breed cross-contained in the data. Thus, breeds were grouped into super-breed categories to account for heterosis. In the case of the heterosis effects, 3 groups were formed: Continental, British, and Zebu. Therefore, 6 direct and 6 maternal heterosis effects were possible, in which heterosis was assumed to have occurred when animals of different breeds were crossed, even if the different breeds were in the same superbreed category. For a particular heterosis effect k for trait t or at knot t, the effect was a priori assumed to be normally distributed, with this specification:
![]() |
where µhtk = the prior mean that was equal to the literature value;
2e,t = the residual variance of the trait t or the residual variance at knot t; and
= a scale parameter which, as for the case of BOF, allowed for less or more weight to be given to the literature values.
The assumed means of the prior distributions both for BOF and heterosis effects are the same as those used by Legarra et al. (2007)
. They are a summary of previously reported estimates from experiments estimating crossbreeding parameters in beef cattle.
Another source of prior information included in the analysis was the EBV of Angus and Red Angus bulls predicted in each of the 2 breeds within their respective association genetic evaluation program. The EBV and predicted error variances derived from the accuracy values were considered as prior mean and prior variance for these animals. It was also assumed that all the animals with external breeding values included were unrelated. Quaas and Zhang (2006)
and Legarra et al. (2007)
give a detailed description on how to include as prior information the evaluations done by other breed associations.
Statistical Models
MT. Under the MT model, the record of trait t, belonging to the ith animal, offspring of the dth dam, and produced in the jth contemporary group, was explained by this linear model:
![]() |
where
![]() |
In the random part of the model, ati = the additive genetic effect of animal i for trait t; mgtd = the maternal additive genetic effect of dam d for trait t; metd = the maternal environmental effect of dam d for trait t; and etjid = the random residual effect for the trait t of the animal i.
In the fixed part of the model, CGtj is the contemporary group (CG) effect j for the trait t; for BWT, it was defined based on sex, breeder-assigned group, and 90-d group definition; for WWT, it was defined based on sex, breeder-assigned group, and weaning weigh date; and for YWT, it was defined based on WWT CG, breeder-assigned group, and yearling weigh date. Also, all records in single-calf CG were pruned. It was found that
is a cubic polynomial function of the deviation of the actual age of the animal at the time of weighing to the expected measurement ages, MESt (205 d for WWT and 365 d for YWT). This polynomial function is not considered in the model for BWT. In addition, Dhki = the coefficient of expected fraction of F1 heterosis in individual i for the group combination k. These coefficients were computed based on the assumption that heterosis effects are primarily due to genetic dominance interactions, as indicated by Klei et al. (1996)
. For the group combination k, which involves the breeds f and h, the equation for computing this coefficient for animal i is:
![]() |
where n = the number of different breeds; pjsirei = the proportion of breed j in the sire of animal i; and pgdami = the proportion of breed g in the dam of animal i; and I(·) = an indicator function that takes the value 1 if the argument is satisfied and the value 0 otherwise. Similarly, Mhkd = the coefficient of expected fraction of F1 heterosis in the dam d of the individual i for the group combination k and was computed in the same way as Dhki, where the breed composition in the parents of the dam was the determining factor. In addition, Dβtk and Mβtk were linear regression coefficients for the expected fraction of F1 heterosis for the trait t and the group combination k, for direct and maternal heterosis, respectively.
In the MT model, the variance-covariance structure was:
![]() |
where A = the usual matrix of additive relationship between animals; I = the identity matrix; and the variances-covariances matrix between traits G, MG, Co-v(a,mg), ME, and R were equal to those reported by Legarra et al. (2004)
.
RR-spline. Under the RR-spline model, the tth measurement on animal i, offspring of the dam d, and produced in the jth contemporary group, was explained by this linear model:
![]() |
where
![]() |
The random effect part of the model was fitted using linear spline functions, with knots at 1, 205, and 365 d of age. Thus, aik, mgdk, pik, and medk represent the coefficients for the knot k in the linear splines describing the additive genetic value for animal i, the additive genetic maternal value of dam d, the permanent environmental effect of animal i, and the maternal environmental effect of dam d, respectively. In addition,
k(tti) represents the value of the kth covariate at time t in individual i. These covariates were computed as described by Misztal (2006)
. When tti was beyond the last knot, the covariate for this last knot was computed assuming an increasing function of time [i.e., (tti – 365)/ (365 – 205)] for direct effects and constant (i.e., 1) for the maternal effects.
Regarding the fixed part of the model, contemporary groups were defined in the same way as in the MT model. The age of the animals at measurement was fitted using a 3-knot linear spline, with covariates as in the direct genetic random part of the model, and coefficients Aβk. The heterosis effects were considered to be a function of time. This function was again a linear spline, with knots at 1, 205, and 365 d for the direct heterosis, similar to the random direct part of the model. Beyond the last knot, the covariate
3(>365) was computed assuming the same increasing function of time. For the maternal heterosis, this function was considered, as for the maternal random part, to be constant.
In the RR-spline model, the variance-covariance structure was:
![]() |
The difference with the previous model was that R was split into 2 terms. The P matrix included the residual covariances in the MT model and the permanent environmental variances. The residual variance, which in this model was assumed to be heterogeneous, was described as a function of age:
![]() |
This is an exponential function of a linear spline; the coefficients were obtained from Legarra et al. (2004)
, and the covariates
k(ti) were computed in the same way as the covariates of the linear splines of the direct genetic random effects. Legarra et al. (2004)
got these coefficients by splitting the residual (co)variance matrix from the MT definition of growth into a matrix of random permanent environmental (co)variances and a diagonal matrix of residual variances using least squared procedures.
In the case of the RR-spline model, the matrixes G, MG, Cov(a,mg), P, and ME refer to variance-covariances between coefficients of the spline functions. But because the knots were placed at 1, 205, and 365 d, which were assumed to be the ages at birth, weaning, and yearling, these matrixes were equal to those used in the MT analysis.
For both models, different
prior parameters were assessed, but the same value was employed for both BOF and heterosis effects. The values were 1.0e + 2, 1.0e – 2, and 1.0e – 5 representing weak, medium, and strong weight on prior information. All of the analyses were carried out using a modified version of BLUP90-IOD (Tsuruta et al., 2001
) with modifications consisting of the incorporation of prior information; details on these modifications were described by Legarra et al. (2007)
.
Comparisons between models were done in terms of estimated values for some fixed effects, correlations between EBV of bulls, genetic trends, and approximated accuracies. Approximated accuracies were computed using the procedure described by Tier and Meyer (2004)
, and prior information was not considered.
| RESULTS |
|---|
|
|
|---|
|
|
Pearson correlations between the EBV of sires with more than 20 offspring with BWT records obtained after MT or RR-spline analysis of D1 for each of 3 scenarios of prior information are shown in Table 3
. Fairly good agreement was observed between models; these correlations ranged from 0.92 to 0.99. But as would be expected for WWT and YWT, the lowest values were observed, due to the age variation, within these 2 traits. For BWT, there was no variation in age, thus the models were almost the same. When the top 100 ranked bulls were compared across methods (Table 3
), it was observed that for all the scenarios of prior information for direct WWT and YWT effects, from 12 to 25 bulls were dropped from the top 100 ranked bulls after using the MT when compared with the RR-spline model. For maternal effects, the number of bulls that differed in the top 100 between the 2 models was slightly larger, from 16 to 39. However, it has to be noted that in the case of direct effects, many bulls not included in the top 100 were close to the threshold; the average distances of the dropped bulls to position 100 ranged from 3 (BWT, medium weight on the prior) to 45 (YWT, strong weight on the prior) positions. For maternal effects, these distances were larger; for example, in the situation of weak prior for WWT, it was 177 positions.
|
|
|
|
|
| DISCUSSION |
|---|
|
|
|---|
With regard to convergence issues, our results were opposite to a previous finding involving a purebred population (Robbins et al., 2005
), in which they observed better convergence rate in RR-spline models than in MT models. Most likely, the convergence rate in the RR-spline model could be improved by a reparametrization of coefficients resulting in diagonal (co)variances submatrices for these coefficients. This strategy was employed previously in maternal models by Bohmanova et al. (2005)
, although gains by the reparametrization of spline models were much smaller than for polynomial models. Number of rounds until convergence in our case was high; however, a very restrictive convergence criterion (5.0e – 13) was employed. One explanation for the observed increase in the number of rounds to reach convergence when prior information was included could be that this information disagreed with the unreliable information contained in the data.
In the analysis of commercial MB populations, several points have to be considered. The first is that some breeds have to be clustered or grouped into broad categories to reduce the number of effects, and the second is that prior information regarding BOF and heterosis effects has to be included in the analysis, and this information has important consequences in the EBV of breeds poorly represented in the data.
Our results suggest that random regression-linear spline models can also be useful in MB populations. In this case, as under MT models, the major problem is to estimate the heterosis effects and the additive genetic effects of each breed. Our solution, to fit the heterosis effect as a time-dependent effect and this time dependence to be described with a linear spline, is very easy to implement mainly because the needed prior function will be fully defined by the knot values, which in our case are the same as for a MT model.
The key question is how to define the weights assigned to the prior information; in this study, the weight assigned to the priors was done in an arbitrary manner, and the same weight was assigned to both BOF and heterosis effects. A desirable feature of a MB analysis system would be the capability of defining the prior variances based on the amount and, most important, the structure of the data for any particular effect. For example, if there are a number of CG containing the purebreds of 2 particular breeds and crossbreds composed of these 2 breeds, then the estimates should be based primarily on the data. If this contemporary structure is not available in the data, then the estimates should be based primarily on prior knowledge. The problem here is how to determine if the proper structure exists for any particular effect in large data sets. Perhaps the effort in implementing such a structure determination procedure would not be very useful, because it is very likely that the majority of the CG would not contain the proper purebred and crossbred numbers for the breeds of interest. For example, in the data set used in the current study, there were 80,868 CG for BWT; 33% of them included purebred animals, and 96% included crossbreds. Of the CG that contained crossbreds, 31% (23,772 CG) also included purebred animals; this means that most of the purebred animals are in CG with crossbreds; however, the opposite does not occur, thus 69% (53,828 CG) of the contemporary groups containing crossbreds did not include purebreds. Almost three-quarters of the CG (57,096) were not useful at all for the estimation of heterosis effects. This description of the CG structure did not consider the particular breeds, the breed proportions of the crossbreds, nor the number of animals in each contemporary group. By considering these points, the number of properly structured CG for estimation of heterosis and BOF effects would be further reduced.
In general, this study indicates that the assumed prior variance does not have a large effect in the ranking of the bulls with moderate to large offspring groups with records for direct effects, but for maternal effects, the value of the variance of the priors is more important. However, some of the correlations were as low as 0.87; therefore, changes in the top-ranked bulls are expected to occur, particularly for WWT and YWT maternal and direct effects. In situations of breeds that are poorly represented in the data, contradictory results in the gametic trends and consequently in the rankings for direct effects could occur if prior information is not considered. A pragmatic strategy could be to assume small prior variances, especially for the heterosis effects, because the prior values are expected to be much more reliable than those estimates that could be obtained directly from the data. It has to be noted that this strategy could be equivalent to a precorrection of the data for heterosis and BOF effects. As indicated by Quaas and Pollak (1999)
, estimation errors for cross-breeding parameters from large field data sets containing crossbreed populations are expected to be small given the large number of animals involved, but they most likely are going to show bias due to the collinearity between CG and breeds in the data set. One possibility that should allow the estimation of these parameters would be to create in well-connected herds the diallelic structure necessary to properly estimate these parameters, thus avoiding the subjective task of assigning prior variances. However, this implementation would require a vast effort from the breed associations to create these diallelic herds. An alternative, based on a close collaboration between breed associations and research institutions, could be to directly include the data from government and university breed characterization experiments that have been used as the information source for the definition of prior means.
The inclusion of animals outside of standard age ranges is relatively straightforward with a RR-spline model; however, this is not the case with MT approaches. Because current age correction factors (Beef Improvement Federation, 1996
) were established using specific age ranges, using these adjustment factors for animals very far outside the standard age ranges could lead to biases in EBV predictions. A possible option could be to make additional growth traits at other ages besides 205 and 365; however, this would increase computational complexity, and it may be difficult to find enough data for other ages to allow adequate estimation of parameters and age correction factors. Moreover, the RR-spline approach provides more information than the usual MT analysis. For example, the RR model provides a curve of the EBV for each animal over the age range, whereas the MT model will only provide EBV at particular points of the growth curve. Our results indicate that minor changes are obtained in the overall rank of bulls after the inclusion of extra information from animals outside the normal measurement age range. However, around 25 of the top 100 bulls after the analysis of the reduced data set will not be present when using the expanded data set. However, these bulls dropped from the top 100 bulls are going to be, in general, close to the selection threshold.
The MB models proposed in this study should be considered as an operational tool for obtaining EBV in a large-scale MB population. These models were previously proposed in MT settings by other authors (Arnold et al., 1992
; Klei et al., 1996
). The models are very simplistic in the sense that they are far from what genetic theory proposes (Lo et al., 1993
, 1995
; Wolf et al., 1995
). Recombination loss, heterogeneous variances across breeds, and segregation variances were all ignored. These features have recently been considered by Cardoso and Tempelman (2004)
in a small population with only 2 breeds and their crosses. In a large-scale commercial application, these extra features will be difficult to consider, because the data structure will not allow the estimation of recombination loss, and computational requirements may not allow the estimation of segregation and breed group genetic variances. Also, CG could be considered as random with either a diagonal or an autocorrelation structure (Wade et al., 1993
). This may improve accuracy, especially when CG are small (Ugarte et al., 1992
), but also it could introduce bias. Treating CG as random likely would require adding other fixed effects to account for time trend x herd interactions.
It could be argued that the knot positions selected for use in the RR-spline model were arbitrary. Firstly and most importantly, these knots were chosen because records were distributed around the selected ages, and secondly, by selecting the knots to be the same as the ages in the MT model, the results of the RR-spline analysis are directly comparable to those provided by a MT model, and moreover, the same (co)variance matrices could be used.
In the current study, only early growth traits were considered; however, it would be desirable to fit adult weights to provide genetic evaluations for mature weight. To implement such an evaluation using RR-spline models, at least 1 more knot would need to be added past 365 d. Determining the position of this knot would be more difficult, because the records for mature weight will not be distributed around particular ages.
Previous studies (Legarra et al., 2004
) have indicated that the genetic correlations between BWT and latter weights decrease very fast, which could be an indication that BWT and postnatal growth are different biological processes; moreover, records before 100 d are quite scarce. Thus, it could make sense to consider BWT as a separate but correlated trait from the rest of the traits along the growth curve. If BWT were considered as a separate trait, it would be necessary to replace the knot at birth in the RR-spline with another knot assigned before 205 d. A decision will need to be made as to which age to place the knot before weaning as well as for mature weight.
The application of random regression using linear splines has been extended to the genetic evaluation of a MB population. The application of these models could be useful in genetic evaluation by allowing the inclusion of out-of-range age records that are eliminated in current MT analyses, which should lead to increases in the accuracy of evaluation and genetic progress. The use of prior information for heterosis and additive genetic effects of breeds is mandatory, and although the overall rankings are relatively robust to this information, the evaluations of particular breeds can be greatly affected. In an applied genetic evaluation framework, a reasonable strategy could be to use very small, with respect to the residual and genetic variances, prior variances for BOF and heterosis effects or alternatively precorrect the records for theses effects using literature values as correction factors.
| Footnotes |
|---|
2 Corresponding author: juansan{at}uga.edu
Received for publication January 24, 2007. Accepted for publication October 2, 2007.
| LITERATURE CITED |
|---|
|
|
|---|
This article has been cited by other articles:
![]() |
I. Aguilar, I. Misztal, and S. Tsuruta Genetic components of heat stress for dairy cattle with multiple lactations J Dairy Sci, November 1, 2009; 92(11): 5702 - 5711. [Abstract] [Full Text] [PDF] |
||||
![]() |
B. L. Golden, D. J. Garrick, and L. L. Benyshek Milestones in beef cattle genetic evaluation J Anim Sci, April 1, 2009; 87(14_suppl): E3 - E10. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. P. Sanchez, I. Misztal, and J. K. Bertrand Evaluation of methods for computing approximate accuracies of predicted breeding values in maternal random regression models for growth traits in beef cattle J Anim Sci, May 1, 2008; 86(5): 1057 - 1066. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |