|
|
||||||||
ANIMAL GENETICS |
Animal and Dairy Science Department, University of Georgia, Athens 30602-2771
Abstract
A simulation study was conducted to compare methods for handling censored records for days to calving in beef cattle data. Days to calving was defined as the time, in days, between when a bull is turned out in the pasture and the subsequent parturition. Simulated data were generated to have data structure and genetic relationships similar to an available field data set. Records were simulated for 33,176 daughters of 4,238 sires. Data were simulated using a mixed linear model that included the fixed effects of contemporary group and sex of calf, linear and quadratic covariates for age at mating, and random effects of animal and residual error. Two methods for handling censored records were evaluated, and two censoring rates of 12 and 20% were applied to assess the influence of higher censoring rates on inferences. Censored records were assigned penalty values on a within-contemporary group basis under the first method (DCPEN). Under the second method (DCSIM), censored records were drawn from their respective predictive distributions. A Bayesian approach via Gibbs sampling was used to estimate variance components and predict breeding values. Posterior means (PM) and standard deviations (SD) of additive genetic variance for DCPEN at 12 and 20% censoring were 23.2 (3.7) and 21.0 (3.6), respectively, whereas the same estimates for DCSIM at 12 and 20% censoring were 23.7(3.3) and 21.9 (3.4), respectively. In all cases, the true value of the genetic variance was within the high posterior density (HPD) interval (95%). The PM (SD) of residual variance for DCPEN at 12 and 20% censoring were 415.7 (4.7) and 440.0 (4.8) respectively, whereas the same estimates for DCSIM at 12 and 20% censoring were 371.0 (4.3) and 365.4 (4.4), respectively. The true value of the residual variance was within the HPD (95%) for DCSIM, but it was outside this interval for DCPEN at both censoring rates, indicating a systematic bias for this parameter. Bayes Factor and Deviance Information Criteria were used for model comparisons, and both criteria indicated the superiority of the DCSIM method. However, little difference was observed between the two methods for correlations between true breeding values and posterior means of animal effects for sires, indicating that no major reranking of sires would be expected. This finding suggests that either censored data handling technique can be successfully used in a genetic evaluation for days to calving.
Key Words: Beef Cattle Censored Records Fertility
Introduction
Fertility or reproductive performance has been reported to be at least twice as important, economically, as production traits under a conventional cow-calf operation (Melton, 1995
). Johnston and Bunter (1996)
demonstrated that days to calving, defined as the days between the time a bull is turned out in the pasture and the subsequent parturition, was a suitable measure of reproductive performance in a large field data set. In the same study, cows with censored records (i.e., cows that failed to calve) were assigned a projected value on a within-contemporary group basis; therefore, all cows within a contemporary group that failed to calve received the same trait value.
An alternative approach to adding a fixed number of days to censored records would be to assume a truncated normal distribution for the uncensored records, and randomly draw from this truncated distribution to obtain a record for censored females on a contemporary group basis. This would allow the data to determine the trait value for censored females.
Studies in dairy cattle have shown that survival analysis is useful for evaluating longevity (Ducrocq, 1994
) and fertility traits (Eicker et al., 1996
). Although survival analyses provide better statistical modeling of censored data, the high computational requirements associated with these nonlinear analyses hinders their use with an animal model and large data sets.
The objective of this study was to compare methods for handling censored fertility records in beef cattle data. Two methods for handling censored records were evaluated and compared; assigning a penalty value on a within-contemporary group basis or simulating records for these animals from their respective predictive distributions. A unique data set was simulated and two levels of censoring and two methods of handling censored records were investigated. Bayes Factor and Deviance Information Criteria were used to assess the plausibility of the two approaches.
Materials and Methods
Data Simulation
The objective of the project was to assess the suitability of two models, and data were simulated using known parameters obtained from a field data set (Donoghue et al., 2003
). The simulated data set had a structure that was similar to that of the field data and had an identical pedigree file that included three generations. The field data set contained days to calving trait records on 33,176 first-calf heifers from Australian Angus herds. There were 4,238 sires and 470 herds represented in the data set, and 62,857 animals in the pedigree file. In this study, a trait record was simulated for each of the 33,176 females. The trait of days to calving was defined for natural service matings as by Johnston and Bunter (1996)
; that is, the time difference, in days, between when a bull is turned out in the pasture and the subsequent parturition of the female. Data were simulated to mimic days to calving trait records used for genetic evaluations in Australia using a mixed linear model that included the fixed effects of contemporary group and sex of calf, linear and quadratic covariates for age at mating, and random effects of animal and residual error. Values of linear and quadratic covariates were obtained from field data. Contemporary group was defined to include animals from the same herd who were mated in the same month and year to the same sire, and these effects were drawn from a uniform distribution U[210,510]. The effect of sex of calf was drawn from a normal distribution N(50,400) and was randomly assigned to either male or female for all animals with censored records. The random effect of animal was sampled from a normal distribution with mean zero and variance
. The residual terms were sampled from a normal distribution with mean zero and variance
. To assess the influence of higher censoring rates on inferences, two censoring rates of 12 and 20% were applied. Animals were selected to be censored on a contemporary group basis; all animals within a contemporary group were ranked by their trait records, and animals with the highest trait records were chosen to be censored. The number of cows within a particular contemporary group that were censored was random. After applying censoring rates of 12 and 20%, there were 4,109 and 6,696 females, respectively, with censored records (i.e., they failed to calve).
Data Analyses
Penalty Method.
The penalty method (DCPEN) assigned penalty values to each censored record on a within-contemporary group basis. As suggested by Johnston and Bunter (1996)
, the highest trait record within each contemporary group was identified, and a constant number (21 d) was added to this record to generate the projected value for all censored records within that group. This constant number is equal to the length of the estrus cycle in cattle, suggesting that females failing to calve would have conceived if given an extra cycle with the bull.
Simulation Method.
The simulation method (DCSIM) assigned trait values for censored records by simulation from their respective predictive distributions (truncated normal distributions). For all animals in the same contemporary group, the truncation point was the largest observed trait record. The predicted days to calving for a censored record was between the truncation point and positive infinity. Thus, an animal with a censored record could not receive a simulated record that was smaller than a noncensored record within her contemporary group. The number of days added to this truncation point for each of the censored records was determined by drawing samples at random from the truncated distribution and depended on the fixed effects in the model, as well as her relationships with other animals, as explained below.
Data Analysis.
A single-trait mixed linear model was used for analysis of days to calving. In matrix notation, the following model was adopted:
![]() |
where y is a vector of 33,176 observations, b is the vector of fixed effects, u is the vector of additive genetic values of all animals, e is the vector of residual terms, and X and Z are known incidence matrices. The vector b included 3,568 contemporary group effects, 2 sex of calf effects, and linear and quadratic covariates for age at mating.
The vector y includes uncensored data points y0 (m x 1) and censored records yc (n - m x 1), where n is the total number of observations such that
. If we let
, (i = 1, 2, . . ., n), then the density of the conditional distribution of the uncensored and censored records, given the parameters, can be derived easily (see Guo et al., 2001
).
Augmenting the posterior distribution with the unobserved calving dates corresponding to the censored observations (Tanner, 1996
) simplifies the procedure. Let w = {wj} with wj > cj, j = m + 1, m + 2, . . ., n and cj is the value of the trait at censoring time. Thus, the augmented data vector is
. Taking into account the augmented data w, the new vector of unknown parameters of the model will be
. To complete the Bayesian formulation, the following prior distributions (mean, d and variance, d2) were assumed to the unknown parameters:
![]() |
![]() |
where A is the matrix of additive relationships between animals and
is the additive variance,
![]() |
![]() |
where U[.] is the uniform distribution.
The augmented joint posterior density can be expressed as
![]() |
where I(.) is an indicator variable denoting censoring of record i (i = m + 1, m + 2, . . ., n).
Following the notation of Guo et al. (2001)
, let
, where
, the full conditional distributions required for the implementation of the Gibbs sampling are easily obtained:
![]() |
where w_j is w without wj
![]() |
where C-1 is the inverse of the coefficient matrix and
is the inverted chi-squared distribution with
degree of freedom.
For each level of censoring (12 and 20%), variance components were estimated and breeding values were predicted using the two methods for handling censored records. As well, variance components and breeding values were obtained for the complete noncensored data set. For each analysis, five replicates were simulated. Results from the average of these analyses were compared to the true variances, fixed effect, and breeding value solutions for the simulated data.
Model Comparison
Bayes Factor.
The Bayes factor, defined by Newton and Raftery (1994)
, was used to assess the feasibility of the two models. The harmonic means of likelihood values evaluated at the posterior draws were estimated from the marginal density of the data under each of the models:
![]() |
where y is the vector of observed responses and
(j) is the Gibbs sample j of parameters under model Mi. The estimated Bayes factor between models Mi and Mj is
![]() |
Deviance Information Criterion.
The deviance information criterion (DIC), as defined by Spiegelhalter et al. (2002)
, was used to compare models:
![]() |
where
is posterior expectation of the Bayesian deviance D(
) = -2 log p(y |
) and
is the effective number of parameters.
It can be seen that DIC consists of a measure of goodness of fit
, with a penalty (pD) for increasing model complexity. The DIC is easy to estimate within a Markov Chain Monte Carlo implementation, as the Bayesian deviance, D(
), is computed at each iteration. At the end of iterations, the mean of the Bayesian deviance,
, and the mean value of the model parameters,
, are calculated, leading directly to the DIC.
Results and Discussion
For all analyses, convergence was assessed using methodology presented by Raftery and Lewis (1992)
. The required length of the burn-in period was always less than 2,500 iterations for all parameters. Thus, 75,000 iterations of the sampler were run with a conservative 20,000 iterations discarded as burn-in; all remaining 55,000 iterations were retained without thinning for post-Gibbs analysis.
Summaries of the posterior distributions of genetic parameters for days to calving under different censoring scenarios are presented in Table 1
. Posterior means of the additive variance under both methods (DCPEN and DCSIM) for both levels of censoring (12 and 20%) were similar to the true value (
). However, there was a tendency of underestimation of true genetic variance, especially for the 20% censoring case. As censoring rate increased, the posterior mean of the additive variance decreased slightly; 23.2d2 and 23.7d2 vs. 21.0d2 and 21.9d2 for DCPEN and DCSIM at 12 and 20% censoring, respectively. Guo et al. (2001)
assessed the influence of higher censoring rates on parameters for performance and prolificacy traits in swine using an approach similar to the present study. The authors observed decreasing sire variances for higher levels of censoring, similar to the trend in additive variances at higher levels of censoring in this study. In all cases, the true value of the genetic variance was well within the high posterior density (HPD) interval (95%). These results indicate that the method of handling censored fertility records did not have a significant impact on estimation of additive variance.
|
). Estimates of this parameter under the DCPEN method, however, were significantly higher than the true value of the parameter and beyond the expected Monte Carlo error. In fact, the true value of the residual variance was outside the HPD (95%) interval for both censoring rates, indicating a systematic bias inferring this parameter. These results imply that the DCSIM method provides a better fit to the data when censored records are present. As censoring rate increased, the posterior mean of the residual variance increased markedly under DCPEN (415.7d2 vs. 440.0d2 for 12 and 20% censoring, respectively). Guo et al. (2001)The posterior means of heritability were similar to the true value of the parameter (h2 = 0.06) under DCSIM for both levels of censoring. As a result of the overestimation of the residual variance under DCPEN, the posterior means of heritability for both censoring levels were slightly smaller than the true value of the parameter. However, in both cases, the true value of the parameter was well within the HPD (95%).
Pearson correlations between true breeding values and posterior means of animal effects for sires are given in Table 2
. As expected, these correlations increased with the number of progeny. Correlations based on the complete data (no censoring) are slightly higher than those for DCPEN and DCSIM for both levels of censoring. Correlations for DCPEN and DCSIM across both levels of censoring were very similar, implying that the level of censoring appeared to have little effect on ranking of sires. There was little difference observed in correlations between DCPEN and DCSIM as the number of progeny with records increased. These results indicate that DCSIM may be slightly more accurate than DCPEN in predicting the true genetic values of these sires; however, no major reranking of sires will be expected between the two methods of handling censored records.
|
|
, but received a larger penalty value (pD) for having a more complex model. Despite this higher penalty value, however, DCSIM had a lower DIC value, implying that it is the superior model when both goodness of fit and model complexity are considered. The superiority of the DCSIM method was further confirmed by the estimated Bayes factor between models (3,563.5), showing that DCSIM was a significantly better model than DCPEN.
|
There were small differences observed in the estimates of additive genetic variance and rankings of sire breeding values for the two methods of handling censored records. However, the overestimation of the residual variance and, consequently, the underestimation of heritability under the penalty method, in conjunction with the model comparison criteria, indicate that the simulation approach provides a better method for handling censored records in beef fertility data, especially at higher levels of censoring. The penalty method does seem to be significantly overestimating the censored records; however, the lack of significant differences in the genetic ranking of sires between the two methods suggests that either censored data handling technique can be successfully used in a genetic evaluation for days to calving. Further research employing both methods to predict days to calving in beef cattle field data should be undertaken to verify the results of this simulation study.
Footnotes
1 Appreciation is extended to the Angus Society of Australia for providing the data; Meat and Livestock Australia for the research scholarship provided to the first author; and D. J. Johnston and C. Teseling for their contributions. ![]()
2 Present address: Animal Genetics and Breeding Unit, University of New England, Armidale, NSW 2351 Australia. ![]()
3 Correspondence: Edgar L. Rhodes Center for Animal and Dairy Science (phone: 706-542-0964; fax: 706-583-0274; e-mail: jkbert{at}uga.edu).
Received for publication May 2, 2003. Accepted for publication September 29, 2003.
Literature Cited
Donoghue, K. A., R. Rekaya, J. K. Bertrand, D. J. Johnston, and C. Teseling. 2003. Comparison of methods for handling missing fertility records in beef cattle data. J. Anim. Sci. 81(Suppl. 1):350. (Abstr.)
Ducrocq, V. 1994. Statistical analysis of length of productive life for dairy cows of the Normande breed. J. Dairy Sci. 77:855866.[Abstract]
Eicker, S. W., Y. T. Grohn, and J. A. Hertl. 1996. The association between cumulative milk yield, days open, and day to first breeding in New York Holstein cows. J. Dairy Sci. 79:235241.[Abstract]
Guo, S-F., D. Gianola, R. Rekaya, and T. Short. 2001. Bayesian analysis of lifetime performance and prolificacy in Landrace sows using a linear mixed model with censoring. Livest. Prod. Sci. 72:243252.
Johnston, D. J., and K. L. Bunter. 1996. Days to calving in Angus cattle: Genetic and environmental effects, and covariances with other traits. Livest. Prod. Sci. 45:1322.
Melton, B. E. 1995. Conception to consumption: The economics of genetic improvement. Pages 4047 in Proc. Beef Improv. Fed. 27th Res. Symp. Annu. Mtg., Sheridan, WY.
Newton, M. A., and A. E. Raftery. 1994. Approximate Bayesian inference with the weighted likelihood bootstrap. J. R. Statist. Soc. 56:348.
Raftery, A. E., and S. Lewis. 1992. How many iterations in the Gibbs sampler? Pages 763773 in Bayesian Statistics 5. J. M. Bernando, J. O. Berger, A. P. Dawid, and A. F. M. Smith, ed. Oxford University Press, New York.
Spiegelhalter, D. J., N. G. Best, B. P. Carlin, and A. van der Linde. 2002. Bayesian measures of model complexity and fit. J. R. Statist. Soc. B 64:583639.
Tanner, M. A. 1996. Tools for Statistical Inference: Methods for the Exploration of Posterior Distributions and Likelihood Functions. 3rd ed. Springer-Verlag, New York.
This article has been cited by other articles:
![]() |
J. I. Urioste, I. Misztal, and J. K. Bertrand Fertility traits in spring-calving Aberdeen Angus cattle. 1. Model development and genetic parameters J Anim Sci, November 1, 2007; 85(11): 2854 - 2860. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. Casellas, G. Caja, A. Ferret, and J. Piedrafita Analysis of litter size and days to lambing in the Ripollesa ewe. I. Comparison of models with linear and threshold approaches J Anim Sci, March 1, 2007; 85(3): 618 - 624. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. Casellas, G. Caja, A. Ferret, and J. Piedrafita Analysis of litter size and days to lambing in the Ripollesa ewe. II. Estimation of variance components and response to phenotypic selection on litter size J Anim Sci, March 1, 2007; 85(3): 625 - 631. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. Antonini, M. Trabalza-Marinucci, R. Franceschini, L. Mughetti, G. Acuti, A. Faba, G. Asdrubali, and C. Boiti In vivo mechanical and in vitro electromagnetic side-effects of a ruminal transponder in cattle J Anim Sci, November 1, 2006; 84(11): 3133 - 3142. [Abstract] [Full Text] [PDF] |
||||
![]() |
K. A. Donoghue, R. Rekaya, and J. K. Bertrand Comparison of methods for handling censored records in beef fertility data: Field data J Anim Sci, February 1, 2004; 82(2): 357 - 361. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |