|
|
||||||||
ANIMAL GENETICS |
Animal and Dairy Science Department, University of Georgia, Athens 30602-2771
| Abstract |
|---|
|
|
|---|
Key Words: beef cattle binary data fertility fuzzy logic threshold model
| INTRODUCTION |
|---|
|
|
|---|
One proposed way to account for this uncertainty is to use fuzzy logic classification. Sapp et al. (2005)
evaluated different methods of analyzing such data in a simulation study. They concluded that a threshold model that disregarded misclassification or uncertainty of binary responses could lead to biased inferences. In fact, based on 10 replicates, the genetic variance was severely biased when the data were analyzed ignoring potential misclassifications. Based on these results, they recommended that uncertain or potentially misclassified binary responses be analyzed using a threshold model that contemplates misclassification. Therefore, the objective of the current study was to apply similar methodology to evaluate the usefulness of a threshold model with fuzzy logic classification to account for potential miscoding of the binary trait success or failure of first insemination in a field data set.
| MATERIALS AND METHODS |
|---|
|
|
|---|
The data consisted of NS mating and calving records of first-parity females from Australian Angus herds. Females having their first mating record between 270 and 625 d of age were retained for analysis. Before editing, there were 36,097 records from first-parity females mated by NS between 1987 and 2000 available in the database. Females whose mating resulted in multiple births (n = 288) or those with incomplete records (n = 460) were removed before analysis. Incomplete records included those in which either the sex of calf was unknown or the mating sire was unknown. Gestation length, computed using AI data, was defined as the difference between the insemination date and the subsequent calving date and was averaged by sex of calf as reported by Donoghue et al. (2004)
. Average GL (SD) was 279.2 d (5.2 d) and 280.3 d (5.2 d) for female and male calves, respectively. Mating records where GL was >2 SD less than the mean derived from AI data were considered to be outliers and were removed during the editing process (n = 225). All herd groups containing only one record, records from service sires with <3 calves, and all records from herds resulting in extreme category problems were removed from the data set. Finally, all service sire groups resulting in extreme category problems were grouped together in one unknown service sire group. After edits, the final data set included 33,099 first-parity NS mating records representing 4,187 sires. The data structure is presented in Table 1
.
|
There are only 2 sources of information available to ascertain conception, both occurring well after the first 21 d of the breeding season: pregnancy checking by either ultrasound or rectal palpation or a calving event. In this study, the latter is used and the underlying assumption is that if conception occurs, a corresponding calving event will occur as well. This may not be true, as it is possible for a female to conceive, but for some reason she may not carry her pregnancy full-term. The only information available in the current data set was days to calving (DC), which was computed as the time elapsed between the introduction of the bull and the subsequent calving date (Johnston and Bunter, 1996
). Success or failure at first insemination (conception during the first 21 d; FIS) was based on the difference between DC and an average GL, where the average GL differed by sex of calf. If the difference between DC and average GL was
21 d, then FIS = 1; otherwise, FIS = 0.
Given the variation in GL between cows, it was possible that some cows had uncertain or miscoded FIS. Furthermore, the variation in GL and the difference between DC and average GL could be used to assess the uncertainty or probability of miscoding for every FIS record. One way to account for this uncertainty would be to use fuzzy logic classification, which uses imprecise propositions based on fuzzy set theory to assign partial membership of a set (Chen and Pham, 2001
). In the current study, fuzzy logic classification, based on the binary response of FIS and the difference between DC and average GL, was used to calculate the probability of miscoding at time ti as described by Sapp et al. (2005)
. Two analyses were carried out: 1) analysis without consideration for potential misclassification, using a threshold model (M1); and 2) analysis accounting for potential misclassification, using a threshold model with fuzzy logic classification (M2).
Statistical Analysis and Computations
An animal model was used to investigate 2 methods of analyzing uncertain binary responses for FIS. A linear mixed model at the liability scale, which included systematic effects of herd, year, month of mating effects; linear and quadratic covariates for age at mating; and unrelated service sire, animal, and residual as random effects, was used in the analyses. Threshold models are becoming a standard tool for analysis of discrete data in the field of animal breeding and genetics. Extensive literature on its theoretical basis, implementation, and application has been generated in the last 20 yr (Gianola, 1982
; Gianola and Foulley, 1983
; Sorensen et al., 1995
). More recently, Rekaya et al. (2001)
proposed a method for analyzing binary data subject to misclassification using a threshold model. In the current study, an extension of such a method, based on fuzzy logic classification as presented by Sapp et al. (2005)
, was extended to field data.
Threshold Model for Analysis of Uncertain Binary Responses
A detailed description of the methodology can be found in Rekaya et al. (2001)
and Sapp et al. (2005)
. The threshold concept (Falconer, 1981
), as applied to data of this type, assumes that FIS is controlled by an underlying normal variable, commonly called a liability, which causes the observed binary response once the liability reaches a threshold level. Here, the basic idea consists of assuming that the observed binary data m = (m1, m2,, mn)' are a sample of uncertain (misclassified) binary responses of nonobserved real data y = (y1, y2,, yn)', where each yi was Bernoulli, with success probability pi that was expressed as a function of some systematic and random effects. Uncertainty or misclassification occurred if some yi was switched (e.g., yi = 0 became mi = 1; a 0 was coded as 1). Furthermore, for each observation, an indicator variable
i [
= (
1,
2,...,
n)'] was assumed, which takes the value of 1 if yi was switched, and
i = 0 otherwise. Following notation by Rekaya et al. (2001)
, each
i was assumed to be Bernoulli with success probability
i (probability of misclassification or uncertainty) at time t such that p(
i|
t =
t
i(1
t)(1
i).
Consequently, the following relationship between yi and mi, given
i, could be established as yi = (1
i)mi +
i(1 mi). Note that for
i = (no misclassification), yi and mi are equal as expected. Furthermore, the likelihood function can be written interchangeably as a function of yi or mi (Rekaya et al., 2001
; Sapp et al., 2005
).
A mixed linear model was used for analysis of the underlying liability of FIS. In matrix notation the model could be written as
![]() |
where
was a vector of unobserved liabilities, ß was the vector of fixed effects, s was the vector of unrelated random service sire effects, u was the vector of additive effects, and e was the vector of residual effects. Furthermore, X, Zs, and Zu were the corresponding incidence matrices with the appropriate dimensions.
Using this same notation, let m be defined as the vector of observed uncertain FIS responses. Assuming that y, the vector of unobserved true FIS responses, and X are independent and using the relationship between yi and mi given earlier, the joint probability of
= (
1,
2,...,
n)' and m, gien
= (ß', s', u')' and
= (
t1,
t2,,...,
tn)', was equal to
![]() |
![]() |
![]() |
where pi(
) =
i(x'iß + z'sis + z'uiu) was the probability FIS for record i. The known row vectors were x'i, z'si and z'ui, relating the fixed, service sire, and additive effects to the probability of first insemination success, respectively.
Finally, prior distributions for
and
would complete the Bayesian formulation; however, in some situations,
is known or could be inferred from external information. In this study, a fuzzy logic approach was used to determine the vector
.
If the absolute difference between DC and average GL was <16 d or >26 d, there was no uncertainty about the observed FIS response. Otherwise, the following fuzzy logic functions were used to compute the probability of miscoding at time ti (see Figure 1
):
|
![]() |
and
![]() |
To ensure proper posterior distribution, the following priors were assumed for the parameters in the model:
![]() |
where ß = (ß'h, ß'h)', with ßh being the vector of herd effects and ßh being the vector of all fixed effects except herd effects, and
![]() |
where I was the identity matrix and A was a known matrix of relationships between animals. Uniform bounded priors were assumed for
s2
u2
h2.
The joint posterior density is proportional to the product of the density of the conditional distribution times the joint prior density. Samples from the conditional posterior distribution were obtained via Gibbs sampler. After augmentation of the joint posterior with the liabilities (Albert and Chib, 1993
; Sorensen et al., 1995
), all conditional posterior distributions of model parameters were in closed form and easy to sample from as described by Rekaya et al. (2001)
and Sapp et al. (2005)
. These distributions were normal for the location parameters, truncated normal for each of the liabilities, binomial for the indicator parameters
i, and scaled-inverted
2 distributions for the dispersion parameters. Liabilities were sampled from their truncated normal distribution using an inverse cumulative distribution function technique (Devroye, 1986
).
Convergence
Convergence diagnostics were based on the method of Raftery and Lewis (1992)
as implemented in the BOA software (Smith, 2003
). The required burn-in period was always <3,000 iterations for all parameters in the analyses. Thus, a total chain length of 150,000 iterations of the Gibbs sampler was run with a conservative burn-in of 50,000 iterations. The remaining 100,000 iterations were retained without thinning for post-Gibbs analysis.
| RESULTS AND DISCUSSION |
|---|
|
|
|---|
The posterior mean, SD, and the high posterior density 95% [HPD (95%)] interval for service sire, herd, and additive variance are presented in Table 2
. The results suggest that both service sire and herd variance were less affected by misclassification than additive variance. This finding agrees with the findings of Sapp et al. (2005)
, who used simulated data to determine that both service sire and herd variances were less affected by misclassification due to the large number of records per service sire or within a given herd. However, point estimates of service sire (0.135) and herd (0.128) variance using M1 were slightly greater than those obtained using fuzzy logic classification (0.127 and 0.123, respectively). The point estimate from the M1 analysis for service sire variance was close to the upper bound of the HPD (95%) interval of the M2 analysis, suggesting that service sire variance was overestimated using M1 compared with M2. The point estimate for additive variance obtained using M1 (0.055) was significantly greater than that from M2 (0.031) because the point estimate (0.055) of M1 fell outside the HPD (95%) interval of M2 (0.015 to 0.048). This result indicates that the additive variance was overestimated when potential misclassification was ignored.
|
Heritability
The posterior mean, SD, and HPD (95%) interval for heritability are presented in Table 2
. The point estimate of heritability obtained from M1 (0.042) was significantly greater than the estimate obtained from M2 (0.024) and fell outside the HPD (95%) interval for M2. This result was expected, given that the additive variance was significantly greater using M1 compared with M2. The HPD (95%) interval using M2 was narrower than the corresponding interval using M1, suggesting more certainty of the estimate obtained from the analysis that accounted for potential misclassification. The results indicated that ignoring potential misclassifications could result in an overestimation of heritability.
Pearson Correlations
Pearson correlations between M1 and M2 for estimated service sire effects and predicted breeding values of the animals in the pedigree file were 0.99 and 0.98, respectively. These results suggest that no major reranking would be expected for either service sire effects or breeding values of animals between the 2 methods. In field data with fewer records per service sire, as may be the case with younger sires, a change in the rank correlation may be anticipated because of limited information with which to infer those effects. Furthermore, it is known that a change in the heritability in a univariate analysis generally does not profoundly affect the ranking of animals. Nonetheless, we expect that the change in the genetic parameters could have an effect on the ranking if the binary trait with potential miscoding is jointly analyzed with correlated traits.
| IMPLICATIONS |
|---|
|
|
|---|
| Footnotes |
|---|
3 The first and second authors contributed equally to this manuscript. ![]()
2 Corresponding author: e-mail: mspanky{at}uga.edu
Received for publication August 26, 2004. Accepted for publication August 29, 2005.
| LITERATURE CITED |
|---|
|
|
|---|
This article has been cited by other articles:
![]() |
I. David, L. Bodin, G. Lagriffoul, C. Leymarie, E. Manfredi, and C. Robert-Granie Genetic Analysis of Male and Female Fertility After Artificial Insemination in Sheep: Comparison of Single-Trait and Joint Models J Dairy Sci, August 1, 2007; 90(8): 3917 - 3923. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |