|
|
||||||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
ANIMAL GENETICS |
,2
* Department of Animal and Dairy Science, and
Department of Statistics, University of Georgia, Athens 30602
| Abstract |
|---|
|
|
|---|
Key Words: genetic evaluation paternity testing relationship matrix uncertain paternity
| INTRODUCTION |
|---|
|
|
|---|
Sapp (2005)
presented a method for predicting breeding values that does not require construction of the inverse relationship matrix. The method proposed by Sapp (2005)
, which allows for the use of phenotypic information and provides for computation of the probability of paternity based on the likelihood of observing the record(s) and computed breeding values, rather than parental average breeding values of all potential candidate sires, could lead to better parental discrimination.
Therefore, the objective of the current study was to develop a method to enhance the accuracy of paternity prediction in cases in which uncertain paternity exists for some animals, but a limited number of possible sires are identified. The methodology was tested using simulated data for a univariate and multiple-trait situation. For the univariate situation, single and repeated records were simulated using 3 heritabilities. In the multiple-trait scenario, 3 correlated traits were used.
| MATERIALS AND METHODS |
|---|
|
|
|---|
Methodology
The major problem with ascertaining paternity using phenotypic data stems from the fact that the relationship matrix has to be reconstructed for every possible combination of offspring and sire. Although it is theoretically simple to handle this problem, it is not computationally feasible for large data sets. Thus, the method presented by Sapp (2005)
offers a computationally feasible solution that could make genetic evaluation with uncertain paternity possible. The previous methodology proposed by Cardoso and Tempelman (2003)
and Sapp (2005)
for prediction of genetic merit in the presence of animals with uncertain paternity was compared by running both methods on the same data and with the same chain length. Details regarding the simulated data used for the comparison of Cardoso and Tempelman (2003)
and Sapp (2005)
can be found in the simulation section below. Further, a chain length of 10,000 iterations with a burn-in of 5,000 iterations was used in both methods. Computation time was 4,828 s for the method proposed by Cardoso and Tempelman (2003)
and 429 s for the method proposed by Sapp (2005)
. Based on this result, it is clear that the method proposed by Sapp (2005)
presents a computationally feasible solution for prediction of genetic merit in cases in which uncertain paternity exists for some animals, but a limited number of possible sires are identified.
Assume that the observed data, conditionally on the model parameters, is normally distributed
![]() |
where y = the vector of phenotypic observations; ß = the vector of systematic effects of order p; u = the vector of additive animal effects with order q; R0 = the residual (co)variance matrix; I = the identity matrix; and X and Z = the corresponding incidence matrices with the appropriate dimensions.
Further, let us assume that the vector of breeding values (u) is a priori normally distributed
![]() |
where G0 = the genetic (co)variance matrix and A = the relationship matrix between animals.
In the presence of uncertain paternity, A is not completely known. Several methods were proposed for dealing with this issue, including the use of molecular information (Jamieson, 1965
; Garber and Morris, 1983
; Jamieson and Taylor, 1997
), prior information of parentage probabilities (Foulley et al., 1987
; Henderson, 1988
; Famula, 1992
), and even phenotypic data (Cardoso and Tempelman, 2003
), but their usefulness has been limited. Molecular information has been limited by the high cost and amount of time required to genotype numerous animals. Phenotypic information has been limited due to low discrimination among candidate males. However, paternity could be ascertained by making inferences on the unknown elements of the A matrix. In other words, the A matrix is considered as an extra parameter in the model.
Let Si = {s1, s2, ..., sn} be a set of n potential sires for animal i with uncertain paternity. The only information available in the phenotypic data to discriminate among these n potential sires is the likelihood of observing the phenotypic record(s) of animal i given each 1 of the possible sires. Thus,
![]() | [1] |
where sirei = the sire of animal i; sj = the jth potential sire for animal i; yi = the vector of records collected on animal i; xi ' = the matrix relating the observed records of animal i to the fixed effects in ß; uij= the vector of breeding values for animal i given the jth potential sire; and R0 = the residual (co)variance matrix. Thus, the probability of sj being the true sire of animal i is given by:
![]() | [2] |
where the denominator of Eq. [2] is the summation of likelihoods for observing the phenotypic record(s) of animal i given each of the possible sires, sj(k = j = 1, 2,..., n), for animal i.
It is obvious from Eq. [1] and [2] that the breeding values of animal i have to be computed assuming that sj(j = 1, 2,..., n) is the true sire. The methodology proposed by Sapp (2005)
facilitates the implementation, because it does not require reconstruction of the relationship matrix for every possible combination of offspring and sire.
Following notation by Sapp (2005
; see Appendix 1), the conditional distribution of breeding values for animal i given that sj is the true sire is proportional to:
![]() | [3] |
where u–i = the vector of breeding values for all animals except animal i; R0, G0, x'i , yi, and ß are as defined above;
= 0.5, 0.75, and 1.0 if both, 1, or no parents are known, respectively; and o = the number of offspring for animal i. Further, µi = a vector with elements
![]() |
where usi and udi = the breeding values of the sire and dam of animal i, respectively. Similarly, µik = a vector with elements
![]() |
where uk and umi = the breeding values of the offspring k and mate of animal i, respectively.
In the right-hand side of Eq. [3], only the second term, which corresponds to the contribution of the parents in the prediction of the breeding value of animal i, changes every time a sire, sj(j = 1, 2,..., n), is assumed as the true sire.
Thus, in a Bayesian implementation via Markov Chain Monte Carlo (MCMC), a draw from the conditional distribution in [3] will be performed for every conditioning potential sire, sj(j = 1, 2,..., n), in every iteration. The resulting draws, ui1, ui2,..., uin, (uij= the vector of breeding values for animal i assuming that sj was the true sire) will be used to compute the probabilities in Eq. [2].
In each iteration of the MCMC algorithm, the true sire will be sampled from a multinomial distribution with success probabilities calculated as indicated in Eq. [2]. At the end of the sampling process, the probability of each candidate sire being the true sire of a given offspring could be easily computed as:
![]() | [4] |
where PTSij = the probability that sire j is the true sire for animal i.
Simulation
A simulation using an animal model was carried out to investigate a method for assessing paternity using phenotypic records. Data sets were generated under different scenarios: single trait, with 1 record and with 2 repeated records, and multiple trait, with 3 trait records. The pedigree structure was the same for all scenarios. Four overlapping generations were simulated. The base population included 500 unrelated animals, and subsequent generations consisted of 1,000 animals with a total of 3,500 animals generated. The data set consisted of records for animals in generations 2 through 4 (non-base population animals).
One hundred contemporary groups (CG) were simulated, 5 of which were randomly allocated to have all records with uncertain paternity. Additionally, 25 CG were randomly assigned to have a mixture of records with either known or uncertain paternity; the probability of a progeny being assigned as having uncertain paternity was 30% for the 25 CG. The remaining CG (n = 70) contained records with known paternity. Sires were randomly assigned to CG. The 30 CG with uncertain paternity were randomly limited to groups of 2, 3, or 4 candidate sires. Thus, sires could be categorized in 3 different ways: 1) sires having only known progeny; 2) sires having both known and uncertain progeny; and 3) sires having only uncertain progeny.
Single Trait.
A linear mixed model, which included a fixed effect for CG as well as additive breeding values and residuals as random effects, was used to generate the single-trait data. The fixed effect was drawn from a uniform distribution U[41, 43]. Additive breeding values were generated from N(0, A
u2), where A = the additive relationship matrix and
u2 = the genetic variance. The residual terms were generated from a normal distribution, N(0, I
e2), where I = the identity matrix and
e2 = the residual variance. Three different heritabilities were investigated to determine the optimal type of trait when using phenotypic information for assignment of paternity. The genetic parameters used in the single-trait simulation and analyses were as follows:
![]() |
Data sets containing repeated records of a single trait were also created by generating 2 records for each animal in generations 2 through 4 using the method described above. Furthermore, the above genetic parameters were used in the simulation as well as the analyses. Five replicates of the simulated data were generated for each combination of heritability and number of records (1 or 2).
Multiple Trait.
A linear mixed model including the same effect as in the univariate case was used to generate data for 3 correlated traits. The fixed effect for traits 1, 2, and 3 was drawn from a normal distribution with means equal to 27, 225, and 25 and SD equal to 3, 8, and 3, respectively. An additive breeding value was simulated from N(0, A
G), where A = the additive relationship matrix and G = the genetic (co)variance matrix. The residuals were sampled from a normal distribution, N(0, I
R), where I = the identity matrix and R = the residual (co)variance matrix. The heritabilities for traits 1, 2, and 3 were 0.42, 0.30, and 0.50, respectively. A complete summary of the genetic parameters used in the multiple-trait simulation is presented in Table 1
. Five replicates of the simulated multiple trait data were generated.
|
| RESULTS AND DISCUSSION |
|---|
|
|
|---|
Single Trait
One Record.
In this scenario, only 1 record of a single trait was used to compute the probability of being the true sire of each candidate sire for a given offspring. The average probability of the true sire being identified (PSA) for each of the 3 heritabilities across all CG that had some amount of uncertain paternity is presented in Table 2
. Also provided in Table 2
is the percentage difference (PD) between PSA and an equal prior probability of parentage (1/n) assigned to each candidate sire. As expected, the PSA improved with increasing heritability. A low to moderate heritability limits the amount of information available in the phenotypic data to discriminate between candidate sires. This is true, because as the heritability decreases, the similarity between parents and offspring is less apparent. Further, it seems that the PD generally increases with an increase in the number of candidate sires for all 3 heritabilities. This is perhaps because if the true sire is not selected based on the phenotypic information, the chosen sire will be one of the remaining sires, which could differ from 1 iteration to another, contrary to what could happen if only 2 candidate sires were considered. For example, using a trait with a heritability of 0.67 and considering 2 candidate sires, the PSA was 0.538 (Table 2
), resulting in a 7.70% increase in the probability of identification of the true sire compared with assigning an equal probability of 0.50. For 3 and 4 candidate sires, the PD was slightly over 14%. However, as the heritability of the trait decreased, the PSA and the PD decreased. In fact, when the heritability was 33%, the PD was only 1.24, 3.36, and 2.49% better than using an equal probability for 2, 3, and 4 candidate sires, respectively. Thus, the power of discriminating between candidate sires increased with an increase in the heritability of the trait.
|
However, when CG containing animals with 30% uncertain paternity were examined, the PSA and PD increased for all 3 heritabilities and number of candidate sire scenarios (Table 3
). This result is not surprising given that more certain information was available to correctly infer the true sire. Candidate sires in this scenario could potentially have progeny with both known and uncertain paternity. Thus, the CG estimates and the breeding value estimate of the sire would be more accurate, thereby increasing the number of animals with uncertain paternity having the sire assigned be the true sire vs. a situation in which all records of a given CG are generated by individuals with uncertain paternity.
|
Repeated Records.
Presented in Table 4
are the PSA and PD using repeated records (2 records per animal) of a single trait with varying heritability across all CG that had some amount of uncertain paternity. The PSA when 2 candidate sires were considered ranged from 0.530 to 0.558 using heritability from 33 to 67%. This resulted in a 50 to 386% increase in the PD compared with using just 1 record per animal. Likewise, PD was increased by approximately 52 to 240% for 3 candidate sires and approximately 92 to 373% for 4 candidate sires across the 3 heritabilities used in the analyses vs. using just 1 record. These results suggest that phenotypic information was able to more accurately discriminate between candidate sires when more than 1 record was used to determine PSA. This is due to the increase in information leading to more accurate estimation of the systematic and random effects and, more importantly, to a reduction in the variability of the observed records due to the residual (error) contribution.
|
For the varying number of candidate sires, the greatest benefit of including an additional record was for 4 candidate sires with increases in PD of approximately 92 to 373% across the 3 heritabilities. This result indicates that phenotypic information was able to discriminate between candidate sires more accurately when more sires were present. In the swine industry, in which pooling of semen from up to 5 boars is standard practice for commercial use, the results of the current study could have significant implications for the inclusion of commercial data in genetic evaluations. Increasing the probability of identifying the true boar of each piglet in a litter could lead to increased use of commercial data in genetic evaluations, thus leading to more accurate breeding value estimation.
The PSA and PD using repeated records of a single trait with varying heritability for CG with 30% uncertain paternity are presented in Table 5
. The same trend was observed as when a single record was considered. Further, the PD was increased by 11 to 57% compared with the situation in which all uncertain paternity records were considered for the 3 heritabilities using 2 or 3 candidate sires. It is also worth mentioning that across the varying heritabilities for 4 candidate sires, the PSA and corresponding PD decreased slightly compared with the respective PSA and PD when all uncertain paternity records were considered for 4 candidate sires. This slight decrease could have been due to a very small number of progeny with 4 candidate sires in CG groups with all uncertain paternity records.
|
Presented in Table 6
are the estimates, averaged over 5 replicates, of PSA and PD using 1 record for 3 traits. For all records with uncertain paternity, PSA (PD) was 0.572 (14.31%), 0.419 (25.57%), and 0.320 (27.91%) for 2, 3, and 4 candidate sires, respectively. Using all records with uncertain paternity for 3 correlated traits increased PD by 86, 77, and 98% when compared with using 1 record for all animals with uncertain paternity for a trait with 67% heritability for 2, 3, and 4 candidate sires, respectively. Similarly, the PD was increased by 105, 661, and 1,021% using all records with uncertain paternity for 3 correlated traits when compared with using 1 record for all animals with uncertain paternity for a trait with 33% heritability. Therefore, these results suggest that the probability of identifying the true sire increased when 3 correlated traits were used. The 3 traits used in the multiple-trait scenario ranged in heritability as well as in correlations. The heritabilities were moderate to low, and traits 1 and 3 were negatively correlated, whereas trait 2 was positively correlated with traits 1 and 3. An increase in the PD was observed when records with uncertain paternity from CG with both known and uncertain paternity were used. Similar to the single-trait scenario, the PD was significantly affected by the paternity status of the CG.
|
Spearman Correlations
Spearman correlations between estimates of genetic merit obtained when an equal prior probability of (1/n) for the n candidate sires and an estimated probability of paternity were used for 3 correlated traits are presented in Table 7
. Across the 3 traits, Spearman correlations with the true breeding values were higher using estimated probability of paternity for candidate sires compared with assigning an equal probability to each of the n candidate sires in a CG. In fact, the correlations between true and predicted breeding values of the 3 traits were increased by 6 to 7% for all animals and 64 to 89% for animals with unknown paternity in the pedigree when estimated probability of paternity was used as compared with assigning 1/n to each of the n candidate sires. Furthermore, for animals with uncertain paternity, major differences were observed between correlations obtained using an equal probability and estimated probability of paternity, thus suggesting that assigning an equal probability to candidate sires resulted in biased breeding value estimates for animals with uncertain paternity. Therefore, the use of estimated probability of paternity for each candidate sire based on phenotypic information resulted in more accurate estimation of genetic merit for all animals. Moreover, the accuracy of genetic merit was nearly double for those animals with uncertain paternity when compared with using equal probability of 1/n.
|
For example, traits such as birth weight and weaning weight are only measured once in the life of the animal. Thus, use of repeated records of these traits to determine paternity is not possible. Further, the probability of the sire assignment being equal to the true sire was lowest when just 1 record of a single trait with varying heritability was used. However, results from the multiple-trait simulation suggest that the power of assigning the true sire to an animal with uncertain paternity could increase by at least 6% when 3 correlated traits are used to determine the probability of the sire assignment being equal to the true sire, depending on the assumptions made regarding heritability and correlation among traits.
Another limitation of using phenotypic information to assign paternity is the difficulty of discriminating between candidate sires with similar breeding values, as in the case of related sires. The results presented in the current study indicated that when only 2 candidate sires were present, the probability of the sire assignment being equal to the true sire was similar. Furthermore, the results indicated that repeated records of a single trait with varying heritability and 1 record of 3 correlated traits (for 3 or 4 candidate sires) were at least 11% better than using an equal probability of 1/n for n candidate sires within a given mating group.
The heritability of the trait being used could also be a limitation. For traits with a high residual to additive variance ratio (i.e., low heritability), the probability of the sire assignment being equal to the true sire was reduced (Tables 2
through 5![]()
![]()
) compared with traits with smaller residual to additive variance ratios (traits with higher heritability). Moreover, the presence or absence of records with known paternity in a CG could also effect the probability of the sire assignment being equal to the true sire.
In general, when all animals in a CG had uncertain paternity, differences in accounting for uncertain paternity using the proposed methods and assigning an equal probability to each of the candidate sires were minimal. In contrast, in CG that contained animals with known and uncertain paternity, the proposed methods were better able to account for uncertain paternity than assigning equal probabilities to candidate sires. Therefore, if a CG was to have all animals with uncertain paternity, then it could be beneficial to paternity test a small portion of these animals using marker information, thereby increasing the probability of the sire assignment being equal to the true sire as well as increasing the accuracy of genetic evaluation.
Records from animals with uncertain paternity have typically been excluded from genetic evaluation or assumed to have an unknown sire. Such practice results in loss of information and potentially could compromise expected genetic gain. To remedy this situation, or at least to attenuate its undesirable effect, several methods were developed over the years. The use of genetic grouping (Kennedy and Moxley, 1975
; Quaas and Pollak, 1981
; Westell et al., 1988
) and parentage probabilities, ranging from 0 to 1, combined with the relationship between sires (Foulley et al., 1987
; Henderson, 1988
; Famula, 1992
), have been studied to account for uncertain paternity. The latter approaches require that the relationship matrix be replaced with an average relationship matrix that is weighted by probabilities of parentage. However, in most cases, knowledge of the true parentage probabilities is unavailable, and an equal probability is assumed for each possible sire. The results of the current study indicated that when an equal probability of 1/n was assigned for each candidate sire in a CG, the accuracy of the breeding value was decreased. However, a substantial increase in the accuracy of breeding value prediction was obtained when an estimated probability of paternity based on phenotypic information was used in the analysis. Further research is needed to determine the performance of the proposed method in genetic evaluation of field data, as well as potential implementation for resolving paternity in conjunction with paternity testing using molecular information or DNA testing.
In conclusion, a method that uses phenotypic information to increase the probability of determining the paternity of an animal in multisire mating schemes was presented. This method can enhance the accuracy of genetic value prediction in cases in which unknown paternity exists for some animals. The results showed that when information for 3 traits was available, the proposed method provided improved accuracy of breeding value predictions compared with using an average relationship matrix, which assigns equal sire probabilities to candidate sires. The proposed method could have value for improving the prediction of breeding values in situations in which multisire pastures or pooled semen are used.
| APPENDIX 1 |
|---|
|
|
|---|
Using laws of probability, the joint distribution of u could be decomposed as follows:
![]() |
where
u2 = the genetic variance and ui(i = 1, 2,..., n) = the breeding value (BV) of animal i.
If the pedigree is ordered from parents to offspring and inbreeding is ignored, as is usually done with large genetic evaluations, it turns out that for any animal i
![]() | [1] |
where usi and udi = the BV of the sire and dam of animal i, respectively. Further, assuming normality for the joint distribution of breeding values,
![]() | [2] |
where µi = the average BV of the parents of animal i and gii = the Mendelian variance for animal i given its parents.
In the context of a mixed linear model and assuming a noninformative flat prior for the fixed effect, the conditional distribution for an animal i could be easily derived as:
![]() | (3) |
where
e2= the residual variance; gii, usi and udi are as before; u–i = the vector of BV for all animals except animal i; yi = the vector of records for animal i; and oi = the number of offspring for animal i. In the last term of [3], either sk (sire of animal k) or dk (dam of animal k) is equal to animal i.
It is clear from Eq. [3] that the conditional posterior distribution of the BV of animal i is the product of 3 terms corresponding to contributions from data, parents, and offspring.
Data Contribution. Assuming a normal distribution of the data, given the model parameters, it follows that
![]() | (4) |
where ni = the number of records for animal i; yij = the jth record of animal i; and x'ij = a row vector for record j of animal i that relates the observation to the fixed effects in ß.
Parental Contribution.
As shown earlier in [1], the conditional distribution of the BV of animal i given its parents {p(ui | usi udi,
u2)} is normal with known mean (µi) and variance (gii). Thus, the kernel of the normal distribution for the BV of animal i, based on the contribution of the parents, is as follows:
![]() | [5] |
where µi and gii are as before.
Offspring Contribution. The final term in Eq. [3] corresponds to the conditional distribution of an offspring k given the BV of its parents. If animal i is either the sire or the dam of animal k, then
![]() |
where uk = the BV of offspring k and umi = the BV for the mate of animal i that produced offspring k. Thus, if animal i is the sire of progeny k, then umi = udk, otherwise umi = usk . Further, using simple manipulations, the conditional distribution of the BV of progeny k could be rewritten as below:
![]() | [6] |
Viewed as a function of the BV of animal i (ui), given the BV of its mate (umi) and progeny (uk), [6] can be rewritten as:
![]() | [7] |
where µik = the deviation of mate mi from offspring k for animal i and gkk = the variance of the BV of animal i given its mate and offspring. Further, µik and gkk were computed as follows:
![]() | [8a] |
and
![]() | [8b] |
Consequently, the conditional distribution of the BV of animal i is the product of [4], [5], and [7] as follows:
![]() | [10] |
The conditional distribution in [9] is the product of (o + 2) univariate normal distributions for which the mean and variance are easily derived simply by keeping track of the progeny and mates of animal i. If i is a nonparent animal, the conditional distribution reduces to the product of 2 univariate distributions. The mean (
i) and variance (vi) of the distribution in Eq. [9] could then be obtained as:
![]() |
and
![]() |
where ni and oi are defined as before; µi and gii are as computed in [2]; and µik and gkk are as computed in [8a] and [8b].
The multitrait situation is a straightforward extension of the methodology presented for the univariate case and can be found in Sapp (2005)
. Furthermore, using simulated and field data, the proposed method gave the same results as the classical implementation (using the inverse of the relationship matrix). In fact, for the univariate and multivariate cases, the correlations between the estimated effects (fixed and random) using the proposed and classical methods were equal to 1 (Sapp, 2005
).
| Footnotes |
|---|
2 Corresponding author: rrekaya{at}uga.edu
Received for publication October 4, 2006. Accepted for publication May 14, 2007.
| LITERATURE CITED |
|---|
|
|
|---|
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |