|
|
||||||||
ANIMAL GENETICS |
AgResearch, Invermay Agricultural Centre, Mosgiel 9032, New Zealand
| Abstract |
|---|
|
|
|---|
Key Words: Breeding Value DNA Marker Genetic Evaluation Parentage Pedigree
| Introduction |
|---|
|
|
|---|
An alternative to traditional recording of pedigree is to use DNA marker information to identify parents; however, with extensively farmed livestock, DNA markers commonly have been used only to verify or exclude putative relationships rather than to assign parentage per se. This is due to the cost of DNA marker typing, and because the usually large number of potential parents makes it difficult to unambiguously assign the parents for all progeny (Dodds et al., 1998
; Laughlin et al., 2003
).
We describe methods that allow the use of DNA information on parentage within a genetic evaluation system. The system we describe is designed to be used when the parentage information is incomplete, with probabilities, allowing for genotyping errors, assigned to possible parent pairs. We also describe a computing strategy that circumvents high memory requirements associated with the application of previous methods designed for incomplete pedigree information. We describe these methods in terms of their application to farmed livestock species and assigning parent-pairs, but the general method also could be applied to plant breeding or with single parent assignment.
| Materials and Methods |
|---|
|
|
|---|
Genotyping and Parentage Probabilities
The offspring and their potential parents are to be genotyped with a number of DNA markers. We assume that the markers used are codominant, as is common practice in DNA parentage applications.
Marshall et al. (1998)
present formulas for calculating the relative likelihood of the genotype data given that a particular male is the sire of an offspring. These formulas are given for both with and without data on the dam and her genotypes, and they allow for a specified level of genotyping errors. The likelihood is calculated relative to the likelihood of the genotypes given that there is no relationship between the individuals involved. We extend these formulas to find the relative likelihood of a particular male-female pair being the parents of an offspring (i.e., simultaneous assignment of both parents).
Suppose there are m markers scored. For a specific marker, let Ai represent the ith allele and pi its frequency, i = 1, 2, . . ., v. For each marker, calculate the likelihood, given parentage, as follows:
![]() |
and the likelihood, given there is no relationship between the individuals, as follows:
![]() |
where e is the assumed rate of genotyping errors, P(go), P(gm), P(gf), and P(gp) are the probabilities of the offspring, putative mother, putative father, and combined putative parent-pair genotypes respectively, and the T(.|.) are transition probabilities. For example, T(gp | go) is the probability that the parents were genotypes gp given the offspring is go. Formulas for these quantities are presented in Tables 1
and 2
, assuming Hardy-Weinberg equilibrium. The same results could be obtained by considering transitions in the opposite direction (as e.g., T(go | gp) P(gp) = T(gp | go) P(go)). We use the forms in Tables 1
and 2
because they can be presented and programmed more succinctly. The same error rate is applied to all markers and classes (sire, dam, or offspring) of individuals; however specific rates for each of these could be applied.
|
|
![]() |
If each of the possible parents is a priori considered to be equally likely, then the posterior probability of parentage given the marker information is the likelihood for that parentage divided by the sum of the likelihoods for all possible parentages. We refer to this as the "parentage probability" and to its sum over dams as the "sire probability." In practice, we consider only those that are more likely than a randomly chosen pair of parents (L > 1) or some other cutoff that does not exclude parents that would otherwise have had moderate parentage probability. In some situations, it may not be possible to determine or DNA sample the complete set of possible parents, or it may not be done with certainty. The set of possible parents for which likelihoods are calculated would then include pairs where one or both parents are unassigned. In the former case, the calculations from Table 2
of Marshall et al. (1998)
are used; in the later case, the relative likelihood to be used is one. A more extreme case of this situation is one in which one sex of parents is not DNA sampled (commonly the mothers), and it is the relationship to only the other sex of parents that is to be considered. An example of parentage probability calculations is shown in the Appendix.
Estimating Breeding Values
Average Relationship Matrix Method.
Genetic evaluation can be undertaken using the average relationship matrix method (ARM) as described by Henderson (1988)
. Relationship matrix calculations are undertaken by allowing each of the possible parentages, weighted by their probabilities as calculated above. Genetic evaluation requires the inverse of this matrix, and algorithms are available for calculating this directly (Perez-Enciso and Fernando, 1992
).
An added complication is that genetic evaluation models often include terms that depend on the parentage (particularly the dam). This includes effects such as birth date, birth rank, rearing rank, age of dam, and breed composition. These can be derived as the weighted (by the parentage probabilities) means of the values derived from each possible parentage. If offspring are tagged during or at the end of the rearing period, then the number of offspring reared can be estimated from parentage records. Normally, the parturition date and litter size of the dam are not observed (one of the reasons for using DNA parenting is to allow birthing groups to be undisturbed), but for livestock species, these effects can be estimated through the use of pregnancy scanning. Once these average effects have been derived, they need to be incorporated into the genetic evaluation model. For terms that were modeled as class effects, this will generally require a modification to the model used, as values intermediate to those used in a known-pedigree system are now possible. The simplest solution is to use rounded values, but it may be more accurate to use a suitable functional relationship (e.g., a polynomial or a spline for birth rank). This would require investigation for each specific situation (i.e., trait and effect being modeled). An alternative method that avoids these difficulties is presented in the Pedigree Sampling Method section. Standard genetic evaluation, using BLUP, involves solving a large set of linear equations (of dimension the number of individuals plus the number of fixed effects levels, if only direct genetic effects are considered), and thus, sparse matrix methods are often used to decrease the computational burden. This is effective because there typically is a large proportion of individuals that are unrelated; however, by allowing multiple parentage possibilities, DNA marker-based pedigree analyses involve a relationship matrix that is much less sparse, thereby increasing the computation required. In particular, the increased memory demands decrease the size of the population that can be evaluated. The Pedigree Sampling Method section describes an alternative computing strategy that circumvents the increased memory demands.
Pedigree Sampling Method.
The method we describe here is a type of multiple-imputation strategy. Missing information (parentage) is filled in by sampling from the possibilities weighted by their probabilities. For example, suppose an individual has one possible sire and two possible dams with equal probability. A set of parents is chosen for that individual by assigning the only possible sire and randomly choosing (with equal probability) one of the two possible dams. Once all the unknown (but partially known) parent relationships have been sampled, the genetic evaluation proceeds identically to the recorded parentage situation, and the EBV is stored. The process is then repeated a suitable number of times (100 is used in the simulations presented below). Finally, the set of EBV from each of these evaluations is averaged to give the final EBV to report. We refer to this method as the pedigree sampling (PS) method. Sen and Churchill (2001)
used a similar multiple-imputation strategy in the analysis of QTL data. Each sampled parentage allows for the calculation of parent-dependent effects in much the same way they are calculated with known parentage. These generally are of the types that are available with full parentage recording and can therefore be handled by standard genetic evaluation software. The breed is the average of the sampled parents breeds, the age of dam is the age of the sampled dam, and the rearing rank is calculated as the number of progeny with the same sampled dam (in that sampling). Birth rank and approximate birth date can be obtained from the sampled dams pregnancy scanning information. It will sometimes be the case that a dam is assigned more progeny than her (inferred) litter size. In these cases, the rearing rank could be considered equal to the birth rank for the genetic evaluation.
Upper Quartile Pedigree Sampling Method.
When performing pedigree sampling, some samples seem more likely than others, particularly with respect to family sizes. For example, we expect a dam to have no more progeny assigned to her than her (estimated) litter size, we might expect the number of progeny per sire to follow a particular distribution, and we expect litter mates to often be full-sibs. Intuitively, we would prefer samplings where these expectations are mainly met. Here we investigate an ad hoc method, the upper quartile pedigree sampling (UQPS) method, which favors samplings with more likely family sizes, to see whether this might give a more efficient evaluation procedure. A weighting is applied to each sample. The specific details will depend on the species and production system. We use weightings, which might be used with prolific sheep, as follows. Initially, the weight is set to one. For each dam with progeny, the weight is multiplied by 0.75(s1), where s is the number of different sires assigned for the progeny of that dam. For each dam, the weight is multiplied by 4w, where w is found in Table 3
according to the number of progeny detected by scanning the dam and the number of progeny assigned (reared). These values approximately represent the (relative) probabilities of rearing the given number of progeny given the scanned number of progeny (and allowing for errors in the scanning process). The values are multiplied by four to aid computation. For combinations not shown, a value of w = 0.01 was used. Samples with higher weights have family sizes closer to expectation than those with low weights. Once a set of samples has been taken and their weights calculated, only the upper quartile (on weight) is retained for averaging to give EBV.
|
Animals.
The pedigree simulated comprised three generations. This allowed some genetic similarities between the parents, as would normally be the case in practice. The first (grandparent) generation consisted of unrelated animals: five grandsires and 20 granddams for the paternal line, and 10 grandsires and 120 granddams for the maternal line. The second (parent) generation was generated from the respective paternal/maternal grandparent lines. The number of progeny produced from each mating was based on a distribution of 0.15, 0.45, 0.35, 0.04, and 0.01 for one to five, respectively. These progeny were randomly assigned a sex, resulting in 20 to 30 potential sires and in excess of 100 potential dams. Of these, 10 were chosen as sires and 100 selected as dams from their respective groups. The third (progeny) generation was generated from these with the same litter size distribution as in the second generation (giving an average litter size of 2.3) and randomly assigned a sex. Progeny were randomly removed (to mimic deaths) according to birth rank, in the proportions 0.15, 0.15, 0.30, 0.5, and 0.5 for one to five lambs, respectively, giving an average of 1.7 progeny reared per dam. True birth rank and rearing rank (assuming no cross-mothering) data were stored for all progeny. Any individuals in the first two generations that were not ancestors of the progeny were removed from analysis.
Genetic Markers.
Marker data were simulated for up to eight markers with allele frequencies as shown in Table 4
. These markers are based on a set of sheep markers currently being used by a commercial genotyping facility. The allele frequencies for these markers were estimated from a set of unrelated animals from a single breeding operation. The first six of these markers were used in the simulations (as in the commercial genotyping facility), unless stated otherwise. Genotypes were assigned to grandparents by randomly selecting alleles based on the allele frequencies. Alleles for the subsequent generations were selected randomly from each parental set of alleles. To conservatively model the proportion of results from a commercial high-throughput laboratory, 10% of genotypes were randomly removed from mothers and progeny. No genotypes were removed from the sires. All genotypes were assigned without error. Once a set of genotypes is generated, those from the grandparent generation are ignored, whereas those for the parent and progeny are used to calculate parentage probabilities. The rate of genotype errors was assumed to be 1% when performing the likelihood calculations, and only those parentages with L > 1, and for which the parentage probability exceeded 0.02, were retained. All relevant animals belonged to a single mating or lambing group.
|
![]() |
The values used for the BR/RR component were c(1,1) = 0, c(2+,1) = 1.4, c(2,2) = 4, c(2+,1) = 5, c(1+,3) = 7, c(1+,4) = 10, and c(1+,5) = 12, where, for instance, 2+ denotes a value of 2 or greater. Phenotypes were not modified according to sex. The true genetic values of the progeny were retained for comparisons.
Breeding Value Estimation.
The EBV were calculated using ASREML (Gilmour et al., 2002
), with the progeny trait values and (estimated) genetic relationships between the parent and progeny generations. For each of the replicate pedigrees, five different analyses were performed using the true pedigree, the ARM method (by supplying ASREML with the inverse relationship matrix), the PS method, the UQPS method, and the best pedigree (BP; i.e., the parentage with the highest parentage probability). The PS and UQPS methods used the same 100 samplings. The model included fixed effects of sex and BR/RR (with BR of three or more grouped and similarly for RR). Analyses were performed both with the true BR/RR and with the estimated values. Estimated BR and RR were rounded to the nearest integer with the ARM method. It was assumed that the litter size of each dam was known. Correlations were found between breeding values estimated by each of the methods and between these and the true genetic values. Results were partitioned into progeny, sires, and dams.
| Results |
|---|
|
|
|---|
Means (over simulation replicates) of the correlations between breeding values calculated using the true pedigree and by the various methods are shown in Table 5
. The ARM and the PS methods were similar if birth and rearing ranks are known. Because ARM uses rounded average birth and rearing ranks, when these are estimated, it performed slightly worse than the PS method in this situation. Alternative methods of modeling average birth and rearing ranks may improve the ARM method. The UQPS method gave similar results to the PS method, which suggests that it may be possible to devise a weighting scheme or another method that uses prior family size information for improving the process, for example to decrease computation at the genetic evaluation step. The BP method was the poorest of those considered.
|
|
| Discussion |
|---|
|
|
|---|
Intensive recording at birth is often used to gain information on effects to be used in the genetic evaluation in addition to pedigree. If dead offspring are collected and genotyped along with live offspring, then litter size could be estimated as the average number of dead plus live offspring assigned to the dam. This would involve a greater genotyping cost without increasing the number of candidates for selection. It also is likely to be unpractical or inaccurate due to unrecovered dead offspring. A more feasible solution is to use pregnancy scanning records to give estimates of parturition date and litter size. Pregnancy scanning is gaining widespread adoption as a management tool in sheep, particularly highly fecund strains, to allow differential feeding of ewes bearing different numbers of lambs. Experienced scanning operators can achieve high accuracies (90 to 97%) in diagnosing fetal numbers, with errors tending to be undercounts for the higher litter sizes (Fowler and Wilkins, 1984
; Logue et al., 1987
). Using pregnancy scanning results to partition a flock could result in lambing contemporary groups that would contain lambs all of the same parity (i.e., a birth rank could be assigned based only on birthing group membership).
Using DNA marker information to identify parents enables the management constraints of visual recording and confinement of animals during mating and parturition to be relaxed. The use of DNA testing has been widely used in the breeding of high-value animals (such as horses) to monitor the accuracy of pedigree records. Such parentage matching is typically very reliable and effective for an offspring with few possible parents, such as a known mother with two or a few potential fathers. If all but one of the possible parent-pairs can be excluded, the remaining pair is declared the true parents.
With extensively farmed livestock, however, it is difficult to unambiguously assign the parents for all progeny (Dodds et al., 1998
; Laughlin et al., 2003
). For example, in sheep where 10 sires may be mated with 1,000 ewes, a lamb could have one of 10,000 possible combinations of parents. Parentage assignment requires that all 9,999 incorrect parent-pairs are excluded. Sherman et al. (2004)
achieved 86% unambiguous paternity assignment with 11 microsatellite markers when there were 26 possible sires. Sise et al. (2001)
investigated management strategies to improve unambiguous assignment.
Parentage matching strategies should allow for genotyping errors. A common method is to require two or more exclusions among the marker tests. Although this works reasonably well in situations where many markers are typed, it is somewhat arbitrary and is less useful when fewer markers are scored. It also makes it more difficult to exclude incorrect parent-pairs, exacerbating the problem of excluding large numbers of false parentage possibilities.
A number of statistical methods have been developed to attempt to assign pedigree in natural populations, but typically only aim to identify the sire or to assign the most likely sire. The objective of these studies is to provide information on population parameters, such as the distribution of mating success (e.g., Devlin et al., 1988
; Dickinson and McCulloch, 1989
; Neff et al., 2001
). One of these methods (Devlin et al., 1988
) allows paternity to be spread over several possible fathers, a method the authors refer to as "fractional paternity." Another application is the generation of putative sibships to allow the estimation of genetic parameters (Thomas and Hill, 2000
). In all of these cases, the parentage assignment of specific individuals is of lesser importance, as opposed to the case of selection of breeding individuals, where the parentage may have a large influence on which individuals are chosen. Methods have been proposed to account for multiple possibilities of sire pedigrees during the genetic evaluation process (Henderson, 1988
; Foulley et al., 1990
; Perez-Enciso and Fernando, 1992
; Cardoso and Tempelman, 2003
). Although mentioned as a possibility, these methods have not demonstrated the use of genetic markers in assigning probabilities to possible parentages.
We have described methods that allow the use of DNA information on parentage within a genetic evaluation system. The simulation results show that these methods allow much of the genetic progress that could have been made had the true parents been known. This is particularly true for the groups of animals where the most selection pressure is applied (sires and progeny). The BP also can be used but with lower genetic progress. Relative genetic gain results for the ARM and PS methods are similar if birth/rearing rank is known (both 97% for progeny), but PS performed better than ARM when birth/rearing rank was estimated (e.g., for progeny the values are 94% for ARM compared with 96% for PS). The birth/rearing rank known results also give an indication of the outcome for traits that are not influenced by such effects. If fixed effects can be modeled suitably or are absent from the model, lower memory costs may make the ARM method more practical than with current technology.
The DNA marker-based fractional parentage methods of genetic evaluation work well because the "mistakes" in parentage tend to be conservative. By this we mean that if the most likely parentage is not the correct one, it will often comprise close relatives of the true parents, as they will often have similar genotypes. This also will be the case for other parentages with nonzero probabilities. In this way, the system is tolerant to the incompleteness of the parentage information. In our simulations the best pedigrees had, on average, 84% of the grandparents correct, compared with 77% of parents correct (62% of offspring had both parents correct, and another 30% had one parent correct).
Many factors may affect the amount of genetic progress achieved using DNA markers. These include the number of DNA markers used, missing animals and genotypes from the dataset, and the heritability of the trait under test. We have made a brief assessment of the effect of these factors on genetic progress by undertaking additional simulation runs. Results are presented as the relative genetic progress achieved in progeny using the PS method compared with the true pedigree, which achieved 96% in the situation presented in the results section. Changing the number of DNA markers from six to four or eight (the first four and eight markers, respectively, in Table 4
) gave a 2% decrease or 2% increase, respectively, in the relative genetic gain. The methods that gave lower relative genetic gain (Table 6
) were more influenced by the number of markers used, so that with eight markers the relative genetic gain ranged from 94 to 98% across the four methods. Changing the heritability from 0.3 to 0.1 or 0.5 gave a 1% decrease or a 2% increase, respectively, in the relative genetic gain, reflecting the lower dependence on information from relatives as the heritability increases. Decreasing the number of missing genotypes in the mothers and progeny from 10% missing to 5%, increased the relative genetic gain by 1% with the six DNA markers. When the genotypes for dams and progeny were generated with a 1% error rate, there was a negligible change to the relative genetic gain. The parentage analysis used assumes all animals are present and available for the analysis. Practically speaking, in a farm situation, some animals may be unrecorded and unavailable for inclusion in the analysis. Although the effect of 5 to 10% of mothers missing does not greatly affect the overall result, if one of the 10 sires used is absent from the analysis, the relative genetic gain decreases by 1%.
The results presented compare the gain against the "gold standard" of perfect pedigree recording. In practical farming systems this is seldom, if ever, achieved. A number of reports have investigated the level of pedigree errors (Table 7
). There also have been a number of reports investigating the decrease in genetic gain due to errors in pedigree recording. Israel and Weller (2000)
, Banos et al. (2001)
, and Spelman (2002)
found decreases of 4 to 15% in genetic gain in dairy cattle due to 10 to 15% errors in pedigree recording. Differences in these results are likely to be due to differences in the selection programs, but the results indicate that the gains with the proposed marker-based methods relative to what can practically (due to pedigree errors) be achieved would be up to 10% greater than those tabulated above.
|
We have used a somewhat simple approach when considering the possible parent combinationsall are considered equally likely a priori. It may be possible to use additional information to construct unequal prior probabilities for each parentage. For example, if mate sires are only present during certain portions of the mating period, then the expected conception date would favor mate sires present at that time; however, this also would increase the complexity of practical recording systems.
The methods we have described show how DNA marker information could be used to replace traditional pedigree recording. This is associated with lower genetic gain (at the same survival levels) but enables relaxed management strategies. The pedigree sampling system has been implemented for a number of large sheep breeding operations in New Zealand and is being implemented into the Sheep Improvement Limited (Wellington, New Zealand) system as a routine option.
| Implications |
|---|
|
|
|---|
| Appendix |
|---|
|
|
|---|
|
| Footnotes |
|---|
3 Current address: Ovita Ltd., P.O. Box 5520, Dunedin, New Zealand. ![]()
2 Correspondence: Private Bag 50034 (phone: +64-3489-9083; fax: +64-3489-9037; e-mail: ken.dodds{at}agresearch.co.nz).
Received for publication February 9, 2005. Accepted for publication June 21, 2005.
| Literature Cited |
|---|
|
|
|---|
This article has been cited by other articles:
![]() |
A. L. Van Eenennaam, R. L. Weaber, D. J. Drake, M. C. T. Penedo, R. L. Quaas, D. J. Garrick, and E. J. Pollak DNA-based paternity analysis and genetic evaluation in a large, commercial cattle ranch setting J Anim Sci, December 1, 2007; 85(12): 3159 - 3169. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |