J. Anim Sci. 2009. 87:2510-2518. doi:10.2527/jas.2008-1586
© 2009 American Society of Animal Science
OPEN ACCESS ARTICLE
Product versus additive threshold models for analysis of reproduction outcomes in animal genetics1
I. David*,2,
L. Bodin*,
D. Gianola
,
A. Legarra*,
E. Manfredi* and
C. Robert-Granié*
* Institut National de la Recherche Agronomique UR 631, Station dAmélioration Génétique des Animaux, 31320 Castanet-Tolosan, France; and
Department of Animal Sciences, University of Wisconsin, Madison 53706
 |
Abstract
|
|---|
The phenotypic observation of some reproduction traits (e.g., insemination success, interval from lambing to insemination) is the result of environmental and genetic factors acting on 2 individuals: the male and female involved in a mating couple. In animal genetics, the main approach (called additive model) proposed for studying such traits assumes that the phenotype is linked to a purely additive combination, either on the observed scale for continuous traits or on some underlying scale for discrete traits, of environmental and genetic effects affecting the 2 individuals. Statistical models proposed for studying human fecundability generally consider reproduction outcomes as the product of hypothetical unobservable variables. Taking inspiration from these works, we propose a model (product threshold model) for studying a binary reproduction trait that supposes that the observed phenotype is the product of 2 unobserved phenotypes, 1 for each individual. We developed a Gibbs sampling algorithm for fitting a Bayesian product threshold model including additive genetic effects and showed by simulation that it is feasible and that it provides good estimates of the parameters. We showed that fitting an additive threshold model to data that are simulated under a product threshold model provides biased estimates, especially for individuals with high breeding values. A main advantage of the product threshold model is that, in contrast to the additive model, it provides distinct estimates of fixed effects affecting each of the 2 unobserved phenotypes.
Key Words: additive model Gibbs sampling product model reproduction traits
 |
INTRODUCTION
|
|---|
The phenotypic observation of some reproduction traits is due to effects contributed by the male and the female. For such traits depending on several individuals, a main approach that has been proposed in animal genetics models the observed phenotype (for continuous traits), or an underlying variable (for binary traits analyzed under the threshold model assumption), as the sum of environmental and genetic effects contributed by the different individuals involved (Wilham, 1963
; Varona and Noguera, 2001
; Piles et al., 2005
; Bijma et al., 2007
). In some cases, these models give perplexing results (e.g., high negative estimates of the correlation between maternal and direct genetic effects; Robinson, 1996
; Koerhuis and Thompson, 1997
). Some authors have pointed out that the combination of the different sources of variation may not be purely additive (on either the observed or underlying scales). Actually, statistical models proposed for studying human fecundability generally consider the reproduction result as the product of hypothetical unobservable variables. For instance, Speirs et al. (1983)
proposed a model in which the number of observed embryo implantations is the product of the number of viable embryos and a measure of uterus receptivity. For binary traits, Zhou and Weinberg (1996)
assumed that the probability of conception in a menstrual cycle is the product of the probability that the cycle is viable and the probability that conception would occur. This product combination of factors for studying reproduction results has the advantage of being biologically more natural and easier to construe than the purely additive combination. Nevertheless, such models have not been used for studying reproduction traits in animal genetics. In line with human studies, we propose a product threshold model for studying the outcome of an insemination taking into account the action of the male and female and the genetic relationships between individuals. The objectives of this study are 1) to present the theory of a product threshold model in a genetic context, and 2) to examine, with illustrations, the feasibility of this model.
 |
MATERIALS AND METHODS
|
|---|
Animal Care and Use Committee approval was not obtained for this study because no animals were used.
Methods
Background: Linear Additive Model for a Continuous Trait Depending on 2 Individuals.
Statistical methodology for studying a quantitative genetic trait dependent on several individuals has already been developed under the assumption of the additive model (Wilham, 1963
; Bijma et al., 2007
). To define the notation that will be used in this paper, we recall the corresponding theory as in Bijma et al. (2007)
. The model for a phenotype dependent on 2 individuals {i,j}, consists in expressing the observed phenotypic value, Pij as the sum of 2 unobserved phenotypic effects: a direct effect (PD,i) of individual i and an associative effect (PS,j) of the associated individual j:

| [1.1] |
Phenotypic direct and associative effects can both be decomposed into a heritable component, referred to as the breeding value (A), and a nonheritable environmental component (E). Thus,
where AD,i is the direct breeding value (DBV) and AS,j is the associative breeding value (SBV), which pertain to genetically distinct traits. In matrix notation, the corresponding animal model is as follows:

| [1.2] |
where
is the vector of observed phenotypes; β is a vector of fixed effects, with incidence matrix X; aD and aS are vectors of DBV and SBV, respectively (each having an order equal to the number of animals in the pedigree), with corresponding incidence matrices ZD and ZS. The distributional assumption is
where A is the numerator relationship matrix and
represents the Kronecker product. The residuals
are independent and identically distributed with variance
Threshold Models for a Binary Trait Depending on 2 Individuals.
For dichotomous traits, the observed phenotype
(1 or 0) is modeled using generalized linear mixed model assumptions. The most widely used approach adopts the same linear predictor as for a linear mixed model with a probit link function:
where
is the conditional (given the A and E variables) probability of success for the couple {i,j} and
is the standard cumulative distribution function of the normal distribution. This model is called the additive threshold model. This model is equivalent, under the threshold model theory (Wright, 1934
; Gianola, 1982
), to
where T is a threshold value and lij is an underlying variable linked to the environmental and genetic effect as in model [1.2].
Suppose now that the 2 unobserved phenotypes
are dichotomous traits as well, such that
and
In this case, the relationship between the probability of success for the unobserved phenotypes and the observed phenotype is not obvious with the additive threshold model. A more intuitive way to link observed and unobserved phenotypes in the case of dichotomous traits is to assume that a success is observed only if there is a success for each of the 2 unobserved phenotypes:
For example, conception in the couple is the result of success of each member. If the probabilities of success for the 2 unobserved phenotypes are conditionally independent, then

| [1.3] |
According to the classical threshold model theory, this product threshold model is equivalent to
where
and
are related to 2 underlying continuous variables, d and s, respectively.
where TD and TS are the threshold values for d and s, respectively. Further, let

| [1.4] |
and

| [1.5] |
where the notations and distributional assumptions for the genetic effects are the same as in [1.2]. In addition,
and
are vectors of fixed effects, with incidence matrices XD and XS linking unobserved direct and associative phenotypes to fixed effects, respectively, and
are mutually independent residuals with variances
and
respectively.
Contrary to the additive threshold model, the interpretation of unobserved phenotypes is straightforward in the product threshold model. For instance, in the study of AI success, if individual j is the male,
is the chance that the male produces a sperm capable of fertilizing the ovum if the latter is viable. Conversely,
is the chance that the ovum can be fertilized, provided that the sperm is fecundant.
Product and additive threshold models support different biological assumptions. Figures 1 and 2 depict for the additive and product threshold models, respectively, variations of
with respect to
with one curve for each
In the additive threshold model, the probability of success
can reach very high values as soon as
is sufficiently large in comparison with
In contrast, in the product threshold model, the increase of
with
is penalized by the value of
and reaches a maximum value that is generally less than 1. Assuming a product threshold model may be more realistic in some cases. For instance, a sterile male will never have progeny, even if he is mated to highly fertile females. This case corresponds to the change of the lowest lines in Figures 1 and 2. For intermediate
and
the change of
for different
values when
increases are similar and independent of
values in the additive threshold model. On the other hand, in the product threshold model these changes in probability of success depend on the values of
and increase faster for greater
However, for reduced
and
the models behave similarly because, in this case, the odds ratio is close to the relative risk.
Heritabilities.
In the additive threshold model, the total variance of the underlying variate l is
where r denotes the additive genetic relationship between associates in a couple. The heritability of direct and associative effects on the underlying scale is then
and
respectively. An approximate transformation of these heritabilities to the observed scale (p-scale) is obtained by multiplying each heritability by z2/{Pr(P* = 1)[1 – Pr(P* = 1)]}, where z is the ordinate of a standard normal distribution function corresponding to a threshold =
(Dempster and Lerner, 1950
).
In contrast to the additive threshold model, it is possible to assess the heritabilities of the unobservable individual phenotypes
and
in the product threshold model. The parametric definitions are
and
on the underlying scale. The genetic variances on the observed scale (p-scale) are given by
and
(Dempster and Lerner, 1950
) for the direct and associate genetic effects, respectively, where
and
are the ordinates of a standard normal distribution function corresponding to thresholds equal to
and
respectively. Therefore, the heritabilities of direct and associative effects on the observed scale are
and
respectively. Because
and
are unobservable, one has to estimate
and
Parameter Estimation.
Parameter estimation methods in the additive threshold model are well known (Gianola and Foulley, 1983
; Harville and Mee, 1984
; Gilmour et al., 1985
; Hoeschele and Tier, 1995
; Moreno et al., 1997
) and will not be presented here. Bayesian inference in the product threshold model can be carried out using the Gibbs sampling algorithm with a modification of the data augmentation algorithm described in Sorensen et al. (1995)
. In the classical additive threshold model, the idea is to sample the underlying variable from its conditional posterior distribution (a truncated normal distribution, on the left if P = 1 and on the right otherwise). The next step is to sample each parameter from its conditional posterior distribution, with the underlying variable replacing the observed data, so that the Gibbs sampler is as for the linear model (Sorensen et al., 1995
). Implementation of the product threshold model can be done using the same approach and a supplementary step, which is the sampling of PD,i and PS,j. Consider the following notation,

|
where PD,i and PS,j are the unobserved phenotypes,
are the vectors of parameters (i.e., fixed and random effects) for direct and associative phenotypes, respectively;
are row incidence matrices linking
to the unobserved phenotypes. PD,i and PS,j are sampled in the nth step of the Gibbs sampling from their joint conditional distribution.
where
Given PD,i and PS,j, parameter sampling is identical to the additive model by regarding PD,i and PS,j as the data. We added this supplementary step to the TM program developed by A. Legarra (Institut National de la Recherche Agronomique-Station dAmélioration Génétique des Animaux, Castanet-Tolosan, France, personal communication).
Illustrations
A study with 4 different simulated designs (Table 1) was performed to illustrate the feasibility of the product threshold model. The base population consisted of 60 unrelated sires and 200 or 2,000 unrelated dams without records. Sires were randomly mated to dams to generate 600 or 4,200 progeny. Dam (DBV) and sire (SBV) breeding values were generated using bivariate normal distributions with parameters corresponding to the designs shown in Table 1. Genetic values of the progeny were sampled as the mean of sire and dam genetic values plus the Mendelian sampling. Part of the progeny were randomly assigned to be females and have an unobserved direct phenotype (400 or 4000 individuals i), whereas the other part were considered to be males and have an associative unobserved phenotype (200 individuals j). The observed phenotypes, resulting from the association of individuals i and j, were generated according to the product threshold model. The underlying d and s values were generated for each couple using Eq. [1.4] and [1.5]. Two fixed effects were included in the model. The first effect f1 was a cross-classified effect with 3 levels that affected direct and associative unobserved phenotypes. The second effect f2 was a cross-classified effect with 2 levels affecting associative unobserved phenotypes only. Here,
and
where the 3 first elements in βS corresponded to the effects of f1 and the fourth and fifth to the effects of f2. Levels of the fixed effects were randomly assigned to each couple with equal probability for each level. We fixed
and randomly assigned
D and
S to each couple. Discrete phenotypes
and P* were generated as follows:
if
otherwise;
if
otherwise, and
Values of TD and TS were fixed to 0 in all designs.
The first 2 designs had perfectly balanced data: each of the 400 "female" individuals i was associated to each of the 200 "male" individuals j. The first design assumed a large
variance for the direct genetic effect and a moderate
variance for associative genetic effect; the second employed a very small (0.05) variance for the direct genetic effects. In the third design, there were unbalanced data (males were randomly mated with females) with a small variance for the direct genetic effect. In the fourth design, few observations per individual i were considered with moderate variances for direct and associative effects.
Each design was analyzed using 3 models: the product threshold model, the additive threshold model, and a multiple-trait threshold model, which considers
and
as observed phenotypes. Results obtained with the multiple-trait threshold model were a reference considered as the best parameter estimates (called best) that can be obtained. In practice, this third model cannot be applied to real data, because
and
are unobserved phenotypes.
For all models, the Gibbs sampler analysis was carried out using 1 chain consisting of 300,000 iterations. After discarding the first 30,000 iterations, samples of the parameters of interest were saved every 100 iterations. Flat priors were used for all parameters, and beginning values were randomly sampled.
 |
RESULTS
|
|---|
Except for
in design 2, genetic parameter estimates obtained with the product threshold model were in accordance with the simulated data (Table 2). Similar posterior means and credibility intervals were obtained with the multiple trait model (results not shown). In nearly all designs, heritability estimates obtained with the additive threshold model understated true values, whereas genetic correlation estimates aligned with those obtained with the product threshold model. Fixed effects were well estimated with the product threshold model but not with the additive one (Table 3). However, fixed effects in product and additive threshold models do not have the same interpretation and cannot be properly compared. Even if posterior means were similar, credibility intervals obtained for fixed effects with the product threshold model were 3 to 4 times larger than those obtained with the multiple trait model (results not shown).
View this table:
[in this window]
[in a new window]
|
Table 2. Variance components estimated (posterior mean) with the additive and product threshold models for the different designs (credibility interval in brackets)1
|
|
View this table:
[in this window]
[in a new window]
|
Table 3. Fixed effects estimated with the additive threshold and product threshold models for the different designs (credibility interval in brackets)
|
|
Correlations between "best estimated" and predicted direct or associative breeding values were greater for the product threshold model than for the additive threshold model in all designs (Table 4). A plot between the "best" and estimated breeding values from the product threshold model did not show any trend, whereas the additive model underestimated the largest breeding values (Figure 3). Nevertheless, correlations between breeding values estimated with product and additive threshold models were high.
View this table:
[in this window]
[in a new window]
|
Table 4. Correlations between breeding values predicted with the different models (between direct breeding values above the diagonal, associative breeding values below the diagonal)
|
|

View larger version (14K):
[in this window]
[in a new window]
|
Figure 3. Correlation between direct breeding value obtained with product and additive threshold models in design 1 (200 individuals i, 400 individuals j, balanced data, high direct genetic variance).
|
|
 |
DISCUSSION
|
|---|
For studying reproduction results, we proposed the product threshold model, which supposes that a success for P* is observed when there is a success for
and
A similar approach has been proposed for modeling embryo implantation after in vitro fertilization in humans: the EU model (Speirs et al., 1983
), where E is the number of viable embryos and U is a binary indicator of uterine receptivity. Dukic and Hogan (2002)
showed that in the extreme case where the number of embryos transferred is equal to one for all subjects, none of the model parameters is identifiable from observed data. Our illustrations imply a similar situation where each female event is associated with a unique male event. Nevertheless, our results (with input parameters essentially recovered within high credibility regions) suggest that parameters are identifiable, at least under the conditions considered. In general, identification is very difficult to study in complex hierarchical models such as the one considered here. In any case, if proper priors are adapted for all parameters, this provides unambiguous Bayesian learning (Carlin and Louis, 2000
; Sorensen and Gianola, 2002
). However, it may be the case that the prior could always be influential, even for very large samples. An argument suggesting why we obtained good estimations of the parameters using flat priors is as follows. In the standard linear animal model, the genetic additive (a) and permanent environmental (p) effects cannot be disentangled from each other at the level of the conditional likelihood because only their sum is identified (an infinite number of values a and p produce the same sum). However, when all random effects are integrated out, the corresponding variance components can be identified in the marginal likelihood, provided their covariance matrices are not diagonal (due to the relationship matrix considered for the additive genetic effect). A similar reasoning holds in the product threshold model: at the level of the conditional likelihood, the probabilities Pr(PS = 1) and Pr(PD = 1) are not identified because an infinite number of values yield the same product, Pr(PS = 1) x Pr(PD = 1). However, the explanatory structure in the probit link functions are distinct for the 2 components, and further, the "male" and "female" genetic effects have different covariance matrices. This led to parameter identification in the cases examined. In short, our illustration results suggest that identification was attained, even though improper priors were adopted for fixed effects and variance components.
The product threshold model was proposed for studying reproduction results under a double assumption: the observed phenotype is the product of 2 unobserved phenotypes and these unobserved events are conditionally independent. When events are not independent,
In this case, expressing conditional probabilities as a function of genetic and environmental factors is not trivial and needs further investigations. On the other hand, it may be of interest to include a third unobserved phenotype affecting the reproduction result: the embryo survival effect. Even if, theoretically, the product threshold model can consider more than 2 unobserved phenotypes, including a third one could induce convergence problems. Actually, it is well known that the additive threshold model can produce biased estimates of parameters when the amount of data per level of fixed effect is small, known as the extreme category problem (Misztal et al., 1989
; Moreno et al., 1997
). The risk of an extreme category problem is greater in the product threshold model, where the probabilities of success of the unobserved phenotypes are in essence greater than the probability of success of the observed phenotype. This problem increases as the number of unobserved phenotypes increases. Because
and
are unobservable, it does not seem possible to investigate a priori this kind of problem.
The illustrations aimed to investigate whether or not the product threshold model is feasible and to evaluate the importance of bias from fitting a (wrong) additive threshold model when observations come from a product threshold model. Even though different designs (balanced vs. unbalanced data; low, medium, or high variances; low vs. medium correlation; scant or considerable information per individual) were considered, these represent only a few of the many possible data scenarios. Due to heavy computing time requirement (2 wk per estimation), there were no replications of each design. Therefore, further investigations are needed to evaluate properties of the product threshold model. Nevertheless, we obtained similar results for simulations with some replications and fewer observations (8,000).
The product threshold model can extract more information from the data than the additive threshold model, provided that it holds. Also, it is able to estimate the effect of an environmental factor on either the direct or associative unobserved phenotypes, whereas the additive threshold model only allows estimating a global effect on the observed phenotype. Our results indicated that the additive threshold model tended to temper the effect of a factor by combining the effects on the direct and associative phenotype. The second level of had a slightly positive effect on the observed phenotype in comparison with its first level in the additive threshold model, whereas it had a positive effect on the direct phenotype and a negative one on the associative phenotype. Therefore, strategies for improving P* may be different in the 2 models. For instance, AI success declines with time in some species (Mackey et al., 2007
). In species where fresh semen is used, a product model would estimate year effects on male and female fertility separately and then identify which sex is responsible for the decline. Subsequently, fertility research could be focused on this sex.
Discrepancy between best and predicted breeding values with the additive threshold model was especially marked for individuals with increased breeding values. Consequently, fitting an additive threshold model when data are generated under the product threshold model assumption would lead to a less effective evaluation of the best animals in the population. This issue requires additional study.
The objective of the study was to propose an alternative model to the additive threshold model for studying reproduction result in an animal genetics context to evaluate if estimation procedures can work well on this model and to evaluate if results obtained with this model differ from those obtained with the classical approach. Because parameters seem estimable, the next step of the analysis would be to compare the predictive ability of the 2 models on real data to provide validation of the product threshold model.
 |
Footnotes
|
|---|
1 The authors thank the Ministère de lAgriculture for supporting this study in the framework of a BELIA action, directed by the Association Nationale Insémination Ovine and Institut National de la Recherche Agronomique. The authors thank Wendy Brand-Williams for editing of the English language. The Midi-Pyrénées region is acknowledged for supporting D. Gianola as Chaire DExcellence Pierre de Fermat. 
2 Corresponding author: Ingrid.David{at}toulouse.inra.fr
Received for publication October 27, 2008.
Accepted for publication April 22, 2009.
 |
LITERATURE CITED
|
|---|
Bijma, P., Muir, W. M. & van Arendonk, J. A. M. 2007. Multilevel selection 1: Quantitative genetics of inheritance and response to selection. Genetics 175:277–288.[Abstract/Free Full Text]
Carlin, B. P., and T. A. Louis. 2000. Bayes and Empirical Bayes Methods for Data Analysis. 2nd ed. Chapman and Hall/CRC Press, Boca Raton, FL.
Dempster, E. R. & Lerner, I. M. 1950. Heritability of threshold characters. Genetics 35:212–236.[Free Full Text]
Dukic, V. & Hogan, J. 2002. A hierarchical Bayesian approach to modeling embryo implantation following in vitro fertilization. Biostatistics 3:361–377.[Abstract]
Gianola, D. 1982. Theory and analysis of threshold characters. J. Anim. Sci. 54:1079–1096.[Abstract/Free Full Text]
Gianola, D. & Foulley, J. L. 1983. Sire evaluation for ordered categorical data with a threshold model. Genet. Sel. Evol. 15:201–224.[CrossRef]
Gilmour, A. R., Anderson, R. D. & Rae, A. C. 1985. The analysis of binomial data by a generalized linear mixed model. Biometrika 72:593–599.[Abstract/Free Full Text]
Harville, D. & Mee, R. 1984. A mixed-model procedure for analyzing ordered categorical data. Biometrics 40:393–408.[CrossRef]
Hoeschele, I. & Tier, B. 1995. Estimation of variance components of threshold characters by marginal posterior modes and means via Gibbs sampling. Genet. Sel. Evol. 27:519–540.[CrossRef]
Koerhuis, A. N. M. & Thompson, R. 1997. Models to estimate maternal effects for juvenile body weight in broiler chickens. Genet. Sel. Evol. 29:225–249.[CrossRef]
Mackey, D. R., Gordon, A. W., McCoy, M. A., Verner, M. & Mayne, C. S. 2007. Associations between genetic merit for milk production and animal parameters and the fertility performance of dairy cows. Animal 1:29.[CrossRef]
Misztal, I., Gianola, D. & Foulley, J. L. 1989. Computing aspects of a nonlinear method of sire evaluation for categorical data. J. Dairy Sci. 72:1557–1568.[Abstract/Free Full Text]
Moreno, C., Sorensen, D., Garcia-Cortes, L. A., Varona, L. & Altarriba, J. 1997. On biased inferences about variance components in the binary threshold model. Genet. Sel. Evol. 29:145–160.[CrossRef]
Piles, M., Rafel, O., Ramon, J. & Varona, L. 2005. Genetic parameters of fertility in two lines of rabbits with different reproductive potential. J. Anim. Sci. 83:340–343.[Abstract/Free Full Text]
Robinson, D. L. 1996. Estimation and interpretation of direct and maternal genetic parameters for weights of Australian Angus cattle. Livest. Prod. Sci. 45:1–11.[CrossRef]
Sorensen, D., Andersen, S., Gianola, D. & Korsgaard, I. 1995. Bayesian inference in threshold models using Gibbs sampling. Genet. Sel. Evol. 27:229–249.[CrossRef]
Sorensen, D., and D. Gianola. 2002. Likelihood, Bayesian and MCMC Methods in Quantitative Genetics. Springer-Verlag, New York, NY.
Speirs, A. L., Lopata, A., Grnow, M. J., Kellow, G. N. & Johnston, W. I. H. 1983. Analysis of the benefits and risks of multiple embryo transfer. Fertil. Steril. 39:468–471.[Medline]
Varona, L. & Noguera, J. L. 2001. Variance components of fertility in Spanish Landrace pigs. Livest. Prod. Sci. 67:217–221.[CrossRef]
Wilham, R. L. 1963. The covariance between relatives for characters composed of components contributed by related individuals. Biometrics 19:18–27.[CrossRef]
Wright, S. 1934. An analysis of variability in number of digits in an inbred strain of guinea pigs. Genetics 19:506–536.[Free Full Text]
Zhou, H. & Weinberg, C. R. 1996. Modeling conception as an aggregated Bernouilli outcome with latent variables via the EM algorithm. Biometrics 52:945–954.[CrossRef][Medline]