|
|
||||||||
ANIMAL GENETICS |

* Department of Animal and Aquacultural Sciences, Norwegian University of Life Sciences;
and
Aqua Gen AS, 7462 Trondheim, Norway
| Abstract |
|---|
|
|
|---|
Key Words: genetic contribution genetic gain inbreeding salmon breeding optimal selection
| INTRODUCTION |
|---|
|
|
|---|
This paper presents a new algorithm for the calculation of optimal genetic contributions based on the methods of Meuwissen (1997)
and Meuwissen and Sonesson (1998)
. This new algorithm, called OCSELECT, uses an alternative method to calculate the relationship between the selection candidates, reducing computational time, which is important especially when the number of candidates is large. The OCSELECT algorithm was tested on different data sets and the solutions where compared with those of the software package GENCONT (Meuwissen, 2002
).
| MATERIALS AND METHODS |
|---|
|
|
|---|
![]() |
where ct is the vector of genetic contributions of the selection candidates and EBVt is the vector of the BLUP estimated breeding values of the candidates. Furthermore we know that the contributions within each sex sum to
, which could be written as
![]() | [1] |
where Q is a known incidence matrix for sex. To control the future increase of inbreeding the optimum genetic contribution theory restricts the average coancestry between selected animals to
![]() | [2] |
where
t+1 is set to [1 (1
F)t+1], with
F being the desired rate of inbreeding.
Now the optimal ct that maximizes Gt+1 under constraints [1] and [2] is obtained by introducing 2 La Grangian multipliers,
0 and
, and maximizing the following objective function:
![]() | [3] |
![]() | [4] |
and
![]() | [5] |
A more detailed description of optimum genetic contribution selection and the derivation of equations for the 2 La Grangian multipliers is given by Meuwissen (1997)
. In addition, Meuwissen and Sonesson (1998)
presented the extension of optimum contribution selection for breeding schemes with overlapping generations.
Algorithm To Calculate A1
The calculation of the optimal genetic contribution of an animal to future generations requires the relationship between the selection candidates; i.e., the additive relationship matrix and its inverse (see Equations [3], [4], and [5]). If the number of selection candidates is large, especially the inversion of the relationship matrix requires substantial memory and computational time. Furthermore, the computational time does not increase linearly, but rather proportionally, to N3, where N is the number of selection candidates.
The relationship between 2 animals, i and j, is equal to
![]() |
where Asisj is the relationship between the sire of i and the sire of j, Asidj is the relationship between the sire of i and the dam of j, Adisj is the relationship between the dam of i and the sire of j, and Adidj is the relationship between the dam of i and the dam of j. In addition the relationship of an animal with itself is
![]() |
where
is the average inbreeding of the parents si and di.
In matrix form these equations could be written as
![]() | [6] |
where Z is a matrix where element Zij is
if j is parent of i and 0 otherwise, Ap is the relationship matrix of the parents, and D is a diagonal matrix with the Mendelian sampling variances on the diagonal; i.e.,
(1 F). By use of the algebra of partitioned matrices, the inverse of A is equal to A1 = d1 d1Z (Z'd1Z + AP1)1 Z' d1. Because D is a diagonal matrix this matrix is easy to invert.
In Equation [6], the inverse of the relationship matrix between the parents is needed, which is a relatively small matrix in most breeding schemes where few parents are selected. Furthermore, AP1 needs to be calculated only once, because even if the number of selection candidates is reduced due to the discarding of selection candidates with zero-contributions, the number of parents stays the same (although some may have no offspring). The inverse of (Z'd1Z + AP1) is of the same size as the number of parents, so it is fast to calculate, but it does change if the list of selection candidates becomes smaller because of the rejection of selection candidates with negative contributions (because Z changes). However, if the change is small; e.g., only 1 selection candidate gets rejected, the Sherman-Morrison formula can be used to calculate the new inverse of (Z'd1Z + AP1) from that of the previous iteration (Press et al., 1992
).
Therefore, in the presented algorithm, we reject only 1 selection candidate per iteration (i.e., the selection candidate with the most negative contribution). This algorithm is expected to be more robust against rejecting animals with negative contribution that should have had a positive contribution in the ultimate solution, compared with algorithms that reject all negative contributions simultaneously (e.g., GENCONT). The actual setting up of A1 is avoided as follows: in Equations [3], [4], and [5], we need terms including A1, of the form A1R, where R is a matrix to be multiplied with A1. Now instead of setting up A1, we compute d1R d1Z (Z'd1Z + AP1)1 Z'd1R directly, where d1 is easy to calculate because D is diagonal.
Data
The new algorithm was tested on salmon breeding data (discrete generations, large number of candidates) from practical selection schemes. In the fish-breeding program, all selection was to be based on optimum contribution selection, with no other objectives; e.g., marker genotypes.
The EBV were based on BW, measured at 8 months of age. The file of EBV contained 39,214 selection candidates from 226 sires and 227 dams.
For the comparison of GENCONT and OCSELECT, this data set was split into 19 smaller data sets. Each of the smaller data sets 1 to 18 contained 2,000 selection candidates, whereas smaller data set 19 contained 3,215 selection candidates. The smaller data sets were created by randomly allocating families to the subset until the total size of the subset exceeded 2,000 animals, in which case the remaining family members were allocated to subset 2, etc. In subset 19, the total size of the data set was not limited, resulting in 3,215 selection candidates for this subset. In all analyses with the smaller data sets, the acceptable rate of inbreeding was set at 0.005.
At first both programs should minimize the relationship between the selected animals. Further analyses were carried out with different restrictions on the minimal and maximal contributions that a selected animal could contribute to the next generation. Minimal contributions implied that the selected animals had to contribute at least this amount or they were not used at all. The minimal contributions were fixed at 0.0025 or 0.005%, and the maximal contributions were fixed at 1, 2, 3, 4, or 5%, respectively. So in total all 19 of the smaller data sets were analyzed with different combinations of restrictions at first with GENCONT and thereafter with OCSELECT.
The number of male and female selection candidates and the average breeding values and their standard deviations for the different smaller data sets and the whole data set are given in Table 1
. The entire data set of 39,214 candidates was analyzed by OCSELECT only because GENCONT could not handle such a large data set. The maximum number of selection candidates GENCONT could handle was approximately 4,000; i.e., smaller data set 19 was close to the maximal number of selection candidates GENCONT could handle (Table 1
). The pedigree file contained 45,846 animals, which descended from 589 sires and 703 dams. The number of offspring per sire ranged from 1 to 267 and from 1 to 315 for the dams, respectively.
|
| RESULTS |
|---|
|
|
|---|
After the minimization the 19 smaller data sets were analyzed with a restriction on the acceptable rate of inbreeding, which was equal to
F = 0.005, but no constraints on minimum nonzero contribution (Cmin) or maximum contribution (Cmax). For smaller data set 1 both programs estimated a
G of 16.19 and selected 61 males and 65 females. The
G in smaller data set 19 was 17.67 and the selected number of animals was 44 males and 45 females. Also in the smaller data sets 2 to 18 OCSELECT and GENCONT gave exactly the same results with respect to
G, the number of selected males, the number of selected females, and the estimated genetic contributions of the selection candidates.
Table 2
summarizes the comparison of the 2 algorithms, with application of constraints, using smaller data set 1, and Table 3
shows the results of the 2 programs using smaller data set 19. If there were any difference between the 2 methods, then the results between the brackets belong to GENCONT.
|
|
The results summarized in Table 3
showed again that a restriction on the maximal contribution had a higher impact than a restriction on the minimal contribution. In addition Figure 1
shows the plotted EBV of the selected male candidates against the contributions without any restriction on minimal or maximal contributions. Figure 2
shows the impact of a restriction on minimal contribution of selected animals. There is little difference between Figures 1
and 2
except that some of the lowest contributions in Figure 1
have moved to zero in Figure 2
. Figure 3
(Cmax = 0.01) and Figure 4
(Cmax = 0.05) are plots from analysis with a restriction on maximal contribution. In Figure 3
the Cmax constraint was so stringent that all animals had a contribution of 1% or no contribution at all. In Figure 4
, the Cmax constraint was less stringent, and the highest contributions in Figure 1
are reduced here to 5%. All results presented in Figures 1
to 4![]()
![]()
are based on OCSELECT analysis. Furthermore in smaller data set 19 in some cases (italic font in Table 3
) both programs could not satisfy the constraint. However, in that case OCSELECT presents a solution with smaller differences between the constraint and the achieved relationship. This could also be observed in some cases in the smaller data sets 2 to 18 (not shown).
|
|
|
|
Calculation of Optimal Genetic Contributions for All 39,214 Candidates
The current relationship between the selection candidates in the complete salmon data set was 0.04223 and the minimal reachable relationship between the selected animals was 0.03055. To reach this relationship OCSELECT selected 10,142 males and 10,179 females, and the computational time for the minimization of inbreeding was 7 minutes.
Using the minimal pairwise relationship as a base and assigning an acceptable rate of inbreeding
F = 0.005, the pairwise relationship of the parents of the next generation was restricted to
![]() |
Using this constraint on the average pairwise relationship of the parents of the next generation OCSELECT estimated a genetic gain of 19.63 and 159 males and 140 females were selected (23 min computational time on a PC). This result suggests that it is possible to achieve genetic gain and reduce the average population relationship at the same time.
Table 4
shows the results of the calculation for the complete salmon data set with different restrictions on the minimal contribution, maximal contribution, or both. Also with the complete salmon data set a restriction on the maximal contribution of the selected animals has a higher impact than a restriction on the minimal contribution of a selected animal, with respect to the genetic gain and the computational time. As expected, the highest genetic gain was achieved again without any restrictions on the contributions of the selected animals.
|
| DISCUSSION |
|---|
|
|
|---|
The calculation of optimal genetic contributions OCSELECT could be divided in 2 steps. The first step is the preparation of the Ap, AP1, and the Z matrices, and in the second step the optimal genetic contributions are calculated in an iterative process, where negative contributions are excluded 1 by 1.
Previously presented algorithms used the additive relationship matrix between the selection candidates and the inverse of this matrix, which is computationally infeasible if the number of selection candidates is large, e.g., in fish breeding it is common to have between 10,000 and 100,000 selection candidates per selection round. In particular the inversion of the additive relationship matrix required a lot of computational time, and this time increases to the power 3 with the number of selection candidates. Henderson (1976)
and Quaas (1976)
derived fast methods to calculate the inverse of very large A matrices, but these methods assume that the A1 of all animals in the pedigree is needed, whereas in optimum contribution selection the A1 of a list of selection candidates is needed. In addition this matrix had to be calculated and thereafter inverted several times due to the fact that the candidates with negative contributions need to be removed from the solution.
In this paper we used the relationship matrix between the parents, AP, and its inverse AP1. Our results showed that it is possible to use the additive relationship between the parents of the selection candidates to calculate the optimal genetic contribution of the selection candidates to the next generation. The main advantages of using the relationship between the parents, instead of the relationship between the selection candidates are that this matrix has a smaller dimension and that it is not necessary to recalculate this matrix because in most cases the number of parents did not change with a decreasing number of selection candidates, and otherwise some of the parents simply did not have any offspring. It should be noted that Equation [6] assumes that all selection candidates come from the same generation. However, in practical breeding schemes it can occur that parents as well as offspring are selection candidates. This is accommodated in Equation [6] by having such parents of candidates directly included in AP, and setting their diagonal element in D to a small positive value (e.g., 106). Therefore it is possible to use OCSELECT also in case of overlapping generations.
Alternatively, genetic and evolutionary optimization algorithms have been proposed to calculate optimal contributions (Kinghorn et al., 2002
). These algorithms are general optimization algorithms and thus are very flexible in the choice of the objective function and the constraints that can be applied. However, these algorithms do not take advantage of the structure of the optimization problem and thus will have difficulty when applied to problems of high dimension, such as was the case here. Thus, it is expected that methods that take advantage of the structure of the solution are faster than general optimization algorithms when the number of candidates becomes very large, e.g., 39,214 as in the salmon breeding program.
The disadvantage of deleting all candidates with a negative contribution simultaneously is that there is no guarantee that all these candidates will have a negative contribution in the ultimate optimal solution, i.e., some of the deleted candidates might have a positive optimal contribution, but this solution will not be found by GENCONT once the animal has been removed. Therefore OCSELECT is safer against rejecting animals with negative contributions that should have had a positive contribution in the ultimate solution. This resulted in some situations where GENCONT could not achieve the constraint because it had removed too many individuals from the solution vector in a previous iteration (Tables 2
and 3
).
The only way to be sure of obtaining an optimal solution is simultaneous handling of all constraints, including Cmax and Cmin. In view of the size of the optimization problem considered, such an approach was not attempted. Instead a heuristic approach was taken, namely, include the constraints that are violated by the optimal solution in an iterative manner. This probably results in a close to optimal solution, which is confirmed by the current result that different strategies of including the constraints (OCSELECT and GENCONT) led to the same or very similar solutions in the majority of the situations investigated in Tables 2
and 3
. The optimality of the solution may be improved (but not guaranteed) by some heuristic that will permit reentry of excluded candidates. It is however not clear what the criterion for reentry should be. Once an animal is excluded, it is not included in A1, and thus Equation [3] cannot be evaluated for such an animal in order to check what its contribution should have been. In practical breeding schemes, many practical constraints need to be considered involving mating logistics and possible migration across farms. Hence, the solutions presented by OCSELECT are expected to serve only as targets to aim at while accommodating such constraints.
Figures 1
and 2
showed that there was only a small effect when the analysis includes a restriction on the minimal nonzero contributions of selection candidates. In practical situations it could be important that the algorithm can deal with minimal nonzero contributions because this ensures that a selected animal must be used at least for 1 mating. The reasons for the higher impact of maximal contributions (Figures 3
and 4
) are that 1) the contribution of the highest EBV animals are reduced, and 2) with stringent restrictions on Cmax, more animals need to be selected than otherwise would have been the case.
Genetic gains increased with the number of selection candidates (Table 2
and Table 3
) and the highest genetic gain was achieved when the contribution of all selection candidates were optimized simultaneously (Table 4
), which is in agreement with the results of computer simulations of Sonesson (2005)
. These computer simulations also showed that selection with a restriction on the rate of inbreeding is essential for populations with large numbers of candidates and large family sizes, such as fish populations, because the selected parents may otherwise come from very few families. Therefore the method presented in this paper is a further step for the implementation of optimum contribution selection on practical breeding schemes because this method makes it possible to treat all animals in large selection schemes as selection candidates simultaneously, and we do not need to preselect the candidates (e.g., on EBV) before entering them into the optimal contribution selection procedure.
| IMPLICATIONS |
|---|
|
|
|---|
1 Corresponding author: dirk.hinrichs{at}umb.no
Received for publication March 15, 2006. Accepted for publication May 26, 2006.
| LITERATURE CITED |
|---|
|
|
|---|
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |