|
|
||||||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
ANIMAL GENETICS |
,
* Animal and Dairy Science Department,
and
Department of Statistics, and
and
Institute of Bioinformatics, University of Georgia, Athens 30602-2771
| Abstract |
|---|
|
|
|---|
Key Words: beef cattle best linear unbiased prediction gene-assisted selection
| INTRODUCTION |
|---|
|
|
|---|
Soller (1978)
was among the first to discuss the uses of molecular information and suggested the preselection of animals before progeny tests based on molecular information. Fernando and Grossman (1989)
proposed BLUP methodology for obtaining breeding values from a mixed inheritance model. Although several attempts have been made (e.g., Meuwissen and Goddard, 1996
) to combine phenotypic and molecular information, the practical impact was limited because of the theoretical and computational complexities.
Currently in beef production, traits like marbling and tenderness are recorded on a limited basis due to the difficulty and expense of collecting these carcass measurements. In the case of sparsely recorded traits, additional information can be garnered not only through molecular information but also through genetic relationships with already available production traits. Although the collection of carcass traits can be expensive and cumbersome, the collection of correlated traits like birth weight or weaning weight is easily accomplished and routinely done. With this in mind, additional information about difficult-to-measure traits may be available, through genetic correlations, at no additional cost.
Consequently, the objective of the current study was to devise a method of combining molecular and phenotypic sources of information to compute a single breeding value that can be used in a straightforward manner.
| MATERIALS AND METHODS |
|---|
|
|
|---|
Let y = (y1',y2',...,yt' )' be a vector of phenotypic records for t quantitative traits. Without loss of generality, assume that a major gene is segregating only for the last trait. The major gene is assumed to have l different codominant alleles, A1,A2,...,Al, with relative frequencies p1,p2,...,pl, where
. For a population in Hardy-Weinberg equilibrium, the genotypic frequencies of the l(l + 1)/2 distinguishable genotypes are the product of the corresponding allelic frequencies.
Following the structure presented by Wu et al. (2002)
, the statistical model that describes the relationship between phenotypes, genotypes, and polygenic effects could be presented in the following hierarchical structure:
![]() |
where ß, a, and u are the vectors of systematic effects, major gene effects, and polygenic effects, respectively; X, W, and Z are the corresponding known incidence matrices; and gi, gsi, and gdi are the genotypes for the major gene for animal i, sire of animal i, and dam of animal i, respectively. Function f indicates the Mendelian segregation patterns between progeny and parents, and h is a multinomial distribution with known probability vector determined by the frequencies of the l alleles of the major gene.
Given that a major gene was assumed segregating only for the last trait, the additive effects could be written as
![]() |
with a and ut representing the major gene and polygenic gene effects for trait t, respectively.
Furthermore, assuming independence between the major gene and polygenic effects, the following equality holds true:
![]() |
Additionally, it was assumed that the major gene effect was independent of the polygenic effects in the first (t-1) traits. Thus, if G0 is the matrix of genetic (co)variances among all traits, and the major gene explains c percentage of the total additive variance for the last trait, then the resulting genetic (co)variance matrix between the 3 polygenic effects, G0*, will be
![]() |
In other words, the variance and associated (co)variances of the trait for which there is a causative gene are reduced according to the amount of variation explained by the gene.
Unknown Genotypes
Given the hierarchical structure of the genotypes presented above, and that genotypes in this study were assigned at random in the population, it is possible to extract additional genotypic information from the pedigree. Animals with missing genotypic information can be assigned 1 or both alleles given parental, progeny, or mate information. Given this trio of information sources and following an algorithm similar to Qian and Beckmann (2002)
and Tapadar et al. (2000)
, imputation on missing genotypes were made and additional genotypic information was garnered.
Simulation
A multiple-trait simulation was carried out using 3 traits. The simulation was designed to mimic a beef cattle data set. The traits included were birth weight, postweaning gain, and the trait of interest, marbling score (MS). The genetic parameters for these 3 traits were within the bounds of published values (Woodward et al., 1992
; Marshall, 1994
; Splan et al., 1998
). The genetic and residual correlations between traits are given in Table 1
.
|
For continuous traits, as is the case in this study, a 3-trait mixed linear model (Henderson and Quaas, 1976
) was implemented. For the first 2 traits, no molecular information was assumed available and only the polygenic effects were included in the model. For the third trait, a major gene and the remaining polygenic effects were considered.
In matrix notation, the following mixed inheritance model was used:
![]() |
where y is the vector of phenotypic observations for the 3 traits; ß and u are the vectors of fixed and random effects, respectively; a is the vector of genotype means (included in the model only for the third trait); and e is the vector of residual terms. The X and Z are known incidence matrices, and W is a matrix that includes the genotype of each animal. For the random effects, it was assumed that:
![]() |
where G and R are 3 x 3 genetic and residual (co)variance matrices, respectively, and A is the additive relationship matrix.
Data Sets for Analysis
A detailed description of the data sets created for analysis can be found in Table 2
. The allelic frequencies, amount of phenotypic data for the trait of interest, and the number of animals for which molecular information was recorded were varied. Animals with missing records or missing genotypes were randomly assigned. Data sets were labeled to represent the given scenario, with the F or U allele was more frequent, varying percentages (100, 5, or 0) of recorded phenotypes for the trait of interest, and varying percentages (100, 50, 25, or 0) of animals with molecular information. For example, data set U-5-100 would represent the case in which the U allele is at a frequency of 0.6, 5% of the MS records are observed, and all animals in the pedigree have molecular information. There are an infinite number of scenarios that could be explored by changing allele frequencies, observed genotypes, and observed phenotypes. However, the purpose of the current study was to explore the differences in extreme scenarios regarding the availability of molecular and phenotypic information, and to explore what appears to be a plausible scenario for the presence of MS records.
|
The pedigree file consisted of 4 generations with 1,000 animals in the base population and 3 subsequent generations consisting of 3,000 animals each, for a total of 10,000 animals of which 9,000 had records and were distributed in 300 herd classes. The correlation estimates were averaged over 10 replications. To estimate the unknown genetic parameters, Gibbs sampling was utilized. A long chain of 20,000 iterations with a burn-in period of 5,000 iterations was implemented. A postGibbs analysis showed that the 15,000 rounds were sufficient for the prediction of breeding values in this study. As indicated earlier, missing genotypes were predicted from known parental genotypes and from known progeny genotypes. Animals in the base population for which the genotype could not be predicted by progeny information were assigned alleles based on the assumed allelic frequencies.
Genetic Progress Analysis
Genetic progress over 5 generations of selection was assessed using the simulated data sets created from the scenario in which the U allele was more frequent. Although 5 generations may appear to be short-term selection, it seems misleading to study the genetic progress over extremely long periods of time because breeding goals and objectives change rapidly. Further, after 5 generations of selection pressure for increased MS, the F allele is close to fixation. Each generation, 3,000 more animal records were created so that by the end of generation 5 the pedigree included 25,000 animals and the data file included 24,000 animals with records. Additionally, each generation 100 herd classes were added. In the case of the data sets U-100-100, U-5-100, U-5-50, and U-5-25, dams for the next generation were selected from the 3 previous generations (9,000 animals). Dams were further discriminated against based on their predicted MS breeding value.
It is logical to assume that producers would set a threshold breeding value for the trait of interest and select animals from within that group perhaps based on other traits (e.g., health, structural correctness, disposition, etc.) rather than selecting the top 3,000 females. Although these traits were not simulated, to mimic this scenario dams were chosen at random from those who were in the top 67% of their predicted MS breeding value. This random draw from the pool of 6,000 dams could be thought of as selection among females of acceptable genetic merit for other phenotypic traits (e.g., weight, structural correctness, disposition, etc.). Consequently, 6,000 females were candidates for selection, of which 3,000 contributed to the next generation. Approximately 30% of these females were from the last generation and could be considered as replacement females. The replacement rate of 30% used here is slightly greater, but reasonably close, to the equilibrium replacement rate of 21.19% described by Greer et al. (1980)
. These females were then mated to the top 100 sires, chosen from all possible sires outside of the base population (i.e., first 1,000 animals) as determined by their predicted breeding value for MS. Selection of males and females was done without consideration of the accuracy associated with the predicted breeding value.
For the case of tandem selection, females and sires were chosen based on their genotype for the causative gene alone, such that animals having the undesirable genotype were not considered regardless of their polygenetic merit. In the early stages of selection there were not sufficient numbers of animals having the desirable genotype, so candidates for selection included those homozygous for the F allele and those who were heterozygotes but in the top 67% for their predicted breeding value for MS.
The analysis was carried out as previously stated with the exception of U-5-50 and U-5-25. In these 2 cases, the allele frequency used to assign genotypes to missing animals in the base population were updated each generation such that the original frequency of 0.6 for the U allele was changed to reflect the new allelic frequency in the entire population each generation. This change was made because the underlying assumption was that information concerning allelic frequencies in previous generations is unknown and the only known information is the molecular information currently available in the updated data file.
The results, MS breeding values and allelic frequencies, were averaged over 10 replicates. The average breeding values and allelic frequencies reported by generation are the average of the true values from all animals in the last generation (the last 3,000 animals) with the exception of generation 0, which is the average over the 10,000 animals in the pedigree before selection.
| RESULTS AND DISCUSSION |
|---|
|
|
|---|
|
Similarly, Spearman rank correlations were estimated among sires that had less than 20 offspring. These 60 sires would generally be younger animals and associated with less confidence in their breeding value. On average these younger sires had 15 progeny. Again, the estimates declined with the decline in information available. There is an approximate 4.5% increase in the rank correlation in favor U-5-50 over U-5-25, and a 5.3% increase in favor of U-5-25 over U-5-0. This shows that the inclusion of limited molecular information could aid in the accurate selection of young sires and perhaps allow for their use at earlier ages, thus reducing the generation interval.
The top sires, sires in the top 20% for their true breeding for MS, and the undesirable sires, those in the bottom 20% for their breeding value for MS, are of the most interest to those looking to select future parent stock. Again, the results show an increase in the correlation along with the increase in information. Of the most significance is the 11.3% increase in the rank correlation among top sires between the estimated and the true value when only 25% of the genotypes are known when compared with the case when genotypic information is unknown. This shows that the inclusion of molecular information in the prediction of breeding values can significantly aid in the selection of the best candidate sires.
Avoiding the sires that offer the least potential for genetic improvement for the trait of interest, in this case MS, is also important. In this comparison, the most significant differences in rank correlations were 7.3 and 7.0% between U-5-100 and U-5-50, and between U-5-50 and U-5-25, respectively.
Complete Absence of Marbling Score Records
The scenario in which only 5% of the records for the trait of interest are observed is not ideal; however, a much less desirable, but possible, scenario is the case in which there is a complete absence of records for the trait of interest in a particular set of data. If genetic correlations exist between the missing traits and other available traits, it can still be possible to obtain breeding values for the missing trait. The following results (Table 4
) are obtained under this scenario.
|
Rank Correlations.
In general, the rank correlations among all subgroups follow the same patterns as when there are records available for the trait of interest. Overall, the rank correlation estimates are lower than when there are limited records available for MS. The lower estimates of the top and bottom sire rank correlations compared with U-100-100 can be explained by the fact that when all phenotypes are observed the breeding values for MS can be easily differentiated. However, in the presence of missing records, animals who are superior in their true breeding value but have no record themselves and very little progeny information are regressed toward the mean.
Minimal Records for Marbling Score and Increased Incidence of the Favorable Allele
In the previous scenarios the F allele was assumed to have a frequency of 0.4. In the following scenario the frequencies are switched such that the F allele is at a frequency of 0.6. Logically it can be justified that either allele could be more frequent. If there has been phenotypic selection for increased marbling then it is reasonable to assume that the F allele would be in higher frequency. However, if this gene also negatively affects external fat (increases external fat) then it is possible that the gene has been selected against. The results from scenarios where the F allele has a frequency of 0.6 are in Table 5
. The changes in the results for Pearson correlations are minimal due to the fact that the frequencies are very close to being equal. The Pearson correlation estimates are slightly lower than those in Table 3
, but the differences are negligible.
|
The differences between the cases when the F allele is more frequent or less frequent are small. However, in this case the frequencies are 0.6 and 0.4. As the frequencies diverge from each other it is possible that these differences may become more apparent. As selection over time occurs, the initial frequencies may change due to the rapid or slow fixation of the F allele or the possible loss of the allele depending on selection strategies.
Genetic Progress
Average MS breeding values by generation can be found in Table 6
. The changes in MS breeding values for the 10 replications are depicted in Figure 1
. Across all generations, U-100-100 had higher true breeding values for MS. In the early stages of selection, generations 1 and 2, the tandem procedure led to higher breeding values with its greatest advantage occurring in generation 1. However, this advantage dissipated by generation 3 and by generation 5 the tandem procedure yielded the least desirable results. In fact at generation 5, the average of the true MS breeding values in the youngest generation of U-5-25, U-5-50, and U-5-100 were 24.7, 37.0, and 43.6% larger, respectively, than that of the tandem procedure.
|
|
The average allelic frequency of the F allele can be found in Table 7
. The changes in the frequency of the F allele averaged over the 10 replicates are depicted in Figure 2
. The change in the frequency of the F allele was most rapid in the tandem procedure and generally declined as the percentage of animals with known genotypes declined. The U-100-100 procedure was intermediate between U-5-100 and U-5-50. At the fifth generation the frequency of the F allele in the youngest generation was 0.92, 0.74, 0.82, 0.87, and 0.84 for tandem, U-5-25, U-5-50, U-5-100, and U-100-100, respectively. The high frequency of the F allele created by the tandem procedure is not surprising given that molecular information was the main criteria for selection. The fact that the tandem procedure had the highest frequency of the F allele at generation 5, and the lowest MS breeding value shows that the rapid fixation of an allele in a population is not conducive to genetic improvement. Dekkers (2004)
explained that the loss of genetic response due to tandem selection decreases to zero as the effect of the gene becomes larger. However, in the case of polygenic traits of economic value in livestock, this is not likely because a major gene may, at most, account for 10% of the genetic variation for the targeted trait. Furthermore, it may be possible that single genes with large effects, such as the one simulated here, may have deleterious pleiotropic effects or may be closely linked to other genes with detrimental effects (Lande and Thompson, 1990
). Consequently, ignoring phenotypes in selection can cause unexpected and undesirable changes.
|
|
| IMPLICATIONS |
|---|
|
|
|---|
1 Corresponding author: mspanky{at}uga.edu
Received for publication September 8, 2006. Accepted for publication November 1, 2006.
| LITERATURE CITED |
|---|
|
|
|---|
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |