|
|
||||||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
ANIMAL GENETICS |
,2
,
,3
* Animal and Dairy Science Department, University of Georgia, Athens 30602-2771;
and
USDA-ARS, Fort Keogh Livestock and Range Research Laboratory, Miles City, MT 59301;
and
Department of Statistics, and
Institute of Bioinformatics, University of Georgia, Athens 30602-2771
| Abstract |
|---|
|
|
|---|
Key Words: genotype sampling marker-assisted selection simulation
| INTRODUCTION |
|---|
|
|
|---|
Once the selected animals are genotyped, several methods have been applied for the assignment of alleles to other animals in the population via allelic peeling (Wang et al., 1996
; Thallman et al., 2001
) or Gibbs sampling (Fernandez et al., 2001
). The problem of calculating genotypic probabilities for nongenotyped animals in the presence of sparsely recorded genotypes, as is the case for genetic disorders, is complex and has been addressed in Henshall et al. (2001)
. However, it could be possible to infer genotypes of all other animals in the population with relatively high accuracy. Therefore, the objectives of the current study were to investigate sampling techniques for genotyping a selection of animals and to determine the impact of estimating allele frequencies of selected animals using simulated pedigrees and genotypes. Selected procedures were tested using actual beef cattle pedigrees with simulated genotypes.
| MATERIALS AND METHODS |
|---|
|
|
|---|
Selecting Animals for Genotyping
Random Sample. To determine the animals for genotyping, a random sample from the population was taken. It was assumed that either 5 or 15% of the population would be randomly selected for genotyping. The random selection scenario was utilized as a method for comparing different selection scenarios based on the relationship between animals in the population.
Relationship Matrix. The inverse of the relationship matrix (A–1) was used for selecting animals for genotyping. Once A–1 was computed, males and females were separated and sorted by their diagonal element of A–1 and the number of progeny. Females were additionally sorted by their number of mates. In the current study, it was assumed that 5% of the population would be selected for genotyping using the relationship between animals, the number of progeny, and the number of mates (females only).
For the scenario in which males and females were selected for genotyping, an equal number of each sex was selected. In other words, 5% of the population, half being males and half being females, were selected for genotyping. Within sexes, animals were ranked by their corresponding diagonal element of A–1 and tied ranks were broken using numbers of progeny (males and females) and number of mates (females only). This was done to maximize the number of alleles known through half-sib relationships. When the number of females within a diagonal element-number of progeny-number of mates group exceeded the number to be selected, females were then selected randomly within that group. Similarly, males were randomly selected within a diagonal element-number of progeny group when the number of males in that group exceeded the number of males to be selected.
When only males were selected, the method of selecting males as described previously was used. For this scenario, 5% of the population selected for genotyping consisted of only males (males with the highest diagonal elements). In other words, the top 5% of males based on their diagonal element of A–1 and number of progeny were selected for genotyping.
Absorption. Selection of animals was based on the diagonal element of either the relationship matrix, A, or the inverse of the relationship matrix, A–1. Animals were selected based on their diagonal element. Further, only one animal was selected in the iterative process. The iterative sampling process was run until a total of n animals were selected. The n animals selected were based on genotyping 5% of the animals in the population. In situations where more than one animal had the largest diagonal element, an animal was randomly selected by calling a uniform distribution, U[0, 1].
The absorption procedure used in the current study is described below.

where P was a matrix with dimension nxn=(n=1, ..., na), na was the total number of animals in the population, aii was the diagonal element of animal i in A–1, and C and R were column and row vectors, respectively, of the selected animal i. Further,

and

After absorption of animal i, the new A–1, NA–1, was computed as follows:

The equations presented here were for selection of animals using A–1. The procedure could be easily converted to selection of animals based on the diagonal element of A. However, forming A (or inverting A–1 to get A) could be time consuming, depending on the structure and size of the pedigree.
Peeling
Given that genotypes in this study were assigned at random from the parental genotypes in the population, it is possible to extract additional genotypic information from the pedigree. Animals with missing genotypic information can be assigned one or both alleles given parental, progeny, or mate information. Given this trio of information sources and following an algorithm similar to Qian and Beckmann (2002)
and Tapadar et al. (2000)
, imputations on missing genotypes were made and additional genotypic information was garnered. The peeling process used in the current study to determine known alleles in the population given the genotypes of animals selected was implemented in 3 steps. For the current study, it was assumed that there were no errors in the recorded pedigree, resulting in all animals having known paternity and maternity. Whenever possible, maternal and paternal alleles were identified based on inheritance. For the purpose of this study, the first allele was inherited from the sire and the second allele was inherited from the dam. If the parental origin of an allele was unclear, then the known allele was arbitrarily assigned as either the paternal or maternal allele.
Statistical Analysis and Computation
After selection of animals for genotyping, the number of animals with 1 or 2 alleles known was computed. This was done by simply counting the number of animals that were assigned either 1 or 2 alleles based on the peeling procedure described above. The percentage of alleles known based on the peeling procedure (AKP) was computed as follows:

where n1 and n2 were the number of animals with 2 and 1 allele(s) known and na was the total number of animals in the population. Furthermore, n1 and na were multiplied by 2, because each animal has 2 alleles.
In this step, an animal with either 1 or 2 allele(s) known was not penalized if the position of the allele(s) was incorrectly assigned. For example, animal i was genotyped as bb and no information was available about the parents genotype. Given that each parent had to have passed allele b to their progeny, animal i, the parents genotype could then be assigned as _b or b_, where _ was the unknown allele, b was the known allele, _b indicated that allele b was inherited from the dam, and b_ indicated that allele b was inherited from the sire. If animal is sires true genotype was b_ but was assigned as _b, then animal is sire was included in the computations of the number of animals with 1 or 2 alleles known and AKP.
Gibbs Sampling.
After the known alleles were determined by the peeling process described above, these alleles were used as prior information in the Gibbs sampler (Wang et al., 1993
; Sorenson et al., 1994
; Sheehan, 2000
; Fernandez et al., 2001
) to assign genotypes to the remaining animals in the population. For the base population animals, the unknown allele(s) were randomly sampled given the frequency of alleles in the population and the assumption of Hardy-Weinberg equilibrium. Unknown alleles for nonbase population animals were randomly sampled from the parents genotypes according to Mendelian rules. An equal weight was assumed for inheriting either the first or second allele from a parent. For a nonbase population animal that had only one unknown allele, the unknown allele was sampled approximately half of the time from the sires genotype and the remaining time from the dams genotype. This was to compensate for incorrect assignment of the known allele as illustrated in the above example.
At the end of the sampling process, a benefit function that described the total number of alleles known in the population was computed. This function was computed from a combination of known alleles and the probability of unknown alleles assigned during the sampling process. To be included in the benefit function, an allele in a particular position had to be equal to the true allele of the same position (i.e., Bb and bB were not equal). The probability of allele ai,j (j = 1 or 2) being assigned as the true allele j for animal i was calculated as:

Using p(ai,j) and the number of known alleles, the benefit function was then computed as

where n1, n2, and n3 were the number of animals with 2, 1, or 0 alleles known, respectively, and p(ai,j) as previously defined. The percentage of alleles known after the Gibbs sampling process (AKG) was such that

where benefit was the benefit function computed above and na was the total number of animals in the population.
During each round of the sampling process, only one genotype for any given animal was assigned as the true genotype. Thus, at the end of the sampling process every animal had a probability of having the true genotype, PTGig, assigned as
![]()
where genotype g was the true genotype of animal i. The average probability of the true genotype being identified for every animal in the population (APTG) was computed using the following:

where PTGig was defined as above and na was the total number of animals in the population. In contrast to the benefit function, APTG only required that the animal have the correct genotype—Bb was considered the same genotype as bB—and therefore was able to compensate for the incorrect allele position and sampling the correct unknown allele.
Simulation
A pedigree with 4 overlapping generations was simulated. The base population included 500 unrelated animals and subsequent generations consisted of 1,500 animals with a total of 5,000 animals generated. Approximately 10% of the animals were sires with approximately 8 progeny per sire and 42% dams with approximately 1.9 progeny per dam. One SNP with 2 alleles was simulated for every animal in the pedigree file. Genotypes of the base population animals were assigned based on allele frequencies. For the 3 subsequent generations, genotypes were randomly assigned using the parents genotype, where an equal chance of passing either the first or second allele was assumed. Five replicates of the simulated data were generated.
Three different frequencies for the favorable allele were used in the simulation and analyses. The frequencies were 0.30, 0.50, and 0.80. Allele frequencies used in the analyses were either the true frequency (equal to the allele frequency used in the simulation) or estimated from the animals that were selected for genotyping. For the analyses using Gibbs sampling, a total chain length of 25,000 iterations of the Gibbs sampler was run, where the first 5,000 iterations were discarded as burn-in.
Two real beef cattle pedigrees were used to validate the selection scenarios using simulated genotypes. The first pedigree was obtained from a Gelbvieh field data set and was similar, but slightly smaller than, the pedigree used by Sapp et al. (2003)
and consisted of 29,101 animals of which approximately 16.4 and 54.8% were sires and dams, respectively. There were approximately 5.7 offspring per sire and 1.7 offspring per dam. The second pedigree was a smaller research pedigree obtained from the USDA-ARS research station at Ft. Keogh (Montana) from the Line 1 Hereford selection project started in 1934 (Kealey et al., 2006
) and consisted of 8,688 animals. It comprised approximately 6.6% sires and 33.0% dams. Each sire had 14.6 offspring on average, and each dam had 2.9 offspring on average. For the 2 beef cattle pedigrees, all animals with both parents unknown were assumed to comprise the base population. For these animals, genotypes were assigned based on allele frequencies. For all other animals, genotypes were randomly assigned using the parents genotype, where there was an equal chance of passing either the first or second allele. Frequencies for the favorable allele were assumed to be either 0.3 or 0.5. The case in which the frequency of the favorable allele was 0.8 was omitted in the field data pedigrees due to the similarity of results in the simulated pedigrees between assuming a frequency of 0.3 or 0.8 for the favorable allele. The same Gibbs sampling procedure mentioned above was used.
| RESULTS AND DISCUSSION |
|---|
|
|
|---|
For all selection scenarios and allele frequencies, estimated allele frequencies were similar to their corresponding true frequencies. The number of animals with either 1 or 2 alleles known and AKP (percentage of alleles known before Gibbs sampling) were identical when the true or estimated allele frequencies were used. This was because these parameters were estimated before the Gibbs sampling procedure and thereby depend only on the allele frequency used in the simulation. Across the 3 allele frequencies, the parameters that depend on allele frequency—benefit function, AKG, and APTG—presented very small differences between the true and estimated allele frequency used in the analysis, suggesting that the estimated allele frequency did not have a significant impact on population parameters when different sampling strategies were implemented. Therefore, the results of the current study will be reported using estimated allele frequencies. Given that the estimated frequencies were similar to the true frequencies in all pedigrees, allele frequency will be referred to as the true frequency (i.e., estimated frequency of 0.79 will be referred to as 0.80). Because genotypes were randomly assigned in the base population and as such are not linked to any trait, they are not influenced by selection. In practice, one would expect larger differences between estimated and known allele frequencies if selection pressure has been applied to the trait for which the marker is associated. As the magnitude of this difference increases, the measures of AKG and APTG would be adversely affected. The correct allele frequency in a population that has undergone artificial selection would be dependent on the amount and duration of selection pressure applied, the magnitude of the association between the marker and trait under selection, and the effect of the marker on fitness traits.
Based on the results of the current study, the allele frequency had an effect on population parameters regardless of the method of selecting animals for genotyping. For all selection scenarios, estimates of all parameters tended to be lowest when an allele frequency of 0.50 was used. Similarly, results indicated that estimates of parameters tended to be greatest when using an allele frequency of 0.80. Further, the results suggest that genotyping strategy depends on the structure of the pedigree and the relative influence of males and females in a particular pedigree.
Random Sample
Five Percent Selected.
A description of the number of animals with 1 or 2 alleles known, percentage of alleles known, benefit function, and APTG based on randomly selecting 5% of the population for genotyping is presented in Table 1
. Based on the number of animals with 1 or 2 alleles known, the percentage of alleles known before the Gibbs sampling procedure (AKP) ranged from 10.05 to 10.94. The percentage of alleles known after Gibbs sampling (AKG) ranged from 60.18 to 73.10, suggesting that 60 to 73% of the alleles in the population were known when the probability that the true allele j (j = 1, 2) of animal i was assigned [p(ai,j)]. This result suggests that the Gibbs sampler in conjunction with the peeling process was able to identify a larger number of alleles in the population than the peeling process alone. To determine the (dis)advantage of using the Gibbs sampling and peeling procedure (AKG) compared with using the peeling procedure alone (AKP) a percentage difference was computed as [(AKG – AKP)/AKP] x 100. Using the percentage difference computed above, the Gibbs sampling procedure increased the percentage of alleles known in the population by over 500% across allele frequencies when compared with using the peeling procedure alone.
|
The average probability of the true genotype being identified for every animal in the population, APTG, ranged from 0.44 to 0.58 for the 3 allele frequencies used in the current study. This result indicates that 44 to 58% of the animals in the population had their true genotype assigned after the peeling and Gibbs sampling processes. The parameter APTG is greatly affected by the number of animals with either one or no alleles known. If there are a large proportion of animals with no alleles known, then APTG would be expected to be lower.
Fifteen Percent Selected.
A description of the number of animals with 1 or 2 alleles known, percentage of alleles known, benefit function, and APTG when 15% of the population was randomly selected for genotyping is presented in Table 2
. Randomly sampling an additional 10% of the population increased the number of animals with 1 or 2 alleles known compared with only sampling 5% of the population. The parameter AKP was increased by 172.32, 171.74, and 166.91% for allele frequency 0.30, 0.50, and 0.80, respectively, when 15% of the animals were genotyped compared with sampling 5% of the population for genotyping.
|
Approximately a 15 to 27% increase in APTG was observed when 17% of animals in the population were randomly selected compared with randomly selecting 5%. Thus, 56 to 68% of the animals in the population had their true genotype assigned. This result indicates that more animals were assigned, with high probability, their true genotype than when only 5% were randomly selected.
Relationship Matrix
Selection of Males and Females.
A description of the number of animals with 1 or 2 alleles known, percentage of alleles known, benefit function, and APTG based on selecting 2.5% of males and 2.5% of females in the population using A–1 is presented in Table 3
. Because of the large number of animals with 1 or 2 alleles known, AKP ranged from 34.57 to 37.70 across the 3 allele frequencies used in the current study.
|
The average probability of assigning the true genotype for every animal in the population, APTG, was 0.62, 0.56, and 0.68 for frequencies of 0.30, 0.50, and 0.80, respectively, suggesting that 56 to 68% of the animals in the population had their true genotype assigned depending on the allele frequency.
When compared with randomly sampling 5% of the population for genotyping, selection of 2.5% of males and 2.5% of females based on the diagonal element of A–1 and the number of progeny or mates increased AKP by 243.98 to 245.11% depending on the allele frequency. Likewise, AKG was increased by 12.34 to 28.93% across the 3 allele frequencies when A–1 was used instead of randomly sampling 5% of the population. When animals were selected based on A–1, APTG was increased by 21.57, 27.27, and 17.24% for allele frequencies of 0.30, 0.50, and 0.80, respectively, compared with randomly selecting 5% of the population.
When compared with randomly sampling 15% of the population for genotyping, selection of 2.5% of males and 2.5% of females based on the diagonal element of A–1 increased AKP by 26.58 to 29.11% depending on the allele frequency. Likewise, AKG was increased by 3.05 to 6.29% across the 3 allele frequencies when A–1 was used instead of randomly sampling 15% of the population. When 2.5% of males and 2.5% of females were selected based on A–1, APTG was virtually identical compared with randomly selecting 15% of the population. The results comparing a relationship-based selection scheme versus random sampling should not be surprising. Kinghorn (1999)
described the advantages of selection based on average numerator relationship as being superior to that of random selection using a much smaller pedigree (1,260 animals). The results from Kinghorn (1999)
did not show the magnitude of separation between random sampling and the use of connectedness as the current study presumably due to differences in pedigrees, particularly size.
Selection of Males.
A description of the number of animals with 1 or 2 alleles known, percentage of alleles known, benefit function, and APTG when 5% of males in the population were selected for genotyping using A–1 is presented in Table 4
. Because only males were selected for genotyping, the number of animals with 2 alleles known was approximately 250 across the 3 allele frequencies used. Yet, the number of animals with 1 allele known ranged from 2,793 to 3,115, which was higher than with any of the other selection scenarios using the simulated pedigrees. However, due to the method of selecting both males and females having over twice the number of animals with both alleles known, the method of selecting equal numbers of both sexes yielded greater values for AKP, AKG, and APTG. For the measures of AKP, and AKG in particular, this method is still more desirable than selecting 5 or even 15% of the animals at random.
|
Inverse of the Relationship Matrix.
A description of the parameters estimated when 5% of the population was selected for genotyping using absorption of A–1 is presented in Table 5
. The method of absorption was only performed on the simulated pedigrees. The results are similar when the allele frequency is known compared with when it is estimated from the selected animals. This was due to the fact that the estimated allele frequencies are close to the true values. The scenario when the allelic frequencies are 0.8/0.2 gives the most desirable results. Although the differences in the number of animals with both alleles known are negligible across allele frequencies, differences in the number of animals with one allele known are more prominent. Consequently, there are not observable differences in the benefit function, AKP, AKG, and APTG across allele frequencies. From these results it appears that, in situations with more extreme allele frequencies (0.8/0.2), it is easier to infer unknown genotypes.
|
Relationship Matrix.
The results of animals selected based on the absorption of A are not reported. This was due to the observation of similar trends for those reported using absorption of A–1 across the 3 allele frequencies. Selection of animals based on absorption of A was inferior to both selection methods of animals based on their diagonal elements of A–1 (Tables 3
and 4
). The absorption of A still has advantages, albeit slight, in regard to AKP over the method of selecting 5% of the animals at random.
Real Beef Cattle Pedigrees
The results using a field data pedigree of 29,101 animals are presented in Table 6
. Similar patterns to the results using the simulated pedigrees were observed. Selecting candidates for genotyping (5% of population) using random selection, selection of males with the greatest diagonal element of A–1 and selecting both males and females from their diagonal element of A–1 were compared. As expected from the simulation results, selection of candidates based on the relationship matrix yielded more desirable results compared with random selection. The advantages in AKP for selection of both males and females based on their diagonal element of A–1 over random selection were 163.6 and 160.4% for allele frequencies 0.3/0.7 and 0.5/0.5, respectively. Similarly, AKG increased by 65.3 and 69.1% and APTG increased by 12.8 and 14.3% for the more extreme (0.3/0.7) and intermediate (0.5/0.5) allele frequencies, respectively.
|
Compared with the simulated pedigrees, the beef cattle pedigree used here appears to be best suited for selection based on animals with the greatest diagonal element of A–1 as opposed to selection of equal proportions of males and females. This can be explained by the fact that in the field data pedigree, numerous females had a small number of mates and offspring. This is in agreement with Koudande et al. (1999)
who determined that when the reproductive rate of males is sufficiently high compared with that of females, genotyping costs can be reduced by genotyping males only. Although the results in Table 6
show that the differences between selecting both males and females or just males are slight, it does show that pedigrees with varying levels of complexities (or livestock species) might respond differently to these selection methods.
Table 7
displays the results of selection using a research pedigree. The results show that selection of males based on their diagonal element of A–1 led to increases in AKP, AKG, and APTG of 213.7, 45.1, and 18.6% for allele frequency of 0.5 and 265.5, 45.2, and 38.0% for allele frequency of 0.3 when compared with randomly selecting 5% of the population. Selection of both males and females based on their diagonal element of A–1 was also superior to randomly selecting 5%, showing increases in AKP, AKG, and APTG of 230.0, 43.0, and 20.9% for allele frequency of 0.5 and increases of 294.5, 41.4, and 36.0% for 0.3 allele frequency. The methods of selecting only males or both males and females from their diagonal element were similar in performance, with the selection of both sexes having an advantage before the Gibbs method and the selection of males having a slight advantage after the Gibbs method.
|
| Footnotes |
|---|
2 Current address: Aviagen, Huntsville, AL 35805. ![]()
3 Corresponding author: rrekaya{at}uga.edu
Received for publication August 3, 2007. Accepted for publication April 16, 2008.
| LITERATURE CITED |
|---|
|
|
|---|
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |