|
|
||||||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
ANIMAL GENETICS |


* Department of Animal & Aquacultural Sciences, Norwegian University of Life Sciences, N-1432 Ås, Norway;
and
AKVAFORSK, PO Box 5010, 1432 Ås, Norway; and and
Roslin Institute, Roslin, Midlothian, EH25 9PS, UK
| Abstract |
|---|
|
|
|---|
Key Words: backcross design inbred line introgression quantitative trait loci mapping
| INTRODUCTION |
|---|
|
|
|---|
Introgression of a detected QTL using genetic markers has been successful in practice (Jefferies et al., 2003
; Koudandé et al., 2005
). In all of these studies, QTL identification proceeded introgression, and the timescale of improvement was the time taken for QTL detection plus the time taken for introgression, limiting the practicality of the approach.
One approach to shorten the time needed for introgression is to combine the 2 steps, QTL identification and introgression, into a single step. This would combine the strengths of fine-mapping and backcrossing and pave the way for introgression of desirable but unknown QTL into recipient lines in animals and plants simultaneously.
The objective of this study is to present a method that performs QTL mapping and introgression of desirable genes from a donor line into a recipient line simultaneously. The effectiveness of this method will be investigated through simulation considering 2 inbred lines.
| MATERIALS AND METHODS |
|---|
|
|
|---|
Base Populations
Two inbred divergent lines (recipient and donor) were simulated using Monte Carlo simulation. The genome structure of individuals was diploid, with size 100 cM in 1 chromosome (carrier chromosome). The genome was assumed to have 1 QTL affecting the trait of interest and 101 markers. The markers were positioned at equal distances on the chromosome. Each locus, either QTL or marker, was assumed to be biallelic, with additive gene effects for the QTL and no effects for the markers. The 2 lines were fixed for the alternative alleles at each locus, which is consistent with the assumption of inbred lines. The assumption of additivity for the QTL is not essential but simplifies the description, and this will be addressed in the discussion section. The genotypes for the recipient and donor lines were qq and QQ for the QTL and m1m1, ... , m101m101 and M1M1, ..., M101M101 for the markers, respectively. In this study, the QTL was placed at 76.5 cM from the beginning of the chromosome. This position was chosen because it was away from the center and ends of the chromosome and not located at a marker position. Location at the center was avoided, because this would be the mean location of false positive QTL. The ends of the chromosome were avoided, because it would result in truncated likelihood peaks, which are unsatisfactory for assessing the procedures proposed.
Selection and Mating
The introgression proceeded by crossing the inbred lines to produce an F1 generation and then by recurrent backcrossing of the selected individuals from the cross-bred population to the recipient line, to produce generations BC1, BC2, BC3, and BC4. In this study, BC4 was the end point, and the selected parents from BC4 were used to report the results. All generations consisted of N individuals, which varied across different scenarios.
In the F1 generation, all individuals had identical markers and QTL genotypes. Hence, selection of parents was random. In each subsequent generation, selection was based on the probability of the candidate being heterozygous for the QTL, conditional on the markers and the phenotypes observed in that generation. Individuals were selected if the probability of being heterozygous exceeded a predetermined threshold value (PSelection). As a consequence, a variable number of candidates were selected and given an opportunity to breed. The calculation of this selection criterion will be described in the section on QTL mapping. The selected F1, BC1, BC2, and BC3 individuals were considered to be nonrecurrent parents.
Mating took place at random to reproduce N offspring (1/2 N males, 1/2 N females). Each offspring was produced by random sampling with replacement of 1 sire and 1 dam from those selected individuals. In each generation, crossing-over events were generated according to the mapping function of Haldane (1919)
. The chromosome string for an offspring began with either the paternal or maternal chromosome sequence in the parent with equal chance and crossed over to read from the other string when a recombination occurred.
Genetic Models
The difference between the 2 lines for the trait of interest was described entirely by the 1 QTL, with the superiority of the donor line being 2
, where
= the QTL allele substitution effect. In a backcross population, where individuals are either Qq or qq with equal frequency, the segregation of 1 QTL contributes an amount
2/4 to the genetic variance,
G2 =
2/4 when the phenotypic difference between lines is 2
(Wright, 1968
).
Therefore, the phenotypic value for each individual was simulated based on the following model
![]() |
where yi = the phenotypic value of the ith individual (i = 1, N); µ = the population mean; bi = an indicator variable equal to the number of favorable QTL alleles, which will take values of 0 or 1 only; and eij = a random normal variable, with mean 0.0 and
e2 = 4.95. Values of
used in the simulations are described in a later section.
QTL Mapping
The single-interval mapping model (Lander and Botstein, 1989
) was applied for QTL mapping. In this model, 1 marker interval at a time was used to construct a putative QTL likelihood at the midpoint location of the interval. By using all marker information, denoted by Mi, and the phenotypic value yi of the recorded trait, the likelihood function for a putative QTL in an interval with midpoint xi in the backcross program was
![]() |
where
i(yi,µ ,
e2) and
i(yi,µ +
,
e2) = the density functions for a normal distribution, with mean µ and µ+
and variance
e2, and
(qq|Mi) and
(Qq|Mi) = the probability of the QTL genotypes conditional on marker genotypes and position of the flanking markers.
The probability of the QTL genotype,
(qq|Mi) or
(Qq|Mi), was calculated based on the marker genotype of the individual and its nonrecurrent parent at flanking markers in each interval. Nonrecurrent parents were assumed to be heterozygous irrespective of the marker genotypes given that they were selected as heterozygous parents. If a marker locus of the nonrecur-rent parent was homozygous (noninformative), then the interval was expanded until the next heterozygous locus. For example, if the genotypes at 4 marker loci of a nonrecurrent parent were (... ,M11m11m12m12m13 m13M14m14, ...), then the left (M11m11) and right (M14m14) marker loci were taken as flanking markers for this expanded interval for the bracket between markers 11 and 14, which are located at 10 and 13 cM, respectively. The probability of putative QTL locations at 10.5, 11.5, and 12.5 cM was calculated with the corresponding recombination fraction of
1 and
2 for markers 11 and 14, respectively. For calculation of the probability of putative QTL, an approach for outbreeding populations was used to make extension to outbreeding populations possible. There are other possible genotypes of flanking markers associated with chromosome ends, and these are dealt with as described by Lander and Botstein (1989)
.
The likelihood function was maximized for each interval using the expectation-maximization algorithm, which deals with the missing genotypes (Lander and Botstein, 1989
). This iterative algorithm combines both genetic marker information and phenotypic value for the calculation of the QTL genotype probabilities for each individual and maximizes the likelihood for each interval based on the estimated parameters. The interval with the greatest maximized likelihood value was taken as the estimated location of the QTL, and
for this interval was taken as the estimate of the QTL allele substitution effect. For the estimation of the QTL location and effect from the second backcross generation onward, the accumulated information from the previous backcross generations was used by including the families from the previous generations. It was assumed that the selected parents were indeed carriers of the QTL, although there was a possibility that noncarrier parents had been mistakenly selected.
The selection criterion for parents of the next generation was the probability that the individual was a carrier of the donor Q allele at the estimated QTL location. This was calculated conditional on the phenotype and markers and assumed the maximum likelihood estimates for
, µ, and
2:
![]() |
Individuals that were heterozygous at the estimated QTL location with probability Pri(Qq|yi,Mi)
PSelection were selected. The values used in simulations are given in the next section.
Simulations, Parameters, and Summary Statistics
In this study, different values of N,
G2, and Pselection were considered. The number of candidates in each generation (N) was 100, 500, and 1,000. The genetic variance of the QTL (
G2) was 0.05, 0.26, or 0.55, corresponding to values of
= 0.4472, 1.0208, and 1.4832 and to heritabilities 0.01, 0.05, and 0.10, respectively. The selection criterion Pselection was 0.75, 0.95, or 0.99. Schemes were simulated with all combinations of these parameters with each combination replicated 100 times.
For each replicate, the donor genome contribution, linkage, and obligate drags at backcross generations were calculated from direct examination of the marker sequence along the genome of individuals with respect to the estimated QTL location. Figure 1
illustrates these terms in the context of our backcross design, although examples are not from our empirical results. Efficiency of selection was calculated as the ratio of the number of selected individuals who were heterozygous at the true QTL location to the total number of selected individuals. The donor genome contribution is the fraction of the genome that derives from the donor genome and includes all segments of the donor genome regardless of QTL position (multiple segments result from multiple recombinations occurring in the same individuals); the linkage drag is the average length of the intact segment of the donor genome flanking the QTL; and the obligate drag is the minimum segment length of the donor genome to the left and to the right of the QTL, which represents that part of the donor genome that cannot be removed from an intercross formed from the final generation.
|
| RESULTS |
|---|
|
|
|---|
= 0.4472, 1.0208, or 1.4832) and 3 population sizes (N = 100, 500, or 1,000) from the BC4, in which the selection of parents in BC1, BC2, and BC3 was based on Pselection = 0.75. The estimate of the QTL location was unbiased when the QTL allele substitution effect was
= 1.4832,, and the SE of the estimate decreased with increasing population size. There was a trend toward increasing bias with smaller QTL as the population size decreased. Closer examination of these results suggests that the data were insufficient for mapping this small QTL effect for N = 100. The efficiency of selection of 0.58 (Table 1
we considered to be by chance. In general, the SE for location were inversely associated with population size.
|
) was unbiased when the population size and allele substitution effect were 500 and 1.0208, respectively, or more, but for the smallest population size of 100, there was a trend to slight underestimation of the substitution effect. The precision of the QTL allele substitution effect increased as the population size increased.
Estimates of the residual variance were close to the true value (
e2 = 4.95) in most cases and ranged from 4.84 to 4.98. Standard errors of the estimated residual variance were very small and ranged from 0.01 to 0.04 across all scenarios. The genome contribution of the donor line after 4 backcross generations ranged from 44.17 to 44.72 cM across the different QTL allele substitution effects and population sizes. It should be noted that there was no background selection in this study. Linkage drag ranged from 39.70 to 42.75 cM across all scenarios. With the small QTL allele substitution effect (
= 0.4472), the linkage drag across the different population sizes was greater than the corresponding value with the large QTL allele substitution effect (
= 1.4832). The obligate drag ranged from 2.33 to 6.48 cM. The SE of the obligate drag was similar across the different QTL allele substitution effects except for the scenario with small allele substitution effect (
= 0.4472) and small population size of 100. The efficiency of selection varied across different scenarios. The efficiency increased with increasing population sizes and the QTL allele substitution effect. The number of selected individuals was usually a little less than 50% as expected due to uncertainty about the QTL segregation. The distribution of the calculated Pselection values was U-shaped, either close to 0 or 1. Therefore, truncation values of Pselection = 0.75 or 0.95 had little effect on the numbers of selected parents. The exception to this was when N was small, where the U-shaped distribution of Pselection was less extreme, so, when Pselection increased, the number of selected parents decreased slightly.
In Table 2
, results are shown from BC4, when Pselection = 0.95 in BC1, BC2, and BC3. These results are consistent with the results presented in Table 1
for Pselection = 0.75, except that the efficiency of selection was greater, ranging from 0.68 to 0.99. Results for Pselection = 0.99 are not shown, because they were similar to the results Pselection = 0.95 (Table 2
). However, the estimate of the QTL location was biased at very small QTL allele substitution effects (
= 0.4472) regardless of Pselection.
|
= 1.4832 is illustrated in Figure 2
|
| DISCUSSION |
|---|
|
|
|---|
To investigate the efficiency of retaining the donor favorable QTL allele across several backcross generations, results of BC4 were presented. The estimates of QTL location and effect were comparable with the true values in most cases considered in this study, particularly with the larger
and increased N. Only with the small QTL allele substitution effect of
= 0.4472, corresponding to the heritability of h2 = 0.01, and small N were there indications of inadequate power with in-sufficient data to provide reliable estimates of QTL location.
The research emphasis on minimizing generations of backcrossing to reach specific tolerance levels for the fraction of donor genome is because the timescale for introgression tends to be long. Yet, before the backcrossing, there is the need to identify the QTL for introgression, which also takes several generations. The use of fore- and background selection (Hospital and Charcosset, 1997
) or a dense marker map to get a short distance between the QTL and the marker (Hospital, 2001
) accelerate the reduction of remaining donor segment. As expected, the proportional donor contribution to the backcross population decreases as the number of generation increases. The retained donor chromosome segment was close to the expected length (44.09 and 39.06 cM for total donor genome contribution and linkage drag, respectively) calculated from Eq. 1, 3, 4, and 5 of Wall et al. (2005)
. Hospital (2001)
concluded that the most important factor for reducing the donor contribution is distances between the flanking markers and the introgressed gene, and the best way is to use flanking markers that are as close as possible to the QTL. In this study, we used a dense marker map. It is also expected that the donor contribution is more reduced when both foreground and background combined selection is carried out (Visscher et al. 1996
; Hospital, 2001
). However, Han et al. (1997)
stated that background selection should be applied cautiously on carrier chromosomes when the QTL location is not accurate, otherwise the QTL could be lost. In this study, there was no background selection, and the QTL was not lost in any generation or replicate, probably due to large numbers of selected parents who carried the QTL.
The method of single-interval mapping was used for gene mapping. This method is used for evaluating the association of 1 QTL with a marker interval, which ignores the effects of other segregating QTL in the mapping populations. Because there was no other QTL in this study, the estimates were unbiased, which are verified when there were sufficient genotyped individual records. Using a dense genome map for simultaneous detection and introgression of a desirable QTL or gene was found to be most instructive if the effect of QTL allele substitution and population size were 1.0208 and 500 individuals, respectively, in each generation.
Calculation of the probability of heterozygosity at the estimated QTL location for selection, Pri(Qq|yi,Mi), was based on the phenotypic and marker information of individuals. Results in Tables 1
and 2
suggest that there is a discrepancy between the Pri(Qq|yi,Mi) estimate and the efficiency of selection. This discrepancy probably originates from the inaccuracy of the estimate of QTL location and the assumption made in obtaining it, for example, that all parents were heterozygous. As the estimate of QTL location became more accurate, the efficiency of selection was closer to unity.
In this study, we focused on a simple case of only 1 QTL in the whole genome with additive effects as the only source of genetic variation affecting the trait of interest. The model was chosen to provide a proof of principle. If the combined detection and introgression was ineffective in this case, then it would be unlikely to be extended to outbred populations with additional polygenic variation. The inclusion of dominance effect in the model requires no further work when the effect of the QTL shows dominance, the difference between heterozygous (Qq) and homozygous (qq) individuals is a –d instead of a, where a and d = the genotypic values of homozygote (qq) and heterozygote (Qq), respectively. Extension to additional polygenic variation and multiple QTL would require further investigation.
There is evidence of gene x environment interaction, as well as gene x genetic background interaction, on quantitative traits in the literature (Beavis, 1994
; Beavis and Keim, 1995
; Valdar et al., 2006
). Dudley (1993)
reported inconsistency in means of marker genotypes among mapping populations and environments. Lecomte et al. (2004)
described effects of epistatic interactions between QTL regions and genetic background for fruit quality traits in the half-diallel cross design among 3 tomato lines. When combined approaches of QTL mapping and gene introgression are used, the effects of gene x recipient genome interaction will be implicitly taken into consideration in the analysis, because the QTL effects are estimated in the presence of the recipient genome.
In general, the combined detection and introgression of genes underlying desirable traits may have several advantages over separate programs of QTL mapping and gene introgression. This scheme ensures that the desirable QTL is introgressed where its function is simultaneously tested in a planned environment and expected recipient genome structure. Hence, it may be preferred, in particular, for species with long generation intervals, expensive cost of rearing, and traits that are difficult to measure. Also, this method saves at least 1 generation of time and related expenses compared with separate QTL mapping and introgression. In this study, we considered inbred lines with 1 QTL, which is mostly applicable in plant species and some animal species that we could have inbred lines, such as mice. However, further research is needed where there are multiple QTL and outbreeding populations.
1 Corresponding author: hossein.yazdi{at}umb.no
Received for publication August 28, 2007. Accepted for publication January 22, 2008.
| LITERATURE CITED |
|---|
|
|
|---|
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |