Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2014 Jun;127(6):1331-41.
doi: 10.1007/s00122-014-2300-4. Epub 2014 Mar 26.

Association studies using family pools of outcrossing crops based on allele-frequency estimates from DNA sequencing

Affiliations

Association studies using family pools of outcrossing crops based on allele-frequency estimates from DNA sequencing

Bilal H Ashraf et al. Theor Appl Genet. 2014 Jun.

Abstract

We propose a method in which GBS data can be conveniently analyzed without calling genotypes. F2 families are frequently used in breeding of outcrossing species, for instance to obtain trait measurements on plots. We propose to perform association studies by obtaining a matching "family genotype" from sequencing a pooled sample of the family, and to directly use allele frequencies computed from sequence read-counts for mapping. We show that, under additivity assumptions, there is a linear relationship between the family phenotype and family allele frequency, and that a regression of family phenotype on family allele frequency will estimate twice the allele substitution effect at a locus. However, medium-to-low sequencing depth causes underestimation of the true allele substitution effect. An expression for this underestimation is derived for the case that parents are diploid, such that F2 families have up to four dosages of every allele. Using simulation studies, estimation of the allele effect from F2-family pools was verified and it was shown that the underestimation of the allele effect is correctly described. The optimal design for an association study when sequencing budget would be fixed is obtained using large sample size and lower sequence depth, and using higher SNP density (resulting in higher LD with causative mutations) and lower sequencing depth. Therefore, association studies using genotyping by sequencing are optimal and use low sequencing depth per sample. The developed framework for association studies using allele frequencies from sequencing can be modified for other types of family pools and is also directly applicable for association studies in polyploids.

PubMed Disclaimer

Figures

Fig. 1
Fig. 1
Schematic representation for the creation of family pools used to measure phenotypes such as yield in grasses: three crosses are shown with parents that segregate at a biallelic locus with alleles a and A; the created F2 families will segregate in five distinct segregation ratios with allele frequencies within the families of 0, ¼, ½, ¾, and 1, which corresponds to the combined allele dosage in the two parents of each family
Fig. 2
Fig. 2
Averages estimated allele effects in a one-locus model. Estimate of allele effect (uncorrected three lines) was computed at three different levels of allele frequencies (0.1, 0.3, 0.5), with environmental standard deviation 4. Corrected three lines are based on applying the derived theoretical expression (10) for bias from using GBS. The true generated allele effect was 1
Fig. 3
Fig. 3
Power to detect a single gene associated with a marker using GBS. The x-axis shows sample size and y-axis indicates the power (estimate of the probability) from 1,000 replicates. We used four sequencing depths (3, 7, 15, and 30). The lines, red, blue and black, show the number of significant results at allele frequency 0.5, 0.3 and 0.1 respectively at environmental standard deviation 4. Here, we used observed frequencies in the families when applying regression of F2 phenotype on F2 pool genotype expression (9)
Fig. 4
Fig. 4
Power to detect a single gene associated with a marker at (almost) equal sequencing efforts in simulation studies. The x-axis is the sample size times sequencing depth and y-axis is the power (estimate of probability) from 1,000 replicates. Three lines, red, blue and black, depict the power at three levels of allele frequencies (0.5, 0.3 and 0.1). (Subset of results presented in Fig. 3)
Fig. 5
Fig. 5
Simulation of the power to detect a single associated with marker in the presence of three errors: binomial sampling, binomial sampling and sequencing error (10 % error rate per read) and all three binomial sampling, sequencing and genotype calling errors at 5 % level of significance. The x-axis is the number of families by sequencing depth per family. The unequal sequencing depth was simulated assuming Poisson distribution with mean depth of 3, 7, 15 and 30× against family sizes 4,000, 2,000, 1,000 and 500, respectively. The black dotted line corresponds to the power in the presence of only binomial sampling error; the blue line is the power if there are two errors, i.e. binomial sampling and sequencing errors; the red indicates the power when we incorporate all three errors, i.e. binomial, sequencing and genotype calling errors. The environmental standard deviation was used to be 4
Fig. 6
Fig. 6
Power as a function of LD and sample size, to obtain significant association when the measured SNP is not causal and has different levels of LD with a causal locus. The LD levels are chosen in such a way that it represent halving the SNP density at each step, allowing (almost) equal sequencing efforts. Three levels (0.5, 0.3 and 0.1) of allele frequencies were used at environmental standard deviation 4 from 1,000 replicates. Complete results (also with environmental standard deviation 2) are supplied in supplementary material (Table 3)

References

    1. Altpeter F, Xu J, Ahmed S. Generation of large numbers of independently transformed fertile perennial ryegrass (Lolium perenne L.) plants of forage- and turf-type cultivars. Mol Breed. 2000;6:519–528. doi: 10.1023/A:1026589804034. - DOI
    1. Andersen JR, Lübberstedt T. Functional markers in plants. Trends Plant Sci. 2003;8(554–560):24. - PubMed
    1. Beissinger TM, Hirsch CN, Sekhon RS, Foerster JM, Johnson JM, Muttoni G, Vaillancourt B, Buell CR, Kaeppler SM, de Leon N. Marker density and read-depth for genotyping populations using genotyping-by-sequencing. Genetics. 2013;193(4):1073–1081. doi: 10.1534/genetics.112.147710. - DOI - PMC - PubMed
    1. Bekker PA. Comment on identification in the linear errors in variables model. Econometrica. 1986;54:215–217. doi: 10.2307/1914166. - DOI
    1. Björn B, Paulo MJ, Kowitwanich K, Sengers M, Visser RG, van Eck HJ, Van Eeuwijk FA. Population structure and linkage disequilibrium unravelled in tetraploid potato. Theor Appl Genet. 2010;121:1151–1170. doi: 10.1007/s00122-010-1379-5. - DOI - PMC - PubMed

Publication types

LinkOut - more resources