A comparison of 27 Arabidopsis thaliana genomes and the path toward an unbiased characterization of genetic polymorphism
- PMID: 40830656
- PMCID: PMC12425826
- DOI: 10.1038/s41588-025-02293-0
A comparison of 27 Arabidopsis thaliana genomes and the path toward an unbiased characterization of genetic polymorphism
Abstract
Making sense of whole-genome polymorphism data is challenging, but it is essential for overcoming the biases in SNP data. Here we analyze 27 genomes of Arabidopsis thaliana to illustrate these issues. Genome size variation is mostly due to tandem repeat regions that are difficult to assemble. However, while the rest of the genome varies little in length, it is full of structural variants, mostly due to transposon insertions. Because of this, the pangenome coordinate system grows rapidly with sample size and ultimately becomes 70% larger than the size of any single genome, even for n = 27. Finally, we show how short-read data are biased by read mapping. SNP calling is biased by the choice of reference genome, and both transcriptome and methylome profiling results are affected by mapping reads to a reference genome rather than to the genome of the assayed individual.
© 2025. The Author(s).
Conflict of interest statement
Competing interests: D.W. holds equity in Computomics, which advises plant breeders. D.W. also consults for KWS SE, a plant breeder and seed producer with activities throughout the world. J.F. is an employee of Tropic TI, Lda. The other authors declare no competing interests.
Figures
References
Publication types
MeSH terms
Substances
Grants and funding
- EPICLINES/EC | EC Seventh Framework Programm | FP7 Ideas: European Research Council (FP7-IDEAS-ERC - Specific Programme: "Ideas" Implementing the Seventh Framework Programme of the European Community for Research, Technological Development and Demonstration Activities (2007 to 2013))
- 847548/EC | EU Framework Programme for Research and Innovation H2020 | H2020 Priority Excellent Science | H2020 European Research Council (H2020 Excellent Science - European Research Council)
- 1001GenomesPlus/Deutsche Forschungsgemeinschaft (German Research Foundation)
- BB/S004661/1/RCUK | Biotechnology and Biological Sciences Research Council (BBSRC)
LinkOut - more resources
Full Text Sources
