Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 Apr 20;22(1):285.
doi: 10.1186/s12864-021-07577-3.

Zea mays RNA-seq estimated transcript abundances are strongly affected by read mapping bias

Affiliations

Zea mays RNA-seq estimated transcript abundances are strongly affected by read mapping bias

Shuhua Zhan et al. BMC Genomics. .

Abstract

Background: Genetic variation for gene expression is a source of phenotypic variation for natural and agricultural species. The common approach to map and to quantify gene expression from genetically distinct individuals is to assign their RNA-seq reads to a single reference genome. However, RNA-seq reads from alleles dissimilar to this reference genome may fail to map correctly, causing transcript levels to be underestimated. Presently, the extent of this mapping problem is not clear, particularly in highly diverse species. We investigated if mapping bias occurred and if chromosomal features associated with mapping bias. Zea mays presents a model species to assess these questions, given it has genotypically distinct and well-studied genetic lines.

Results: In Zea mays, the inbred B73 genome is the standard reference genome and template for RNA-seq read assignments. In the absence of mapping bias, B73 and a second inbred line, Mo17, would each have an approximately equal number of regulatory alleles that increase gene expression. Remarkably, Mo17 had 2-4 times fewer such positively acting alleles than did B73 when RNA-seq reads were aligned to the B73 reference genome. Reciprocally, over one-half of the B73 alleles that increased gene expression were not detected when reads were aligned to the Mo17 genome template. Genes at dissimilar chromosomal ends were strongly affected by mapping bias, and genes at more similar pericentromeric regions were less affected. Biased transcript estimates were higher in untranslated regions and lower in splice junctions. Bias occurred across software and alignment parameters.

Conclusions: Mapping bias very strongly affects gene transcript abundance estimates in maize, and bias varies across chromosomal features. Individual genome or transcriptome templates are likely necessary for accurate transcript estimation across genetically variable individuals in maize and other species.

Keywords: Gene coexpression analysis; Genetic diversity; Maize; Mapping bias; RNA-Seq; Sequence divergence; Transcriptome variation; eQTL analysis.

PubMed Disclaimer

Conflict of interest statement

The authors declare that they have no competing interests.

Figures

Fig. 1
Fig. 1
Depiction of a correlation test to determine if preferential alignment affected gene transcript estimates. a A chromosomal region whose genes transcripts are all inferred to be correlated across RILs indicates preferential read alignment. A hypothetical region has five linked genes (A-E) that are in complete linkage disequilibrium within the B73 x Mo17 RIL population. Individual members of the RIL population are arbitrarily labelled 1–97. Preferential read alignment causes the B73 allele to have a positive effect on all genes’ transcript abundance estimates relative to the Mo17 allele, leading to one group of coexpressed genes. b A chromosomal region with two groups of genes coexpressed across RILs indicates unbiased read alignments across the regions’ genes. Relative to Mo17, B73 alleles increase transcript abundances for genes A, C, and D and decrease transcript abundances for genes B and E. Genes A, C, and D are coexpressed, and genes B and E are also coexpressed
Fig. 2
Fig. 2
The relationship between the proportion of cis-eQTL that had positive B73 effects when aligned to the B73 reference genome and the number of SNPs between B73 and Mo17 genes. The line plots are a lowess smooth of the number of SNPs per gene per 2 Mb of chromosome 3 sequence and the proportion of B73 positive cis-eQTL per gene per 2 Mb of chromosome 3 sequence. The x-axis is the physical location of intervals in Mb along chromosome 3. The left y-axis is the average number of SNPs within cis-eQTL genes in the intervals. The right y-axis is proportion of B73 positive cis-eQTL out of the total cis-eQTL in the interval. An arrow indicates the centromere location
Fig. 3
Fig. 3
For genes positively affected by B73 (A) and Mo17 (B) cis-eQTL, the number of genic SNPs and expression level differences of homozygous B73 and Mo17 lines were plotted. Exon sequence divergence was calculated as the number of SNPs matching exons in a gene divided by the total exon lengths and multiplied by 1000. The numbers in parenthesis represent the numbers of genes in each group. The y-axis is the absolute value of the allelic expression level difference of B73 minus Mo17 in FPKM, or two times the expected additive effect of the locus. a Expression differences between B73 and Mo17 were highly correlated with numbers of SNPs per 1 kb exon for the 5559 B73 positive cis-eQTL genes with SNP information. b Expression level differences between Mo17 and B73 were weakly correlated with numbers of SNPs per 1 kb exon for the 2561 Mo17 positive cis-eQTL genes
Fig. 4
Fig. 4
Comparison of alignment criteria on cis-eQTL effect size. Each cis-eQTL for which the B73 allele increased a target gene’s expression was labelled “yes” if its effect estimated using the most relaxed alignment criteria was lower than its effect using the default alignment criteria. The proportion of “yes” genes was calculated. The cis-eQTL effect was estimated as the average of the mean expression of RILs with the B73 genotype minus the mean expression of RILs with the Mo17 genotype. A large proportion of B73 positive cis-eQTLs had greater effects using default alignment criteria (binomial test, P < 2.2e-16)

Similar articles

Cited by

References

    1. Staal J, Kaliff M, Dewaele E, Persson M, Dixelius C. RLM3, a TIR domain encoding gene involved in broad-range immunity of Arabidopsis to necrotrophic fungal pathogens. Plant J. 2008;55(2):188–200. doi: 10.1111/j.1365-313X.2008.03503.x. - DOI - PubMed
    1. Becker MG, Zhang X, Walker PL, Wan JC, Millar JL, Khan D, Granger MJ, Cavers JD, Chan AC, Fernando DWG, Belmonte MF. Transcriptome analysis of the Brassica napus–Leptosphaeria maculans pathosystem identifies receptor, signaling and structural genes underlying plant resistance. Plant J. 2017;90(3):573–586. doi: 10.1111/tpj.13514. - DOI - PubMed
    1. Wang X, Wang H, Liu S, Ferjani A, Li J, Yan J, Yang X, Qin F. Genetic variation in ZmVPP1 contributes to drought tolerance in maize seedlings. Nat Genet. 2016;48(10):1233–1241. doi: 10.1038/ng.3636. - DOI - PubMed
    1. Avila LM, Obeidat W, Earl H, Niu X, Hargreaves W, Lukens L. Shared and genetically distinct Zea mays transcriptome responses to ongoing and past low temperature exposure. BMC Genomics. 2018;19(1):761. doi: 10.1186/s12864-018-5134-7. - DOI - PMC - PubMed
    1. Taylor CM, Kamphuis LG, Zhang W, Garg G, Berger JD, Mousavi-Derazmahalleh M, Bayer PE, Edwards D, Singh KB, Cowling WA, Nelson MN. INDEL variation in the regulatory region of the major flowering time gene LanFTc1 is associated with vernalization response and flowering time in narrow-leafed lupin (Lupinus angustifolius L.) Plant Cell Environ. 2019;42(1):174–187. doi: 10.1111/pce.13320. - DOI - PMC - PubMed

LinkOut - more resources