Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2016 Jul 19;113(29):7949-56.
doi: 10.1073/pnas.1608775113. Epub 2016 Jun 27.

Analysis of tandem gene copies in maize chromosomal regions reconstructed from long sequence reads

Affiliations

Analysis of tandem gene copies in maize chromosomal regions reconstructed from long sequence reads

Jiaqiang Dong et al. Proc Natl Acad Sci U S A. .

Abstract

Haplotype variation not only involves SNPs but also insertions and deletions, in particular gene copy number variations. However, comparisons of individual genomes have been difficult because traditional sequencing methods give too short reads to unambiguously reconstruct chromosomal regions containing repetitive DNA sequences. An example of such a case is the protein gene family in maize that acts as a sink for reduced nitrogen in the seed. Previously, 41-48 gene copies of the alpha zein gene family that spread over six loci spanning between 30- and 500-kb chromosomal regions have been described in two Iowa Stiff Stalk (SS) inbreds. Analyses of those regions were possible because of overlapping BAC clones, generated by an expensive and labor-intensive approach. Here we used single-molecule real-time (Pacific Biosciences) shotgun sequencing to assemble the six chromosomal regions from the Non-Stiff Stalk maize inbred W22 from a single DNA sequence dataset. To validate the reconstructed regions, we developed an optical map (BioNano genome map; BioNano Genomics) of W22 and found agreement between the two datasets. Using the sequences of full-length cDNAs from W22, we found that the error rate of PacBio sequencing seemed to be less than 0.1% after autocorrection and assembly. Expressed genes, some with premature stop codons, are interspersed with nonexpressed genes, giving rise to genotype-specific expression differences. Alignment of these regions with those from the previous analyzed regions of SS lines exhibits in part dramatic differences between these two heterotic groups.

Keywords: gene copy number; haplotype variation; maize genome; shotgun DNA sequencing; transposable elements.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflict of interest.

Figures

Fig. 1.
Fig. 1.
Genomic distribution of alpha zein loci in three maize inbred lines. Zein gene copies at each locus in the genome are presented as yellow diamonds on a blue (19-kDa clusters) or red (22-kDa clusters) background. When copy number differs between three inbreds the zeins are numbered accordingly. Vertical bars represent maize chromosomes, from left to right, chromosome 1, chromosome 4, and chromosome 7.
Fig. S1.
Fig. S1.
Total size of selected PacBio self-corrected reads. The combinations of percent identity and alignment length cutoff are listed on the x axis.
Fig. 2.
Fig. 2.
Haplotype variability at four larger alpha zein loci. Zein genes are numbered from left to right, as red bars. Sequence conservation between these inbreds is represented as vertical gray lines. DNA TEs are depicted in green blocks, whereas REs are represented by blue blocks. Insertion either in DNA TEs or REs are marked as black side blocks. Nesting of REs is illustrated as gray and pink side blocks. (A) Haplotype variability of z1A1 locus includes gene CNV. (B) The z1B locus of hte NSS line W22 differs from SS lines B73 and BSSS53, whereas the latter two are similar. (C) The W22 z1C1 locus is a recombination of the other two haplotypes. (D) The z1D locus is the oldest one that suffers from many transposable elements insertions.
Fig. 3.
Fig. 3.
Alignment of BioNano contigs with assembled PacBio scaffolds. BNG contigs are used as reference (blue bar), with which the scaffolds (green bars) are aligned. The black lines inside green and blues bars are the GCTCTTC sequences recognized by nickase Nt.BspQI. The colored lines on green bars represent supporting contigs for the assemblies. Junctions between colored bars can introduce shifts in the alignments because of gaps in the scaffolds. Contigs are chosen by an empirical confidence score cutoff. For instance, the cyan and yellow contigs contain z1B zein gene copies (third row). Because these contigs are rather short, each of them has a rather low score, and the threshold has been set as 4. However, they are contiguous because both contigs contain z1B zein gene copies in the right order. Therefore, the score of the scaffolds is much higher.
Fig. S2.
Fig. S2.
Haplotype variability at the z1A2 and z1C2 loci. Zein genes are numbered from left to right, as red bars. Sequence conservation between these inbreds is represented as vertical gray lines. DNA TEs are depicted in green blocks, and REs are represented by blue blocks. Insertions either as DNA TEs or REs are marked as black side blocks. Nesting of REs is illustrated as gray side blocks. (A) The W22 z1A2 locus is similar with the two SS inbreds. (B) The W22 z1C2 locus is quite different from the SS inbreds. Note that the sequence gap in BSSS53 is due to the lack of overlapping BAC clones.
Fig. S3.
Fig. S3.
Flowchart of guided local assembly based on the BioNano genome map. Stringent and loose RefAligner parameters are shown as red and green arrows, respectively. Starting Celera assembled contigs are highlighted in orange boxes.
Fig. S4.
Fig. S4.
Confidence scores for the alignment of PacBio assembled contigs with the BNG map using loose RefAligner parameters. Red points represent the mapping scores of selected contigs.
Fig. S5.
Fig. S5.
Neighbor-joining (NJ) trees are constructed for each locus of W22 alpha zein genes. Wheat gliadin-7 is used as an outgroup. Genomic sequences were aligned using ClustalW and then MEGA4 software was used to generate the trees. Bootstrap values are indicated on the branches of the tree, for 1,000 replicas. Trees are displayed from top to bottom, (A) the z1A1 cluster; (B) the z1A2 cluster; (C) the z1B cluster; (D) the z1C cluster; and (E) the z1D cluster.
Fig. S6.
Fig. S6.
Verification of the z1D locus. Red bars indicate the z1D genes and the numbers indicate the intergenic size. Primers are listed in Table S2.

References

    1. Adams MD, et al. Complementary DNA sequencing: Expressed sequence tags and human genome project. Science. 1991;252(5013):1651–1656. - PubMed
    1. Venter JC, et al. The sequence of the human genome. Science. 2001;291(5507):1304–1351. - PubMed
    1. Song R, Messing J. Gene expression of a gene family in maize based on noncollinear haplotypes. Proc Natl Acad Sci USA. 2003;100(15):9055–9060. - PMC - PubMed
    1. Goettel W, Messing J. Change of gene structure and function by non-homologous end-joining, homologous recombination, and transposition of DNA. PLoS Genet. 2009;5(6):e1000516. - PMC - PubMed
    1. Burke DT, Carle GF, Olson MV. Cloning of large segments of exogenous DNA into yeast by means of artificial chromosome vectors. Science. 1987;236(4803):806–812. - PubMed

Publication types