Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Comparative Study
. 2006 Apr;140(4):1169-82.
doi: 10.1104/pp.105.073981.

Partial shotgun sequencing of the Boechera stricta genome reveals extensive microsynteny and promoter conservation with Arabidopsis

Affiliations
Comparative Study

Partial shotgun sequencing of the Boechera stricta genome reveals extensive microsynteny and promoter conservation with Arabidopsis

Aaron J Windsor et al. Plant Physiol. 2006 Apr.

Abstract

Comparative genomics provides insight into the evolutionary dynamics that shape discrete sequences as well as whole genomes. To advance comparative genomics within the Brassicaceae, we have end sequenced 23,136 medium-sized insert clones from Boechera stricta, a wild relative of Arabidopsis (Arabidopsis thaliana). A significant proportion of these sequences, 18,797, are nonredundant and display highly significant similarity (BLASTn e-value < or = 10(-30)) to low copy number Arabidopsis genomic regions, including more than 9,000 annotated coding sequences. We have used this dataset to identify orthologous gene pairs in the two species and to perform a global comparison of DNA regions 5' to annotated coding regions. On average, the 500 nucleotides upstream to coding sequences display 71.4% identity between the two species. In a similar analysis, 61.4% identity was observed between 5' noncoding sequences of Brassica oleracea and Arabidopsis, indicating that regulatory regions are not as diverged among these lineages as previously anticipated. By mapping the B. stricta end sequences onto the Arabidopsis genome, we have identified nearly 2,000 conserved blocks of microsynteny (bracketing 26% of the Arabidopsis genome). A comparison of fully sequenced B. stricta inserts to their homologous Arabidopsis genomic regions indicates that indel polymorphisms >5 kb contribute substantially to the genome size difference observed between the two species. Further, we demonstrate that microsynteny inferred from end-sequence data can be applied to the rapid identification and cloning of genomic regions of interest from nonmodel species. These results suggest that among diploid relatives of Arabidopsis, small- to medium-scale shotgun sequencing approaches can provide rapid and cost-effective benefits to evolutionary and/or functional comparative genomic frameworks.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
A rooted cladogram, with Cleome as an outgroup, depicting evolutionary relationships within the Brassicaceae (after Yang et al., 1999; Koch et al., 2001a; Beilstein et al., 2006). Currently, significant genome sequencing projects exist for A. lyrata, C. rubella (Weigel et al., 2005), and Brassica (http://www.brassica.info/).
Figure 2.
Figure 2.
Summary of B. stricta sequence-indexed library end sequencing. Redundant sequences are end sequences that have been excluded from further analysis as they represent duplicated inserts (i.e. library amplification artifacts). Filtered sequences display significant similarity to repetitive DNA species or organellar genomes. Sequences with low similarity may be similar, but have failed to meet our significance threshold (e-value ≤ 10−30). The two remaining categories are comprised of B. stricta end sequences with highly significant similarity (e-value ≤ 10−30) to Arabidopsis genomic regions. The designation “paired end sequences” indicates that both T3 and T7 sequencing reads are available for a given insert; solo end sequences indicates that that only one sequencing read is available for a given insert, the second read having been placed into one of the initial four categories.
Figure 3.
Figure 3.
Distribution of nonredundant SAD12.4 sequence-indexed library end-sequence homologies across chromosome III (NC_003074.4) of Arabidopsis. The frequency of end sequences with homology to a given physical interval have been plotted along the 200 kb intervals comprising the Arabidopsis chromosome III pseudomolecule. B. stricta sequences are placed according to the physical position of the Arabidopsis nucleotide that is homologous to the 5′-most B. stricta nucleotide of a given blast HSP. While B. stricta end sequences with multiple Arabidopsis homologs are indicated, B. stricta end sequences are only mapped according to the most significant HSP. Color coding indicates the copy number of homologous sequences in the Arabidopsis genome: green, 1×; yellow, 2×; light orange, 3×; orange, 4×; and red, 5×.
Figure 4.
Figure 4.
Flowchart of the algorithm used by syntenyFinder.py to identify B. stricta sequence-indexed inserts whose end sequences display microsynteny relative to the Arabidopsis genome. Output from dupCloneFinder.py for 6,334 inserts with paired end sequences is fed to syntenyFinder.py (top). End-sequence pairs then move down the vertical axis and are tested for the criteria indicated. If the end-sequence pair satisfies a given criterion, the pair continues down the vertical axis toward the designation SYNTENIC. If a given criterion is not satisfied, the end-sequence pair moves along the horizontal axis to the designation indicated. The designation FILTERED indicates that the physical length of a region in Arabidopsis identified by an end-sequence pair was less than twice the average read length of all B. stricta end sequences; LINKED indicates that the Arabidopsis region is greater than 50 kb. Percentage and total number (in parentheses) of B. stricta inserts placed in each designation are indicated. Descriptions of both dupCloneFinder.py and syntenyFinder.py can be found in Supplemental Text 1.
Figure 5.
Figure 5.
Frequency of B. stricta inserts versus the physical size of the corresponding microsyntenic regions in Arabidopsis. The mean physical size of the B. stricta λ-library inserts is 13,187 bp (n = 27; sd = 1,716 bp). The mean Arabidopsis interval length is 11,949 (n = 4,691; sd = 4,637 bp).
Figure 6.
Figure 6.
Dot-plot alignments of B. stricta inserts and the homologous Arabidopsis genomic regions identified by syntenyFinder.py. Annotated Arabidopsis CDSs with exon-intron structure are indicated along the horizontal axes. B. stricta CDSs, as predicted by Twinscan, are shown on the vertical axes. Watson-strand CDSs are positioned closer to the relevant axis. CDSs are presented in varying gray scale to highlight homologous relationships between B. stricta and Arabidopsis. A, B. stricta genomic inserts are larger than the homologous region(s) of Arabidopsis. This is generally attributable to one or more large indel polymorphisms. In this instance, two of the predicted CDSs are of unknown function; the predicted translation product of the 3′-most Watson-strand CDS, however, shares similarity to ping/pong/SNOOPY family transposases. B, Microsyntenic block comprised of two B. stricta inserts compared to the homologous, class I chitinase (CHI-I)-containing Arabidopsis region. This region is of similar size in the two species and contains many small indel polymorphisms. C, An example of a comparison where the Arabidopsis homolog is larger than the B. stricta region. The Arabidopsis indel in this region contains four tandem duplications (At5g02330–60) of a putative CDS encoding a DC-1 domain-containing protein.
Figure 7.
Figure 7.
Differences in the log e-value for B. stricta BLASTn hits to Arabidopsis gene pairs (n = 1,440 gene pairs). E values were taken from the selected dataset (Table III; Supplemental Data 4). E values of 0.0 in the native BLASTn output have been adjusted to 10−180; e-values for unreported paralogs adjusted to 10−10, the maximum e-value allowable for reporting in our analysis. As the Δlog e-value approaches 0, the ability to distinguish between orthologs and paralogs diminishes. Δlog e-value scores of −32, −15, and −5 correspond to the 95th, 97th, and 98th percentiles, respectively.
Figure 8.
Figure 8.
Summary of the per-nucleotide identity shared between homologous 5′ regulatory regions. Values along the x axis correspond to Arabidopsis nucleotide positions 5′ to CDSs; position 1 is the most 5′ nucleotide, position 500 is the nucleotide adjacent to the translation initiation codon of a given CDS. For each position, the proportion of Arabidopsis nucleotides scored as identity is indicated as a jagged black line; the proportion of nucleotides from noninformative alignments (see Supplemental Text 1, UntransID.py, Test1 for details) at each position is designated by the dashed black line (17.7% and 34.8% of all nucleotides analyzed at every position for B. stricta and B. oleracea, respectively). The thick, black line at approximately 0.01 is the mean proportion of nucleotides scored as identities with error bars (se of each mean) for each nucleotide position as determined via a permutation test (100 iterations; see Supplemental Text 1, UntransID.py, Test2 for details). The gray line indicates the proportion identity at each position when noninformative alignments are excluded from the calculation. A, n = 657 5′-noncoding sequence alignments for the B. stricta by Arabidopsis comparison. B, n = 1,208 5′-noncoding sequence alignments for the B. oleracea by Arabidopsis comparison.

Similar articles

Cited by

References

    1. Acarkan A, Rossberg M, Koch M, Schmidt R (2000) Comparative genome analysis reveals extensive conservation of genome organisation for Arabidopsis thaliana and Capsella rubella. Plant J 23: 55–62 - PubMed
    1. Altschul SF, Madden TL, Schaeffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res 25: 3389–3402 - PMC - PubMed
    1. Arabidopsis Genome Initiative (2000) Analysis of the genome sequence of the flowering plant Arabidopsis thaliana. Nature 408: 768–815 - PubMed
    1. Ayele M, Haas BJ, Kumar N, Wu H, Xiao Y, Van Aken S, Utterback TR, Wortman JR, White OR, Town CD (2005) Whole genome shotgun sequencing of Brassica oleracea and its application to gene discovery and annotation in Arabidopsis. Genome Res 15: 487–495 - PMC - PubMed
    1. Ayre BG, Blair JE, Turgeon R (2003) Functional and phylogenetic analyses of a conserved regulatory program in the phloem of minor veins. Plant Physiol 133: 1229–1239 - PMC - PubMed

Publication types

Substances

Associated data

LinkOut - more resources