Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2016 Jun;23(3):215-24.
doi: 10.1093/dnares/dsw012. Epub 2016 Apr 2.

Assembly of the draft genome of buckwheat and its applications in identifying agronomically useful genes

Affiliations

Assembly of the draft genome of buckwheat and its applications in identifying agronomically useful genes

Yasuo Yasui et al. DNA Res. 2016 Jun.

Abstract

Buckwheat (Fagopyrum esculentum Moench; 2n = 2x = 16) is a nutritionally dense annual crop widely grown in temperate zones. To accelerate molecular breeding programmes of this important crop, we generated a draft assembly of the buckwheat genome using short reads obtained by next-generation sequencing (NGS), and constructed the Buckwheat Genome DataBase. After assembling short reads, we determined 387,594 scaffolds as the draft genome sequence (FES_r1.0). The total length of FES_r1.0 was 1,177,687,305 bp, and the N50 of the scaffolds was 25,109 bp. Gene prediction analysis revealed 286,768 coding sequences (CDSs; FES_r1.0_cds) including those related to transposable elements. The total length of FES_r1.0_cds was 212,917,911 bp, and the N50 was 1,101 bp. Of these, the functions of 35,816 CDSs excluding those for transposable elements were annotated by BLAST analysis. To demonstrate the utility of the database, we conducted several test analyses using BLAST and keyword searches. Furthermore, we used the draft genome as a reference sequence for NGS-based markers, and successfully identified novel candidate genes controlling heteromorphic self-incompatibility of buckwheat. The database and draft genome sequence provide a valuable resource that can be used in efforts to develop buckwheat cultivars with superior agronomic traits.

Keywords: GBS marker; buckwheat; database usage; draft sequence; heteromorphic self-incompatibility.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
Assignment of proteins to KOG functional categories in the three plant species. Genes from F. esculentum (red), B. vulgaris (blue), and A. thaliana (green) were classified based on NCBI's KOG database by performing BLAST searches with an E-value cut-off of 1E−4. KOGs were classified into functional categories. The percentage of KOGs in each functional category is plotted, and percentages are arranged in ascending order within each group. (A) RNA processing and modification; (B) chromatin structure and dynamics; (C) energy production and conversion; (D) cell cycle control, cell division, and chromosome partitioning; (E) amino acid transport and metabolism; (F) nucleotide transport and metabolism; (G) carbohydrate transport and metabolism; (H) coenzyme transport and metabolism; (I) lipid transport and metabolism; (J) translation, ribosomal structure, and biogenesis; (K) transcription; (L) replication, recombination, and repair; (M) cell wall/membrane/envelope biogenesis; (O) posttranslational modification, protein turnover, and chaperones; (P) inorganic ion transport and metabolism; (Q) secondary metabolites biosynthesis, transport, and catabolism; (R) general function prediction only; (S) function unknown; (T) signal transduction mechanisms; (U) intracellular trafficking, secretion, and vesicular transport; (V) defense mechanisms; (W) extracellular structures; (Y) nuclear structure; and (Z) cytoskeleton. Note that KOGs in Groups 1–3 are shown, and that fewer than 10 KOGs were assigned to category N (cell motility) in the three species and were excluded.
Figure 2.
Figure 2.
Alignment of amino acid sequences of Fag e 2 (buckwheat 16 kDa allergen of 2S albumin; GenBank accession number: DQ304682), BW 8 kDa (buckwheat 8 kDa allergen of 2S albumin; GenBank accession number: AB055892) and predicted sequences of homologues obtained from the BGDB. The following amino acid colour code is used: orange, small nonpolar (Gly, Ala, Ser, and Thr); green, hydrophobic (Cys, Val, Ile, Leu, Pro, Phe, Tyr, Met, and Trp); magenta, polar (Asn, Gln, and His); red, negatively charged (Asp and Glu); and blue, positively charged (Lys and Arg). Asterisks indicate the eight characteristic Cys residues present in Fag e 2 and 2S albumin family proteins, as described by Satoh et al. Note that Fes_sc0007211.1.g000003.aua.1 had low similarity with other amino acid sequences and lacked four of the eight characteristic Cys residues.
Figure 3.
Figure 3.
NJ tree based on amino acid sequences of GBSS from buckwheat and other plant species. The bootstrap values (500 replicates) not <50 are shown next to the branches. The scale bar corresponds to 0.05 substitutions per site. Two GBSSs from Physcomitrella patens were used as outgroup sequences. Species names are coloured according to their order: Poales, grey; Ranunculales, cyan; Vitales, purple; Cucurbitales, blue grey; Fabales, green; Malpighiales, blue; Rosales, pink; Myrtales, teal; Brassicales, indigo; Malvales, brown; Sapindales, orange; Caryophyllales, red; Lamiales, yellow; and Solanales, lime. Four GBSSs from F. esculentum are indicated by red circles next to the sequence names. Sequences obtained from BGDB are abbreviated (sc0002521: Fes_sc0002521.1.g000007; sc0004292: Fes_sc0004292.1.g000004; and sc0005258: Fes_sc0005258.1.g000004). Two sequences of Fagopyrum species (AHA36967 and HW041459) were obtained from GenBank. Sequences excluding those from Fagopyrum species were obtained using Phytozome 10.3 and the accession numbers are in parentheses.
Figure 4.
Figure 4.
The number of non-LS sites and non-SS sites identified in GBS reads. Sites at which >50 reads were mapped in long-styled plants but not in short-styled plants were defined as ‘non-SS’. Sites at which >50 reads were mapped in short-styled plants but not in long-styled plants were defined as ‘non-LS’. The number of non-LS (blue bar) and non-SS sites (red bar) was plotted against the number of short-styled plants sharing the non-LS sites and of long-styled plants sharing the non-SS sites, respectively.

References

    1. Polashock J., Zelzion E., Fajardo D. et al. . 2014, The American cranberry: first insights into the whole genome of a species adapted to bog habitat, BMC Plant Biol., 14, 165. - PMC - PubMed
    1. Kim S., Park M., Yeom S-I. et al. . 2014, Genome sequence of the hot pepper provides insights into the evolution of pungency in Capsicum species, Nat. Genet., 46, 270–8. - PubMed
    1. Zhang G., Liu X., Quan Z. et al. . 2012, Genome sequence of foxtail millet (Setaria italica) provides insights into grass evolution and biofuel potential, Nat. Biotechnol., 30, 549–54. - PubMed
    1. Varshney R.K., Chen W., Li Y. et al. . 2012, Draft genome sequence of pigeonpea (Cajanus cajan), an orphan legume crop of resource-poor farmers, Nat. Biotechnol., 30, 83–9. - PubMed
    1. Giménez-Bastida J.A., Zieliński H.. 2015, Buckwheat as a functional food and its effects on health, J. Agric. Food Chem., 63, 7896–913. - PubMed