Identification and analysis of gene families from the duplicated genome of soybean using EST sequences
- PMID: 16899135
- PMCID: PMC1557498
- DOI: 10.1186/1471-2164-7-204
Identification and analysis of gene families from the duplicated genome of soybean using EST sequences
Abstract
Background: Large scale gene analysis of most organisms is hampered by incomplete genomic sequences. In many organisms, such as soybean, the best source of sequence information is the existence of expressed sequence tag (EST) libraries. Soybean has a large (1115 Mbp) genome that has yet to be fully sequenced. However it does have the 6th largest EST collection comprised of ESTs from a variety of soybean genotypes. Many EST libraries were constructed from RNA extracted from various genetic backgrounds, thus gene identification from these sources is complicated by the existence of both gene and allele sequence differences. We used the ESTminer suite of programs to identify potential soybean gene transcripts from a single genetic background allowing us to observe functional classifications between gene families as well as structural differences between genes and gene paralogs within families. The identification of potential gene sequences (pHaps) from soybean allows us to begin to get a picture of the genomic history of the organism as well as begin to observe the evolutionary fates of gene copies in this highly duplicated genome.
Results: We identified approximately 45,000 potential gene sequences (pHaps) from EST sequences of Williams/Williams82, an inbred genotype of soybean (Glycine max L. Merr.) using a redundancy criterion to identify reproducible sequence differences between related genes within gene families. Analysis of these sequences revealed single base substitutions and single base indels are the most frequently observed form of sequence variation between genes within families in the dataset. Genomic sequencing of selected loci indicate that intron-like intervening sequences are numerous and are approximately 220 bp in length. Functional annotation of gene sequences indicate functional classifications are not randomly distributed among gene families containing few or many genes.
Conclusion: The predominance of single nucleotide insertion/deletions and substitution events between genes within families (individual genes and gene paralogs) is consistent with a model of gene amplification followed by single base random mutational events expected under the classical model of duplicated gene evolution. Molecular functions of small and large gene families appear to be non-randomly distributed possibly indicating a difference in retention of duplicates or local expansion.
Figures



Similar articles
-
[Analysis, identification and correction of some errors of model refseqs appeared in NCBI Human Gene Database by in silico cloning and experimental verification of novel human genes].Yi Chuan Xue Bao. 2004 May;31(5):431-43. Yi Chuan Xue Bao. 2004. PMID: 15478601 Chinese.
-
Genome-wide identification and evolutionary analysis of leucine-rich repeat receptor-like protein kinase genes in soybean.BMC Plant Biol. 2016 Mar 2;16:58. doi: 10.1186/s12870-016-0744-1. BMC Plant Biol. 2016. PMID: 26935840 Free PMC article.
-
Dating and functional characterization of duplicated genes in the apple (Malus domestica Borkh.) by analyzing EST data.BMC Plant Biol. 2010 May 14;10:87. doi: 10.1186/1471-2229-10-87. BMC Plant Biol. 2010. PMID: 20470375 Free PMC article.
-
Paleopolyploidy and gene duplication in soybean and other legumes.Curr Opin Plant Biol. 2006 Apr;9(2):104-9. doi: 10.1016/j.pbi.2006.01.007. Epub 2006 Feb 2. Curr Opin Plant Biol. 2006. PMID: 16458041 Review.
-
Role of gene duplication in evolution.Genome. 1989;31(1):304-10. doi: 10.1139/g89-048. Genome. 1989. PMID: 2687099 Review.
Cited by
-
Validation of an NSP-based (negative selection pattern) gene family identification strategy.BMC Bioinformatics. 2008 Aug 12;9 Suppl 9(Suppl 9):S2. doi: 10.1186/1471-2105-9-S9-S2. BMC Bioinformatics. 2008. PMID: 18793465 Free PMC article.
-
Identification of the soybean HyPRP family and specific gene response to Asian soybean rust disease.Genet Mol Biol. 2013 Jul;36(2):214-24. doi: 10.1590/S1415-47572013005000017. Epub 2013 Apr 19. Genet Mol Biol. 2013. PMID: 23885204 Free PMC article.
-
Generation and analysis of expressed sequence tags from the ciliate protozoan parasite Ichthyophthirius multifiliis.BMC Genomics. 2007 Jun 18;8:176. doi: 10.1186/1471-2164-8-176. BMC Genomics. 2007. PMID: 17577414 Free PMC article.
-
Expression patterns in soybean resistant to Phakopsora pachyrhizi reveal the importance of peroxidases and lipoxygenases.Funct Integr Genomics. 2008 Nov;8(4):341-59. doi: 10.1007/s10142-008-0080-0. Epub 2008 Apr 15. Funct Integr Genomics. 2008. PMID: 18414911
-
Blast2GO: A comprehensive suite for functional analysis in plant genomics.Int J Plant Genomics. 2008;2008:619832. doi: 10.1155/2008/619832. Int J Plant Genomics. 2008. PMID: 18483572 Free PMC article.
References
-
- Arumuganathan K, Earle ED. Estimation of nuclear DNA content of plants by flow cytometry. Plant Mol Biol Rep. 1991;9:229–241.
-
- Lee JM, Grant D, Vallejos CE, Shoemaker RC. Genome organization in dicots. II. Arabidopsis as a 'bridging species' to resolve genome evolution events among legumes. Theor Appl Genet. 2001;103:765–773. doi: 10.1007/s001220100658. - DOI
Publication types
MeSH terms
LinkOut - more resources
Full Text Sources
Research Materials
Miscellaneous