Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2019 Apr 29;7(2):19.
doi: 10.3390/proteomes7020019.

A Preliminary Metagenome Analysis Based on a Combination of Protein Domains

Affiliations

A Preliminary Metagenome Analysis Based on a Combination of Protein Domains

Yoji Igarashi et al. Proteomes. .

Abstract

Metagenomic data have mainly been addressed by showing the composition of organisms based on a small part of a well-examined genomic sequence, such as ribosomal RNA genes and mitochondrial DNAs. On the contrary, whole metagenomic data obtained by the shotgun sequence method have not often been fully analyzed through a homology search because the genomic data in databases for living organisms on earth are insufficient. In order to complement the results obtained through homology-search-based methods with shotgun metagenomes data, we focused on the composition of protein domains deduced from the sequences of genomes and metagenomes, and we utilized them in characterizing genomes and metagenomes, respectively. First, we compared the relationships based on similarities in the protein domain composition with the relationships based on sequence similarities. We searched for protein domains of 325 bacterial species produced using the Pfam database. Next, the correlation coefficients of protein domain compositions between every pair of bacteria were examined. Every pairwise genetic distance was also calculated from 16S rRNA or DNA gyrase subunit B. We compared the results of these methods and found a moderate correlation between them. Essentially, the same results were obtained when we used partial random 100 bp DNA sequences of the bacterial genomes, which simulated raw sequence data obtained from short-read next-generation sequences. Then, we applied the method for analyzing the actual environmental data obtained by shotgun sequencing. We found that the transition of the microbial phase occurred because the seasonal change in water temperature was shown by the method. These results showed the usability of the method in characterizing metagenomic data based on protein domain compositions.

Keywords: correlation coefficient; environmental DNA; metagenomics; phylogenetic analysis; protein domain.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing financial interests.

Figures

Figure 1
Figure 1
Dot plots for correlation coefficients of domain combinations and pairwise distances of DNA sequences: (a) The pairwise distances were calculated based on the 16S rRNA sequence. The correlation coefficient was 0.4285, P < 2.2e−16; (b) Domain counts were converted to 0 (absence)/1 (presence), and pairwise distances were calculated based on the 16S rRNA sequence. The correlation coefficient was 0.5967, P < 2.2e−16; (c) Domain counts were converted to ln [number of domain + 1], and pairwise distances were calculated based on the 16S rRNA sequence. The correlation coefficient was 0.5993, P < 2.2e−16; and (d) The pairwise distances were calculated based on the DNA gyrase subunit B sequence. The correlation coefficient was 0.4723, P < 2.2e−16.
Figure 2
Figure 2
Heatmap analysis of the protein domains using 30 samples of the environmental metagenomic data. It is divided into two large clusters: Clusters of 5 μm and 0.8 μm samples on the left cluster, while the right cluster contains 0.2 μm samples. See Supplementary Figure S7 for an analysis of the results using all of the data sets.
Figure 3
Figure 3
Cluster analysis based on the protein domains using environmental metagenomic data. The distance between the samples was calculated by correlating the distance and they were clustered using the “ward.D2” method. It is divided into four clusters. The black bars and arrows indicate 5 μm filter samples in a 0.8 μm filter sample. See Supplementary Figure S8 for the high-resolution version.
Figure 4
Figure 4
A principal component analysis was carried out on the protein domains by the environmental data. The data of the 0.8 μm filter samples were examined under three conditions: Sea depth, namely surface (1 m) vs. SCM (10–20 m); locations, namely the bay vs. the offshore area; the season, namely from December to April vs. from May to November. The red and green circles show samples from December to April and from May to November, respectively.

References

    1. Kennedy J., Marchesi J.R., Dobson A.D. Marine metagenomics: Strategies for the discovery of novel enzymes with biotechnological applications from marine environments. Microb. Cell Fact. 2008;7:27. doi: 10.1186/1475-2859-7-27. - DOI - PMC - PubMed
    1. Whitman W.B., Coleman D.C., Wiebe W.J. Prokaryotes: The unseen majority. Proc. Natl. Acad. Sci. USA. 1998;95:6578–6583. doi: 10.1073/pnas.95.12.6578. - DOI - PMC - PubMed
    1. Hugenholtz P., Goebel B.M., Pace N.R. Impact of Culture-Independent Studies on the Emerging Phylogenetic View of Bacterial Diversity. J. Bacteriol. 1998;180:4765–4774. - PMC - PubMed
    1. Handelsman J., Rondon M.R., Brady S.F., Clardy J., Goodman R.M. Molecular biological access to the chemistry of unknown soil microbes: A new frontier for natural products. Chem. Biol. 1998;5:R245–R249. doi: 10.1016/S1074-5521(98)90108-9. - DOI - PubMed
    1. Venter J.C., Remington K., Heidelberg J.F., Halpern A.L., Rusch D., Eisen J.A., Wu D., Paulsen I., Nelson K.E., Nelson W., et al. Environmental genome shotgun sequencing of the Sargasso Sea. Science. 2004;304:66–74. doi: 10.1126/science.1093857. - DOI - PubMed

LinkOut - more resources