Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2020 Aug 24;59(35):14788-14795.
doi: 10.1002/anie.202000421. Epub 2020 May 11.

Haruspex: A Neural Network for the Automatic Identification of Oligonucleotides and Protein Secondary Structure in Cryo-Electron Microscopy Maps

Affiliations

Haruspex: A Neural Network for the Automatic Identification of Oligonucleotides and Protein Secondary Structure in Cryo-Electron Microscopy Maps

Philipp Mostosi et al. Angew Chem Int Ed Engl. .

Abstract

In recent years, three-dimensional density maps reconstructed from single particle images obtained by electron cryo-microscopy (cryo-EM) have reached unprecedented resolution. However, map interpretation can be challenging, in particular if the constituting structures require de-novo model building or are very mobile. Herein, we demonstrate the potential of convolutional neural networks for the annotation of cryo-EM maps: our network Haruspex has been trained on a carefully curated set of 293 experimentally derived reconstruction maps to automatically annotate RNA/DNA as well as protein secondary structure elements. It can be straightforwardly applied to newly reconstructed maps in order to support domain placement or as a starting point for main-chain placement. Due to its high recall and precision rates of 95.1 % and 80.3 %, respectively, on an independent test set of 122 maps, it can also be used for validation during model building. The trained network will be available as part of the CCP-EM suite.

Keywords: DNA structures; RNA structures; electron microscopy; neural networks; protein structures.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflict of interest.

Figures

Figure 1
Figure 1
Typical example of Haruspex annotation. A) Reconstruction map for the human ribonuclease P holoenzyme (EMDB entry 9627). Manual assignment of secondary structure features can be difficult, in particular if the composition of a macromolecular complex is unknown. The surface shown corresponds to an r.m.s.d. of 0.04 with no carving. B) Secondary structure, as identified by our network in the map, is projected onto the surface. Orange corresponds to RNA/DNA; red to helices and blue to sheets. This was a fairly typical test case with 70.5 % true positives, 18.8 % false positives, and 10.7 % false negatives. Recall was 86.8 % and precision 79.0 %. Region I) depicts a well‐predicted α‐helical structure, II) a β‐sheet, and III) RNA misinterpreted as an α‐helix. C) The deposited model PDB 6AHU for this map is shown in comparison. The regions depicted in Figure 2 C and 2 D are marked # and *, respectively.
Figure 2
Figure 2
Network performance. A) Network precision vs. recall rates, with one marker per EMDB entry (training set entries are shown as orange, test set entries as blue markers). Both perform similarly well; with the training set producing a few more outliers. B) Frequency vs. map r.m.s.d. level for EMDB 9627 on a per‐residue basis: True positives (green), false positives (orange), and false negatives (blue). This plot is typical: false negatives often occur in low‐density map regions. C) α‐Helical false positives (PDB 6AHU, residues 131–139 in chain J): The model partly occupies the conformational space of a polyproline type II helix (PII), which is often misinterpreted as α‐helical and may have been modelled incorrectly (given that the model does not completely fit the density). D) False positives in a β‐sheet (6AHU, residues 215–221 in chain B). The deposited model does not maintain the hydrogen bonding that defines a regular β‐sheet; to the network, however, the fold still “looks” like a β‐sheet and a third segment (top) is also assumed to be part of it.
Figure 3
Figure 3
Additional examples from the test set. Top: Annotated map. Bottom: Deposited structure for comparison. Orange corresponds to RNA/DNA; red to helices; blue to sheets and grey regions were not assigned any secondary structure. A) Nucleosome from Xenopus laevis, average map resolution 3.8 Å (map: EMDB 4297, model: PDB 6FQ5): recall 98.5 %, precision 94.0 %. B) Flavobacterium johnsoniae Type 9 protein translocon, average map resolution 3.5 Å (map: EMDB 0133, model: PDB 6H3I): recall 96.3 %, precision 49.3 %. C)  Leucine dehydrogenase from Geobacillus stearothermophilus, average map resolution 3.0 Å (map: EMDB 9590, model: PDB 6ACF): recall 89.8 %, precision 85.7 %. D) Escherichia coli Type VI secretion system, average map resolution 4.0 Å (map: EMDB 9747, model:PDB 6IXH): recall 95.9 %, precision 70.9 %. E) Homo sapiens metabotropic glutamate receptor 5, average map resolution 4.0 Å (map: EMDB 0345, model: PDB 6N51): recall 95.9 %, precision 71.7 %. F) Bacterial RNA polymerase‐sigma54 holoenzyme transcription open complex, average map resolution 3.4 Å (map: EMDB 0001, model: PDB 6GH5): recall 94.2 %, precision 67.5 %.

Similar articles

Cited by

References

    1. Sevvana M., Long F., Miller A. S., Klose T., Buda G., Sun L., Kuhn R. J., Rossmann M. G., Structure 2018, 26, 1169–1177.e3. - PMC - PubMed
    1. Famelis N., Rivera-Calzada A., Degliesposti G., Wingender M., Mietrach N., Skehel J. M., Fernandez-Leiro R., Böttcher B., Schlosser A., Llorca O., et al., Nature 2019, 1–21. - PMC - PubMed
    1. Frank J., Agrawal R. K., Nature 2000, 406, 318–322. - PubMed
    1. Rosenthal P. B., IUCrJ 2019, 6, 3. - PMC - PubMed
    1. Nicholls R. A., Tykac M., Kovalevskiy O., Murshudov G. N., Acta Crystallogr. Sect. D 2018, 74, 492. - PMC - PubMed

Publication types

LinkOut - more resources