Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2020 Nov;74(10-11):565-577.
doi: 10.1007/s10858-020-00321-1. Epub 2020 Jul 7.

CcpNmr AnalysisScreen, a new software programme with dedicated automated analysis tools for fragment-based drug discovery by NMR

Affiliations

CcpNmr AnalysisScreen, a new software programme with dedicated automated analysis tools for fragment-based drug discovery by NMR

Luca G Mureddu et al. J Biomol NMR. 2020 Nov.

Abstract

Fragment-based drug discovery or FBDD is one of the main methods used by industry and academia for identifying drug-like candidates in early stages of drug discovery. NMR has a significant impact at any stage of the drug discovery process, from primary identification of small molecules to the elucidation of binding modes for guiding optimisations. The essence of NMR as an analytical tool, however, requires the processing and analysis of relatively large amounts of single data items, e.g. spectra, which can be daunting when managed manually. One bottleneck in FBDD by NMR is a lack of adequate and well-integrated resources for NMR data analysis that are freely available to the community. Thus, scientists typically resort to manually inspecting large datasets and relying predominantly on subjective interpretations. In this manuscript, we present CcpNmr AnalysisScreen, a software package that provides computational tools for automated analysis of FBDD data by NMR. We outline how the quality of collected spectra can be evaluated quickly, and how robust workflows can be optimised for reliable and rapid hit identification. With an intuitive graphical user interface and powerful algorithms, AnalysisScreen enables easy analysis of the large datasets needed in the early process of drug discovery by NMR.

Keywords: CCPN; CcpNmr software; FBDD; Fragments based drug discovery; NMR; Screening.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflict of interest.

Figures

Fig. 1
Fig. 1
Ligand-detected NMR methods. Common techniques for detecting ligand binding (Sugiki et al. 2018) to a large macromolecular target (blue motif). The binding and non-binding compounds (small molecules) are displayed as a green hexagons and red squares, respectively a 1H Relaxation-edited experiment. The peaks of both compounds in the control spectrum are characterised by narrow resonance lines. In the presence of a target, a binding compound partially acquires the NMR properties of the macromolecule, resulting in a broadening of its resonance line (green peak). The effect does not affect a non-binding compound. b In the on-resonance experiment of a saturated transfer difference (STD) experiment, a saturating RF field is applied to the target and saturation is transferred to the binding compound, resulting in a slightly lower intensity of its resonance line. In the off-resonance control experiment no such effect occurs; consequently, only the resonance of the binding compound will be visible in the STD spectrum. c In the WaterLOGSY experiment saturation is transferred to the target through saturation of the bulk water molecules and passed on to the binding compound. Its resonance line in the spectrum in the presence of the target will have the opposite sign compared to the control spectrum. d In the T1ρ experiments a series of spectra are recorded with different relaxation durations. For the binding compound, spectral intensities will attenuate at a faster rate compared to the non-binding compound
Fig. 2
Fig. 2
CcpNmr AnalysisScreen sidebar and various pop-ups. a Screenshot of the sidebar state after parsing and loading an Excel file containing spectral metadata. Objects are automatically created and are listed on various branches. The regex-enabled search widget (blue rectangle) allows for quick scanning of project metadata through the tree, an essential feature when handling several hundred entries of a typical NMR screening dataset. b Small molecule metadata are stored into the CcpNmr software as Substances. Substances are a representation of chemical properties of the reference compound. They can be visualised and edited in the Substances pop-up. If SMILES are provided, molecular structures are also shown in this window. c The Samples properties pop-up enables users to insert and edit information regarding particular experimental conditions, such as concentration and pH or other sample identifiers. d The SpectrumGroup editor pop-up allows users to quickly and easily group spectra using drag-and-drop features. SpectrumGroups can be displayed as single entities in displays or be used as input data for several tools throughout the programme
Fig. 3
Fig. 3
Principal component analysis (PCA) of 1760 reference spectra. Most of the spectra were uniformly grouped around the PCA origins, (blue rectangle, panel a); for spectra in the region 3 < PC1 < 7 (purple rectangle, panel b) large phasing errors were observed; the spectra in the region PC1 > 8 (green rectangle, panel c) appeared highly distorted, probably due to inadequate solvent suppression. Finally, spectra presenting only noise were discovered in the region indicated by the red square (panel d)
Fig. 4
Fig. 4
CcpNmr AnalysisScreen Pipeline and Hit Analysis module. a Schematic representation of a pipeline. The pipeline is able to handle SpectrumGroups as well as single spectra as the input data. Each pipe performs a dedicated action on the spectra and returns a new set of spectra which are used as input for each successive pipe. Finally, a result or report pipe provides information on performed actions. b Current graphical user interface for assembling and executing a Pipeline. The left side shows the available settings affecting the execution of the pipeline. Pipelines are constructed by simply selecting pipes from the main pull-down; the grey area underneath displays the selected pipes. On the right side, a pop-up is shown which can be used to customise the main selection pull-down. Pipelines can also be saved and restored, including last used parameters, as a JSON file that can be shared with other AnalysisScreen users. c A pipeline for STD hit identification. Each green header represents a pipe action. The pipe can be as simple as the Peak Detector, without user adjustable parameters, or a list of complex widgets such as the Noise Threshold pipe, which allows direct interaction with displayed spectra. d Current Hit Analysis module graphical user interface containing a report of 1000 simulated samples for three different experiment types. The Hit Analysis module allows interactive inspection and assessment of SpectrumHits showing spectra, scores and associated metadata. Furthermore, custom peak tables (bottom) allow quick navigation through the peak hits in the selected spectrum display. A summary for the sample and SpectrumHit properties is shown in the bottom right corner
Fig. 5
Fig. 5
Peak and hit detection assessment using simulated spectra. a Simulated 1H spectra at different signal-to-noise ratios and estimated positive noise thresholds calculated using Eq. 1, with α set to 1.5 (blue), relative adjustment NTh+10 = + 10% NTh (green) and NTh-10 = − 10% (red). The left panel shows typical spectral peaks with an S/N greater than 2.5. Peak intensities are well above threshold values and peaks are correctly identified. At around a S/N of 1.5, most of the peaks are still identified, although a larger number of artefacts can be mistakenly included as real peaks. At very low S/N it is generally difficult to distinguish genuine peak shapes from the spectral noisy distortions. b Total count of correctly identified observations for 100 simple spectra simulated at over 20,000 different S/N variations. c Total accuracy for the peak picker on simulated spectra at different delta values. Accuracy (A) was defined as A = (TP+TN)/(TP+FN+FP+TN). d Total sensitivity for the peak picker on simulated spectra. Sensitivity (S) was calculated as S = TP/(TP + FN), with TP, TN, FP and FN denoting true positive, true negative, false positive, and false negative values, respectively
Fig. 6
Fig. 6
Re-referencing of spectral datasets. a and c show an example of an STD SpectrumHit and its best-matched reference before and after applying a re-referencing pipe. b and d illustrate peak shift distributions of experimental STD spectra to their reference spectra before and after a re-referencing pipe was applied. The maximum of the distribution, ~ 0.0075 ppm, (from Fig. 6b), was used to calculate the total adjustment needed to re-reference the STD spectra to their references. d New distribution after the adjustment was applied, with a maximum centred around ~ 0.000 ppm
Fig. 7
Fig. 7
Automated versus manual hit detection results. a Total number of SpectrumHits obtained by a visual inspection using manually picked peaks (light green bar); SpectrumHits obtained by the hit detection pipeline before and after re-referencing, using the same previously manually picked peaks (blue and yellow bars) and SpectrumHits obtained after re-referencing and automatic peak detection using default parameters (dark green). b Newly detected and lost SpectrumHits counts between the four methods. Notably, the automatic approach showed 15 new potential SpectrumHits, which were missed during the manual analysis. c Example of STD SpectrumHit and best matched reference (compound 3) for the mixture. Although, all the references in the mixture appeared to have at least one matching peak to the SpectrumHit, the Hit Analysis module was accordingly able to score the references and identify the compound 3 as the top hit. d Total number of overlaps for the original randomly created mixtures and for the new optimised mixtures generated by the mixture generation module. Overlaps and other mixture scores were calculated as in NmrMix (Stark et al. 2016). In the red circle the SpectrumHit shown in Fig. 7c is highlighted; it appeared in proximity to the maximum (top horizontal bar) and outliers (coloured dots) as it scored a large degree of overlapping peaks. The rectangular boxes represent the interquartile range (IQR); the “X” symbol inside the IQR represents the mean; long horizontal bar in the middle of the dataset represents the median (second quartile, Q2), the area below and above indicates the first (Q1) and the third quartile (Q3). Q1, Q2 and Q3 are also referred as 25th, 50th, 75th percentile. The maximum is calculated as Q3 + 1.5*IQR and minimum as Q1-1.5*IQR (Galarnyk 2018)

References

    1. Antanasijevic A, Ramirez B, Caffrey M. Comparison of the sensitivities of WaterLOGSY and saturation transfer difference NMR experiments. J Biomol NMR. 2014;60:37–44. doi: 10.1007/s10858-014-9848-9. - DOI - PMC - PubMed
    1. Baell JB, Nissink JWM. Seven year itch: pan-assay interference compounds (PAINS) in 2017—utility and limitations. ACS Chem Biol. 2018;13:36–44. doi: 10.1021/acschembio.7b00903. - DOI - PMC - PubMed
    1. Baldisseri DM, Bruker Biospin (2018) Practical aspects of fragment-based screening experiments in TopSpin. https://www.bruker.com/products/mr/nmr/software/fragment-based-screening....
    1. Billauer E (2012) Peak detect. https://billauer.co.il/peakdet.html.
    1. Campagnola L (2016) PyQtGraph. Scientific graphics and gui library for python. https://www.pyqtgraph.org.

LinkOut - more resources