Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 Jan 29;36(21):5129-5132.
doi: 10.1093/bioinformatics/btaa686.

Persistent minimal sequences of SARS-CoV-2

Affiliations

Persistent minimal sequences of SARS-CoV-2

Diogo Pratas et al. Bioinformatics. .

Abstract

Motivation: Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) has caused more than 14 million cases and more than half million deaths. Given the absence of implemented therapies, new analysis, diagnosis and therapeutics are of great importance.

Results: Analysis of SARS-CoV-2 genomes from the current outbreak reveals the presence of short persistent DNA/RNA sequences that are absent from the human genome and transcriptome (PmRAWs). For the PmRAWs with length 12, only four exist at the same location in all SARS-CoV-2. At the gene level, we found one PmRAW of size 13 at the Spike glycoprotein coding sequence. This protein is fundamental for binding in human ACE2 and further use as an entry receptor to invade target cells. Applying protein structural prediction, we localized this PmRAW at the surface of the Spike protein, providing a potential targeted vector for diagnostics and therapeutics. In addition, we show a new pattern of relative absent words (RAWs), characterized by the progressive increase of GC content (Guanine and Cytosine) according to the decrease of RAWs length, contrarily to the virus and host genome distributions. New analysis shows the same property during the Ebola virus outbreak. At a computational level, we improved the alignment-free method to identify pathogen-specific signatures in balance with GC measures and removed previous size limitations.

Availability and implementation: https://github.com/cobilab/eagle.

Supplementary information: Supplementary data are available at Bioinformatics online.

PubMed Disclaimer

Publication types

Substances