Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Jul 29;12(1):13058.
doi: 10.1038/s41598-022-17318-5.

Using HPV-meta for human papillomavirus RNA quality detection

Affiliations

Using HPV-meta for human papillomavirus RNA quality detection

Agustin Ure et al. Sci Rep. .

Abstract

In the era of cervical cancer elimination, accurate and validated pipelines to detect human papillomavirus are essential to elucidate and understand HPV association with human cancers. We aimed to provide an open-source pipeline, "HPV-meta", to detect HPV transcripts in RNA sequencing data, including several steps to warn operators for possible viral contamination. The "HPV-meta" pipeline automatically performs several steps, starting with quality trimming, human genome filtering, HPV detection (blastx), cut-off settlement (10 reads and 690 bp coverage to make an HPV call) and finishing with fasta sequence generation for HPV positive samples. Fasta sequences can then be aligned to assess sequence diversity among HPV positive samples. All RNA sequencing files (n = 10,908) present in the cancer genome atlas (TCGA) were analyzed. "HPV-meta" identified 25 different HPV types being present in 488/10,904 specimens. Validation of results showed 99.98% agreement (10,902/10,904). Multiple alignment from fasta files warned about high sequence identity between several HPV 18 and 38 positive samples, whose contamination had previously been reported. The "HPV-meta" pipeline is a robust and validated pipeline that detects HPV in RNA sequencing data. Obtaining the fasta files enables contamination investigation, a non very rare occurrence in next generation sequencing.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interests.

Figures

Figure 1
Figure 1
“HPV-meta” pipeline for detecting HPV transcripts in RNA sequencing data. Flowchart describing the pipeline steps included in “HPV-meta”. The pipeline includes removal of human reads, sort and conversion to fq files (samtools v. 1.10), quality trimming (Trimmomatic v. 0.39 (https://github.com/usadellab/Trimmomatic) extra trimming (needed for specific library preparation kits, e.g: removing 3 bp of R2 from libraries prepared with the Smarter stranded total RNA-seq kit from Takara, USA) performed with Cutadapt v. 3.3) re-mapping to human reference genome (double human cleaning using Nextgenmap v. 0.5.5) filtering out of human reads, mapping non-human reads to an HPV protein database (Diamond v. 2.0.7) coverage calculation and, if HPV positivity is present, a fasta file is generated by mapping the reads to HPV genome references and subjecting them to variant calling using GATK v. 4.2.3.0 (Image created using https://app.diagrams.net/ and Inkscape v. 1.1, https://inkscape.org/).
Figure 2
Figure 2
HPV pairwise comparison among HPV positive specimens detected in TCGA. Pairwise comparison performed for HPV16 positive RNA sequences in the TCGA database (a), for HPV 38 sequences (b) and for HPV 18 sequences (c). BLCA: Bladder Urothelial Carcinoma; BRCA: Breast Invasive Carcinoma; CESC: Cervical Squamous Cell Carcinoma and Endocervical Adenocarcinoma; COAD:Colon Adenocarcinoma; HNSC: Head and Neck Squamous Cell Carcinoma; KIRC: Kidney Renal Clear Cell Carcinoma; KIRP: Kidney Renal Papillary Cell Carcinoma; LGG: Brain Lower Grade Glioma; LIHC: Liver Hepatocellular Carcinoma; LUSC: Lung Squamous Cell Carcinoma; MESO: Mesothelioma; OV: Ovarian Serous Cystadenocarcinoma; PAAD: Pancreatic Adenocarcinoma; PRAD: Prostate Adenocarcinoma; READ: Rectum Adenocarcinoma; SARC: Sarcoma; SKCM: Skin Cutaneous Melanoma; STAD: Stomach Adenocarcinoma; UCEC: Uterine Corpus Endometrial Carcinoma. (Image created using Python v. 3.8.10 and Seaborn library v. 0.11.2, https://seaborn.pydata.org/).

Similar articles

Cited by

References

    1. Muhr LSA, Eklund C, Dillner J. Towards quality and order in human papillomavirus research. Virology. 2018;519:74–76. doi: 10.1016/j.virol.2018.04.003. - DOI - PubMed
    1. Bzhalava D, et al. Deep sequencing extends the diversity of human papillomaviruses in human skin. Sci. Rep. 2014;4:5807. doi: 10.1038/srep05807. - DOI - PMC - PubMed
    1. Ekstrom J, et al. Diversity of human papillomaviruses in skin lesions. Virology. 2013;447:300–311. doi: 10.1016/j.virol.2013.09.010. - DOI - PubMed
    1. Martin E, et al. Characterization of three novel human papillomavirus types isolated from oral rinse samples of healthy individuals. J. Clin. Virol. 2014;59:30–37. doi: 10.1016/j.jcv.2013.10.028. - DOI - PMC - PubMed
    1. IARC Working Group on the Evaluation of Carcinogenic Risks to Humans. Human papillomaviruses. IARC Monographs on the Evaluation of Carcinogenic

Publication types