Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Dec 7;13(1):21705.
doi: 10.1038/s41598-023-47488-9.

Diversity of symptom phenotypes in SARS-CoV-2 community infections observed in multiple large datasets

Affiliations

Diversity of symptom phenotypes in SARS-CoV-2 community infections observed in multiple large datasets

Martyn Fyles et al. Sci Rep. .

Abstract

Variability in case severity and in the range of symptoms experienced has been apparent from the earliest months of the COVID-19 pandemic. From a clinical perspective, symptom variability might indicate various routes/mechanisms by which infection leads to disease, with different routes requiring potentially different treatment approaches. For public health and control of transmission, symptoms in community cases were the prompt upon which action such as PCR testing and isolation was taken. However, interpreting symptoms presents challenges, for instance, in balancing the sensitivity and specificity of individual symptoms with the need to maximise case finding, whilst managing demand for limited resources such as testing. For both clinical and transmission control reasons, we require an approach that allows for the possibility of distinct symptom phenotypes, rather than assuming variability along a single dimension. Here we address this problem by bringing together four large and diverse datasets deriving from routine testing, a population-representative household survey and participatory smartphone surveillance in the United Kingdom. Through the use of cutting-edge unsupervised classification techniques from statistics and machine learning, we characterise symptom phenotypes among symptomatic SARS-CoV-2 PCR-positive community cases. We first analyse each dataset in isolation and across age bands, before using methods that allow us to compare multiple datasets. While we observe separation due to the total number of symptoms experienced by cases, we also see a separation of symptoms into gastrointestinal, respiratory and other types, and different symptom co-occurrence patterns at the extremes of age. In this way, we are able to demonstrate the deep structure of symptoms of COVID-19 without usual biases due to study design. This is expected to have implications for the identification and management of community SARS-CoV-2 cases and could be further applied to symptom-based management of other diseases and syndromes.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interests.

Figures

Figure 1
Figure 1
Histograms showing the age density for each dataset. (a) Pillar 2, (b) SGSS, (c) COVID Symptom Study, (d) COVID-19 Infection Survey.
Figure 2
Figure 2
(a) the proportion of the Bernoulli deviance explained using an LPCA model with k components. (b) the proportion of the Bernoulli deviance explained by adding the kth component to the model. In this example, we would select k=2 as the true number of components, as indicted by the vertical dashed red line.
Figure 3
Figure 3
Jaccard distance matrices between symptoms adjacent to associated dendrograms obtained through hierarchical clustering under complete linkage. The symptom category is denoted using coloured points at the roots of the dendrogram. The central columns give the name of the symptom with the percentage of symptomatic cases who exhibit the symptom in the dataset. (a) Pillar 2, (b) SGSS, (c) COVID Symptom Study, (d) COVID-19 Infection Survey.
Figure 4
Figure 4
Logistic Principal Components Analysis (LPCA) results. For each dataset, elements of the principal components are visualised as vertical bar plots. Each vector is insensitive to overall multiplication by -1. Symptom categories are labelled by colours. (a) Pillar 2, (b) SGSS, (c) COVID Symptom Study, (d) COVID-19 Infection Survey.
Figure 5
Figure 5
AlignedUMAP embeddings of SARS-CoV-2 symptoms. For each dataset, an optimal embedding of the symptoms into 2D Euclidean space is found, subject to the following loose constraint: if a symptom is common to all datasets, then it should be placed in roughly the same position across all datasets. This alignment allows for easier comparison, and investigation of shared symptom structures across all datasets. Point size is proportional to the proportion of cases that develop a given symptom. Symptoms that are common to all datasets, and are aligned between distinct datasets are plotted as triangles. For this embedding the parameters were chosen to capture more of the global structure of symptoms and produces less well-defined clusters. (a) Pillar 2, (b) SGSS, (c) COVID Symptom Study, (d) COVID-19 Infection Survey.
Figure 6
Figure 6
AlignedUMAP embeddings of SARS-CoV-2 symptoms across several datasets. Each dataset has been age-stratified into strata of length 10 years. For each strata, an optimal two-dimensional embedding into Euclidean space of the symptoms is found, subject to the loose constraint that each symptom is placed in a similar location in adjacent embeddings. Linear interpolation is used to connect the embedding of each strata, allowing for a 3-dimensional visualisation of how the co-occurrence patterns of symptoms change with age. For each 3D embedding, we take three images at 45 degree rotations. (a) Pillar 2, (b) SGSS, (c) COVID Symptom Study, (d) COVID-19 Infection Survey.

References

    1. World Health Organization. Coronavirus disease (COVID-19) pandemic, 2022. URL https://www.who.int/emergencies/diseases/novel-coronavirus-2019. 25 May 2022.
    1. Hale Thomas, Angrist Noam, Goldszmidt Rafael, Kira Beatriz, Petherick Anna, Phillips Toby, Webster Samuel, Cameron-Blake Emily, Hallas Laura, Majumdar Saptarshi, Tatlow Helen. A global panel database of pandemic policies (Oxford COVID-19 Government Response Tracker) Nat. Hum. Behav. 2021;5(4):529–538. doi: 10.1038/s41562-021-01079-8. - DOI - PubMed
    1. Google LLC. Community Mobility Reports, 2021. URL https://www.google.com/covid19/mobility/.
    1. Fyles, Martyn, Fearon, Elizabeth & Overton, Christopher. University of Manchester COVID-19 Modelling Group, Tom Wingfield, Graham F. Medley, Ian Hall, Lorenzo Pellis, and Thomas House. Using a household-structured branching process to analyse contact tracing in the SARS-CoV-2 pandemic. Philosop. Trans. R. Soc. B: Biol. Sci., 376(1829):20200267, (2021). - PMC - PubMed
    1. Crozier, Alex, Dunning, Jake, Rajan, Selina, Semple, Malcolm G & Buchan, Malcolm G. Could expanding the COVID-19 case definition improve the UK’s pandemic response? BMJ, 374, (2021). 10.1136/bmj.n1625. URL https://www.bmj.com/content/374/bmj.n1625. - PubMed