Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2020 Nov 11;15(11):e0236849.
doi: 10.1371/journal.pone.0236849. eCollection 2020.

Sample pooling methods for efficient pathogen screening: Practical implications

Affiliations

Sample pooling methods for efficient pathogen screening: Practical implications

Tara N Furstenau et al. PLoS One. .

Abstract

Due to the large number of negative tests, individually screening large populations for rare pathogens can be wasteful and expensive. Sample pooling methods improve the efficiency of large-scale pathogen screening campaigns by reducing the number of tests and reagents required to accurately categorize positive and negative individuals. Such methods rely on group testing theory which mainly focuses on minimizing the total number of tests; however, many other practical concerns and tradeoffs must be considered when choosing an appropriate method for a given set of circumstances. Here we use computational simulations to determine how several theoretical approaches compare in terms of (a) the number of tests, to minimize costs and save reagents, (b) the number of sequential steps, to reduce the time it takes to complete the assay, (c) the number of samples per pool, to avoid the limits of detection, (d) simplicity, to reduce the risk of human error, and (e) robustness, to poor estimates of the number of positive samples. We found that established methods often perform very well in one area but very poorly in others. Therefore, we introduce and validate a new method which performs fairly well across each of the above criteria making it a good general use approach.

PubMed Disclaimer

Conflict of interest statement

The authors have declared that no competing interests exist.

Figures

Fig 1
Fig 1. DNA Sudoku pooling example.
In this example, there are a total of N = 96 samples. The 96-well plates show which samples are combined into each pool (Pi) for the two different window sizes (W1 = 10 and W2 = 11 which are greater than N and co-prime). By using two different window sizes, the weight of this pooling design is w = 2 meaning that k = w − 1 = 1 positive sample can be unambiguously identified in a single step using T = W1 + W2 = 21 tests. The positive samples are decoded by finding the samples that appear most often in the positive pools. For example, if G10 is the only positive sample, we can detect this from the pooling results by noticing that G10 was added to both of the positive (red) pools while the other samples in those pools were added to only one or the other. Alternatively, if both G10 and D4 are positive, four samples occur with equal frequency (D4, G10, E12, and F2) in the positive pools (red and purple) and it is impossible to determine which are the true positive samples. This ambiguity is introduced because the test was designed to handle only one positive sample.
Fig 2
Fig 2. Two-Dimensional pooling example.
A total of 96 samples are arrayed in symmetrical 5x5 grids (with 4 empty wells in the last grid) and k = 9 of the samples are positive (red wells). The pooling procedure combines each row and each column of a grid into separate pools for a total of T = 2 × 5 × 4 = 40 tests. Samples that are at the intersection of a positive row and a positive column (marked with an “X”) are potentially positive samples. When more than one row and more than one column are positive, some of the samples at the intersections are likely false positives (e.g. the top left and bottom right grids). Otherwise, the results are unambiguous and the correct positive samples can be identified (e.g. the top right and bottom left grids).
Fig 3
Fig 3. S-Stage pooling example.
For 96 samples with an estimate of 3 positive samples, the S-Stage algorithm requires 4 steps. In the first step (top 96 well plate), 96 samples are tested in 6 groups (black outline) of 16. In the next step, the samples in the positive pools from the previous step are arbitrarily redivided into 5 groups of 6 or 7 samples and tested. In the third step, the samples from positive pools from step 2 are redivided into 4 groups of 3 or 4. In the final step, individual testing is performed on samples from the positive pools in step 3. The number of tests required depends on the initial arrangement of positive samples within the pools but in this example 21 tests are required to identify 3 positive samples (red wells). The number of tests is lower than the upper bound in this case due to the fortunate placement of two positive samples in the same pool in steps 1-3.
Fig 4
Fig 4. Binary splitting by halving pooling example.
In this example, there are N = 96 samples and two of the samples are positive (red wells). To begin, all of the samples are pooled and tested (Step 1). If the first test is negative, testing is complete and all samples are considered negative. Otherwise, half of the samples are pooled and tested (Step 2). If the tested half is negative, then all of the samples in the tested half are considered to be negative and at least one positive sample is known to be present in the other non-tested half of the samples. If the tested half is positive, then it contains at least one positive sample and no information is gained about the other untested half. In either case, the method continues by halving and testing whichever group is known to contain a positive sample until a single positive sample is identified (either by individual testing, as seen in Step 7, or by elimination, as seen in Step 16). Once a single positive sample is identified, the remaining unresolved samples (non-grey wells) are pooled and tested to determine if any positive samples remain and the process continues until all positive samples are identified. Only one test is required per round, and in this example, it takes 17 sequential rounds to recover both positive samples.
Fig 5
Fig 5. Modified 3-Stage pooling example.
For 96 samples and an estimate of 2 positive samples, the Modified 3-Stage approach begins by creating 6 pools with 16 samples each. The positive pools from the first step are then subdivided into 4 groups of 4 in the second step. In the final step, the samples from the positive pools in step 2 are tested individually. In the modified 3-Stage approach, the pools are recursively subdivided into groups instead of arbitrarily redividing the remaining samples at each step. This is simpler and keeps the samples for each subsequent pool in close proximity. The total number of tests depends on the arrangement of the positive samples, but in this example, the modified 3-stage algorithm requires 22 tests.
Fig 6
Fig 6. Comparison of five pooling methods.
The radar charts show the average number of tests, number of steps, maximum number of samples per pool, number of pipettings (for 1-, 8- and 16-channel pipettes), and the number of additional tests and steps required when the number of positive samples is overestimated (k = 1, k^=20) or underestimated (k = 20, k^=1). The left column shows results from simulations with one positive sample and the right column shows simulations with 20 positive samples. The rows are different sample sizes from top to bottom: 96, 384, and 1,596. In each plot the values for each feature have been Min-Max normalized. In each category, methods with points at the center performed the best while methods with points near the edge performed the worst.
Fig 7
Fig 7. Comparison of the number of tests required for each pooling method.
The swarm plots show the distribution of the number of tests required for each method (100 simulations each). The left column shows simulations with one positive sample and the right column shows simulations with 20 positive samples. The rows are different sample sizes from top to bottom: 96, 384, and 1,596. For the S-Stage, Modified 3-Stage, and General Binary Splitting approaches, the results shown are for simulations where the expected number of positive samples was the same as the true number of positives. For DNA Sudoku and 2D Pooling, the results shown are for simulations with parameters that resulted in the lowest average number of tests (DNA Sudoku: w = 2 when k = 1, and when k = 20, w = 3 for 96 samples and w = 4 for 384 and 1,536 samples; 2D Pooling: when k = 1, the grid sizes shown are 1x10x10 for 96 samples, 1x20x20 for 394 samples, and 1x40x40 for 1536 samples, and when k = 20 the grid sizes are 11x3x3 for 96 samples, 24x4x4 for 384 samples, and 96x4x4 for 1,536 samples).
Fig 8
Fig 8. Comparison of the number of steps required for each pooling method.
The swarm plots show the distribution of the number of steps required for each method (100 simulations each). The left column shows simulations with one positive sample and the right column shows simulations with 20 positive samples. The rows are different sample sizes from top to bottom: 96, 384, and 1,596. The results are from the same set of simulations as shown in Fig 7.
Fig 9
Fig 9. Comparison of the maximum number of samples per pool for each pooling method.
The plots show the maximum number of samples in a single pool for each method. The left column shows simulations with one positive sample and the right column shows simulations with 20 positive samples. The rows are different sample sizes from top to bottom: 96, 384, and 1,596. These results are from the same set of simulations as shown in Fig 7.
Fig 10
Fig 10. Comparison of the number of pipettings for each pooling method.
The number of pipettings required for each pooling method is an indicator of method simplicity and reproducibility. The swarm plots show the distribution of the number of pipettings required to create pools for each method using 1-, 8-, and 16-channel pipettes (columns). The DNA Sudoku method does not benefit from the use of multichannel pipettes so the number of pipettings is the same across each row. These results are from the same set of simulations as shown in Fig 7.
Fig 11
Fig 11. Changes in the number of tests and steps given different estimated positive rates for the adaptive methods.
The figure shows the number of tests (y-axis) and the number of steps (marker size) required to recover all positive samples (x-axis) in simulations with N = 384 samples using each of the adaptive methods. For each method, except for Binary Splitting by Halving, the pooling scheme was optimized around the expected number of positive samples (marker color) provided to each simulation. Each point represents a single simulation and the lines are the average number of tests for a given number of expected positives. The black dashed line in the S-Stage, 3-Stage, and Binary Splitting by Halving figures represents the upper bound of the number of tests (assuming that the number of positive samples is estimated correctly, where applicable). For the Generalized Binary Splitting figure, the number of tests approaches the lower bound (black dashed line) when Nk is large.
Fig 12
Fig 12. Changes in the number of tests and steps given different estimated positive rates for the non-adaptive methods.
The figure shows the number of tests (y-axis) and the number of steps (marker size) required to recover all positive samples (x-axis) in simulations with N = 384 samples using DNA Sudoku and 2D Pooling methods. Each point is the average number of tests required for 100 simulations and the width of the bands is the standard deviation. The simulations were run using different weights for DNA Sudoku and different symmetrical 2D grid sizes for 2D Pooling. Small markers indicate unambiguous results that required only a single round of testing and the larger markers indicate ambiguous results that required a second validation step to correctly identify the positive samples. The grey dashed line is the number of tests required for individual testing.

References

    1. Abdurrahman ST, Mbanaso O, Lawson L, Oladimeji O, Blakiston M, Obasanya J, et al. Testing Pooled Sputum with Xpert MTB/RIF for Diagnosis of Pulmonary Tuberculosis To Increase Affordability in Low-Income Countries. Journal of clinical microbiology. 2015;53(8):2502–2508. 10.1128/JCM.00864-15 - DOI - PMC - PubMed
    1. Ray KJ, Zhou Z, Cevallos V, Chin S, Enanoria W, Lui F, et al. Estimating Community Prevalence of Ocular Chlamydia trachomatis Infection using Pooled Polymerase Chain Reaction Testing. Ophthalmic Epidemiology. 2014;21(2):86–91. 10.3109/09286586.2014.884600 - DOI - PubMed
    1. Stallknecht DE. Impediments to wildlife disease surveillance, research, and diagnostics. Curr Top Microbiol Immunol. 2007;315:445–461. - PubMed
    1. Evaluating and testing persons for coronavirus disease 2019 (COVID-19). Centers for Disease Control and Prevention; 2020. Available from: https://www.cdc.gov/coronavirus/2019-ncov/hcp/clinical-criteria.html.
    1. Dorfman R. The Detection of Defective Members of Large Populations. Annals of Mathematical Statistics. 1943;14(4):436–440. 10.1214/aoms/1177731363 - DOI

Publication types

MeSH terms

Supplementary concepts