Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2018 May 8;19(1):332.
doi: 10.1186/s12864-018-4703-0.

Characterization and remediation of sample index swaps by non-redundant dual indexing on massively parallel sequencing platforms

Affiliations

Characterization and remediation of sample index swaps by non-redundant dual indexing on massively parallel sequencing platforms

Maura Costello et al. BMC Genomics. .

Abstract

Background: Here we present an in-depth characterization of the mechanism of sequencer-induced sample contamination due to the phenomenon of index swapping that impacts Illumina sequencers employing patterned flow cells with Exclusion Amplification (ExAmp) chemistry (HiSeqX, HiSeq4000, and NovaSeq). We also present a remediation method that minimizes the impact of such swaps.

Results: Leveraging data collected over a two-year period, we demonstrate the widespread prevalence of index swapping in patterned flow cell data. We calculate mean swap rates across multiple sample preparation methods and sequencer models, demonstrating that different library methods can have vastly different swapping rates and that even non-ExAmp chemistry instruments display trace levels of index swapping. We provide methods for eliminating sample data cross contamination by utilizing non-redundant dual indexing for complete filtering of index swapped reads, and share the sequences for 96 non-combinatorial dual indexes we have validated across various library preparation methods and sequencer models. Finally, using computational methods we provide a greater insight into the mechanism of index swapping.

Conclusions: Index swapping in pooled libraries is a prevalent phenomenon that we observe at a rate of 0.2 to 6% in all sequencing runs on HiSeqX, HiSeq 4000/3000, and NovaSeq. Utilizing non-redundant dual indexing allows for the removal (flagging/filtering) of these swapped reads and eliminates swapping induced sample contamination, which is critical for sensitive applications such as RNA-seq, single cell, blood biopsy using circulating tumor DNA, or clinical sequencing.

Keywords: Barcodes; Exclusion amplification; ILLUMINA sequencing; Index; Index hopping; Index swapping; Indexes; Massively parallel sequencing; Multiplexing; Next generation sequencing.

PubMed Disclaimer

Conflict of interest statement

Ethics approval and consent to participate

Only sequencing metric values automatically calculated by the Picard analysis pipeline (% contamination, etc.) and library index read data (% demultiplexed reads, % index swapping, etc.) were examined. For E. coli mixture study, human DNA used was obtained from Coriell Biorepository (NA12878) and was consented for research.

Competing interests

The authors declare that they have no competing interests.

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Figures

Fig. 1
Fig. 1
Percent contamination over time for whole genomes sequenced on HiSeqX. Panel at left: Single indexed PCR-free library contamination by month. Contamination significantly increased when we began 8-plex pooling and worsened as we introduced 24-plex pooling. Panel at right: Single indexed PCR-plus library contamination by month. Although overall contamination was lower for PCR-plus, rates did increase significantly as well when we began pooling
Fig. 2
Fig. 2
Contamination for single versus dual indexed pooled PCR-free libraries on HiSeqX. Percent contamination month by month continuous run chart as measured by VerifyBamID [3] for 24-plexed PCR-free genomes, demonstrating the drop in mean contamination after implementation of unique dual indexing. Red reference line is the mean, green reference lines are upper and lower control limits of the data generated by JMP statistical software
Fig. 3
Fig. 3
Index swapping leads to incorrect assignment of reads from fusion transcripts in cell line RNA-seq data. Counts of reads spanning fusion transcripts for 5 different gene fusions in 3 different cell lines using STAR-Fusion software. Four RNA-seq libraries were pooled for each cell line for a total of 12 libraries, and sequenced on a HiSeq 4000 lane. Only the K562 cell line should contain the BCR—ABL1 translocation, however reads containing BCR—ABL1 (blue and black striped) were also found in data files for the other two cell lines due to index swapping
Fig. 4
Fig. 4
Variability of index swap rates from pool to pool and flow cell to flow cell. Index swapping rates plotted for seven 24-plex pools, each sequenced on at least two HiSeqX flow cells and prepared using identical automated methods on a Hamilton MiniStar. Each data point represents a flow cell lane. The data shows variability between different pools, but also variability for the same pools run on different flow cells, indicating that flow cell and/or ExAmp reagents also influence swap rate variability
Fig. 5
Fig. 5
Characterization of index swapping mechanism. a Diagram of a HiSeqX flow cell lane colored by number of index swaps detected at each surface tile, showing relatively uniform distribution of swapping across the entire lane and both surfaces. b Read counts for all 36 index combinations in a 6-plex pool of uniquely dual indexed libraries. The combinations in heavy bordered cells with blue text along the diagonal are the correct index combinations; read counts for all other combinations are due to index swapping. Note all indexes participate in swapping relatively equally. c Mean insert size (bp) and percent chimerism calculated by Picard for both swapped and non-swapped reads. Swapped reads have shorter inserts and higher rates of chimeric read pairs. d Normalized human coverage across GC content bins, indicating there are less high GC reads in the swapped population (blue) compared to non-swapped (red) and all other non-demultiplexed (green) populations

References

    1. Shen MR, Boutell JM, Stephens KM, Ronaghi M, Gunderson K, Venkatesan BM, Bowen MS, Vijayan K. Kinetic exclusion amplification of nucleic acid libraries. USPTO 20160053310:A1. US Patent, filed October 9, 2015, and issued February 25, 2016.
    1. Illumina, Inc . Illumina HiSeqX series specification sheet. 2017.
    1. Illumina, Inc . Illumina NovaSeq specification sheet. 2017.
    1. Sinha R, Stanley G, Gulati GS, Ezran C, Travaglini KJ, Wei E, et al. Index switching causes ‘spreading-of-signal’ among multiplexed samples in illumina HiSeq 4000 DNA sequencing. bioRxiv. 2017; 10.1101/125724.
    1. Illumina, Inc . Effects of index Misassignment on multiplexing and downstream analysis. 2017.