Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Oct 30:11:giac100.
doi: 10.1093/gigascience/giac100.

Best genome sequencing strategies for annotation of complex immune gene families in wildlife

Affiliations

Best genome sequencing strategies for annotation of complex immune gene families in wildlife

Emma Peel et al. Gigascience. .

Abstract

Background: The biodiversity crisis and increasing impact of wildlife disease on animal and human health provides impetus for studying immune genes in wildlife. Despite the recent boom in genomes for wildlife species, immune genes are poorly annotated in nonmodel species owing to their high level of polymorphism and complex genomic organisation. Our research over the past decade and a half on Tasmanian devils and koalas highlights the importance of genomics and accurate immune annotations to investigate disease in wildlife. Given this, we have increasingly been asked the minimum levels of genome quality required to effectively annotate immune genes in order to study immunogenetic diversity. Here we set out to answer this question by manually annotating immune genes in 5 marsupial genomes and 1 monotreme genome to determine the impact of sequencing data type, assembly quality, and automated annotation on accurate immune annotation.

Results: Genome quality is directly linked to our ability to annotate complex immune gene families, with long reads and scaffolding technologies required to reassemble immune gene clusters and elucidate evolution, organisation, and true gene content of the immune repertoire. Draft-quality genomes generated from short reads with HiC or 10× Chromium linked reads were unable to achieve this. Despite mammalian BUSCOv5 scores of up to 94.1% amongst the 6 genomes, automated annotation pipelines incorrectly annotated up to 59% of manually annotated immune genes regardless of assembly quality or method of automated annotation.

Conclusions: Our results demonstrate that long reads and scaffolding technologies, alongside manual annotation, are required to accurately study the immune gene repertoire of wildlife species.

Keywords: MHC; annotation; disease; genome; immune gene; quality; wildlife.

PubMed Disclaimer

Conflict of interest statement

The authors declare that they have no competing interests.

Figures

Figure 1:
Figure 1:
Percentage overlap of genomic coordinates between manual and automated annotations of immune genes in 6 genomes. *Denotes automated annotation by NCBI and ^denotes automated annotation by MAKER. The remaining genomes were annotated using Fgenesh++. Colours indicate proportion of immune genes with 0% to 100% overlap between manual and automated annotations, with 0 indicating manually annotated genes with no overlap of genomic coordinates with the automated annotation.
Figure 2:
Figure 2:
L50 and L90 immune gene metric for 7 genomes from 6 species, compared to log10 contig N50.
Figure 3:
Figure 3:
Genomic organisation and gene content of the LRC (A) and MHC region (B) in 6 genomes. The number of genes within each cluster is given, as well as scaffold counts of orphan genes (genes on single scaffolds). In A, LRC genes are purple, and extended LRC genes are teal. In B, MHC class I genes are red, class II blue, class III green, extended class I pink, extended class II yellow, and framework genes orange. Large distances between genes are given below the scaffold; otherwise, the distance between genes and/or clusters was within the expected range for each family. Figure created with BioRender.com.
Figure 4:
Figure 4:
Impact of different sequencing technologies on the assembly of immune gene clusters such as the MHC. The impact of long-read (A—platypus, koala and woylie), short-read (B—wombat), and 10× Chromium linked read (C—antechinus and numbat) sequencing technologies, alone or in combination with HiC scaffolding (i—koala & platypus; ii—wombat), on the assembly of complex and repetitive immune gene clusters such as the MHC. Colour gradient represents gene orientation. (A) Long-read sequencing generates reads that span complex and repetitive sequences, resulting in long contigs and scaffolds that contain multiple immune genes with complete coding sequences. (B) Short-read sequencing generated reads that are unable to span immune genes; hence, reads are assembled into multiple short contigs that end when the algorithm is unable to assemble a repetitive and complex immune gene sequence. (C) In linked-read sequencing, individual DNA molecules are partitioned into gel beads and identical barcodes attached, then sequenced using short-read technology, resulting in read clouds [103]. As no individual read within the cloud spans the entire length of the DNA molecule, the algorithm is unable to assemble repetitive and complex sequences, resulting in multiple short contigs similar to a short-read assembly. Short contigs in B and C result in fragmentation of immune genes, leading to false pseudogenization and “missing” genes. (i) HiC sequencing provides contact information for DNA sequences located in close proximity within the nucleus, as frequency decreases with increasing linear distance within the genome assembly [104]. This contact information can be used to cluster, order, and orient contigs into chromosome-size scaffolds [105]. Long contigs scaffolded with HiC result in near-complete reconstruction of immune gene clusters. (ii) Short contigs scaffolded with HiC generate what appears to be long scaffolds, but complex immune gene clusters are incomplete. As multiple HiC contacts can span the length of the contig, the correct contig orientation is not apparent, leading to inversions and misplaced contigs during scaffolding. This leads to incorrect orientation of genes, which can cause pseudogenization and/or gene fragmentation. Manual immune gene annotation reveals that the true gene complement of the immune cluster is not contained within the scaffolded sequence. Figure created with BioRender.com.

Similar articles

Cited by

References

    1. Diaz S, Settle J, Brondizio ES, et al. Summary for policymakers of the global assessment report on biodiversity and ecosystem services of the Intergovernmental Science-Policy Platform on Biodiversity and Ecosystem Services. Bonn, Germany: IPBES; 2019.
    1. Scheele BC, Pasmans F, Skerratt LF, et al. Amphibian fungal panzootic causes catastrophic and ongoing loss of biodiversity. Science. 2019;363(6434):1459. - PubMed
    1. Hoyt JR, Kilpatrick AM, Langwig KE. Ecology and impacts of white-nose syndrome on bats. Nat Rev Microbiol. 2021;19(3):196–210. - PubMed
    1. Woods GM, Lyons AB, Bettiol SS. A devil of a transmissible cancer. Trop Med Infect Dis. 2020;5(2):50. - PMC - PubMed
    1. Rohr JR, Civitello DJ, Halliday FW, et al. Towards common ground in the biodiversity–disease debate. Nat Ecol Evol. 2020;4(1):24–33. - PMC - PubMed

Publication types