. 2020 Dec 18;370(6523):eabc6617.

doi: 10.1126/science.abc6617.

Sequence diversity analyses of an improved rhesus macaque genome enhance its biomedical utility

Wesley C Warren^{1

2

3}, R Alan Harris⁴, Marina Haukness⁵, Ian T Fiddes⁶, Shwetha C Murali^{7

8}, Jason Fernandes⁹, Philip C Dishuck⁷, Jessica M Storer^{10

11}, Muthuswamy Raveendran⁴, LaDeana W Hillier⁷, David Porubsky⁷, Yafei Mao⁷, David Gordon^{7

8}, Mitchell R Vollger⁷, Alexandra P Lewis⁷, Katherine M Munson⁷, Elizabeth DeVogelaere⁵, Joel Armstrong⁵, Mark Diekhans⁵, Jerilyn A Walker¹⁰, Chad Tomlinson¹², Tina A Graves-Lindsay¹², Milinn Kremitzki¹², Sofie R Salama⁹, Peter A Audano⁷, Merly Escalona⁹, Nicholas W Maurer⁹, Francesca Antonacci¹³, Ludovica Mercuri¹³, Flavia A M Maggiolini¹³, Claudia Rita Catacchio¹³, Jason G Underwood¹⁴, David H O'Connor¹⁵, Ashley D Sanders¹⁶, Jan O Korbel¹⁶, Betsy Ferguson¹⁷, H Michael Kubisch¹⁸, Louis Picker¹⁹, Ned H Kalin²⁰, Douglas Rosene²¹, Jon Levine^{22

23}, David H Abbott^{23

24}, Stanton B Gray²⁵, Mar M Sanchez^{26

27}, Zsofia A Kovacs-Balint²⁶, Joseph W Kemnitz^{23

28}, Sara M Thomasy^{29

30}, Jeffrey A Roberts³¹, Erin L Kinnally^{31

32}, John P Capitanio^{31

32}, J H Pate Skene³³, Michael Platt³⁴, Shelley A Cole³⁵, Richard E Green⁹, Mario Ventura¹³, Roger W Wiseman¹⁵, Benedict Paten⁵, Mark A Batzer¹⁰, Jeffrey Rogers³⁶, Evan E Eichler^{37

8}

Affiliations

¹ Department of Animal Sciences, Bond Life Sciences Center, University of Missouri, Columbia, MO 65211, USA. warrenwc@missouri.edu jr13@bcm.edu eee@gs.washington.edu.
² Department of Surgery, School of Medicine, University of Missouri, Columbia, MO 65211, USA.
³ Institute of Data Science and Informatics, University of Missouri, Columbia, MO 65211, USA.
⁴ Human Genome Sequencing Center, Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA.
⁵ Computational Genomics Laboratory, University of California-Santa Cruz, Santa Cruz, CA 95064, USA.
⁶ Inscripta Inc., Boulder, CO 80301, USA.
⁷ Department of Genome Sciences, University of Washington, Seattle, WA 98195, USA.
⁸ Howard Hughes Medical Institute, University of Washington, Seattle, WA 98195, USA.
⁹ Department of Biomolecular Engineering, University of California-Santa Cruz, Santa Cruz, CA 95064, USA.
¹⁰ Department of Biological Sciences, Louisiana State University, Baton Rouge, LA 70803, USA.
¹¹ Institue for Systems Biology, Seattle, WA 98109, USA.
¹² McDonnell Genome Institute, Washington University, St. Louis, MO 63108, USA.
¹³ Department of Biology, University of Bari 'Aldo Moro', 70125 Bari, Italy.
¹⁴ Pacific Biosciences of California, Seattle, WA 94025, USA.
¹⁵ Department of Pathology and Laboratory Medicine, Wisconsin National Primate Research Center, University of Wisconsin-Madison, Madison, WI 53711, USA.
¹⁶ European Molecular Biology Laboratory, Genome Biology Unit, Heidelberg, Germany.
¹⁷ Division of Genetics, Oregon National Primate Research Center, Oregon Health and Science University, Beaverton, OR 97006, USA.
¹⁸ Tulane National Primate Research Center, Covington, LA 70433, USA.
¹⁹ Oregon National Primate Research Center and Vaccine and Gene Therapy Institute, Oregon Health Sciences University, Beaverton, OR 97006, USA.
²⁰ Department of Psychiatry, University of Wisconsin School of Medicine and Public Health, Madison, WI 53719, USA.
²¹ Department of Anatomy and Neurobiology, Boston University School of Medicine, Boston, MA 02118, USA.
²² Department of Neuroscience, University of Wisconsin, Madison, WI 53175, USA.
²³ Wisconsin National Primate Research Center, University of Wisconsin, Madison, WI 53171, USA.
²⁴ Department of Obstetrics and Gynecology, Wisconsin National Primate Research Center, University of Wisconsin, Madison, WI 53715, USA.
²⁵ The University of Texas MD Anderson Cancer Center, Michale E. Keeling Center for Comparative Medicine and Research, Bastrop, TX 78602, USA.
²⁶ Yerkes National Primate Research Center, Atlanta, GA 30329, USA.
²⁷ Department of Psychiatry and Behavioral Sciences, Emory University School of Medicine, Atlanta, GA 30329, USA.
²⁸ Department of Cell and Regenerative Biology, University of Wisconsin, Madison, WI 53706, USA.
²⁹ Department of Surgical and Radiological Sciences, School of Veterinary Medicine, University of California-Davis, Davis, CA 95616, USA.
³⁰ Department of Ophthalmology and Vision Science, School of Medicine, University of California-Davis, Davis, CA 95817, USA.
³¹ California National Primate Research Center, Davis, CA 95616, USA.
³² Department of Psychology, University of California, Davis, CA 95616, USA.
³³ Department of Neurobiology, Duke University School of Medicine, Durham, NC 27710, USA.
³⁴ Department of Neuroscience, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA.
³⁵ Population Health Program, Texas Biomedical Research Institute and Southwest National Primate Research Center, San Antonio, TX 78227, USA.
³⁶ Human Genome Sequencing Center, Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA. warrenwc@missouri.edu jr13@bcm.edu eee@gs.washington.edu.
³⁷ Department of Genome Sciences, University of Washington, Seattle, WA 98195, USA. warrenwc@missouri.edu jr13@bcm.edu eee@gs.washington.edu.

PMID: 33335035
PMCID: PMC7818670
DOI: 10.1126/science.abc6617

Sequence diversity analyses of an improved rhesus macaque genome enhance its biomedical utility

Wesley C Warren et al. Science. 2020.

. 2020 Dec 18;370(6523):eabc6617.

doi: 10.1126/science.abc6617.

Authors

Affiliations

¹ Department of Animal Sciences, Bond Life Sciences Center, University of Missouri, Columbia, MO 65211, USA. warrenwc@missouri.edu jr13@bcm.edu eee@gs.washington.edu.
² Department of Surgery, School of Medicine, University of Missouri, Columbia, MO 65211, USA.
³ Institute of Data Science and Informatics, University of Missouri, Columbia, MO 65211, USA.
⁴ Human Genome Sequencing Center, Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA.
⁵ Computational Genomics Laboratory, University of California-Santa Cruz, Santa Cruz, CA 95064, USA.
⁶ Inscripta Inc., Boulder, CO 80301, USA.
⁷ Department of Genome Sciences, University of Washington, Seattle, WA 98195, USA.
⁸ Howard Hughes Medical Institute, University of Washington, Seattle, WA 98195, USA.
⁹ Department of Biomolecular Engineering, University of California-Santa Cruz, Santa Cruz, CA 95064, USA.
¹⁰ Department of Biological Sciences, Louisiana State University, Baton Rouge, LA 70803, USA.
¹¹ Institue for Systems Biology, Seattle, WA 98109, USA.
¹² McDonnell Genome Institute, Washington University, St. Louis, MO 63108, USA.
¹³ Department of Biology, University of Bari 'Aldo Moro', 70125 Bari, Italy.
¹⁴ Pacific Biosciences of California, Seattle, WA 94025, USA.
¹⁵ Department of Pathology and Laboratory Medicine, Wisconsin National Primate Research Center, University of Wisconsin-Madison, Madison, WI 53711, USA.
¹⁶ European Molecular Biology Laboratory, Genome Biology Unit, Heidelberg, Germany.
¹⁷ Division of Genetics, Oregon National Primate Research Center, Oregon Health and Science University, Beaverton, OR 97006, USA.
¹⁸ Tulane National Primate Research Center, Covington, LA 70433, USA.
¹⁹ Oregon National Primate Research Center and Vaccine and Gene Therapy Institute, Oregon Health Sciences University, Beaverton, OR 97006, USA.
²⁰ Department of Psychiatry, University of Wisconsin School of Medicine and Public Health, Madison, WI 53719, USA.
²¹ Department of Anatomy and Neurobiology, Boston University School of Medicine, Boston, MA 02118, USA.
²² Department of Neuroscience, University of Wisconsin, Madison, WI 53175, USA.
²³ Wisconsin National Primate Research Center, University of Wisconsin, Madison, WI 53171, USA.
²⁴ Department of Obstetrics and Gynecology, Wisconsin National Primate Research Center, University of Wisconsin, Madison, WI 53715, USA.
²⁵ The University of Texas MD Anderson Cancer Center, Michale E. Keeling Center for Comparative Medicine and Research, Bastrop, TX 78602, USA.
²⁶ Yerkes National Primate Research Center, Atlanta, GA 30329, USA.
²⁷ Department of Psychiatry and Behavioral Sciences, Emory University School of Medicine, Atlanta, GA 30329, USA.
²⁸ Department of Cell and Regenerative Biology, University of Wisconsin, Madison, WI 53706, USA.
²⁹ Department of Surgical and Radiological Sciences, School of Veterinary Medicine, University of California-Davis, Davis, CA 95616, USA.
³⁰ Department of Ophthalmology and Vision Science, School of Medicine, University of California-Davis, Davis, CA 95817, USA.
³¹ California National Primate Research Center, Davis, CA 95616, USA.
³² Department of Psychology, University of California, Davis, CA 95616, USA.
³³ Department of Neurobiology, Duke University School of Medicine, Durham, NC 27710, USA.
³⁴ Department of Neuroscience, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA.
³⁵ Population Health Program, Texas Biomedical Research Institute and Southwest National Primate Research Center, San Antonio, TX 78227, USA.
³⁶ Human Genome Sequencing Center, Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA. warrenwc@missouri.edu jr13@bcm.edu eee@gs.washington.edu.
³⁷ Department of Genome Sciences, University of Washington, Seattle, WA 98195, USA. warrenwc@missouri.edu jr13@bcm.edu eee@gs.washington.edu.

PMID: 33335035
PMCID: PMC7818670
DOI: 10.1126/science.abc6617

Abstract

The rhesus macaque (Macaca mulatta) is the most widely studied nonhuman primate (NHP) in biomedical research. We present an updated reference genome assembly (Mmul_10, contig N50 = 46 Mbp) that increases the sequence contiguity 120-fold and annotate it using 6.5 million full-length transcripts, thus improving our understanding of gene content, isoform diversity, and repeat organization. With the improved assembly of segmental duplications, we discovered new lineage-specific genes and expanded gene families that are potentially informative in studies of evolution and disease susceptibility. Whole-genome sequencing (WGS) data from 853 rhesus macaques identified 85.7 million single-nucleotide variants (SNVs) and 10.5 million indel variants, including potentially damaging variants in genes associated with human autism and developmental delay, providing a framework for developing noninvasive NHP models of human disease.

PubMed Disclaimer

Conflict of interest statement

Competing interests: The authors declare no competing financial interests.

Figures

**Figure 1.. Rhesus macaque genome assembly quality and contiguity.**
(A) The number of gaps and contig N50 lengths are compared among mammalian genomes (color-coded based on sequencing technology). The contiguity of macaque (Mmul_10) is comparable to human (GRCh38) and mouse (GRCm38.p6) reference genomes. (B) The number of gaps (red ticks) are compared against a synteny plot of Chinese (rheMacS) and Indian (Mmul_10) macaque chromosome 3 assemblies. (C) Comparison of potential orientation misassemblies based on Strand-seq analysis (18). Mmul_10 shows far fewer (yellow; n = 13) inversions when compared to an earlier macaque assembly, Mmul_8.0.1 (blue; n = 82), predicting 34 times less misoriented bases; 99.7% of gaps (white rectangles) in the earlier assembly are now closed.

**Figure 2.. Novel genes and gene models.**
(A) A novel gene model with homology to the cytochrome p450 protein family is predicted by the AugustusPB mode of the CAT. The gene structure and protein domain architecture of three isoforms are shown (top). The predictions arose from supporting Iso-Seq reads from five tissues (middle). Orthologous novel genes are also predicted in marmoset, orangutan, and gorilla assemblies; a protein alignment (bottom) of those genes along with a human CYP2C18 protein is shown. (B) Two macaque isoforms in *ELN* (tropoelastin) are predicted by the AugustusPB mode of CAT and are supported by macaque Iso-Seq data but differ significantly from human by two exons. The gene structure and functional domains for the last seven exons of this gene are shown (top), along with a comparison to a human transcript model. These two protein-encoding exons are also observed in marmoset, owl monkey, and mouse, but not in apes, as a result of an ape-specific deletion (bottom) that changed the gene structure of tropoelastin.

**Figure 3.. Macaque *ZNF669* gene family expansion.**
(A) A 68 kbp region of collapsed assembly corresponding to the *ZNF669* gene family as indicated by the excess read depth and increased number of paralogous sequence variants (PSVs, red dots) that are diverged when compared to the consensus sequence (black dots). The highly identical copies were, thus, unresolved in Mmul_10 and predicted to be present in about 50 copies in macaque (left). Segmental Duplication Assembler (SDA) partitions the long reads into 19 distinct paralog clusters (colored and numbered) based on shared PSVs and assembled these clusters into 18 contigs. Vertices reflect individual PSVs and edges represent long-read sequences that contain both of the connected PSVs (right). SDA partitioned and assembled the remaining *ZNF669* collapses into 35 additional contigs. The outlined PSV cluster corresponds to contig 2 in panel B. (B) Mapping of FLNC transcripts shows they align better to SDA-resolved contigs than the original assembly. (C) Annotation of these genes shows that these three contigs encode a highly expanded *ZNF669* gene family where there is FLNC data supporting complete open reading frames that differ by only a few amino acids. (D) FISH with BAC CH250–540H16 as a probe corresponding to a *ZNF669* locus demonstrate interchromosomal duplications (red) on interphase nucleus (right) and metaphase chromosomes (left), labeled by chromosome (human syntenic chromosome in parentheses).

**Figure 4.. Full-length LINE1 analyses.**
A LINE1 subfamily network analysis comparing (A) an earlier macaque assembly, Mmul_8.0.1 (61 subfamilies), to (B) the new assembly, Mmul_10 (58 subfamilies) (18). Related subfamilies are connected by lines and clustered by color: L1RS37 (purple), L1PA7/8 (blue), L1PA6 (green), L1RS36 (pink), L1RS2 (red), L1RS10/16/21 (orange), and L1RS25 (teal). The size of each node corresponds to the relative number of LINE1 elements. There is an increase in annotated younger elements (orange) although the number of subfamilies has decreased the L1RS36 cluster as a result of reassignment based on a higher-quality assembly. (C) The plot depicts the number of full-length L1 elements (y-axis) that have been assigned to a new chromosome in Mmul_10 (key) when compared to Mmul_8.0.1 (x-axis). (D) A similar analysis depicting the number of full-length LINE1 elements previously unplaced (n = 92) but now assigned to a chromosomal location in Mmul_10.

**Figure 5.. Evolution of L1RS elements.**
(A) (Left) All full-length L1RS elements (>6000 nt, top schematic) were grouped by families and mapped to a consensus version of L1PA5 (the ancestral LINE-1 element from which they derive) with the first 700 nt (red) of the 5’ UTR analyzed further. Site 1 (brown) experiences a coverage drop that is found in the majority of L1RS16 and younger families. Coverage drops at Site 2 (blue) and Site 3 (yellow) occur in the L1RS21 family at nearly the same time. (Right) Percentage of individual instances that do not map to the L1PA5 consensuses for each L1RS family. Coverage drops are not found in old L1RS elements but found in nearly all young elements, suggesting a fitness advantage for the changes at each site. (B) (Left) All full-length elements (>6000 nt) of the youngest L1RS families in four OWM genomes (L1RS10 in Rrox_v1/rhiRox1 [golden snub-nosed monkey] and L1RS2 in Panu_3.0/papAnu4 [baboon], Macaca_fascicularis_5.0/macFas5 [crab-eating macaque], and Mmul_10/rheMac10) were aligned to the L1PA5 consensus to generate coverage plots. The youngest human L1 (L1HS) was also aligned to L1PA5 as an outgroup. Drops in coverage (Site 1, Site 2 and Site 3) were seen in OWM, although golden snub-nosed monkeys (Rrox_v1/rhiRox1) display distinct patterns from other OWM suggesting convergent but distinct changes in the 5’ UTR, possibly to escape repressive elements. (Right) An evolutionary model for shared and convergent changes in L1RS elements. Site 1 changes are shared amongst all OWM while Site 2 and 3 changes experience similar but not exact changes in Rrox_v1/rhiRox1 compared to other OWM. Coverage drops at Sites 1 and 3 are also observed in human while Site 2 changes are OWM specific. (C) Schematic of Site 1, 2, and 3 (brown, blue, yellow) changes on the L1 5’ UTR in representative lineages: human, golden snub-nosed monkey, and rhesus. Rhesus macaque and golden snub-nosed monkey have identical coverage drops at Sites 1 and 2 that arose in the OWM common ancestor; golden snub-nosed monkeys also experience larger changes (larger bars) spanning these sites that most likely occurred after the *Colobinae* divergence as they are not observed in rhesus. Humans experience a unique coverage drop at Site 1 larger than rhesus but smaller than the large golden snub-nosed monkey-specific changes. All three species experience unique changes resulting in differing length elements at Site 3.

**Figure 6.. Rhesus macaque population structure and developing macaque models of disease.**
(A) A 3D principal component analysis (PCA) based of SNVs filtered for missing call rates > 0.05 or major allele frequency (MAF) < 0.1 from sequencing 853 macaque genomes shows clear separation of Chinese (PC1) genomes (red) and a gradient for Cayo macaques (green) with respect to other Indian macaques (PC2). (B) A PCA excluding Chinese and Cayo populations comparing 771 macaques from different NPRCs. The Cattell–Nelson–Gorsuch (CNG) screen test retained the top three principal components in both PCAs and the percent variance explained calculations are based on those three components. (C) Allele frequency distribution of likely gene-disruptive (LGD) including splice acceptor, splice donor, stop gained, stop loss and start loss variants (red) and missense (blue) variants compared to synonymous changes (green). (D) Genes implicated in human neurodevelopmental disorders (NDDs) showing naturally occurring putatively damaging variants in macaque orthologs. A schematic of damaging missense (blue) variants (CADD >= 25) for NDD genes: *MBD5*, *ARID1B*, and *SHANK3*. For each variant, we indicate the amino acid change| CADD score| allele count. All potentially deleterious mutations are low frequency.

See this image and copyright information in PMC

References

1. Bailey JA, Eichler EE, Primate segmental duplications: crucibles of evolution, diversity and disease. Nat Rev Genet 7, 552–564 (2006). - PubMed
1. Rhesus Macaque Genome S et al. , Evolutionary and biomedical insights from the rhesus macaque genome. Science 316, 222–234 (2007). - PubMed
1. Xue C et al. , The population genomics of rhesus macaques (Macaca mulatta) based on whole-genome sequences. Genome Res 26, 1651–1662 (2016). - PMC - PubMed
1. Bimber BN et al. , Whole genome sequencing predicts novel human disease models in rhesus macaques. Genomics 109, 214–220 (2017). - PMC - PubMed
1. Kronenberg ZN et al. , High-resolution comparative analysis of great ape genomes. Science 360, (2018). - PMC - PubMed

Publication types

Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Grants and funding

LinkOut - more resources

Full Text Sources
Other Literature Sources
- The Lens - Patent Citations Database
Miscellaneous
- NCI CPTAC Assay Portal

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Sequence diversity analyses of an improved rhesus macaque genome enhance its biomedical utility

Affiliations

Sequence diversity analyses of an improved rhesus macaque genome enhance its biomedical utility

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

References

Publication types

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources

Miscellaneous