. 2021 Apr;592(7856):737-746.

doi: 10.1038/s41586-021-03451-0. Epub 2021 Apr 28.

Towards complete and error-free genome assemblies of all vertebrate species

Arang Rhie^#¹, Shane A McCarthy^#^{2

3}, Olivier Fedrigo^#⁴, Joana Damas⁵, Giulio Formenti^{4

6}, Sergey Koren¹, Marcela Uliano-Silva^{7

8}, William Chow³, Arkarachai Fungtammasan⁹, Juwan Kim¹⁰, Chul Lee¹⁰, Byung June Ko¹¹, Mark Chaisson¹², Gregory L Gedman⁶, Lindsey J Cantin⁶, Francoise Thibaud-Nissen¹³, Leanne Haggerty¹⁴, Iliana Bista^{2

3}, Michelle Smith³, Bettina Haase⁴, Jacquelyn Mountcastle⁴, Sylke Winkler^{15

16}, Sadye Paez^{4

6}, Jason Howard¹⁷, Sonja C Vernes^{18

19

20}, Tanya M Lama²¹, Frank Grutzner²², Wesley C Warren²³, Christopher N Balakrishnan²⁴, Dave Burt²⁵, Julia M George²⁶, Matthew T Biegler⁶, David Iorns²⁷, Andrew Digby²⁸, Daryl Eason²⁸, Bruce Robertson²⁹, Taylor Edwards³⁰, Mark Wilkinson³¹, George Turner³², Axel Meyer³³, Andreas F Kautt^{33

34}, Paolo Franchini³³, H William Detrich 3rd³⁵, Hannes Svardal^{36

37}, Maximilian Wagner³⁸, Gavin J P Naylor³⁹, Martin Pippel^{15

40}, Milan Malinsky^{3

41}, Mark Mooney⁴², Maria Simbirsky⁹, Brett T Hannigan⁹, Trevor Pesout⁴³, Marlys Houck⁴⁴, Ann Misuraca⁴⁴, Sarah B Kingan⁴⁵, Richard Hall⁴⁵, Zev Kronenberg⁴⁵, Ivan Sović^{45

46}, Christopher Dunn⁴⁵, Zemin Ning³, Alex Hastie⁴⁷, Joyce Lee⁴⁷, Siddarth Selvaraj⁴⁸, Richard E Green^{43

49}, Nicholas H Putnam⁵⁰, Ivo Gut^{51

52}, Jay Ghurye^{49

53}, Erik Garrison⁴³, Ying Sims³, Joanna Collins³, Sarah Pelan³, James Torrance³, Alan Tracey³, Jonathan Wood³, Robel E Dagnew¹², Dengfeng Guan^{2

54}, Sarah E London⁵⁵, David F Clayton⁵⁶, Claudio V Mello⁵⁷, Samantha R Friedrich⁵⁷, Peter V Lovell⁵⁷, Ekaterina Osipova^{15

40

58}, Farooq O Al-Ajli^{59

60

61}, Simona Secomandi⁶², Heebal Kim^{10

11

63}, Constantina Theofanopoulou⁶, Michael Hiller^{64

65

66}, Yang Zhou⁶⁷, Robert S Harris⁶⁸, Kateryna D Makova^{68

69

70}, Paul Medvedev^{69

70

71

72}, Jinna Hoffman¹³, Patrick Masterson¹³, Karen Clark¹³, Fergal Martin¹⁴, Kevin Howe¹⁴, Paul Flicek¹⁴, Brian P Walenz¹, Woori Kwak^{63

73}, Hiram Clawson⁴³, Mark Diekhans⁴³, Luis Nassar⁴³, Benedict Paten⁴³, Robert H S Kraus^{33

74}, Andrew J Crawford⁷⁵, M Thomas P Gilbert^{76

77}, Guojie Zhang^{78

79

80

81}, Byrappa Venkatesh⁸², Robert W Murphy⁸³, Klaus-Peter Koepfli⁸⁴, Beth Shapiro^{85

86}, Warren E Johnson^{84

87

88}, Federica Di Palma⁸⁹, Tomas Marques-Bonet^{90

91

92

93}, Emma C Teeling⁹⁴, Tandy Warnow⁹⁵, Jennifer Marshall Graves⁹⁶, Oliver A Ryder^{44

97}, David Haussler^{43

85}, Stephen J O'Brien^{98

99}, Jonas Korlach⁴⁵, Harris A Lewin^{5

100

101}, Kerstin Howe¹⁰², Eugene W Myers^{103

104

105}, Richard Durbin^{106

107}, Adam M Phillippy¹⁰⁸, Erich D Jarvis^{109

110

111}

Affiliations

¹ Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA.
² Department of Genetics, University of Cambridge, Cambridge, UK.
³ Wellcome Sanger Institute, Cambridge, UK.
⁴ Vertebrate Genome Lab, The Rockefeller University, New York, NY, USA.
⁵ The Genome Center, University of California Davis, Davis, CA, USA.
⁶ Laboratory of Neurogenetics of Language, The Rockefeller University, New York, NY, USA.
⁷ Leibniz Institute for Zoo and Wildlife Research, Department of Evolutionary Genetics, Berlin, Germany.
⁸ Berlin Center for Genomics in Biodiversity Research, Berlin, Germany.
⁹ DNAnexus Inc., Mountain View, CA, USA.
¹⁰ Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul, Republic of Korea.
¹¹ Department of Agricultural Biotechnology and Research Institute of Agriculture and Life Sciences, Seoul National University, Seoul, Republic of Korea.
¹² University of Southern California, Los Angeles, CA, USA.
¹³ National Center for Biotechnology Information, National Library of Medicine, NIH, Bethesda, MD, USA.
¹⁴ European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, UK.
¹⁵ Max Planck Institute of Molecular Cell Biology and Genetics, Dresden, Germany.
¹⁶ DRESDEN-concept Genome Center, Dresden, Germany.
¹⁷ Novogene, Durham, NC, USA.
¹⁸ Neurogenetics of Vocal Communication Group, Max Planck Institute for Psycholinguistics, Nijmegen, The Netherlands.
¹⁹ Donders Institute for Brain, Cognition and Behaviour, Nijmegen, The Netherlands.
²⁰ School of Biology, University of St Andrews, St Andrews, UK.
²¹ University of Massachusetts Cooperative Fish and Wildlife Research Unit, Amherst, MA, USA.
²² School of Biological Science, The Environment Institute, University of Adelaide, Adelaide, South Australia, Australia.
²³ Bond Life Sciences Center, University of Missouri, Columbia, MO, USA.
²⁴ Department of Biology, East Carolina University, Greenville, NC, USA.
²⁵ UQ Genomics, University of Queensland, Brisbane, Queensland, Australia.
²⁶ Department of Biological Sciences, Clemson University, Clemson, SC, USA.
²⁷ The Genetic Rescue Foundation, Wellington, New Zealand.
²⁸ Kākāpō Recovery, Department of Conservation, Invercargill, New Zealand.
²⁹ Department of Zoology, University of Otago, Dunedin, New Zealand.
³⁰ University of Arizona Genetics Core, Tucson, AZ, USA.
³¹ Department of Life Sciences, Natural History Museum, London, UK.
³² School of Natural Sciences, Bangor University, Gwynedd, UK.
³³ Department of Biology, University of Konstanz, Konstanz, Germany.
³⁴ Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, MA, USA.
³⁵ Department of Marine and Environmental Sciences, Northeastern University Marine Science Center, Nahant, MA, USA.
³⁶ Department of Biology, University of Antwerp, Antwerp, Belgium.
³⁷ Naturalis Biodiversity Center, Leiden, The Netherlands.
³⁸ Institute of Biology, Karl-Franzens University of Graz, Graz, Austria.
³⁹ Florida Museum of Natural History, University of Florida, Gainesville, FL, USA.
⁴⁰ Center for Systems Biology, Dresden, Germany.
⁴¹ Zoological Institute, University of Basel, Basel, Switzerland.
⁴² Tag.bio, San Francisco, CA, USA.
⁴³ UC Santa Cruz Genomics Institute, University of California, Santa Cruz, CA, USA.
⁴⁴ San Diego Zoo Global, Escondido, CA, USA.
⁴⁵ Pacific Biosciences, Menlo Park, CA, USA.
⁴⁶ Digital BioLogic, Ivanić-Grad, Croatia.
⁴⁷ Bionano Genomics, San Diego, CA, USA.
⁴⁸ Arima Genomics, San Diego, CA, USA.
⁴⁹ Dovetail Genomics, Santa Cruz, CA, USA.
⁵⁰ Independent Researcher, Santa Cruz, CA, USA.
⁵¹ CNAG-CRG, Centre for Genomic Regulation, Barcelona Institute of Science and Technology, Barcelona, Spain.
⁵² Universitat Pompeu Fabra, Barcelona, Spain.
⁵³ Department of Computer Science, University of Maryland College Park, College Park, MD, USA.
⁵⁴ School of Computer Science and Technology, Center for Bioinformatics, Harbin Institute of Technology, Harbin, China.
⁵⁵ Department of Psychology, Institute for Mind and Biology, University of Chicago, Chicago, IL, USA.
⁵⁶ Department of Genetics and Biochemistry, Clemson University, Clemson, SC, USA.
⁵⁷ Department of Behavioral Neuroscience, Oregon Health and Science University, Portland, OR, USA.
⁵⁸ Max Planck Institute for the Physics of Complex Systems, Dresden, Germany.
⁵⁹ Monash University Malaysia Genomics Facility, School of Science, Selangor Darul Ehsan, Malaysia.
⁶⁰ Tropical Medicine and Biology Multidisciplinary Platform, Monash University Malaysia, Selangor Darul Ehsan, Malaysia.
⁶¹ Qatar Falcon Genome Project, Doha, Qatar.
⁶² Department of Biosciences, University of Milan, Milan, Italy.
⁶³ eGnome, Inc., Seoul, Republic of Korea.
⁶⁴ LOEWE Centre for Translational Biodiversity Genomics, Frankfurt, Germany.
⁶⁵ Senckenberg Research Institute, Frankfurt, Germany.
⁶⁶ Goethe-University, Faculty of Biosciences, Frankfurt, Germany.
⁶⁷ BGI-Shenzhen, Shenzhen, China.
⁶⁸ Department of Biology, Pennsylvania State University, University Park, PA, USA.
⁶⁹ Center for Medical Genomics, Pennsylvania State University, University Park, PA, USA.
⁷⁰ Center for Computational Biology and Bioinformatics, Pennsylvania State University, University Park, PA, USA.
⁷¹ Department of Computer Science and Engineering, Pennsylvania State University, University Park, PA, USA.
⁷² Department of Biochemistry and Molecular Biology, Pennsylvania State University, University Park, PA, USA.
⁷³ Hoonygen, Seoul, Korea.
⁷⁴ Department of Migration, Max Planck Institute of Animal Behavior, Radolfzell, Germany.
⁷⁵ Department of Biological Sciences, Universidad de los Andes, Bogotá, Colombia.
⁷⁶ Center for Evolutionary Hologenomics, The GLOBE Institute, University of Copenhagen, Copenhagen, Denmark.
⁷⁷ University Museum, NTNU, Trondheim, Norway.
⁷⁸ China National Genebank, BGI-Shenzhen, Shenzhen, China.
⁷⁹ Villum Center for Biodiversity Genomics, Section for Ecology and Evolution, Department of Biology, University of Copenhagen, Copenhagen, Denmark.
⁸⁰ State Key Laboratory of Genetic Resources and Evolution, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming, China.
⁸¹ Center for Excellence in Animal Evolution and Genetics, Chinese Academy of Sciences, Kunming, China.
⁸² Institute of Molecular and Cell Biology, A*STAR, Biopolis, Singapore, Singapore.
⁸³ Centre for Biodiversity, Royal Ontario Museum, Toronto, Ontario, Canada.
⁸⁴ Smithsonian Conservation Biology Institute, Center for Species Survival, National Zoological Park, Washington, DC, USA.
⁸⁵ Department of Ecology and Evolutionary Biology, University of California Santa Cruz, Santa Cruz, CA, USA.
⁸⁶ Howard Hughes Medical Institute, Chevy Chase, MD, USA.
⁸⁷ The Walter Reed Biosystematics Unit, Museum Support Center MRC-534, Smithsonian Institution, Suitland, MD, USA.
⁸⁸ Walter Reed Army Institute of Research, Silver Spring, MD, USA.
⁸⁹ Department of Biological Sciences, Earlham Institute, University of East Anglia, Norwich, UK.
⁹⁰ Institute of Evolutionary Biology (UPF-CSIC), PRBB, Barcelona, Spain.
⁹¹ Catalan Institution of Research and Advanced Studies (ICREA), Barcelona, Spain.
⁹² Centre for Genomic Regulation (CRG), Barcelona Institute of Science and Technology (BIST), Barcelona, Spain.
⁹³ Institut Català de Paleontologia Miquel Crusafont, Universitat Autònoma de Barcelona, Barcelona, Spain.
⁹⁴ School of Biology and Environmental Science, University College Dublin, Dublin, Ireland.
⁹⁵ Department of Computer Science, The University of Illinois at Urbana-Champaign, Urbana, IL, USA.
⁹⁶ School of Life Science, La Trobe University, Melbourne, Victoria, Australia.
⁹⁷ Department of Evolution, Behavior, and Ecology, University of California San Diego, La Jolla, CA, USA.
⁹⁸ Laboratory of Genomics Diversity-Center for Computer Technologies, ITMO University, St. Petersburg, Russian Federation.
⁹⁹ Guy Harvey Oceanographic Center, Halmos College of Natural Sciences and Oceanography, Nova Southeastern University, Fort Lauderdale, FL, USA.
¹⁰⁰ Department of Evolution and Ecology, University of California Davis, Davis, CA, USA.
¹⁰¹ John Muir Institute for the Environment, University of California Davis, Davis, CA, USA.
¹⁰² Wellcome Sanger Institute, Cambridge, UK. kj2@sanger.ac.uk.
¹⁰³ Max Planck Institute of Molecular Cell Biology and Genetics, Dresden, Germany. gene@mpi-cbg.de.
¹⁰⁴ Center for Systems Biology, Dresden, Germany. gene@mpi-cbg.de.
¹⁰⁵ Faculty of Computer Science, Technical University Dresden, Dresden, Germany. gene@mpi-cbg.de.
¹⁰⁶ Department of Genetics, University of Cambridge, Cambridge, UK. rd109@cam.ac.uk.
¹⁰⁷ Wellcome Sanger Institute, Cambridge, UK. rd109@cam.ac.uk.
¹⁰⁸ Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA. adam.phillippy@nih.gov.
¹⁰⁹ Vertebrate Genome Lab, The Rockefeller University, New York, NY, USA. ejarvis@rockefeller.edu.
¹¹⁰ Laboratory of Neurogenetics of Language, The Rockefeller University, New York, NY, USA. ejarvis@rockefeller.edu.
¹¹¹ Howard Hughes Medical Institute, Chevy Chase, MD, USA. ejarvis@rockefeller.edu.

^# Contributed equally.

PMID: 33911273
PMCID: PMC8081667
DOI: 10.1038/s41586-021-03451-0

Towards complete and error-free genome assemblies of all vertebrate species

Arang Rhie et al. Nature. 2021 Apr.

. 2021 Apr;592(7856):737-746.

doi: 10.1038/s41586-021-03451-0. Epub 2021 Apr 28.

Authors

Affiliations

¹ Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA.
² Department of Genetics, University of Cambridge, Cambridge, UK.
³ Wellcome Sanger Institute, Cambridge, UK.
⁴ Vertebrate Genome Lab, The Rockefeller University, New York, NY, USA.
⁵ The Genome Center, University of California Davis, Davis, CA, USA.
⁶ Laboratory of Neurogenetics of Language, The Rockefeller University, New York, NY, USA.
⁷ Leibniz Institute for Zoo and Wildlife Research, Department of Evolutionary Genetics, Berlin, Germany.
⁸ Berlin Center for Genomics in Biodiversity Research, Berlin, Germany.
⁹ DNAnexus Inc., Mountain View, CA, USA.
¹⁰ Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul, Republic of Korea.
¹¹ Department of Agricultural Biotechnology and Research Institute of Agriculture and Life Sciences, Seoul National University, Seoul, Republic of Korea.
¹² University of Southern California, Los Angeles, CA, USA.
¹³ National Center for Biotechnology Information, National Library of Medicine, NIH, Bethesda, MD, USA.
¹⁴ European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, UK.
¹⁵ Max Planck Institute of Molecular Cell Biology and Genetics, Dresden, Germany.
¹⁶ DRESDEN-concept Genome Center, Dresden, Germany.
¹⁷ Novogene, Durham, NC, USA.
¹⁸ Neurogenetics of Vocal Communication Group, Max Planck Institute for Psycholinguistics, Nijmegen, The Netherlands.
¹⁹ Donders Institute for Brain, Cognition and Behaviour, Nijmegen, The Netherlands.
²⁰ School of Biology, University of St Andrews, St Andrews, UK.
²¹ University of Massachusetts Cooperative Fish and Wildlife Research Unit, Amherst, MA, USA.
²² School of Biological Science, The Environment Institute, University of Adelaide, Adelaide, South Australia, Australia.
²³ Bond Life Sciences Center, University of Missouri, Columbia, MO, USA.
²⁴ Department of Biology, East Carolina University, Greenville, NC, USA.
²⁵ UQ Genomics, University of Queensland, Brisbane, Queensland, Australia.
²⁶ Department of Biological Sciences, Clemson University, Clemson, SC, USA.
²⁷ The Genetic Rescue Foundation, Wellington, New Zealand.
²⁸ Kākāpō Recovery, Department of Conservation, Invercargill, New Zealand.
²⁹ Department of Zoology, University of Otago, Dunedin, New Zealand.
³⁰ University of Arizona Genetics Core, Tucson, AZ, USA.
³¹ Department of Life Sciences, Natural History Museum, London, UK.
³² School of Natural Sciences, Bangor University, Gwynedd, UK.
³³ Department of Biology, University of Konstanz, Konstanz, Germany.
³⁴ Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, MA, USA.
³⁵ Department of Marine and Environmental Sciences, Northeastern University Marine Science Center, Nahant, MA, USA.
³⁶ Department of Biology, University of Antwerp, Antwerp, Belgium.
³⁷ Naturalis Biodiversity Center, Leiden, The Netherlands.
³⁸ Institute of Biology, Karl-Franzens University of Graz, Graz, Austria.
³⁹ Florida Museum of Natural History, University of Florida, Gainesville, FL, USA.
⁴⁰ Center for Systems Biology, Dresden, Germany.
⁴¹ Zoological Institute, University of Basel, Basel, Switzerland.
⁴² Tag.bio, San Francisco, CA, USA.
⁴³ UC Santa Cruz Genomics Institute, University of California, Santa Cruz, CA, USA.
⁴⁴ San Diego Zoo Global, Escondido, CA, USA.
⁴⁵ Pacific Biosciences, Menlo Park, CA, USA.
⁴⁶ Digital BioLogic, Ivanić-Grad, Croatia.
⁴⁷ Bionano Genomics, San Diego, CA, USA.
⁴⁸ Arima Genomics, San Diego, CA, USA.
⁴⁹ Dovetail Genomics, Santa Cruz, CA, USA.
⁵⁰ Independent Researcher, Santa Cruz, CA, USA.
⁵¹ CNAG-CRG, Centre for Genomic Regulation, Barcelona Institute of Science and Technology, Barcelona, Spain.
⁵² Universitat Pompeu Fabra, Barcelona, Spain.
⁵³ Department of Computer Science, University of Maryland College Park, College Park, MD, USA.
⁵⁴ School of Computer Science and Technology, Center for Bioinformatics, Harbin Institute of Technology, Harbin, China.
⁵⁵ Department of Psychology, Institute for Mind and Biology, University of Chicago, Chicago, IL, USA.
⁵⁶ Department of Genetics and Biochemistry, Clemson University, Clemson, SC, USA.
⁵⁷ Department of Behavioral Neuroscience, Oregon Health and Science University, Portland, OR, USA.
⁵⁸ Max Planck Institute for the Physics of Complex Systems, Dresden, Germany.
⁵⁹ Monash University Malaysia Genomics Facility, School of Science, Selangor Darul Ehsan, Malaysia.
⁶⁰ Tropical Medicine and Biology Multidisciplinary Platform, Monash University Malaysia, Selangor Darul Ehsan, Malaysia.
⁶¹ Qatar Falcon Genome Project, Doha, Qatar.
⁶² Department of Biosciences, University of Milan, Milan, Italy.
⁶³ eGnome, Inc., Seoul, Republic of Korea.
⁶⁴ LOEWE Centre for Translational Biodiversity Genomics, Frankfurt, Germany.
⁶⁵ Senckenberg Research Institute, Frankfurt, Germany.
⁶⁶ Goethe-University, Faculty of Biosciences, Frankfurt, Germany.
⁶⁷ BGI-Shenzhen, Shenzhen, China.
⁶⁸ Department of Biology, Pennsylvania State University, University Park, PA, USA.
⁶⁹ Center for Medical Genomics, Pennsylvania State University, University Park, PA, USA.
⁷⁰ Center for Computational Biology and Bioinformatics, Pennsylvania State University, University Park, PA, USA.
⁷¹ Department of Computer Science and Engineering, Pennsylvania State University, University Park, PA, USA.
⁷² Department of Biochemistry and Molecular Biology, Pennsylvania State University, University Park, PA, USA.
⁷³ Hoonygen, Seoul, Korea.
⁷⁴ Department of Migration, Max Planck Institute of Animal Behavior, Radolfzell, Germany.
⁷⁵ Department of Biological Sciences, Universidad de los Andes, Bogotá, Colombia.
⁷⁶ Center for Evolutionary Hologenomics, The GLOBE Institute, University of Copenhagen, Copenhagen, Denmark.
⁷⁷ University Museum, NTNU, Trondheim, Norway.
⁷⁸ China National Genebank, BGI-Shenzhen, Shenzhen, China.
⁷⁹ Villum Center for Biodiversity Genomics, Section for Ecology and Evolution, Department of Biology, University of Copenhagen, Copenhagen, Denmark.
⁸⁰ State Key Laboratory of Genetic Resources and Evolution, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming, China.
⁸¹ Center for Excellence in Animal Evolution and Genetics, Chinese Academy of Sciences, Kunming, China.
⁸² Institute of Molecular and Cell Biology, A*STAR, Biopolis, Singapore, Singapore.
⁸³ Centre for Biodiversity, Royal Ontario Museum, Toronto, Ontario, Canada.
⁸⁴ Smithsonian Conservation Biology Institute, Center for Species Survival, National Zoological Park, Washington, DC, USA.
⁸⁵ Department of Ecology and Evolutionary Biology, University of California Santa Cruz, Santa Cruz, CA, USA.
⁸⁶ Howard Hughes Medical Institute, Chevy Chase, MD, USA.
⁸⁷ The Walter Reed Biosystematics Unit, Museum Support Center MRC-534, Smithsonian Institution, Suitland, MD, USA.
⁸⁸ Walter Reed Army Institute of Research, Silver Spring, MD, USA.
⁸⁹ Department of Biological Sciences, Earlham Institute, University of East Anglia, Norwich, UK.
⁹⁰ Institute of Evolutionary Biology (UPF-CSIC), PRBB, Barcelona, Spain.
⁹¹ Catalan Institution of Research and Advanced Studies (ICREA), Barcelona, Spain.
⁹² Centre for Genomic Regulation (CRG), Barcelona Institute of Science and Technology (BIST), Barcelona, Spain.
⁹³ Institut Català de Paleontologia Miquel Crusafont, Universitat Autònoma de Barcelona, Barcelona, Spain.
⁹⁴ School of Biology and Environmental Science, University College Dublin, Dublin, Ireland.
⁹⁵ Department of Computer Science, The University of Illinois at Urbana-Champaign, Urbana, IL, USA.
⁹⁶ School of Life Science, La Trobe University, Melbourne, Victoria, Australia.
⁹⁷ Department of Evolution, Behavior, and Ecology, University of California San Diego, La Jolla, CA, USA.
⁹⁸ Laboratory of Genomics Diversity-Center for Computer Technologies, ITMO University, St. Petersburg, Russian Federation.
⁹⁹ Guy Harvey Oceanographic Center, Halmos College of Natural Sciences and Oceanography, Nova Southeastern University, Fort Lauderdale, FL, USA.
¹⁰⁰ Department of Evolution and Ecology, University of California Davis, Davis, CA, USA.
¹⁰¹ John Muir Institute for the Environment, University of California Davis, Davis, CA, USA.
¹⁰² Wellcome Sanger Institute, Cambridge, UK. kj2@sanger.ac.uk.
¹⁰³ Max Planck Institute of Molecular Cell Biology and Genetics, Dresden, Germany. gene@mpi-cbg.de.
¹⁰⁴ Center for Systems Biology, Dresden, Germany. gene@mpi-cbg.de.
¹⁰⁵ Faculty of Computer Science, Technical University Dresden, Dresden, Germany. gene@mpi-cbg.de.
¹⁰⁶ Department of Genetics, University of Cambridge, Cambridge, UK. rd109@cam.ac.uk.
¹⁰⁷ Wellcome Sanger Institute, Cambridge, UK. rd109@cam.ac.uk.
¹⁰⁸ Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA. adam.phillippy@nih.gov.
¹⁰⁹ Vertebrate Genome Lab, The Rockefeller University, New York, NY, USA. ejarvis@rockefeller.edu.
¹¹⁰ Laboratory of Neurogenetics of Language, The Rockefeller University, New York, NY, USA. ejarvis@rockefeller.edu.
¹¹¹ Howard Hughes Medical Institute, Chevy Chase, MD, USA. ejarvis@rockefeller.edu.

^# Contributed equally.

PMID: 33911273
PMCID: PMC8081667
DOI: 10.1038/s41586-021-03451-0

Abstract

High-quality and complete reference genome assemblies are fundamental for the application of genomics to biology, disease, and biodiversity conservation. However, such assemblies are available for only a few non-microbial species^1-4. To address this issue, the international Genome 10K (G10K) consortium^5,6 has worked over a five-year period to evaluate and develop cost-effective methods for assembling highly accurate and nearly complete reference genomes. Here we present lessons learned from generating assemblies for 16 species that represent six major vertebrate lineages. We confirm that long-read sequencing technologies are essential for maximizing genome quality, and that unresolved complex repeats and haplotype heterozygosity are major sources of assembly error when not handled correctly. Our assemblies correct substantial errors, add missing sequence in some of the best historical reference genomes, and reveal biological discoveries. These include the identification of many false gene duplications, increases in gene sizes, chromosome rearrangements that are specific to lineages, a repeated independent chromosome breakpoint in bat genomes, and a canonical GC-rich pattern in protein-coding genes and their regulatory regions. Adopting these lessons, we have embarked on the Vertebrate Genomes Project (VGP), an international effort to generate high-quality, complete reference genomes for all of the roughly 70,000 extant vertebrate species and to help to enable a new era of discovery across the life sciences.

PubMed Disclaimer

Conflict of interest statement

During the contributing period, B.T.H., M. Simbirsky, A.F. and M. Mooney were employees of DNAnexus Inc. S.B.K., R.H., Z.K., J. Korlach, I.S. and C.D. were full-time employees at Pacific Biosciences, a company developing single-molecule long read sequencing technologies. R.E.G., N.H.P., and J.G. were affiliated with Dovetail Genomics, a company developing genome assembly tools, including Hi-C. I.G. was affiliated with Oxford Nanopore Technologies, a company generating long read sequencing technologies. A.H. and J.L were employees of Bionano Genomics, a company developing optical maps for genome assembly. S. Selvaraj was an employee of Arima Genomics, a company developing Hi-C data for genome assemblies. R.D. is a scientific advisory board member of Dovetail Inc. P. Flicek is a member of the Scientific Advisory Boards of Fabric Genomics, Inc., and Eagle Genomics, Ltd. H.C. receives royalties from the sale of UCSC Genome Browser source code, LiftOver, GBiB, and GBiC licenses to commercial entities. S.K. has received travel funds to speak at symposia organized by Oxford Nanopore. M.D. and L.N. receive royalties from licensing of UCSC Genome Browser. For W.E.J., the content here is not to be construed as the views of the DA or DOD. All other authors declare no competing interests.

Figures

**Fig. 1. Comparative analyses of Anna’s hummingbird genome assemblies with various data types.**
a, Contig NG50 values of the primary pseudo-haplotype. b, Scaffold NG50 values. c, Number of joins (gaps). d, Number of mis-join errors compared with the curated assembly. The curated assembly has no remaining conflicts with the raw data and thus no known mis-joins. *Same as CLR + linked + Opt. + Hi-C, but with contigs generated with an updated FALCON version and earlier Hi-C Salsa version (v2.0 versus v2.2; Supplementary Table 2) for less aggressive contig joining. e, f, Hi-C interaction heat maps before and after manual curation, which identified 34 chromosomes. Grid lines indicate scaffold boundaries. Red arrow, example mis-join that was corrected during curation. g, Karyotype of the identified chromosomes (n = 36 + ZW), consistent with previous findings. h, Correlation between estimated chromosome sizes (in Mb) based on karyotype images in g and assembled scaffolds in Supplementary Table 4 (bCalAna1) on a log–log scale. v1.0, VGP assembly v1.0 pipeline; linked, 10X Genomics linked reads; Hi-C, Hi-C proximity ligation; 1D, 2D, Oxford Nanopore long reads; NRGene, NRGene paired-end Illumina reads; SR, paired-end Illumina short reads.

**Fig. 2. Impact of repeats and heterozygosity on assembly quality.**
a, Correlation between scaffold NG50 and genome size of the curated assemblies. b, Nonlinear correlation between contig NG50 and repeat content, before and after curation. c, Correlation between number of gaps per Gb assembled and repeat content. d, Correlation between primary assembly size relative to estimated genome size (y axis) and genome heterozygosity (x axis), before and after purging of false duplications. Assembly sizes above 100% indicate the presence of false duplications and those below 100% indicate collapsed repeats. e, f, Correlations between genome duplication rate using k-mers (e) and conserved BUSCO vertebrate gene set (f), and genome heterozygosity before and after purging of false duplications. g, h, As in e, f, but with whole-genome repeat content before and after purging of false duplications. Genome size, heterozygosity, and repeat content were estimated from 31-mer counts using GenomeScope, except for the channel bull blenny, as the estimates were unreliable (see Methods). Repeat content was measured by modelling the k-mer multiplicity from sequencing reads. Sequence duplication rates were estimated with Merqury using 21-mers. *P < 0.05; **P < 0.01; ***P < 0.001, of the correlation coefficient: P values and adjusted r² from F-statistics. n = 17 assemblies of 16 species.

**Fig. 3. Improvements to alignments and annotations in VGP assemblies relative to prior references.**
a, b, Average percentage of RNA-seq transcriptome samples (a; n = 44, mean ± s.e.m.) and ATAC–seq genome reads (b; n = 12) that align to the previous and VGP zebra finch assemblies. Unique reads mapped to only one location in the assembly. Total is the sum of unique and multi-mapped reads. P values are from paired t-test. c, d, Total number of coding sequence (CDS) transcripts (full bar) and portion fully supported (inner bar) (c) and the number of RefSeq coding genes annotated as partial (d) in the previous and VGP assemblies using the same input data. e–h, Examples of assembly and associated annotation errors in previous reference assemblies corrected in the new VGP assemblies. See main text for descriptions. i, Gene synteny around the *VTR2C* receptor in the platypus shows completely missing genes (*NUDT16*), truncated and duplicated *ARHGAP4*, and many gaps in the earlier Sanger-based assembly compared with the filled in and expanded gene lengths in the new VGP assembly. Assembly accessions are in Supplementary Table 19.

**Fig. 4. VGP assemblies reveal GC content patterns in protein-coding genes.**
a, Average GC content (n = 14,000–18,000 annotated coding genes; Extended Data Table 2) in VGP assemblies (black) and the percentage of genes with missing sequence in the earlier references (red) based on a Cactus alignment, in 100-bp blocks, 2 kb on either side of all protein-coding genes (left and right), and for UTRs, exons, and introns (middle). b, Average GC content (mean ± s.d. for lineages with more than one species) of the six major vertebrate lineages sequenced, for 30 kb upstream and downstream (in 100-bp blocks, log scale; left and right) and of the UTR, exons, and introns (middle). c, d, Left, specialized expression (arrows) shown by in situ hybridization of *DRD1B* in the zebra finch striatum (c) and *ER81* in the arcopallium (d), from Jarvis et al.; the cerebellum was removed from the ER81 image. Right, ATAC–seq profiles in the GC-rich promoter regions of these genes, showing each gene’s GC content (red is high), the ATAC–seq peaks in striatum (purple) or arcopallium (yellow) neurons, and portions of missing sequence (black) in the previous reference assembly (grey).

**Fig. 5. Chromosome evolution among bats and other vertebrates.**
a, Chromosome synteny maps across the species sequenced based on BUSCO gene alignments. Chromosome sizes (bar lengths) are normalized to genome size, to make visualization easier. Genes (lines) are coloured according to the human chromosome to which they belong; those on human chromosome 6 are highlighted in blue and other chromosomes are in lighter shades. The cladogram is from the TimeTree database. b, Phylogenetic relationship of the mammalian species sequenced and their inferred chromosome EBR rates (breaks per Myr) on different branches. Red, higher rates than average (0.84); blue, lower than average. c, Summary of alignment, gene organization, and functional gene status surrounding a bat interchromosomal EBR involving the homologue of human chromosome 6. End of scaffold (S) or chromosome (Chr.) means that the breakpoint is located at a chromosome arm end; middle means that it is located within a scaffold or chromosome. Scale is relevant for human Chr. 6 only. Actual gene sizes in the non-human species may differ and were drawn to match the annotated human gene sizes for simplicity.

**Extended Data Fig. 1. Assessment of completeness of the Anna’s hummingbird assembly.**
a, b, Steps and NG50 continuity values of the VGP assembly pipeline that gave the highest quality assembly for Anna’s hummingbird (a) and Canada lynx (b) in this study. The specific steps are outlined further in Extended Data Fig. 2a, and Methods. c, Whole-genome alignment of CLR (red), linked reads (green), optical maps (blue), and Hi-C reads (purple) of the Anna’s hummingbird, along with telomere motif (TTAGGG and its reverse complement, yellow) and gaps (grey) using Asset software. For each data type, the first row shows the mapped coverage, and the second shows the number of counts of low coverage or signs of collapsed repeats. Larger chromosomal scaffolds (1–19) have fewer gaps and low coverage or collapsed regions compared with the micro chromosomes (20–33). Chromosomes 14, 15 and 19 of the Anna’s hummingbird were the most structurally reliable scaffolds, having only one gap each with no low-support regions. We defined reliable blocks as those supported by at least two technologies. Reliable blocks excluded regions with structural assembly errors, such as collapsed repeats or unresolved segmental duplications. Low-support regions are those where the reliable blocks row has a peak.

**Extended Data Fig. 2. VGP assembly pipeline applied across multiple species.**
a, Iterative assembly pipeline of sequence data types (coloured as in b) with increasing chromosomal distance. Thin bars, sequence reads; thick black bars, assembled contigs; black bars with space and arcing links, scaffolds; grey bars, gaps placed by previous steps; thick red border, tracking of an example contig in the pipeline. The curation step shows an example of a mis-assembly break identified by sequence coverage (grey, left) and an example of an inversion error (right) detected by the optical map. b, Intra-molecule length distribution of the four data types used to generate the assemblies of 16 vertebrate species, weighted by the fraction of bases in each length bin (log scaled). Molecule length above 1 kb was measured from read length for CLR, estimated molecule coverage for linked reads, raw molecule length for optical maps, and interaction distance for Hi-C reads. For each species, the fragment length distribution of each data type was similar to those for the Anna’s hummingbird, with differences primarily influenced by tissue type, preservation method, and collection or storage conditions (unpublished data).

**Extended Data Fig. 3. Flow charts of assembly pipelines used to generate high-quality assemblies in this study.**
a, Standard VGP assembly pipeline when sequencing data of one individual, that generated the highest quality assemblies: generate primary pseudo-haplotype and alternate haplotype contigs with CLR using FALCON-Unzip; generate scaffolds with linked reads using Scaff10x; break mis-joins and further scaffold with optical maps using Solve; generate chromosome-scale scaffolds with Hi-C reads using Salsa2; fill in gaps and polish base-errors with CLR using Arrow (Pacific BioSciences); perform two or more rounds of short-read polishing with linked reads using FreeBayes; and perform expert manual curation to correct potential assembly errors using gEVAL^, b, Standard VGP trio assembly pipeline when DNA is available for a child and parents. Dashed line indicates that the other haplotype went through the same steps before curation. In addition to the curated assemblies of both haplotypes, a representative haplotype with both sex chromosomes is submitted. c, Mitochondrial assembly pipeline. Figure key applies to a–c. Steps newly introduced in v1.5–v1.6 are highlighted in light blue. c, contigs; p, purged false duplications from primary contigs; q, purged alternate contigs; s, scaffolds; t, polished scaffolds. Further details and instructions are available elsewhere and at https://github.com/VGP/vgp-assembly.

**Extended Data Fig. 4. Relationship between collapses and genomic characteristics.**
a, Correlation between the total number of collapses and percentage repeat content estimated in the submitted curated versions of n = 17 genomes from 16 species. b, Correlation between total number of bases in collapsed regions per Gb and repeat content. c, Correlation between total missing bases collapsed per Gb and repeat content. d, Correlation between total number of genes (coding and non-coding) in the collapsed regions and repeat content. e, Lack of correlation between the average collapsed size and repeat content. f, Lack of correlation between the total number of collapses and percentage heterozygosity. g, Lack of correlation between the total number of collapses per Gb and genome size. Genome size, heterozygosity, and repeat content were estimated from 31-mer counts using GenomeScope. Reported are adjusted r² and P values from F-statistics. h, Cumulative collapsed bases per Gb in each collapse and percentage repeat masked. Each circle is coloured by species with its size relative to the length of the collapse as it appears in the assembly. Collapses above the horizontal bar (>90%) are further classified as collapsed high-copy repeats, and those below the horizontal bar are classified as segmental duplications (low-copy repeats). i, Major repeat types in collapsed high-copy repeats. Most of the repeats were masked only with WindowMasker, with no annotation available by RepeatMasker. j, Minor repeat types in collapsed repeats. This is a breakdown of the repeats categorized as ‘Others’ in i, owing to the smaller scale. Bar colours in i and j are as in h. Note smaller scale in j compared with i. Collapsed satellite arrays were almost exclusively found in the platypus, comprising ~2.5 Mb. Collapsed simple repeats were the major source in the thorny skate (~400 kb). There was a higher proportion of LTRs in birds, LINEs and SINEs in mammals, and DNA repeats in the amphibian. Among the genes in the collapses, many were repetitive short non-coding RNAs. P values from F-statistics.

**Extended Data Fig. 5. False duplication mechanisms in genome assembly.**
a, False heterotype (haplotype) duplications occurs when more divergent sequence reads from each haplotype A (blue) and B (red) (maternal and paternal) form greater divergent paths in the assembly graph (bubbles), while nearly identical homozygous sequences (black) become collapsed. When the assembly graph is properly formed and correctly resolved (green arrow), one of the haplotype-specific paths (red or blue) is chosen for building a ‘primary’ pseudo-haplotype assembly and the other is set apart as an ‘alternate’ assembly. When the graph is not correctly resolved (purple arrow), one of four types of pattern are formed in the contigs and subsequent scaffolds. Depending on the supporting evidence, the scaffolder either keeps these haplotype contigs on separate scaffolds or brings them together on the same scaffold, often separated by gaps: 1. Separate contigs: both contigs are retained in the primary contig set, an error often observed when haplotype-specific sequences are highly diverged. 2. Flanking contigs: the assembly graph is partially formed, connecting the homozygous sequence of the 5′ side to one haplotype (blue) and the 3′ side to the other haplotype (red). 3. Partial flanking contigs: only one haplotype (blue) flanks one side of the homozygous sequence. 4. Failed connecting of contigs: all haplotype sequences fail to properly connect to flanking homozygous sequences. b, False homotype duplications occur where a sequence from the same genomic locus is duplicated, and are of two types: 1. Overlapping sequences at contig boundaries: in current overlap-layout-consensus assemblers, branching sequences in assembly graphs that are not selected as the primary path have a small overlapping sequence (purple), dovetailing to the primary path where it originated a branch. The size of the duplicated sequence is often the length of a corrected read. Subsequent scaffolding results in tandem duplicated sequences with a gap between. 2. Under-collapsed sequences: sequencing errors in reads (red x) randomly or systematically pile up, forming under-collapsed sequences. Subsequent duplication errors in the scaffolding are similar to the heterotype duplications. Purge_haplotigs align sequences to themselves to find a smaller sequence that aligns fully to a larger contig or scaffold, and removes heterotype duplication types 1, 3, and 4. Purge_dups additionally uses coverage information to detect heterotype duplication type 2 and homotype duplications. We distinguished the two types of duplications by: 1) haplotype-specific variants in reads aligning at half coverage to each heterotype duplication; 2) differing consensus quality that resulted from read coverage fluctuations when aligning reads to homotype duplications; and 3) k-mer copy number anomalies in which homotype duplications were observed in the assembly with more than the expected number of copies.

**Extended Data Fig. 6. False duplication examples fixed during manual curation.**
a, An example of a heterotype duplication in the female zebra finch, non-trio assembly. Left, a self-dot plot of this region generated with Gepard, with sequences coloured by haplotypes. Gaps, duplicated sequences (green and purple), and haplotype-specific marker densities are indicated at the top. Right, a detailed alignment view of the green haplotype duplication with paternal and maternal markers, self-alignment components, transcripts annotated, contigs, bionano maps, and repeat components displayed in gEVAL. b, Example of a homotype duplication found in the hummingbird assembly. These were caused by an algorithm bug in FALCON, which was later fixed. c, Example of a combined duplication involving both heterotype (green) and homotype (orange) duplications. Assembly graph structure is shown on the left for clarity, highlighting the overlapping sites at the contig boundary shaded following the duplication type. Assembly errors including the above false duplications were detected and fixed during the curation process.

**Extended Data Fig. 7. Evidence of near-complete chromosome scaffolds in the VGP assemblies.**
Shown are Hi-C interaction heat maps for each species after curation, visualized with PretextView. A scaffold is considered a putative arm-to-arm chromosome when all Hi-C read pairs in a row and column map to a square (that is, an assembled chromosome) on the diagonal without any other interactions off the diagonal. Those with remaining off-diagonal matches to smaller scaffolds are not linked because of ambiguous order or orientation, and are instead submitted as ‘unlocalized’ belonging to the relevant chromosome. Bands at the top of each heat map show scaffolds identified as X, Z (blue) or Y, W (red) sex chromosomes. The Hi-C map of fAstCal1 is not included as we had no remaining tissue left of the animal used to generate Hi-C reads.

**Extended Data Fig. 8. Comparison of chromosomal organization between previous and new VGP assemblies.**
a, Zebra finch male compared to a previous reference assembly of the same animal. b, Platypus male compared with a previous reference female assembly (so the Y chromosomes are absent in the previous reference). c, Hummingbird female compared to a previous reference of the same animal. d, Climbing perch compared to a previous reference. Each row represents a VGP-generated chromosome for the target species. Colours depict identity with the reference (see key to the right); more than one colour indicates reorganization in the VGP assembly relative to the reference. The lines within each block depict orientation relative to the reference; a positive slope is the same orientation as the reference, whereas a negative slope is the inverse orientation. Gaps are white boxes with no lines, in the reference relative to the VGP assembly. A white box for the entire chromosome means a newly identified chromosome in the VGP assembly. Top 20 is the longest 20 scaffolds of the hummingbird and climbing perch assemblies. Accession numbers of the assemblies compared are listed in Supplementary Table 19.

**Extended Data Fig. 9. Haplotype-resolved sex chromosomes and mitochondrial genomes.**
a, Alignment scatterplot, generated with MUMmer NUCmer, visualized with dot, of maternal and paternal chromosomes from the female zebra finch trio-based assembly. Blue, same orientation; red, inversion; orange, repeats between haplotypes. The paternal Z chromosome is highly divergent from the maternal W, and thus mostly unaligned. b, Alignment scatterplot of assembled Z and W chromosomes across the three bird species, approximated with MashMap2. Segments of 300 kb (green), 500 kb (blue), and 1 Mb (purple) are shaded darker with higher sequence identity, with a minimum of 85%. The smaller size and higher repeat content of the W chromosome are clearly visible. c, X and Y chromosome segments of the mammals (platypus, Canada lynx, pale spear-nosed bat, and greater horseshoe bat) showing a higher density of repeats within the mammalian X chromosome than the avian Z chromosome. d, VGP kākāpō mitochondrial genome assembly reveals previously missing repetitive sequences (adding 2,232 bp) in the origin of replication region, containing an 83-bp repeat unit. e, VGP climbing perch mitochondrial genome assembly showing a duplication of *trnL2* and partial duplication of *Nad1*, which were absent from the prior reference. Orange arrows and red lines, tRNA genes and their alignments; dark grey arrows and grey shading, all other genes and their alignments; black, non-coding regions; green line, conventional starting point of the circular sequence.

**Extended Data Fig. 10. Large haplotype inversions with direct evidence in the zebra finch trio assembly.**
a, Two inversions (green and red) in chromosome 5 found from the MUMmer NUCmer alignment of the maternal and paternal haplotype assemblies, visualized with dot. b, Hi-C interaction plot showing that the trio-binned Hi-C data remove most of the interactions from the other haplotype (red arrows), which could be erroneously classified as a mis-assembly if only one haplotype was used as a reference. c, An 8.5-Mb inversion found on chromosome 11 and a complicated 8.1-Mb rearrangement on chromosome 13 between maternal and paternal haplotypes. d, No mis-assembly signals were detected from the binned Hi-C interaction plots, indicating that the haplotype-specific inversions are real. e, Half the PacBio CLR span and Bionano optical maps agree with the inversion breakpoints in chromosome 11, supporting the haplotype-specific inversion.

**Extended Data Fig. 11. Polishing artefacts.**
a, An example of uneven mapping coverage in the primary and alternate sequence pair of the Anna’s hummingbird assembly. In this example, the alternate (alt) sequence was built at higher quality, attracting all linked-reads for polishing. The matching locus in the primary (pri) assembly was left unpolished, resulting in frameshift errors in the *TLK1* gene. b, Haplotype-specific markers (red for maternal, blue for paternal) and error markers found in the assembly on the Z chromosome (inherited from the paternal side) of the trio-binned female zebra finch assembly. Each row shows markers before short-read polishing, mapping all reads to both haplotype assemblies, and polishing by mapping paternally binned reads to the paternal assembly. Polishing improves QV, but introduces haplotype switch errors when using reads from both haplotypes as shown in row 2. This can be avoided when using haplotype binned reads for polishing. c, Example of over-polishing. The nuclear mitochondria (NuMT) sequence was transformed as a full mitochondria (MT) sequence during long-read polishing owing to the absence of the MT contig, where the NuMT attracted all long reads from the MT. In comparison, the trio-binned assembly had the MT sequence assembled in place, preventing mis-placing of MT reads during read mapping.

**Extended Data Fig. 12. Chromosome evolution among the bat species sequenced.**
a, Genes surrounding an inversion in the greater horseshoe bat, relative to human chromosome 15 (red highlight). The *STARD5* gene is directly disrupted by this inversion, which separates exons 1–5 from exon 6 in the greater horseshoe bat. b, RNA-seq tracks showing the lack of RNA splicing evidence of *STARD5* transcripts in the greater horseshoe bat (bottom) in comparison to the pale spear-nosed bat where the *STARD5* gene is not disrupted (top). c, Circos plots of chromosome organization relationships between the each of the analysed bats and segments of the human chromosomes 1, 2, 6 and 10. Red star, breakpoint location in human chromosome 6, depicting the fission of the boreoeutherian chromosome 5 in the bat ancestor; blue star, the region upstream of the breakpoint in the bats; green star, the region downstream of the breakpoint in the bats. The red starred breakpoint was confirmed as reused, as opposed to assembly errors, in chromosomal rearrangements of the pale spear-nosed bat, Egyptian fruit bat, and greater horseshoe bat. There is no evidence of reuse for the velvety free-tailed bat. We could not confirm breakpoint reuse in the greater mouse-eared bat or Kuhl’s pipistrelle at the chromosomal scale because they were on small scaffolds that may not be completely assembled.

See this image and copyright information in PMC

Comment in

Assembling vertebrate genomes.
Wrighton KH. Wrighton KH. Nat Rev Genet. 2021 Jul;22(7):413. doi: 10.1038/s41576-021-00379-z. Nat Rev Genet. 2021. PMID: 34017104 No abstract available.

References

1. International Human Genome Sequencing Consortium Initial sequencing and analysis of the human genome. Nature. 2001;409:860–921. - PubMed
1. Sulston J, et al. The C. elegans genome sequencing project: a beginning. Nature. 1992;356:37–41. - PubMed
1. Mouse Genome Sequencing Consortium Initial sequencing and comparative analysis of the mouse genome. Nature. 2002;420:520–562. - PubMed
1. Howe K, et al. The zebrafish reference genome sequence and its relationship to the human genome. Nature. 2013;496:498–503. - PMC - PubMed
1. Genome 10K Community of Scientists Genome 10K: a proposal to obtain whole-genome sequence for 10,000 vertebrate species. J. Hered. 2009;100:659–674. - PMC - PubMed

Publication types

Actions
Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Grants and funding

LinkOut - more resources

Full Text Sources
Other Literature Sources
- scite Smart Citations
Miscellaneous
- NCI CPTAC Assay Portal

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Towards complete and error-free genome assemblies of all vertebrate species

Affiliations

Towards complete and error-free genome assemblies of all vertebrate species

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

Comment in

References

Publication types

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources

Miscellaneous