1
|
Armstrong J, Hickey G, Diekhans M, Fiddes IT, Novak AM, Deran A, Fang Q, Xie D, Feng S, Stiller J, Genereux D, Johnson J, Marinescu VD, Alföldi J, Harris RS, Lindblad-Toh K, Haussler D, Karlsson E, Jarvis ED, Zhang G, Paten B. Progressive Cactus is a multiple-genome aligner for the thousand-genome era. Nature 2020; 587:246-251. [PMID: 33177663 PMCID: PMC7673649 DOI: 10.1038/s41586-020-2871-y] [Citation(s) in RCA: 166] [Impact Index Per Article: 41.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2019] [Accepted: 07/27/2020] [Indexed: 12/11/2022]
Abstract
New genome assemblies have been arriving at a rapidly increasing pace, thanks to decreases in sequencing costs and improvements in third-generation sequencing technologies1-3. For example, the number of vertebrate genome assemblies currently in the NCBI (National Center for Biotechnology Information) database4 increased by more than 50% to 1,485 assemblies in the year from July 2018 to July 2019. In addition to this influx of assemblies from different species, new human de novo assemblies5 are being produced, which enable the analysis of not only small polymorphisms, but also complex, large-scale structural differences between human individuals and haplotypes. This coming era and its unprecedented amount of data offer the opportunity to uncover many insights into genome evolution but also present challenges in how to adapt current analysis methods to meet the increased scale. Cactus6, a reference-free multiple genome alignment program, has been shown to be highly accurate, but the existing implementation scales poorly with increasing numbers of genomes, and struggles in regions of highly duplicated sequences. Here we describe progressive extensions to Cactus to create Progressive Cactus, which enables the reference-free alignment of tens to thousands of large vertebrate genomes while maintaining high alignment quality. We describe results from an alignment of more than 600 amniote genomes, which is to our knowledge the largest multiple vertebrate genome alignment created so far.
Collapse
Affiliation(s)
- Joel Armstrong
- UC Santa Cruz Genomics Institute, UC Santa Cruz, Santa Cruz, CA, USA
| | - Glenn Hickey
- UC Santa Cruz Genomics Institute, UC Santa Cruz, Santa Cruz, CA, USA
| | - Mark Diekhans
- UC Santa Cruz Genomics Institute, UC Santa Cruz, Santa Cruz, CA, USA
| | - Ian T Fiddes
- UC Santa Cruz Genomics Institute, UC Santa Cruz, Santa Cruz, CA, USA
| | - Adam M Novak
- UC Santa Cruz Genomics Institute, UC Santa Cruz, Santa Cruz, CA, USA
| | - Alden Deran
- UC Santa Cruz Genomics Institute, UC Santa Cruz, Santa Cruz, CA, USA
| | - Qi Fang
- BGI-Shenzhen, Beishan Industrial Zone, Shenzhen, China
- Section for Ecology and Evolution, Department of Biology, University of Copenhagen, Copenhagen, Denmark
| | - Duo Xie
- BGI-Shenzhen, Beishan Industrial Zone, Shenzhen, China
- BGI Education Center, University of Chinese Academy of Sciences, Shenzhen, China
| | - Shaohong Feng
- BGI-Shenzhen, Beishan Industrial Zone, Shenzhen, China
- State Key Laboratory of Genetic Resources and Evolution, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming, China
| | - Josefin Stiller
- Section for Ecology and Evolution, Department of Biology, University of Copenhagen, Copenhagen, Denmark
| | - Diane Genereux
- Broad Institute of Harvard and Massachusetts Institute of Technology (MIT), Cambridge, MA, USA
| | - Jeremy Johnson
- Broad Institute of Harvard and Massachusetts Institute of Technology (MIT), Cambridge, MA, USA
| | - Voichita Dana Marinescu
- Science for Life Laboratory, Department of Medical Biochemistry and Microbiology, Uppsala University, Uppsala, Sweden
| | - Jessica Alföldi
- Broad Institute of Harvard and Massachusetts Institute of Technology (MIT), Cambridge, MA, USA
| | - Robert S Harris
- Department of Biology, The Pennsylvania State University, University Park, PA, USA
| | - Kerstin Lindblad-Toh
- Broad Institute of Harvard and Massachusetts Institute of Technology (MIT), Cambridge, MA, USA
- Science for Life Laboratory, Department of Medical Biochemistry and Microbiology, Uppsala University, Uppsala, Sweden
| | - David Haussler
- UC Santa Cruz Genomics Institute, UC Santa Cruz, Santa Cruz, CA, USA
- Howard Hughes Medical Institute, Chevy Chase, MD, USA
| | - Elinor Karlsson
- Broad Institute of Harvard and Massachusetts Institute of Technology (MIT), Cambridge, MA, USA
- Program in Molecular Medicine, University of Massachusetts Medical School, Worcester, MA, USA
- Bioinformatics and Integrative Biology, University of Massachusetts Medical School, Worcester, MA, USA
| | - Erich D Jarvis
- Howard Hughes Medical Institute, Chevy Chase, MD, USA
- Laboratory of Neurogenetics of Language, The Rockefeller University, New York, NY, USA
| | - Guojie Zhang
- Section for Ecology and Evolution, Department of Biology, University of Copenhagen, Copenhagen, Denmark.
- State Key Laboratory of Genetic Resources and Evolution, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming, China.
- Center for Excellence in Animal Evolution and Genetics, Chinese Academy of Sciences, Kunming, China.
- China National GeneBank, BGI-Shenzhen, Shenzhen, China.
| | - Benedict Paten
- UC Santa Cruz Genomics Institute, UC Santa Cruz, Santa Cruz, CA, USA.
| |
Collapse
|
2
|
Gönen M, Weir BA, Cowley GS, Vazquez F, Guan Y, Jaiswal A, Karasuyama M, Uzunangelov V, Wang T, Tsherniak A, Howell S, Marbach D, Hoff B, Norman TC, Airola A, Bivol A, Bunte K, Carlin D, Chopra S, Deran A, Ellrott K, Gopalacharyulu P, Graim K, Kaski S, Khan SA, Newton Y, Ng S, Pahikkala T, Paull E, Sokolov A, Tang H, Tang J, Wennerberg K, Xie Y, Zhan X, Zhu F, Aittokallio T, Mamitsuka H, Stuart JM, Boehm JS, Root DE, Xiao G, Stolovitzky G, Hahn WC, Margolin AA. A Community Challenge for Inferring Genetic Predictors of Gene Essentialities through Analysis of a Functional Screen of Cancer Cell Lines. Cell Syst 2017; 5:485-497.e3. [PMID: 28988802 DOI: 10.1016/j.cels.2017.09.004] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2016] [Revised: 06/18/2017] [Accepted: 09/07/2017] [Indexed: 12/18/2022]
Abstract
We report the results of a DREAM challenge designed to predict relative genetic essentialities based on a novel dataset testing 98,000 shRNAs against 149 molecularly characterized cancer cell lines. We analyzed the results of over 3,000 submissions over a period of 4 months. We found that algorithms combining essentiality data across multiple genes demonstrated increased accuracy; gene expression was the most informative molecular data type; the identity of the gene being predicted was far more important than the modeling strategy; well-predicted genes and selected molecular features showed enrichment in functional categories; and frequently selected expression features correlated with survival in primary tumors. This study establishes benchmarks for gene essentiality prediction, presents a community resource for future comparison with this benchmark, and provides insights into factors influencing the ability to predict gene essentiality from functional genetic screens. This study also demonstrates the value of releasing pre-publication data publicly to engage the community in an open research collaboration.
Collapse
Affiliation(s)
- Mehmet Gönen
- Department of Industrial Engineering, College of Engineering, Koç University, İstanbul, Turkey; School of Medicine, Koç University, İstanbul, Turkey; Department of Biomedical Engineering, Oregon Health & Science University, Portland, OR, USA
| | | | - Glenn S Cowley
- Genetic Perturbation Platform, The Broad Institute, Boston, MA, USA; Janssen R&D US, Spring House, PA, USA
| | - Francisca Vazquez
- Cancer Program, The Broad Institute, Boston, MA, USA; Dana-Farber Cancer Institute, Boston, MA, USA
| | - Yuanfang Guan
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA
| | - Alok Jaiswal
- Institute for Molecular Medicine Finland FIMM, University of Helsinki, Helsinki, Finland
| | - Masayuki Karasuyama
- Department of Computer Science, Nagoya Institute of Technology, Nagoya, Japan
| | - Vladislav Uzunangelov
- Department of Biomolecular Engineering, University of California, Santa Cruz, CA, USA
| | - Tao Wang
- Quantitative Biomedical Research Center, Department of Clinical Sciences, University of Texas Southwestern Medical Center, Dallas, TX, USA; Center for the Genetics of Host Defense, University of Texas Southwestern Medical Center, Dallas, TX, USA
| | | | - Sara Howell
- Cancer Program, The Broad Institute, Boston, MA, USA; Brandeis University, Waltham, MA, USA
| | - Daniel Marbach
- Department of Computational Biology, University of Lausanne, Lausanne, Switzerland; Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | | | | | - Antti Airola
- Department of Information Technology, University of Turku, Turku, Finland
| | - Adrian Bivol
- Department of Biomolecular Engineering, University of California, Santa Cruz, CA, USA
| | - Kerstin Bunte
- Helsinki Institute for Information Technology HIIT, Department of Computer Science, Aalto University, Espoo, Finland; School of Computer Science, The University of Birmingham, Birmingham, UK
| | - Daniel Carlin
- Department of Bioengineering, University of California, San Diego, CA, USA
| | - Sahil Chopra
- Department of Biomolecular Engineering, University of California, Santa Cruz, CA, USA; Department of Computer Science, Stanford University, Stanford, CA, USA
| | - Alden Deran
- Department of Biomolecular Engineering, University of California, Santa Cruz, CA, USA
| | - Kyle Ellrott
- Department of Biomedical Engineering, Oregon Health & Science University, Portland, OR, USA; Department of Biomolecular Engineering, University of California, Santa Cruz, CA, USA
| | | | - Kiley Graim
- Department of Biomolecular Engineering, University of California, Santa Cruz, CA, USA
| | - Samuel Kaski
- Helsinki Institute for Information Technology HIIT, Department of Computer Science, Aalto University, Espoo, Finland; Helsinki Institute for Information Technology HIIT, Department of Computer Science, University of Helsinki, Helsinki, Finland
| | - Suleiman A Khan
- Institute for Molecular Medicine Finland FIMM, University of Helsinki, Helsinki, Finland
| | - Yulia Newton
- Department of Biomolecular Engineering, University of California, Santa Cruz, CA, USA
| | - Sam Ng
- Department of Biomolecular Engineering, University of California, Santa Cruz, CA, USA
| | - Tapio Pahikkala
- Department of Information Technology, University of Turku, Turku, Finland
| | - Evan Paull
- Department of Biomolecular Engineering, University of California, Santa Cruz, CA, USA
| | - Artem Sokolov
- Department of Biomolecular Engineering, University of California, Santa Cruz, CA, USA
| | - Hao Tang
- Quantitative Biomedical Research Center, Department of Clinical Sciences, University of Texas Southwestern Medical Center, Dallas, TX, USA
| | - Jing Tang
- Institute for Molecular Medicine Finland FIMM, University of Helsinki, Helsinki, Finland
| | - Krister Wennerberg
- Institute for Molecular Medicine Finland FIMM, University of Helsinki, Helsinki, Finland
| | - Yang Xie
- Quantitative Biomedical Research Center, Department of Clinical Sciences, University of Texas Southwestern Medical Center, Dallas, TX, USA; Department of Bioinformatics, University of Texas Southwestern Medical Center, Dallas, TX, USA; Simons Comprehensive Cancer Center, University of Texas Southwestern Medical Center, Dallas, TX, USA
| | - Xiaowei Zhan
- Quantitative Biomedical Research Center, Department of Clinical Sciences, University of Texas Southwestern Medical Center, Dallas, TX, USA; Center for the Genetics of Host Defense, University of Texas Southwestern Medical Center, Dallas, TX, USA
| | - Fan Zhu
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA
| | | | - Tero Aittokallio
- Institute for Molecular Medicine Finland FIMM, University of Helsinki, Helsinki, Finland; Department of Mathematics and Statistics, University of Turku, Turku, Finland
| | - Hiroshi Mamitsuka
- Bioinformatics Center, Institute for Chemical Research, Kyoto University, Kyoto, Japan
| | - Joshua M Stuart
- Department of Biomolecular Engineering, University of California, Santa Cruz, CA, USA
| | - Jesse S Boehm
- Cancer Program, The Broad Institute, Boston, MA, USA
| | - David E Root
- Genetic Perturbation Platform, The Broad Institute, Boston, MA, USA
| | - Guanghua Xiao
- Quantitative Biomedical Research Center, Department of Clinical Sciences, University of Texas Southwestern Medical Center, Dallas, TX, USA; Simons Comprehensive Cancer Center, University of Texas Southwestern Medical Center, Dallas, TX, USA
| | - Gustavo Stolovitzky
- Computational Biology Center, IBM Thomas J. Watson Research Center, Yorktown Heights, NY, USA; Department of Genetics and Genomics Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA.
| | - William C Hahn
- Cancer Program, The Broad Institute, Boston, MA, USA; Dana-Farber Cancer Institute, Boston, MA, USA.
| | - Adam A Margolin
- Department of Biomedical Engineering, Oregon Health & Science University, Portland, OR, USA; Computational Biology Program, Oregon Health & Science University, Portland, OR, USA.
| |
Collapse
|
3
|
Rice ES, Kohno S, John JS, Pham S, Howard J, Lareau LF, O'Connell BL, Hickey G, Armstrong J, Deran A, Fiddes I, Platt RN, Gresham C, McCarthy F, Kern C, Haan D, Phan T, Schmidt C, Sanford JR, Ray DA, Paten B, Guillette LJ, Green RE. Improved genome assembly of American alligator genome reveals conserved architecture of estrogen signaling. Genome Res 2017; 27:686-696. [PMID: 28137821 PMCID: PMC5411764 DOI: 10.1101/gr.213595.116] [Citation(s) in RCA: 32] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2016] [Accepted: 12/13/2016] [Indexed: 12/12/2022]
Abstract
The American alligator, Alligator mississippiensis, like all crocodilians, has temperature-dependent sex determination, in which the sex of an embryo is determined by the incubation temperature of the egg during a critical period of development. The lack of genetic differences between male and female alligators leaves open the question of how the genes responsible for sex determination and differentiation are regulated. Insight into this question comes from the fact that exposing an embryo incubated at male-producing temperature to estrogen causes it to develop ovaries. Because estrogen response elements are known to regulate genes over long distances, a contiguous genome assembly is crucial for predicting and understanding their impact. We present an improved assembly of the American alligator genome, scaffolded with in vitro proximity ligation (Chicago) data. We use this assembly to scaffold two other crocodilian genomes based on synteny. We perform RNA sequencing of tissues from American alligator embryos to find genes that are differentially expressed between embryos incubated at male- versus female-producing temperature. Finally, we use the improved contiguity of our assembly along with the current model of CTCF-mediated chromatin looping to predict regions of the genome likely to contain estrogen-responsive genes. We find that these regions are significantly enriched for genes with female-biased expression in developing gonads after the critical period during which sex is determined by incubation temperature. We thus conclude that estrogen signaling is a major driver of female-biased gene expression in the post-temperature sensitive period gonads.
Collapse
Affiliation(s)
- Edward S Rice
- Department of Biomolecular Engineering, University of California, Santa Cruz, California 95064, USA
| | - Satomi Kohno
- Department of Biology, St. Cloud State University, St. Cloud, Minnesota 56301, USA
| | - John St John
- Driver Group, LLC, San Francisco, California 94158, USA
| | - Son Pham
- BioTuring, Incorporated, San Diego, California 92121, USA
| | - Jonathan Howard
- Department of Biochemistry, Stanford University, Stanford, California 94305, USA
| | - Liana F Lareau
- California Institute for Quantitative Biosciences, University of California, Berkeley, California 94720, USA
| | - Brendan L O'Connell
- Department of Biomolecular Engineering, University of California, Santa Cruz, California 95064, USA.,Dovetail Genomics, LLC, Santa Cruz, California 95060, USA
| | - Glenn Hickey
- Department of Biomolecular Engineering, University of California, Santa Cruz, California 95064, USA
| | - Joel Armstrong
- Department of Biomolecular Engineering, University of California, Santa Cruz, California 95064, USA
| | - Alden Deran
- Department of Biomolecular Engineering, University of California, Santa Cruz, California 95064, USA
| | - Ian Fiddes
- Department of Biomolecular Engineering, University of California, Santa Cruz, California 95064, USA
| | - Roy N Platt
- Department of Biological Sciences, Texas Tech University, Lubbock, Texas 79409, USA
| | - Cathy Gresham
- Institute for Genomics, Biocomputing & Biotechnology, Mississippi State University, Mississippi State, Mississippi 39762, USA
| | - Fiona McCarthy
- School of Animal and Comparative Biomedical Sciences, University of Arizona, Tucson, Arizona 85721, USA
| | - Colin Kern
- Department of Animal Science, University of California, Davis, California 95616, USA
| | - David Haan
- Department of Biomolecular Engineering, University of California, Santa Cruz, California 95064, USA
| | - Tan Phan
- HCM University of Science, Ho Chí Minh, Vietnam 748500
| | - Carl Schmidt
- Department of Animal and Food Sciences, University of Delaware, Newark, Delaware 19717, USA
| | - Jeremy R Sanford
- Department of Molecular, Cell, and Developmental Biology, University of California, Santa Cruz, California 95064, USA
| | - David A Ray
- Department of Biological Sciences, Texas Tech University, Lubbock, Texas 79409, USA
| | - Benedict Paten
- Center for Biomolecular Science and Engineering, University of California, Santa Cruz, California 95064, USA
| | - Louis J Guillette
- Department of Obstetrics and Gynecology, Marine Biomedicine and Environmental Science Center, Hollings Marine Laboratory, Medical University of South Carolina, Charleston, South Carolina 29412, USA
| | - Richard E Green
- Department of Biomolecular Engineering, University of California, Santa Cruz, California 95064, USA.,California Institute for Quantitative Biosciences, University of California, Berkeley, California 94720, USA.,Dovetail Genomics, LLC, Santa Cruz, California 95060, USA
| |
Collapse
|