1
|
Füllgrabe J, Gosal WS, Creed P, Liu S, Lumby CK, Morley DJ, Ost TWB, Vilella AJ, Yu S, Bignell H, Burns P, Charlesworth T, Fu B, Fordham H, Harding NJ, Gandelman O, Golder P, Hodson C, Li M, Lila M, Liu Y, Mason J, Mellad J, Monahan JM, Nentwich O, Palmer A, Steward M, Taipale M, Vandomme A, San-Bento RS, Singhal A, Vivian J, Wójtowicz N, Williams N, Walker NJ, Wong NCH, Yalloway GN, Holbrook JD, Balasubramanian S. Simultaneous sequencing of genetic and epigenetic bases in DNA. Nat Biotechnol 2023; 41:1457-1464. [PMID: 36747096 PMCID: PMC10567558 DOI: 10.1038/s41587-022-01652-0] [Citation(s) in RCA: 19] [Impact Index Per Article: 19.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/12/2022] [Accepted: 12/16/2022] [Indexed: 02/08/2023]
Abstract
DNA comprises molecular information stored in genetic and epigenetic bases, both of which are vital to our understanding of biology. Most DNA sequencing approaches address either genetics or epigenetics and thus capture incomplete information. Methods widely used to detect epigenetic DNA bases fail to capture common C-to-T mutations or distinguish 5-methylcytosine from 5-hydroxymethylcytosine. We present a single base-resolution sequencing methodology that sequences complete genetics and the two most common cytosine modifications in a single workflow. DNA is copied and bases are enzymatically converted. Coupled decoding of bases across the original and copy strand provides a phased digital readout. Methods are demonstrated on human genomic DNA and cell-free DNA from a blood sample of a patient with cancer. The approach is accurate, requires low DNA input and has a simple workflow and analysis pipeline. Simultaneous, phased reading of genetic and epigenetic bases provides a more complete picture of the information stored in genomes and has applications throughout biomedicine.
Collapse
Affiliation(s)
- Jens Füllgrabe
- Cambridge Epigenetix Ltd, The Trinity Building, Chesterford Research Park, Cambridge, UK
| | - Walraj S Gosal
- Cambridge Epigenetix Ltd, The Trinity Building, Chesterford Research Park, Cambridge, UK
| | - Páidí Creed
- Cambridge Epigenetix Ltd, The Trinity Building, Chesterford Research Park, Cambridge, UK
| | - Sidong Liu
- Cambridge Epigenetix Ltd, The Trinity Building, Chesterford Research Park, Cambridge, UK
| | - Casper K Lumby
- Cambridge Epigenetix Ltd, The Trinity Building, Chesterford Research Park, Cambridge, UK
| | - David J Morley
- Cambridge Epigenetix Ltd, The Trinity Building, Chesterford Research Park, Cambridge, UK
| | - Tobias W B Ost
- Cambridge Epigenetix Ltd, The Trinity Building, Chesterford Research Park, Cambridge, UK
| | - Albert J Vilella
- Cambridge Epigenetix Ltd, The Trinity Building, Chesterford Research Park, Cambridge, UK
| | - Shirong Yu
- Cambridge Epigenetix Ltd, The Trinity Building, Chesterford Research Park, Cambridge, UK
| | - Helen Bignell
- Cambridge Epigenetix Ltd, The Trinity Building, Chesterford Research Park, Cambridge, UK
| | - Philippa Burns
- Cambridge Epigenetix Ltd, The Trinity Building, Chesterford Research Park, Cambridge, UK
| | - Tom Charlesworth
- Cambridge Epigenetix Ltd, The Trinity Building, Chesterford Research Park, Cambridge, UK
| | - Beiyuan Fu
- Cambridge Epigenetix Ltd, The Trinity Building, Chesterford Research Park, Cambridge, UK
| | - Howerd Fordham
- Cambridge Epigenetix Ltd, The Trinity Building, Chesterford Research Park, Cambridge, UK
| | - Nicolas J Harding
- Cambridge Epigenetix Ltd, The Trinity Building, Chesterford Research Park, Cambridge, UK
| | - Olga Gandelman
- Cambridge Epigenetix Ltd, The Trinity Building, Chesterford Research Park, Cambridge, UK
| | - Paula Golder
- Cambridge Epigenetix Ltd, The Trinity Building, Chesterford Research Park, Cambridge, UK
| | - Christopher Hodson
- Cambridge Epigenetix Ltd, The Trinity Building, Chesterford Research Park, Cambridge, UK
| | - Mengjie Li
- Cambridge Epigenetix Ltd, The Trinity Building, Chesterford Research Park, Cambridge, UK
| | - Marjana Lila
- Cambridge Epigenetix Ltd, The Trinity Building, Chesterford Research Park, Cambridge, UK
| | - Yang Liu
- Cambridge Epigenetix Ltd, The Trinity Building, Chesterford Research Park, Cambridge, UK
| | - Joanne Mason
- Cambridge Epigenetix Ltd, The Trinity Building, Chesterford Research Park, Cambridge, UK
| | - Jason Mellad
- Cambridge Epigenetix Ltd, The Trinity Building, Chesterford Research Park, Cambridge, UK
| | - Jack M Monahan
- Cambridge Epigenetix Ltd, The Trinity Building, Chesterford Research Park, Cambridge, UK
| | - Oliver Nentwich
- Cambridge Epigenetix Ltd, The Trinity Building, Chesterford Research Park, Cambridge, UK
| | - Alexandra Palmer
- Cambridge Epigenetix Ltd, The Trinity Building, Chesterford Research Park, Cambridge, UK
| | - Michael Steward
- Cambridge Epigenetix Ltd, The Trinity Building, Chesterford Research Park, Cambridge, UK
| | - Minna Taipale
- Cambridge Epigenetix Ltd, The Trinity Building, Chesterford Research Park, Cambridge, UK
| | - Audrey Vandomme
- Cambridge Epigenetix Ltd, The Trinity Building, Chesterford Research Park, Cambridge, UK
| | - Rita Santo San-Bento
- Cambridge Epigenetix Ltd, The Trinity Building, Chesterford Research Park, Cambridge, UK
| | - Ankita Singhal
- Cambridge Epigenetix Ltd, The Trinity Building, Chesterford Research Park, Cambridge, UK
| | - Julia Vivian
- Cambridge Epigenetix Ltd, The Trinity Building, Chesterford Research Park, Cambridge, UK
| | - Natalia Wójtowicz
- Cambridge Epigenetix Ltd, The Trinity Building, Chesterford Research Park, Cambridge, UK
| | - Nathan Williams
- Cambridge Epigenetix Ltd, The Trinity Building, Chesterford Research Park, Cambridge, UK
| | - Nicolas J Walker
- Cambridge Epigenetix Ltd, The Trinity Building, Chesterford Research Park, Cambridge, UK
| | - Nicola C H Wong
- Cambridge Epigenetix Ltd, The Trinity Building, Chesterford Research Park, Cambridge, UK
| | - Gary N Yalloway
- Cambridge Epigenetix Ltd, The Trinity Building, Chesterford Research Park, Cambridge, UK
| | - Joanna D Holbrook
- Cambridge Epigenetix Ltd, The Trinity Building, Chesterford Research Park, Cambridge, UK.
| | - Shankar Balasubramanian
- Cancer Research UK Cambridge Institute, University of Cambridge, Cambridge, UK.
- Yusuf Hamied Department of Chemistry, University of Cambridge, Cambridge, UK.
| |
Collapse
|
2
|
Cullen JN, Martin J, Vilella AJ, Treeful A, Sargan D, Bradley A, Friedenberg SG. Development and application of a next-generation sequencing protocol and bioinformatics pipeline for the comprehensive analysis of the canine immunoglobulin repertoire. PLoS One 2022; 17:e0270710. [PMID: 35802654 PMCID: PMC9269486 DOI: 10.1371/journal.pone.0270710] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/31/2021] [Accepted: 06/15/2022] [Indexed: 11/18/2022] Open
Abstract
Profiling the adaptive immune repertoire using next generation sequencing (NGS) has become common in human medicine, showing promise in characterizing clonal expansion of B cell clones through analysis of B cell receptors (BCRs) in patients with lymphoid malignancies. In contrast, most work evaluating BCR repertoires in dogs has employed traditional PCR-based approaches analyzing the IGH locus only. The objectives of this study were to: (1) describe a novel NGS protocol to evaluate canine BCRs; (2) develop a bioinformatics pipeline for processing canine BCR sequencing data; and (3) apply these methods to derive insights into BCR repertoires of healthy dogs and dogs undergoing treatment for B-cell lymphoma. RNA from peripheral blood mononuclear cells of healthy dogs (n = 25) and dogs newly diagnosed with intermediate-to-large B-cell lymphoma (n = 18) with intent to pursue chemotherapy was isolated, converted into cDNA and sequenced by NGS. The BCR repertoires were identified and quantified using a novel analysis pipeline. The IGK repertoires of the healthy dogs were far less diverse compared to IGL which, as with IGH, was highly diverse. Strong biases at key positions within the CDR3 sequence were identified within the healthy dog BCR repertoire. For a subset of the dogs with B-cell lymphoma, clonal expansion of specific IGH sequences pre-treatment and reduction post-treatment was observed. The degree of expansion and reduction correlated with the clinical outcome in this subset. Future studies employing these techniques may improve disease monitoring, provide earlier recognition of disease progression, and ultimately lead to more targeted therapeutics.
Collapse
Affiliation(s)
- Jonah N. Cullen
- Department of Veterinary Clinical Sciences, University of Minnesota College of Veterinary Medicine, St. Paul, Minnesota, United States of America
| | - Jolyon Martin
- Wellcome Trust Genome Campus, Hinxton, Saffron Walden, United Kingdom
- PetMedix Ltd, Glenn Berge Building, Babraham Research Campus, Cambridge, United Kingdom
| | - Albert J. Vilella
- PetMedix Ltd, Glenn Berge Building, Babraham Research Campus, Cambridge, United Kingdom
| | - Amy Treeful
- Department of Veterinary Clinical Sciences, University of Minnesota College of Veterinary Medicine, St. Paul, Minnesota, United States of America
| | - David Sargan
- Department of Veterinary Medicine, Madingley Road, Cambridge, United Kingdom
| | - Allan Bradley
- Wellcome Trust Genome Campus, Hinxton, Saffron Walden, United Kingdom
- PetMedix Ltd, Glenn Berge Building, Babraham Research Campus, Cambridge, United Kingdom
- Department of Medicine, Jeffrey Cheah Biomedical Centre, Cambridge, United Kingdom
| | - Steven G. Friedenberg
- Department of Veterinary Clinical Sciences, University of Minnesota College of Veterinary Medicine, St. Paul, Minnesota, United States of America
- * E-mail:
| |
Collapse
|
3
|
Locke DP, Hillier LW, Warren WC, Worley KC, Nazareth LV, Muzny DM, Yang SP, Wang Z, Chinwalla AT, Minx P, Mitreva M, Cook L, Delehaunty KD, Fronick C, Schmidt H, Fulton LA, Fulton RS, Nelson JO, Magrini V, Pohl C, Graves TA, Markovic C, Cree A, Dinh HH, Hume J, Kovar CL, Fowler GR, Lunter G, Meader S, Heger A, Ponting CP, Marques-Bonet T, Alkan C, Chen L, Cheng Z, Kidd JM, Eichler EE, White S, Searle S, Vilella AJ, Chen Y, Flicek P, Ma J, Raney B, Suh B, Burhans R, Herrero J, Haussler D, Faria R, Fernando O, Darré F, Farré D, Gazave E, Oliva M, Navarro A, Roberto R, Capozzi O, Archidiacono N, Della Valle G, Purgato S, Rocchi M, Konkel MK, Walker JA, Ullmer B, Batzer MA, Smit AFA, Hubley R, Casola C, Schrider DR, Hahn MW, Quesada V, Puente XS, Ordoñez GR, López-Otín C, Vinar T, Brejova B, Ratan A, Harris RS, Miller W, Kosiol C, Lawson HA, Taliwal V, Martins AL, Siepel A, RoyChoudhury A, Ma X, Degenhardt J, Bustamante CD, Gutenkunst RN, Mailund T, Dutheil JY, Hobolth A, Schierup MH, Ryder OA, Yoshinaga Y, de Jong PJ, Weinstock GM, Rogers J, Mardis ER, Gibbs RA, Wilson RK. Author Correction: Comparative and demographic analysis of orang-utan genomes. Nature 2022; 608:E36. [PMID: 35962045 PMCID: PMC9402433 DOI: 10.1038/s41586-022-04799-7] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Affiliation(s)
- Devin P. Locke
- grid.4367.60000 0001 2355 7002The Genome Center at Washington University, Washington University School of Medicine, Saint Louis, Missouri USA
| | - LaDeana W. Hillier
- grid.4367.60000 0001 2355 7002The Genome Center at Washington University, Washington University School of Medicine, Saint Louis, Missouri USA
| | - Wesley C. Warren
- grid.4367.60000 0001 2355 7002The Genome Center at Washington University, Washington University School of Medicine, Saint Louis, Missouri USA
| | - Kim C. Worley
- grid.39382.330000 0001 2160 926XHuman Genome Sequencing Center, Department of Molecular and Human Genetics, Baylor College of Medicine, One Baylor Plaza, Houston, Texas USA
| | - Lynne V. Nazareth
- grid.39382.330000 0001 2160 926XHuman Genome Sequencing Center, Department of Molecular and Human Genetics, Baylor College of Medicine, One Baylor Plaza, Houston, Texas USA
| | - Donna M. Muzny
- grid.39382.330000 0001 2160 926XHuman Genome Sequencing Center, Department of Molecular and Human Genetics, Baylor College of Medicine, One Baylor Plaza, Houston, Texas USA
| | - Shiaw-Pyng Yang
- grid.4367.60000 0001 2355 7002The Genome Center at Washington University, Washington University School of Medicine, Saint Louis, Missouri USA
| | - Zhengyuan Wang
- grid.4367.60000 0001 2355 7002The Genome Center at Washington University, Washington University School of Medicine, Saint Louis, Missouri USA
| | - Asif T. Chinwalla
- grid.4367.60000 0001 2355 7002The Genome Center at Washington University, Washington University School of Medicine, Saint Louis, Missouri USA
| | - Pat Minx
- grid.4367.60000 0001 2355 7002The Genome Center at Washington University, Washington University School of Medicine, Saint Louis, Missouri USA
| | - Makedonka Mitreva
- grid.4367.60000 0001 2355 7002The Genome Center at Washington University, Washington University School of Medicine, Saint Louis, Missouri USA
| | - Lisa Cook
- grid.4367.60000 0001 2355 7002The Genome Center at Washington University, Washington University School of Medicine, Saint Louis, Missouri USA
| | - Kim D. Delehaunty
- grid.4367.60000 0001 2355 7002The Genome Center at Washington University, Washington University School of Medicine, Saint Louis, Missouri USA
| | - Catrina Fronick
- grid.4367.60000 0001 2355 7002The Genome Center at Washington University, Washington University School of Medicine, Saint Louis, Missouri USA
| | - Heather Schmidt
- grid.4367.60000 0001 2355 7002The Genome Center at Washington University, Washington University School of Medicine, Saint Louis, Missouri USA
| | - Lucinda A. Fulton
- grid.4367.60000 0001 2355 7002The Genome Center at Washington University, Washington University School of Medicine, Saint Louis, Missouri USA
| | - Robert S. Fulton
- grid.4367.60000 0001 2355 7002The Genome Center at Washington University, Washington University School of Medicine, Saint Louis, Missouri USA
| | - Joanne O. Nelson
- grid.4367.60000 0001 2355 7002The Genome Center at Washington University, Washington University School of Medicine, Saint Louis, Missouri USA
| | - Vincent Magrini
- grid.4367.60000 0001 2355 7002The Genome Center at Washington University, Washington University School of Medicine, Saint Louis, Missouri USA
| | - Craig Pohl
- grid.4367.60000 0001 2355 7002The Genome Center at Washington University, Washington University School of Medicine, Saint Louis, Missouri USA
| | - Tina A. Graves
- grid.4367.60000 0001 2355 7002The Genome Center at Washington University, Washington University School of Medicine, Saint Louis, Missouri USA
| | - Chris Markovic
- grid.4367.60000 0001 2355 7002The Genome Center at Washington University, Washington University School of Medicine, Saint Louis, Missouri USA
| | - Andy Cree
- grid.39382.330000 0001 2160 926XHuman Genome Sequencing Center, Department of Molecular and Human Genetics, Baylor College of Medicine, One Baylor Plaza, Houston, Texas USA
| | - Huyen H. Dinh
- grid.39382.330000 0001 2160 926XHuman Genome Sequencing Center, Department of Molecular and Human Genetics, Baylor College of Medicine, One Baylor Plaza, Houston, Texas USA
| | - Jennifer Hume
- grid.39382.330000 0001 2160 926XHuman Genome Sequencing Center, Department of Molecular and Human Genetics, Baylor College of Medicine, One Baylor Plaza, Houston, Texas USA
| | - Christie L. Kovar
- grid.39382.330000 0001 2160 926XHuman Genome Sequencing Center, Department of Molecular and Human Genetics, Baylor College of Medicine, One Baylor Plaza, Houston, Texas USA
| | - Gerald R. Fowler
- grid.39382.330000 0001 2160 926XHuman Genome Sequencing Center, Department of Molecular and Human Genetics, Baylor College of Medicine, One Baylor Plaza, Houston, Texas USA
| | - Gerton Lunter
- grid.4991.50000 0004 1936 8948MRC Functional Genomics Unit and Department of Physiology, Anatomy and Genetics, University of Oxford, Le Gros Clark Building, Oxford, UK ,grid.270683.80000 0004 0641 4511Wellcome Trust Centre for Human Genetics, Oxford, UK
| | - Stephen Meader
- grid.4991.50000 0004 1936 8948MRC Functional Genomics Unit and Department of Physiology, Anatomy and Genetics, University of Oxford, Le Gros Clark Building, Oxford, UK
| | - Andreas Heger
- grid.4991.50000 0004 1936 8948MRC Functional Genomics Unit and Department of Physiology, Anatomy and Genetics, University of Oxford, Le Gros Clark Building, Oxford, UK
| | - Chris P. Ponting
- grid.4991.50000 0004 1936 8948MRC Functional Genomics Unit and Department of Physiology, Anatomy and Genetics, University of Oxford, Le Gros Clark Building, Oxford, UK
| | - Tomas Marques-Bonet
- grid.34477.330000000122986657Department of Genome Sciences, University of Washington School of Medicine, Seattle, Washington USA ,grid.5612.00000 0001 2172 2676IBE, Institut de Biologia Evolutiva (UPF-CSIC), Universitat Pompeu Fabra, PRBB, Doctor Aiguader, 88, Barcelona, Spain
| | - Can Alkan
- grid.34477.330000000122986657Department of Genome Sciences, University of Washington School of Medicine, Seattle, Washington USA
| | - Lin Chen
- grid.34477.330000000122986657Department of Genome Sciences, University of Washington School of Medicine, Seattle, Washington USA
| | - Ze Cheng
- grid.34477.330000000122986657Department of Genome Sciences, University of Washington School of Medicine, Seattle, Washington USA
| | - Jeffrey M. Kidd
- grid.34477.330000000122986657Department of Genome Sciences, University of Washington School of Medicine, Seattle, Washington USA
| | - Evan E. Eichler
- grid.34477.330000000122986657Department of Genome Sciences, University of Washington School of Medicine, Seattle, Washington USA ,grid.413575.10000 0001 2167 1581Howard Hughes Medical Institute, Seattle, Washington USA
| | - Simon White
- grid.10306.340000 0004 0606 5382Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Cambridge, UK
| | - Stephen Searle
- grid.10306.340000 0004 0606 5382Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Cambridge, UK
| | - Albert J. Vilella
- grid.52788.300000 0004 0427 7672European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge UK
| | - Yuan Chen
- grid.52788.300000 0004 0427 7672European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge UK
| | - Paul Flicek
- grid.52788.300000 0004 0427 7672European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge UK
| | - Jian Ma
- grid.205975.c0000 0001 0740 6917Center for Biomolecular Science and Engineering, University of California, Santa Cruz, California USA ,grid.35403.310000 0004 1936 9991Present Address: Department of Bioengineering, University of Illinois at Urbana-Champaign, Urbana, Illinois USA
| | - Brian Raney
- grid.205975.c0000 0001 0740 6917Center for Biomolecular Science and Engineering, University of California, Santa Cruz, California USA
| | - Bernard Suh
- grid.205975.c0000 0001 0740 6917Center for Biomolecular Science and Engineering, University of California, Santa Cruz, California USA
| | - Richard Burhans
- grid.29857.310000 0001 2097 4281Center for Comparative Genomics and Bioinformatics, Penn State University, University Park, Pennsylvania, USA
| | - Javier Herrero
- grid.52788.300000 0004 0427 7672European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge UK
| | - David Haussler
- grid.205975.c0000 0001 0740 6917Center for Biomolecular Science and Engineering, University of California, Santa Cruz, California USA
| | - Rui Faria
- grid.5612.00000 0001 2172 2676IBE, Institut de Biologia Evolutiva (UPF-CSIC), Universitat Pompeu Fabra, PRBB, Doctor Aiguader, 88, Barcelona, Spain ,grid.5808.50000 0001 1503 7226CIBIO, Centro de Investigação em Biodiversidade e Recursos Genéticos, Universidade do Porto, Campus Agrário de Vairão, Vairão, Portugal
| | - Olga Fernando
- grid.5612.00000 0001 2172 2676IBE, Institut de Biologia Evolutiva (UPF-CSIC), Universitat Pompeu Fabra, PRBB, Doctor Aiguader, 88, Barcelona, Spain ,grid.10772.330000000121511713Instituto de Tecnologia Química e Biológica, Universidade Nova de Lisboa, Oeiras, Portugal
| | - Fleur Darré
- grid.5612.00000 0001 2172 2676IBE, Institut de Biologia Evolutiva (UPF-CSIC), Universitat Pompeu Fabra, PRBB, Doctor Aiguader, 88, Barcelona, Spain
| | - Domènec Farré
- grid.5612.00000 0001 2172 2676IBE, Institut de Biologia Evolutiva (UPF-CSIC), Universitat Pompeu Fabra, PRBB, Doctor Aiguader, 88, Barcelona, Spain
| | - Elodie Gazave
- grid.5612.00000 0001 2172 2676IBE, Institut de Biologia Evolutiva (UPF-CSIC), Universitat Pompeu Fabra, PRBB, Doctor Aiguader, 88, Barcelona, Spain
| | - Meritxell Oliva
- grid.5612.00000 0001 2172 2676IBE, Institut de Biologia Evolutiva (UPF-CSIC), Universitat Pompeu Fabra, PRBB, Doctor Aiguader, 88, Barcelona, Spain
| | - Arcadi Navarro
- grid.5612.00000 0001 2172 2676IBE, Institut de Biologia Evolutiva (UPF-CSIC), Universitat Pompeu Fabra, PRBB, Doctor Aiguader, 88, Barcelona, Spain ,grid.425902.80000 0000 9601 989XICREA (Institució Catalana de Recerca i Estudis Avançats) and INB (Instituto Nacional de Bioinformática) PRBB, Doctor Aiguader, 88, Barcelona, Spain
| | - Roberta Roberto
- grid.7644.10000 0001 0120 3326Department of Biology, University of Bari, Bari, Italy
| | - Oronzo Capozzi
- grid.7644.10000 0001 0120 3326Department of Biology, University of Bari, Bari, Italy
| | | | - Giuliano Della Valle
- grid.6292.f0000 0004 1757 1758Department of Biology, University of Bologna, Bologna, Italy
| | - Stefania Purgato
- grid.6292.f0000 0004 1757 1758Department of Biology, University of Bologna, Bologna, Italy
| | - Mariano Rocchi
- grid.7644.10000 0001 0120 3326Department of Biology, University of Bari, Bari, Italy
| | - Miriam K. Konkel
- grid.64337.350000 0001 0662 7451Department of Biological Sciences, Louisiana State University, Baton Rouge, Louisiana USA
| | - Jerilyn A. Walker
- grid.64337.350000 0001 0662 7451Department of Biological Sciences, Louisiana State University, Baton Rouge, Louisiana USA
| | - Brygg Ullmer
- grid.64337.350000 0001 0662 7451Center for Computation and Technology, Department of Computer Sciences, Louisiana State University, Baton Rouge, Louisiana USA
| | - Mark A. Batzer
- grid.64337.350000 0001 0662 7451Department of Biological Sciences, Louisiana State University, Baton Rouge, Louisiana USA
| | - Arian F. A. Smit
- grid.64212.330000 0004 0463 2320Institute for Systems Biology, Seattle, Washington USA
| | - Robert Hubley
- grid.64212.330000 0004 0463 2320Institute for Systems Biology, Seattle, Washington USA
| | - Claudio Casola
- grid.411377.70000 0001 0790 959XDepartment of Biology and School of Informatics and Computing, Indiana University, Bloomington, Indiana USA
| | - Daniel R. Schrider
- grid.411377.70000 0001 0790 959XDepartment of Biology and School of Informatics and Computing, Indiana University, Bloomington, Indiana USA
| | - Matthew W. Hahn
- grid.411377.70000 0001 0790 959XDepartment of Biology and School of Informatics and Computing, Indiana University, Bloomington, Indiana USA
| | - Victor Quesada
- grid.10863.3c0000 0001 2164 6351Instituto Universitario de Oncologia, Departamento de Bioquimica y Biologia Molecular, Universidad de Oviedo, Oviedo, Spain
| | - Xose S. Puente
- grid.10863.3c0000 0001 2164 6351Instituto Universitario de Oncologia, Departamento de Bioquimica y Biologia Molecular, Universidad de Oviedo, Oviedo, Spain
| | - Gonzalo R. Ordoñez
- grid.10863.3c0000 0001 2164 6351Instituto Universitario de Oncologia, Departamento de Bioquimica y Biologia Molecular, Universidad de Oviedo, Oviedo, Spain
| | - Carlos López-Otín
- grid.10863.3c0000 0001 2164 6351Instituto Universitario de Oncologia, Departamento de Bioquimica y Biologia Molecular, Universidad de Oviedo, Oviedo, Spain
| | - Tomas Vinar
- grid.7634.60000000109409708Faculty of Mathematics, Physics and Informatics, Comenius University, Mlynska Dolina, Bratislava, Slovakia
| | - Brona Brejova
- grid.7634.60000000109409708Faculty of Mathematics, Physics and Informatics, Comenius University, Mlynska Dolina, Bratislava, Slovakia
| | - Aakrosh Ratan
- grid.29857.310000 0001 2097 4281Center for Comparative Genomics and Bioinformatics, Penn State University, University Park, Pennsylvania, USA
| | - Robert S. Harris
- grid.29857.310000 0001 2097 4281Center for Comparative Genomics and Bioinformatics, Penn State University, University Park, Pennsylvania, USA
| | - Webb Miller
- grid.29857.310000 0001 2097 4281Center for Comparative Genomics and Bioinformatics, Penn State University, University Park, Pennsylvania, USA
| | - Carolin Kosiol
- Institut für Populations genetik, Vetmeduni Vienna, Wien, Austria
| | - Heather A. Lawson
- grid.4367.60000 0001 2355 7002Department of Anatomy and Neurobiology, Washington University School of Medicine, Saint Louis, Missouri USA
| | - Vikas Taliwal
- grid.5386.8000000041936877XDepartment of Biological Statistics and Computational Biology, Cornell University, Ithaca, New York USA
| | - André L. Martins
- grid.5386.8000000041936877XDepartment of Biological Statistics and Computational Biology, Cornell University, Ithaca, New York USA
| | - Adam Siepel
- grid.5386.8000000041936877XDepartment of Biological Statistics and Computational Biology, Cornell University, Ithaca, New York USA
| | - Arindam RoyChoudhury
- grid.21729.3f0000000419368729Department of Biostatistics, Columbia University, New York, New York USA
| | - Xin Ma
- grid.5386.8000000041936877XDepartment of Biological Statistics and Computational Biology, Cornell University, Ithaca, New York USA
| | - Jeremiah Degenhardt
- grid.5386.8000000041936877XDepartment of Biological Statistics and Computational Biology, Cornell University, Ithaca, New York USA
| | - Carlos D. Bustamante
- grid.168010.e0000000419368956Department of Genetics, Stanford University, Stanford, California USA
| | - Ryan N. Gutenkunst
- grid.134563.60000 0001 2168 186XDepartment of Molecular and Cellular Biology, University of Arizona, Tucson, Arizona USA
| | - Thomas Mailund
- grid.7048.b0000 0001 1956 2722Bioinformatics Research Centre, Aarhus University, Aarhus C, Denmark
| | - Julien Y. Dutheil
- grid.7048.b0000 0001 1956 2722Bioinformatics Research Centre, Aarhus University, Aarhus C, Denmark
| | - Asger Hobolth
- grid.7048.b0000 0001 1956 2722Bioinformatics Research Centre, Aarhus University, Aarhus C, Denmark
| | - Mikkel H. Schierup
- grid.7048.b0000 0001 1956 2722Bioinformatics Research Centre, Aarhus University, Aarhus C, Denmark
| | - Oliver A. Ryder
- grid.452788.40000 0004 0458 5309San Diego Zoo’s Institute for Conservation Research, Escondido, California USA
| | - Yuko Yoshinaga
- grid.414016.60000 0004 0433 7727Children’s Hospital Oakland Research Institute, Oakland, California USA
| | - Pieter J. de Jong
- grid.414016.60000 0004 0433 7727Children’s Hospital Oakland Research Institute, Oakland, California USA
| | - George M. Weinstock
- grid.4367.60000 0001 2355 7002The Genome Center at Washington University, Washington University School of Medicine, Saint Louis, Missouri USA
| | - Jeffrey Rogers
- grid.39382.330000 0001 2160 926XHuman Genome Sequencing Center, Department of Molecular and Human Genetics, Baylor College of Medicine, One Baylor Plaza, Houston, Texas USA
| | - Elaine R. Mardis
- grid.4367.60000 0001 2355 7002The Genome Center at Washington University, Washington University School of Medicine, Saint Louis, Missouri USA
| | - Richard A. Gibbs
- grid.39382.330000 0001 2160 926XHuman Genome Sequencing Center, Department of Molecular and Human Genetics, Baylor College of Medicine, One Baylor Plaza, Houston, Texas USA
| | - Richard K. Wilson
- grid.4367.60000 0001 2355 7002The Genome Center at Washington University, Washington University School of Medicine, Saint Louis, Missouri USA
| |
Collapse
|
4
|
Lapp H, Bala S, Balhoff JP, Bouck A, Goto N, Holder M, Holland R, Holloway A, Katayama T, Lewis PO, Mackey AJ, Osborne BI, Piel WH, Pond SLK, Poon AF, Qiu WG, Stajich JE, Stoltzfus A, Thierer T, Vilella AJ, Vos RA, Zmasek CM, Zwickl DJ, Vision TJ. The 2006 NESCent Phyloinformatics Hackathon: A Field Report. Evol Bioinform Online 2017. [DOI: 10.1177/117693430700300016] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022] Open
Abstract
In December, 2006, a group of 26 software developers from some of the most widely used life science programming toolkits and phylogenetic software projects converged on Durham, North Carolina, for a Phyloinformatics Hackathon, an intense five-day collaborative software coding event sponsored by the National Evolutionary Synthesis Center (NESCent). The goal was to help researchers to integrate multiple phylogenetic software tools into automated workflows. Participants addressed deficiencies in interoperability between programs by implementing “glue code” and improving support for phylogenetic data exchange standards (particularly NEXUS) across the toolkits. The work was guided by use-cases compiled in advance by both developers and users, and the code was documented as it was developed. The resulting software is freely available for both users and developers through incorporation into the distributions of several widely-used open-source toolkits. We explain the motivation for the hackathon, how it was organized, and discuss some of the outcomes and lessons learned. We conclude that hackathons are an effective mode of solving problems in software interoperability and usability, and are underutilized in scientific software development.
Collapse
Affiliation(s)
- Hilmar Lapp
- National Evolutionary Synthesis Center, 2024 W. Main St., Suite A200, Durham NC 27705, U.S.A
| | - Sendu Bala
- Dunn Human Nutrition Unit, Medical Research Council, Hills Road, Cambridge CB2 0XY, United Kingdom
| | - James P. Balhoff
- National Evolutionary Synthesis Center, 2024 W. Main St., Suite A200, Durham NC 27705, U.S.A
| | - Amy Bouck
- Department of Biology, CB 3280, University of North Carolina, Chapel Hill, NC 27599
- Department of Biology, Duke University, P.O. Box 90338, Durham, NC 27708, U.S.A
| | - Naohisa Goto
- Genome Information Research Center, Research Institute for Microbial Diseases, Osaka University, 3-1 Yamadaoka, Suita, Osaka 565–0871, Japan
| | - Mark Holder
- School of Computational Science, 150-F Dirac Science Library, Florida State University, Tallahassee, Florida 32306–4120, U.S.A
| | - Richard Holland
- EMBL—European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, United Kingdom
| | - Alisha Holloway
- Section of Evolution and Ecology, Center for Population Biology, 3347 Storer Hall, University of California, Davis, CA 95616, U.S.A
| | - Toshiaki Katayama
- Human Genome Center, Institute of Medical Science, University of Tokyo, 4-6-1 Shirokanedai, Minato-ku, Tokyo 108–0071, Japan
| | - Paul O. Lewis
- Department of Ecology and Evolutionary Biology, University of Connecticut, 75 North Eagleville Road, Unit 3043, Storrs, CT 06269-3043, U.S.A
| | - Aaron J. Mackey
- GlaxoSmithKline, 1250 S. Collegeville Road, Collegeville, PA 19426, U.S.A
| | | | - William H. Piel
- Peabody Museum of Natural History, Yale University, 170 Whitney Ave., New Haven CT 06511, U.S.A
| | - Sergei L. Kosakovsky Pond
- University of California, San Diego, Division of Comparative Pathology and Antiviral Research Center, 150 West Washington Street, San Diego, CA 92103
| | - Art F.Y. Poon
- University of California, San Diego, Division of Comparative Pathology and Antiviral Research Center, 150 West Washington Street, San Diego, CA 92103
| | - Wei-Gang Qiu
- Department of Biological Sciences, Hunter College, City University of New York, 695 Park Ave, New York, NY 10021, U.S.A
| | - Jason E. Stajich
- Department of Plant and Microbial Biology, University of California, Berkeley, CA 94720, U.S.A
| | - Arlin Stoltzfus
- Biochemical Science Division, National Institute of Standards and Technology, 100 Bureau Drive, Mail Stop 8310, Gaithersburg, MD, 20899-8310
| | - Tobias Thierer
- Biomatters Ltd, Level 6, 220 Queen St, Auckland, New Zealand
| | - Albert J. Vilella
- EMBL—European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, United Kingdom
| | - Rutger A. Vos
- Department of Zoology, University of British Columbia, #2370-6270 University Blvd., Vancouver, B.C. V6T 1Z4, Canada
| | | | - Derrick J. Zwickl
- National Evolutionary Synthesis Center, 2024 W. Main St., Suite A200, Durham NC 27705, U.S.A
| | - Todd J. Vision
- National Evolutionary Synthesis Center, 2024 W. Main St., Suite A200, Durham NC 27705, U.S.A
- Department of Biology, CB 3280, University of North Carolina, Chapel Hill, NC 27599
| |
Collapse
|
5
|
Herrero J, Muffato M, Beal K, Fitzgerald S, Gordon L, Pignatelli M, Vilella AJ, Searle SMJ, Amode R, Brent S, Spooner W, Kulesha E, Yates A, Flicek P. Ensembl comparative genomics resources. Database (Oxford) 2016; 2016:baw053. [PMID: 27141089 PMCID: PMC4852398 DOI: 10.1093/database/baw053] [Citation(s) in RCA: 126] [Impact Index Per Article: 15.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
|
6
|
Pignatelli M, Vilella AJ, Muffato M, Gordon L, White S, Flicek P, Herrero J. ncRNA orthologies in the vertebrate lineage. Database (Oxford) 2016; 2016:bav127. [PMID: 26980512 PMCID: PMC4792531 DOI: 10.1093/database/bav127] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/02/2015] [Revised: 12/23/2015] [Accepted: 12/23/2015] [Indexed: 01/07/2023]
Abstract
Annotation of orthologous and paralogous genes is necessary for many aspects of evolutionary analysis. Methods to infer these homology relationships have traditionally focused on protein-coding genes and evolutionary models used by these methods normally assume the positions in the protein evolve independently. However, as our appreciation for the roles of non-coding RNA genes has increased, consistently annotated sets of orthologous and paralogous ncRNA genes are increasingly needed. At the same time, methods such as PHASE or RAxML have implemented substitution models that consider pairs of sites to enable proper modelling of the loops and other features of RNA secondary structure. Here, we present a comprehensive analysis pipeline for the automatic detection of orthologues and paralogues for ncRNA genes. We focus on gene families represented in Rfam and for which a specific covariance model is provided. For each family ncRNA genes found in all Ensembl species are aligned using Infernal, and several trees are built using different substitution models. In parallel, a genomic alignment that includes the ncRNA genes and their flanking sequence regions is built with PRANK. This alignment is used to create two additional phylogenetic trees using the neighbour-joining (NJ) and maximum-likelihood (ML) methods. The trees arising from both the ncRNA and genomic alignments are merged using TreeBeST, which reconciles them with the species tree in order to identify speciation and duplication events. The final tree is used to infer the orthologues and paralogues following Fitch's definition. We also determine gene gain and loss events for each family using CAFE. All data are accessible through the Ensembl Comparative Genomics ('Compara') API, on our FTP site and are fully integrated in the Ensembl genome browser, where they can be accessed in a user-friendly manner. Database URL: http://www.ensembl.org.
Collapse
Affiliation(s)
- Miguel Pignatelli
- European Molecular Biology Laboratory, European Bioinformatics Institute
| | - Albert J Vilella
- European Molecular Biology Laboratory, European Bioinformatics Institute
| | - Matthieu Muffato
- European Molecular Biology Laboratory, European Bioinformatics Institute
| | - Leo Gordon
- European Molecular Biology Laboratory, European Bioinformatics Institute
| | - Simon White
- Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Paul Flicek
- European Molecular Biology Laboratory, European Bioinformatics Institute Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Javier Herrero
- European Molecular Biology Laboratory, European Bioinformatics Institute UCL Cancer Institute, University College London, London WC1E 6BT, UK
| |
Collapse
|
7
|
Herrero J, Muffato M, Beal K, Fitzgerald S, Gordon L, Pignatelli M, Vilella AJ, Searle SMJ, Amode R, Brent S, Spooner W, Kulesha E, Yates A, Flicek P. Ensembl comparative genomics resources. Database (Oxford) 2016; 2016:bav096. [PMID: 26896847 PMCID: PMC4761110 DOI: 10.1093/database/bav096] [Citation(s) in RCA: 191] [Impact Index Per Article: 23.9] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/02/2015] [Revised: 08/10/2015] [Accepted: 09/04/2015] [Indexed: 01/08/2023]
Abstract
Evolution provides the unifying framework with which to understand biology. The coherent investigation of genic and genomic data often requires comparative genomics analyses based on whole-genome alignments, sets of homologous genes and other relevant datasets in order to evaluate and answer evolutionary-related questions. However, the complexity and computational requirements of producing such data are substantial: this has led to only a small number of reference resources that are used for most comparative analyses. The Ensembl comparative genomics resources are one such reference set that facilitates comprehensive and reproducible analysis of chordate genome data. Ensembl computes pairwise and multiple whole-genome alignments from which large-scale synteny, per-base conservation scores and constrained elements are obtained. Gene alignments are used to define Ensembl Protein Families, GeneTrees and homologies for both protein-coding and non-coding RNA genes. These resources are updated frequently and have a consistent informatics infrastructure and data presentation across all supported species. Specialized web-based visualizations are also available including synteny displays, collapsible gene tree plots, a gene family locator and different alignment views. The Ensembl comparative genomics infrastructure is extensively reused for the analysis of non-vertebrate species by other projects including Ensembl Genomes and Gramene and much of the information here is relevant to these projects. The consistency of the annotation across species and the focus on vertebrates makes Ensembl an ideal system to perform and support vertebrate comparative genomic analyses. We use robust software and pipelines to produce reference comparative data and make it freely available. Database URL: http://www.ensembl.org.
Collapse
Affiliation(s)
- Javier Herrero
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton CB10 1SD
- Bill Lyons Informatics Centre, UCL Cancer Institute, University College London, London WC1E 6DD
| | - Matthieu Muffato
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton CB10 1SD
| | - Kathryn Beal
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton CB10 1SD
| | - Stephen Fitzgerald
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton CB10 1SD
| | - Leo Gordon
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton CB10 1SD
| | - Miguel Pignatelli
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton CB10 1SD
| | - Albert J. Vilella
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton CB10 1SD
| | | | - Ridwan Amode
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton CB10 1SD
- Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton CB10 1SA
| | - Simon Brent
- Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton CB10 1SA
| | - William Spooner
- Eagle Genomics Ltd., Babraham Research Campus, Cambridge, CB22 3AT, UK, and
- Cold Spring Harbor Laboratory, 1 Bungtown Road, Cold Spring Harbor, NY 11724, USA
| | - Eugene Kulesha
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton CB10 1SD
- Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton CB10 1SA
| | - Andrew Yates
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton CB10 1SD
| | - Paul Flicek
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton CB10 1SD
- Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton CB10 1SA
| |
Collapse
|
8
|
Tarasov A, Vilella AJ, Cuppen E, Nijman IJ, Prins P. Sambamba: fast processing of NGS alignment formats. Bioinformatics 2015; 31:2032-4. [PMID: 25697820 PMCID: PMC4765878 DOI: 10.1093/bioinformatics/btv098] [Citation(s) in RCA: 1056] [Impact Index Per Article: 117.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2014] [Accepted: 02/10/2015] [Indexed: 11/13/2022] Open
Abstract
Summary: Sambamba is a high-performance robust tool and library for working with SAM, BAM and CRAM sequence alignment files; the most common file formats for aligned next generation sequencing data. Sambamba is a faster alternative to samtools that exploits multi-core processing and dramatically reduces processing time. Sambamba is being adopted at sequencing centers, not only because of its speed, but also because of additional functionality, including coverage analysis and powerful filtering capability. Availability and implementation: Sambamba is free and open source software, available under a GPLv2 license. Sambamba can be downloaded and installed from http://www.open-bio.org/wiki/Sambamba. Sambamba v0.5.0 was released with doi:10.5281/zenodo.13200. Contact: j.c.p.prins@umcutrecht.nl
Collapse
Affiliation(s)
- Artem Tarasov
- Department of Statistical Simulation, St. Petersburg State University, St. Petersburg, Russia, Illumina Cambridge, Cambridge, UK, Hubrecht Institute, Royal Netherlands Academy of Arts and Sciences, Utrecht, The Netherlands, Department of Medical Genetics, Institute for Molecular Medicine, University Medical Centre Utrecht, Utrecht, The Netherlands and Department of Nematology, Wageningen University, Wageningen, The Netherlands
| | - Albert J Vilella
- Department of Statistical Simulation, St. Petersburg State University, St. Petersburg, Russia, Illumina Cambridge, Cambridge, UK, Hubrecht Institute, Royal Netherlands Academy of Arts and Sciences, Utrecht, The Netherlands, Department of Medical Genetics, Institute for Molecular Medicine, University Medical Centre Utrecht, Utrecht, The Netherlands and Department of Nematology, Wageningen University, Wageningen, The Netherlands
| | - Edwin Cuppen
- Department of Statistical Simulation, St. Petersburg State University, St. Petersburg, Russia, Illumina Cambridge, Cambridge, UK, Hubrecht Institute, Royal Netherlands Academy of Arts and Sciences, Utrecht, The Netherlands, Department of Medical Genetics, Institute for Molecular Medicine, University Medical Centre Utrecht, Utrecht, The Netherlands and Department of Nematology, Wageningen University, Wageningen, The Netherlands Department of Statistical Simulation, St. Petersburg State University, St. Petersburg, Russia, Illumina Cambridge, Cambridge, UK, Hubrecht Institute, Royal Netherlands Academy of Arts and Sciences, Utrecht, The Netherlands, Department of Medical Genetics, Institute for Molecular Medicine, University Medical Centre Utrecht, Utrecht, The Netherlands and Department of Nematology, Wageningen University, Wageningen, The Netherlands
| | - Isaac J Nijman
- Department of Statistical Simulation, St. Petersburg State University, St. Petersburg, Russia, Illumina Cambridge, Cambridge, UK, Hubrecht Institute, Royal Netherlands Academy of Arts and Sciences, Utrecht, The Netherlands, Department of Medical Genetics, Institute for Molecular Medicine, University Medical Centre Utrecht, Utrecht, The Netherlands and Department of Nematology, Wageningen University, Wageningen, The Netherlands
| | - Pjotr Prins
- Department of Statistical Simulation, St. Petersburg State University, St. Petersburg, Russia, Illumina Cambridge, Cambridge, UK, Hubrecht Institute, Royal Netherlands Academy of Arts and Sciences, Utrecht, The Netherlands, Department of Medical Genetics, Institute for Molecular Medicine, University Medical Centre Utrecht, Utrecht, The Netherlands and Department of Nematology, Wageningen University, Wageningen, The Netherlands Department of Statistical Simulation, St. Petersburg State University, St. Petersburg, Russia, Illumina Cambridge, Cambridge, UK, Hubrecht Institute, Royal Netherlands Academy of Arts and Sciences, Utrecht, The Netherlands, Department of Medical Genetics, Institute for Molecular Medicine, University Medical Centre Utrecht, Utrecht, The Netherlands and Department of Nematology, Wageningen University, Wageningen, The Netherlands
| |
Collapse
|
9
|
Abstract
MOTIVATION Accurate alignment of large numbers of sequences is demanding and the computational burden is further increased by downstream analyses depending on these alignments. With the abundance of sequence data, an integrative approach of adding new sequences to existing alignments without their full re-computation and maintaining the relative matching of existing sequences is an attractive option. Another current challenge is the extension of reference alignments with fragmented sequences, as those coming from next-generation metagenomics, that contain relatively little information. Widely used methods for alignment extension are based on profile representation of reference sequences. These do not incorporate and use phylogenetic information and are affected by the composition of the reference alignment and the phylogenetic positions of query sequences. RESULTS We have developed a method for phylogeny-aware alignment of partial-order sequence graphs and apply it here to the extension of alignments with new data. Our new method, called PAGAN, infers ancestral sequences for the reference alignment and adds new sequences in their phylogenetic context, either to predefined positions or by finding the best placement for sequences of unknown origin. Unlike profile-based alternatives, PAGAN considers the phylogenetic relatedness of the sequences and is not affected by inclusion of more diverged sequences in the reference set. Our analyses show that PAGAN outperforms alternative methods for alignment extension and provides superior accuracy for both DNA and protein data, the improvement being especially large for fragmented sequences. Moreover, PAGAN-generated alignments of noisy next-generation sequencing (NGS) sequences are accurate enough for the use of RNA-seq data in evolutionary analyses. AVAILABILITY PAGAN is written in C++, licensed under the GPL and its source code is available at http://code.google.com/p/pagan-msa.
Collapse
Affiliation(s)
- Ari Löytynoja
- EMBL-European Bioinformatics Institute, Hinxton, CB10 1SD, UK.
| | | | | |
Collapse
|
10
|
Murchison EP, Schulz-Trieglaff OB, Ning Z, Alexandrov LB, Bauer MJ, Fu B, Hims M, Ding Z, Ivakhno S, Stewart C, Ng BL, Wong W, Aken B, White S, Alsop A, Becq J, Bignell GR, Cheetham RK, Cheng W, Connor TR, Cox AJ, Feng ZP, Gu Y, Grocock RJ, Harris SR, Khrebtukova I, Kingsbury Z, Kowarsky M, Kreiss A, Luo S, Marshall J, McBride DJ, Murray L, Pearse AM, Raine K, Rasolonjatovo I, Shaw R, Tedder P, Tregidgo C, Vilella AJ, Wedge DC, Woods GM, Gormley N, Humphray S, Schroth G, Smith G, Hall K, Searle SMJ, Carter NP, Papenfuss AT, Futreal PA, Campbell PJ, Yang F, Bentley DR, Evers DJ, Stratton MR. Genome sequencing and analysis of the Tasmanian devil and its transmissible cancer. Cell 2012; 148:780-91. [PMID: 22341448 PMCID: PMC3281993 DOI: 10.1016/j.cell.2011.11.065] [Citation(s) in RCA: 238] [Impact Index Per Article: 19.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2011] [Revised: 11/03/2011] [Accepted: 11/29/2011] [Indexed: 01/23/2023]
Abstract
The Tasmanian devil (Sarcophilus harrisii), the largest marsupial carnivore, is endangered due to a transmissible facial cancer spread by direct transfer of living cancer cells through biting. Here we describe the sequencing, assembly, and annotation of the Tasmanian devil genome and whole-genome sequences for two geographically distant subclones of the cancer. Genomic analysis suggests that the cancer first arose from a female Tasmanian devil and that the clone has subsequently genetically diverged during its spread across Tasmania. The devil cancer genome contains more than 17,000 somatic base substitution mutations and bears the imprint of a distinct mutational process. Genotyping of somatic mutations in 104 geographically and temporally distributed Tasmanian devil tumors reveals the pattern of evolution and spread of this parasitic clonal lineage, with evidence of a selective sweep in one geographical area and persistence of parallel lineages in other populations. PaperClip
Collapse
|
11
|
Scally A, Dutheil JY, Hillier LW, Jordan GE, Goodhead I, Herrero J, Hobolth A, Lappalainen T, Mailund T, Marques-Bonet T, McCarthy S, Montgomery SH, Schwalie PC, Tang YA, Ward MC, Xue Y, Yngvadottir B, Alkan C, Andersen LN, Ayub Q, Ball EV, Beal K, Bradley BJ, Chen Y, Clee CM, Fitzgerald S, Graves TA, Gu Y, Heath P, Heger A, Karakoc E, Kolb-Kokocinski A, Laird GK, Lunter G, Meader S, Mort M, Mullikin JC, Munch K, O'Connor TD, Phillips AD, Prado-Martinez J, Rogers AS, Sajjadian S, Schmidt D, Shaw K, Simpson JT, Stenson PD, Turner DJ, Vigilant L, Vilella AJ, Whitener W, Zhu B, Cooper DN, de Jong P, Dermitzakis ET, Eichler EE, Flicek P, Goldman N, Mundy NI, Ning Z, Odom DT, Ponting CP, Quail MA, Ryder OA, Searle SM, Warren WC, Wilson RK, Schierup MH, Rogers J, Tyler-Smith C, Durbin R. Insights into hominid evolution from the gorilla genome sequence. Nature 2012; 483:169-75. [PMID: 22398555 PMCID: PMC3303130 DOI: 10.1038/nature10842] [Citation(s) in RCA: 457] [Impact Index Per Article: 38.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2011] [Accepted: 01/10/2012] [Indexed: 12/13/2022]
Abstract
Gorillas are humans’ closest living relatives after chimpanzees, and are of comparable importance for the study of human origins and evolution. Here we present the assembly and analysis of a genome sequence for the western lowland gorilla, and compare the whole genomes of all extant great ape genera. We propose a synthesis of genetic and fossil evidence consistent with placing the human-chimpanzee and human-chimpanzee-gorilla speciation events at approximately 6 and 10 million years ago (Mya). In 30% of the genome, gorilla is closer to human or chimpanzee than the latter are to each other; this is rarer around coding genes, indicating pervasive selection throughout great ape evolution, and has functional consequences in gene expression. A comparison of protein coding genes reveals approximately 500 genes showing accelerated evolution on each of the gorilla, human and chimpanzee lineages, and evidence for parallel acceleration, particularly of genes involved in hearing. We also compare the western and eastern gorilla species, estimating an average sequence divergence time 1.75 million years ago, but with evidence for more recent genetic exchange and a population bottleneck in the eastern species. The use of the genome sequence in these and future analyses will promote a deeper understanding of great ape biology and evolution.
Collapse
Affiliation(s)
- Aylwyn Scally
- Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton CB10 1SA, UK
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
12
|
Bateman A, Agrawal S, Birney E, Bruford EA, Bujnicki JM, Cochrane G, Cole JR, Dinger ME, Enright AJ, Gardner PP, Gautheret D, Griffiths-Jones S, Harrow J, Herrero J, Holmes IH, Huang HD, Kelly KA, Kersey P, Kozomara A, Lowe TM, Marz M, Moxon S, Pruitt KD, Samuelsson T, Stadler PF, Vilella AJ, Vogel JH, Williams KP, Wright MW, Zwieb C. RNAcentral: A vision for an international database of RNA sequences. RNA 2011; 17:1941-6. [PMID: 21940779 PMCID: PMC3198587 DOI: 10.1261/rna.2750811] [Citation(s) in RCA: 55] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/07/2023]
Abstract
During the last decade there has been a great increase in the number of noncoding RNA genes identified, including new classes such as microRNAs and piRNAs. There is also a large growth in the amount of experimental characterization of these RNA components. Despite this growth in information, it is still difficult for researchers to access RNA data, because key data resources for noncoding RNAs have not yet been created. The most pressing omission is the lack of a comprehensive RNA sequence database, much like UniProt, which provides a comprehensive set of protein knowledge. In this article we propose the creation of a new open public resource that we term RNAcentral, which will contain a comprehensive collection of RNA sequences and fill an important gap in the provision of biomedical databases. We envision RNA researchers from all over the world joining a federated RNAcentral network, contributing specialized knowledge and databases. RNAcentral would centralize key data that are currently held across a variety of databases, allowing researchers instant access to a single, unified resource. This resource would facilitate the next generation of RNA research and help drive further discoveries, including those that improve food production and human and animal health. We encourage additional RNA database resources and research groups to join this effort. We aim to obtain international network funding to further this endeavor.
Collapse
Affiliation(s)
- Alex Bateman
- Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, CB10 1SA, United Kingdom
- Corresponding author.E-mail .
| | - Shipra Agrawal
- Institute of Bioinformatics and Applied Biotechnology (IBAB), Bangalore 560 100, India
- BioCOS Life Sciences Private Limited, Bangalore 560 100, India
| | - Ewan Birney
- European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, CB10 1SD, United Kingdom
| | - Elspeth A. Bruford
- European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, CB10 1SD, United Kingdom
| | - Janusz M. Bujnicki
- Laboratory of Bioinformatics and Protein Engineering, International Institute of Molecular and Cell Biology in Warsaw, Trojdena 4, 02-109 Warsaw, Poland
- Laboratory of Bioinformatics, Institute of Molecular Biology and Biotechnology, Faculty of Biology, Umultowska 89, 61-614 Poznan, Poland
| | - Guy Cochrane
- European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, CB10 1SD, United Kingdom
| | - James R. Cole
- Microbial Ecology Center, Michigan State University, East Lansing, Michigan 48824-1319, USA
| | - Marcel E. Dinger
- Institute for Molecular Bioscience, The University of Queensland, St Lucia QLD 4072, Australia
| | - Anton J. Enright
- European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, CB10 1SD, United Kingdom
| | - Paul P. Gardner
- Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, CB10 1SA, United Kingdom
| | - Daniel Gautheret
- Institut de Génétique et Microbiologie–UMR CNRS 8621, Université Paris-Sud–Bâtiment 400, 91405 Orsay Cedex, France
| | - Sam Griffiths-Jones
- Faculty of Life Sciences, University of Manchester, Michael Smith Building, Manchester, M13 9PT, United Kingdom
| | - Jen Harrow
- Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, CB10 1SA, United Kingdom
| | - Javier Herrero
- European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, CB10 1SD, United Kingdom
| | - Ian H. Holmes
- Department of Bioengineering, University of California, Berkeley, California 94720-1762, USA
| | - Hsien-Da Huang
- Institute of Bioinformatics and Systems Biology, National Chiao Tung University, HsinChu, 30050, Taiwan
| | - Krystyna A. Kelly
- Department of Plant Sciences, University of Cambridge, Cambridge CB2 3EA, United Kingdom
| | - Paul Kersey
- European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, CB10 1SD, United Kingdom
| | - Ana Kozomara
- Faculty of Life Sciences, University of Manchester, Michael Smith Building, Manchester, M13 9PT, United Kingdom
| | - Todd M. Lowe
- Department of Biomolecular Engineering, University of California, Santa Cruz, California 95064, USA
| | - Manja Marz
- RNA Bioinformatics Group, Institute of Pharmaceutical Chemistry, Marbacher Weg 6, 35037 Marburg, Germany
| | - Simon Moxon
- University of East Anglia, Norwich, NR4 7TJ, United Kingdom
| | - Kim D. Pruitt
- National Center for Biotechnology Information, National Library of Medicine, Bethesda, Maryland 20894, USA
| | - Tore Samuelsson
- Department of Medical Biochemistry, University of Goteborg, Medicinareg. 9A, S-405 30 Goteborg, Sweden
| | - Peter F. Stadler
- Bioinformatics Group, Department of Computer Science, and Interdisciplinary Center for Bioinformatics, University of Leipzig, 04009 Leipzig, Germany
| | - Albert J. Vilella
- European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, CB10 1SD, United Kingdom
| | - Jan-Hinnerk Vogel
- Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, CB10 1SA, United Kingdom
| | - Kelly P. Williams
- Sandia National Laboratories, MS 9291, Livermore, California 94551-0969, USA
| | - Mathew W. Wright
- European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, CB10 1SD, United Kingdom
| | - Christian Zwieb
- Department of Biochemistry, University of Texas Health Science Center at San Antonio, San Antonio, Texas 78229-3901, USA
| |
Collapse
|
13
|
Abstract
A response to 2x genomes - depth does matter by MC Milinkovitch, R Helaers, E Depiereux, AC Tzika and T Gabaldón. Genome Biol 2010, 11:R16.
Collapse
|
14
|
Flicek P, Amode MR, Barrell D, Beal K, Brent S, Chen Y, Clapham P, Coates G, Fairley S, Fitzgerald S, Gordon L, Hendrix M, Hourlier T, Johnson N, Kähäri A, Keefe D, Keenan S, Kinsella R, Kokocinski F, Kulesha E, Larsson P, Longden I, McLaren W, Overduin B, Pritchard B, Riat HS, Rios D, Ritchie GRS, Ruffier M, Schuster M, Sobral D, Spudich G, Tang YA, Trevanion S, Vandrovcova J, Vilella AJ, White S, Wilder SP, Zadissa A, Zamora J, Aken BL, Birney E, Cunningham F, Dunham I, Durbin R, Fernández-Suarez XM, Herrero J, Hubbard TJP, Parker A, Proctor G, Vogel J, Searle SMJ. Ensembl 2011. Nucleic Acids Res 2011; 39:D800-6. [PMID: 21045057 PMCID: PMC3013672 DOI: 10.1093/nar/gkq1064] [Citation(s) in RCA: 564] [Impact Index Per Article: 43.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/07/2010] [Accepted: 10/13/2010] [Indexed: 11/13/2022] Open
Abstract
The Ensembl project (http://www.ensembl.org) seeks to enable genomic science by providing high quality, integrated annotation on chordate and selected eukaryotic genomes within a consistent and accessible infrastructure. All supported species include comprehensive, evidence-based gene annotations and a selected set of genomes includes additional data focused on variation, comparative, evolutionary, functional and regulatory annotation. The most advanced resources are provided for key species including human, mouse, rat and zebrafish reflecting the popularity and importance of these species in biomedical research. As of Ensembl release 59 (August 2010), 56 species are supported of which 5 have been added in the past year. Since our previous report, we have substantially improved the presentation and integration of both data of disease relevance and the regulatory state of different cell types.
Collapse
Affiliation(s)
- Paul Flicek
- European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK.
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
15
|
Arensburger P, Megy K, Waterhouse RM, Abrudan J, Amedeo P, Antelo B, Bartholomay L, Bidwell S, Caler E, Camara F, Campbell CL, Campbell KS, Casola C, Castro MT, Chandramouliswaran I, Chapman SB, Christley S, Costas J, Eisenstadt E, Feschotte C, Fraser-Liggett C, Guigo R, Haas B, Hammond M, Hansson BS, Hemingway J, Hill SR, Howarth C, Ignell R, Kennedy RC, Kodira CD, Lobo NF, Mao C, Mayhew G, Michel K, Mori A, Liu N, Naveira H, Nene V, Nguyen N, Pearson MD, Pritham EJ, Puiu D, Qi Y, Ranson H, Ribeiro JMC, Roberston HM, Severson DW, Shumway M, Stanke M, Strausberg RL, Sun C, Sutton G, Tu ZJ, Tubio JMC, Unger MF, Vanlandingham DL, Vilella AJ, White O, White JR, Wondji CS, Wortman J, Zdobnov EM, Birren B, Christensen BM, Collins FH, Cornel A, Dimopoulos G, Hannick LI, Higgs S, Lanzaro GC, Lawson D, Lee NH, Muskavitch MAT, Raikhel AS, Atkinson PW. Sequencing of Culex quinquefasciatus establishes a platform for mosquito comparative genomics. Science 2010; 330:86-8. [PMID: 20929810 DOI: 10.1126/science.1191864] [Citation(s) in RCA: 343] [Impact Index Per Article: 24.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/28/2023]
Abstract
Culex quinquefasciatus (the southern house mosquito) is an important mosquito vector of viruses such as West Nile virus and St. Louis encephalitis virus, as well as of nematodes that cause lymphatic filariasis. C. quinquefasciatus is one species within the Culex pipiens species complex and can be found throughout tropical and temperate climates of the world. The ability of C. quinquefasciatus to take blood meals from birds, livestock, and humans contributes to its ability to vector pathogens between species. Here, we describe the genomic sequence of C. quinquefasciatus: Its repertoire of 18,883 protein-coding genes is 22% larger than that of Aedes aegypti and 52% larger than that of Anopheles gambiae with multiple gene-family expansions, including olfactory and gustatory receptors, salivary gland genes, and genes associated with xenobiotic detoxification.
Collapse
Affiliation(s)
- Peter Arensburger
- Center for Disease Vector Research, University of California Riverside, Riverside, CA 92521, USA.
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
16
|
Dalloul RA, Long JA, Zimin AV, Aslam L, Beal K, Ann Blomberg L, Bouffard P, Burt DW, Crasta O, Crooijmans RPMA, Cooper K, Coulombe RA, De S, Delany ME, Dodgson JB, Dong JJ, Evans C, Frederickson KM, Flicek P, Florea L, Folkerts O, Groenen MAM, Harkins TT, Herrero J, Hoffmann S, Megens HJ, Jiang A, de Jong P, Kaiser P, Kim H, Kim KW, Kim S, Langenberger D, Lee MK, Lee T, Mane S, Marcais G, Marz M, McElroy AP, Modise T, Nefedov M, Notredame C, Paton IR, Payne WS, Pertea G, Prickett D, Puiu D, Qioa D, Raineri E, Ruffier M, Salzberg SL, Schatz MC, Scheuring C, Schmidt CJ, Schroeder S, Searle SMJ, Smith EJ, Smith J, Sonstegard TS, Stadler PF, Tafer H, Tu Z(J, Van Tassell CP, Vilella AJ, Williams KP, Yorke JA, Zhang L, Zhang HB, Zhang X, Zhang Y, Reed KM. Multi-platform next-generation sequencing of the domestic turkey (Meleagris gallopavo): genome assembly and analysis. PLoS Biol 2010; 8:e1000475. [PMID: 20838655 PMCID: PMC2935454 DOI: 10.1371/journal.pbio.1000475] [Citation(s) in RCA: 320] [Impact Index Per Article: 22.9] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2009] [Accepted: 07/27/2010] [Indexed: 12/11/2022] Open
Abstract
A synergistic combination of two next-generation sequencing platforms with a detailed comparative BAC physical contig map provided a cost-effective assembly of the genome sequence of the domestic turkey (Meleagris gallopavo). Heterozygosity of the sequenced source genome allowed discovery of more than 600,000 high quality single nucleotide variants. Despite this heterozygosity, the current genome assembly (∼1.1 Gb) includes 917 Mb of sequence assigned to specific turkey chromosomes. Annotation identified nearly 16,000 genes, with 15,093 recognized as protein coding and 611 as non-coding RNA genes. Comparative analysis of the turkey, chicken, and zebra finch genomes, and comparing avian to mammalian species, supports the characteristic stability of avian genomes and identifies genes unique to the avian lineage. Clear differences are seen in number and variety of genes of the avian immune system where expansions and novel genes are less frequent than examples of gene loss. The turkey genome sequence provides resources to further understand the evolution of vertebrate genomes and genetic variation underlying economically important quantitative traits in poultry. This integrated approach may be a model for providing both gene and chromosome level assemblies of other species with agricultural, ecological, and evolutionary interest.
Collapse
Affiliation(s)
- Rami A. Dalloul
- Avian Immunobiology Laboratory, Department of Animal and Poultry Sciences, Virginia Tech, Blacksburg, Virginia, United States of America
| | - Julie A. Long
- Animal Biosciences and Biotechnology Laboratory, USDA Agricultural Research Service, Beltsville, Maryland, United States of America
| | - Aleksey V. Zimin
- Institute for Physical Science and Technology, University of Maryland, College Park, Maryland, United States of America
| | - Luqman Aslam
- Animal Breeding and Genomics Centre, Wageningen University, Wageningen, the Netherlands
| | - Kathryn Beal
- European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, United Kingdom
| | - Le Ann Blomberg
- Animal Biosciences and Biotechnology Laboratory, USDA Agricultural Research Service, Beltsville, Maryland, United States of America
| | - Pascal Bouffard
- Roche Applied Science, Indianapolis, Indiana, United States of America
| | - David W. Burt
- The Roslin Institute and Royal (Dick) School of Veterinary Studies, University of Edinburgh, Roslin, Midlothian, United Kingdom
| | - Oswald Crasta
- Virginia Bioinformatics Institute, Virginia Tech, Blacksburg, Virginia, United States of America
- Chromatin Inc., Champaign, Illinois, United States of America
| | | | - Kristal Cooper
- Virginia Bioinformatics Institute, Virginia Tech, Blacksburg, Virginia, United States of America
| | - Roger A. Coulombe
- Department of Veterinary Sciences, Utah State University, Logan, Utah, United States of America
| | - Supriyo De
- Gene Expression and Genomics Unit, National Institute on Aging, National Institutes of Health, Baltimore, Maryland, United States of America
| | - Mary E. Delany
- Department of Animal Science, University of California, Davis, California, United States of America
| | - Jerry B. Dodgson
- Department of Microbiology and Molecular Genetics, Michigan State University, East Lansing, Michigan, United States of America
| | - Jennifer J. Dong
- Department of Soil and Crop Sciences, Texas A&M University, College Station, Texas, United States of America
| | - Clive Evans
- Virginia Bioinformatics Institute, Virginia Tech, Blacksburg, Virginia, United States of America
| | | | - Paul Flicek
- European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, United Kingdom
| | - Liliana Florea
- Center for Bioinformatics and Computational Biology, University of Maryland, College Park, Maryland, United States of America
| | - Otto Folkerts
- Virginia Bioinformatics Institute, Virginia Tech, Blacksburg, Virginia, United States of America
- Chromatin Inc., Champaign, Illinois, United States of America
| | - Martien A. M. Groenen
- Animal Breeding and Genomics Centre, Wageningen University, Wageningen, the Netherlands
| | - Tim T. Harkins
- Roche Applied Science, Indianapolis, Indiana, United States of America
| | - Javier Herrero
- European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, United Kingdom
| | - Steve Hoffmann
- Department of Computer Science and Interdisciplinary Center for Bioinformatics, University of Leipzig, Leipzig, Germany
- LIFE Project, University of Leipzig, Leipzig, Germany
| | - Hendrik-Jan Megens
- Animal Breeding and Genomics Centre, Wageningen University, Wageningen, the Netherlands
| | - Andrew Jiang
- Department of Animal Science, University of California, Davis, California, United States of America
| | - Pieter de Jong
- Children's Hospital and Research Center at Oakland, Oakland, California, United States of America
| | - Pete Kaiser
- Institute for Animal Health, Compton, Berkshire, United Kingdom
| | - Heebal Kim
- Laboratory of Bioinformatics and Population Genetics, Department of Agricultural Biotechnology, Seoul National University, Seoul, Korea
| | - Kyu-Won Kim
- Laboratory of Bioinformatics and Population Genetics, Department of Agricultural Biotechnology, Seoul National University, Seoul, Korea
| | - Sungwon Kim
- Avian Immunobiology Laboratory, Department of Animal and Poultry Sciences, Virginia Tech, Blacksburg, Virginia, United States of America
| | - David Langenberger
- Department of Computer Science and Interdisciplinary Center for Bioinformatics, University of Leipzig, Leipzig, Germany
| | - Mi-Kyung Lee
- Department of Soil and Crop Sciences, Texas A&M University, College Station, Texas, United States of America
| | - Taeheon Lee
- Laboratory of Bioinformatics and Population Genetics, Department of Agricultural Biotechnology, Seoul National University, Seoul, Korea
| | - Shrinivasrao Mane
- Virginia Bioinformatics Institute, Virginia Tech, Blacksburg, Virginia, United States of America
| | - Guillaume Marcais
- Institute for Physical Science and Technology, University of Maryland, College Park, Maryland, United States of America
| | - Manja Marz
- Department of Computer Science and Interdisciplinary Center for Bioinformatics, University of Leipzig, Leipzig, Germany
- Philipps-Universität Marburg, Pharmazeutische Chemie, Marburg, Germany
| | - Audrey P. McElroy
- Avian Immunobiology Laboratory, Department of Animal and Poultry Sciences, Virginia Tech, Blacksburg, Virginia, United States of America
| | - Thero Modise
- Virginia Bioinformatics Institute, Virginia Tech, Blacksburg, Virginia, United States of America
| | - Mikhail Nefedov
- Children's Hospital and Research Center at Oakland, Oakland, California, United States of America
| | - Cédric Notredame
- Comparative Bioinformatics, Centre for Genomic Regulation (CRG), Universitat Pompeus Fabre, Barcelona, Spain
| | - Ian R. Paton
- The Roslin Institute and Royal (Dick) School of Veterinary Studies, University of Edinburgh, Roslin, Midlothian, United Kingdom
| | - William S. Payne
- Department of Microbiology and Molecular Genetics, Michigan State University, East Lansing, Michigan, United States of America
| | - Geo Pertea
- Center for Bioinformatics and Computational Biology, University of Maryland, College Park, Maryland, United States of America
| | - Dennis Prickett
- Institute for Animal Health, Compton, Berkshire, United Kingdom
| | - Daniela Puiu
- Center for Bioinformatics and Computational Biology, University of Maryland, College Park, Maryland, United States of America
| | - Dan Qioa
- Department of Computer Science, Virginia Tech, Blacksburg, Virginia, United States of America
| | - Emanuele Raineri
- Comparative Bioinformatics, Centre for Genomic Regulation (CRG), Universitat Pompeus Fabre, Barcelona, Spain
| | - Magali Ruffier
- Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, United Kingdom
| | - Steven L. Salzberg
- Center for Bioinformatics and Computational Biology, Department of Computer Science, University of Maryland, College Park, Maryland, United States of America
| | - Michael C. Schatz
- Center for Bioinformatics and Computational Biology, Department of Computer Science, University of Maryland, College Park, Maryland, United States of America
| | - Chantel Scheuring
- Department of Soil and Crop Sciences, Texas A&M University, College Station, Texas, United States of America
| | - Carl J. Schmidt
- Department of Animal and Food Sciences, University of Delaware, Newark, Delaware, United States of America
| | - Steven Schroeder
- Bovine Functional Genomics Laboratory, USDA Agricultural Research Service, Beltsville Agricultural Research Center, Beltsville, Maryland, United States of America
| | - Stephen M. J. Searle
- Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, United Kingdom
| | - Edward J. Smith
- Avian Immunobiology Laboratory, Department of Animal and Poultry Sciences, Virginia Tech, Blacksburg, Virginia, United States of America
| | - Jacqueline Smith
- The Roslin Institute and Royal (Dick) School of Veterinary Studies, University of Edinburgh, Roslin, Midlothian, United Kingdom
| | - Tad S. Sonstegard
- Bovine Functional Genomics Laboratory, USDA Agricultural Research Service, Beltsville Agricultural Research Center, Beltsville, Maryland, United States of America
| | - Peter F. Stadler
- Department of Computer Science and Interdisciplinary Center for Bioinformatics, University of Leipzig, Leipzig, Germany
- Max Planck Institute for Mathematics in the Sciences, Leipzig, Germany
- Fraunhofer Institut für Zelltherapie und Immunologie, Leipzig, Germany
- Department of Theoretical Chemistry University of Vienna, Vienna, Austria
- Santa Fe Institute, Santa Fe, New Mexico, United States of America
| | - Hakim Tafer
- Department of Computer Science and Interdisciplinary Center for Bioinformatics, University of Leipzig, Leipzig, Germany
- Department of Theoretical Chemistry University of Vienna, Vienna, Austria
| | - Zhijian (Jake) Tu
- Department of Biochemistry, Virginia Tech, Blacksburg, Virginia, United States of America
| | - Curtis P. Van Tassell
- Bovine Functional Genomics Laboratory, USDA Agricultural Research Service, Beltsville Agricultural Research Center, Beltsville, Maryland, United States of America
- Animal Improvement Programs Laboratory, USDA Agricultural Research Service, Beltsville Agricultural Research Center, Beltsville, Maryland, United States of America
| | - Albert J. Vilella
- European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, United Kingdom
| | - Kelly P. Williams
- Virginia Bioinformatics Institute, Virginia Tech, Blacksburg, Virginia, United States of America
| | - James A. Yorke
- Institute for Physical Science and Technology, University of Maryland, College Park, Maryland, United States of America
| | - Liqing Zhang
- Department of Computer Science, Virginia Tech, Blacksburg, Virginia, United States of America
| | - Hong-Bin Zhang
- Department of Soil and Crop Sciences, Texas A&M University, College Station, Texas, United States of America
| | - Xiaojun Zhang
- Department of Soil and Crop Sciences, Texas A&M University, College Station, Texas, United States of America
| | - Yang Zhang
- Department of Soil and Crop Sciences, Texas A&M University, College Station, Texas, United States of America
| | - Kent M. Reed
- Department of Veterinary and Biomedical Sciences, College of Veterinary Medicine, University of Minnesota, St. Paul, Minnesota, United States of America
| |
Collapse
|
17
|
Severin J, Beal K, Vilella AJ, Fitzgerald S, Schuster M, Gordon L, Ureta-Vidal A, Flicek P, Herrero J. eHive: an artificial intelligence workflow system for genomic analysis. BMC Bioinformatics 2010; 11:240. [PMID: 20459813 PMCID: PMC2885371 DOI: 10.1186/1471-2105-11-240] [Citation(s) in RCA: 37] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2009] [Accepted: 05/11/2010] [Indexed: 12/03/2022] Open
Abstract
Background The Ensembl project produces updates to its comparative genomics resources with each of its several releases per year. During each release cycle approximately two weeks are allocated to generate all the genomic alignments and the protein homology predictions. The number of calculations required for this task grows approximately quadratically with the number of species. We currently support 50 species in Ensembl and we expect the number to continue to grow in the future. Results We present eHive, a new fault tolerant distributed processing system initially designed to support comparative genomic analysis, based on blackboard systems, network distributed autonomous agents, dataflow graphs and block-branch diagrams. In the eHive system a MySQL database serves as the central blackboard and the autonomous agent, a Perl script, queries the system and runs jobs as required. The system allows us to define dataflow and branching rules to suit all our production pipelines. We describe the implementation of three pipelines: (1) pairwise whole genome alignments, (2) multiple whole genome alignments and (3) gene trees with protein homology inference. Finally, we show the efficiency of the system in real case scenarios. Conclusions eHive allows us to produce computationally demanding results in a reliable and efficient way with minimal supervision and high throughput. Further documentation is available at: http://www.ensembl.org/info/docs/eHive/.
Collapse
Affiliation(s)
- Jessica Severin
- European Bioinformatics Institute, Wellcome Trust Genome Campus, Cambridge, UK
| | | | | | | | | | | | | | | | | |
Collapse
|
18
|
Warren WC, Clayton DF, Ellegren H, Arnold AP, Hillier LW, Künstner A, Searle S, White S, Vilella AJ, Fairley S, Heger A, Kong L, Ponting CP, Jarvis ED, Mello CV, Minx P, Lovell P, Velho TAF, Ferris M, Balakrishnan CN, Sinha S, Blatti C, London SE, Li Y, Lin YC, George J, Sweedler J, Southey B, Gunaratne P, Watson M, Nam K, Backström N, Smeds L, Nabholz B, Itoh Y, Whitney O, Pfenning AR, Howard J, Völker M, Skinner BM, Griffin DK, Ye L, McLaren WM, Flicek P, Quesada V, Velasco G, Lopez-Otin C, Puente XS, Olender T, Lancet D, Smit AFA, Hubley R, Konkel MK, Walker JA, Batzer MA, Gu W, Pollock DD, Chen L, Cheng Z, Eichler EE, Stapley J, Slate J, Ekblom R, Birkhead T, Burke T, Burt D, Scharff C, Adam I, Richard H, Sultan M, Soldatov A, Lehrach H, Edwards SV, Yang SP, Li X, Graves T, Fulton L, Nelson J, Chinwalla A, Hou S, Mardis ER, Wilson RK. The genome of a songbird. Nature 2010; 464:757-62. [PMID: 20360741 PMCID: PMC3187626 DOI: 10.1038/nature08819] [Citation(s) in RCA: 597] [Impact Index Per Article: 42.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2009] [Accepted: 01/06/2010] [Indexed: 01/16/2023]
Abstract
The zebra finch is an important model organism in several fields with unique relevance to human neuroscience. Like other songbirds, the zebra finch communicates through learned vocalizations, an ability otherwise documented only in humans and a few other animals and lacking in the chicken-the only bird with a sequenced genome until now. Here we present a structural, functional and comparative analysis of the genome sequence of the zebra finch (Taeniopygia guttata), which is a songbird belonging to the large avian order Passeriformes. We find that the overall structures of the genomes are similar in zebra finch and chicken, but they differ in many intrachromosomal rearrangements, lineage-specific gene family expansions, the number of long-terminal-repeat-based retrotransposons, and mechanisms of sex chromosome dosage compensation. We show that song behaviour engages gene regulatory networks in the zebra finch brain, altering the expression of long non-coding RNAs, microRNAs, transcription factors and their targets. We also show evidence for rapid molecular evolution in the songbird lineage of genes that are regulated during song experience. These results indicate an active involvement of the genome in neural processes underlying vocal communication and identify potential genetic substrates for the evolution and regulation of this behaviour.
Collapse
Affiliation(s)
- Wesley C Warren
- The Genome Center, Washington University School of Medicine, Campus Box 8501, 4444 Forest Park Avenue, St Louis, Missouri 63108, USA.
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
19
|
Kersey PJ, Lawson D, Birney E, Derwent PS, Haimel M, Herrero J, Keenan S, Kerhornou A, Koscielny G, Kähäri A, Kinsella RJ, Kulesha E, Maheswari U, Megy K, Nuhn M, Proctor G, Staines D, Valentin F, Vilella AJ, Yates A. Ensembl Genomes: extending Ensembl across the taxonomic space. Nucleic Acids Res 2009; 38:D563-9. [PMID: 19884133 PMCID: PMC2808935 DOI: 10.1093/nar/gkp871] [Citation(s) in RCA: 116] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022] Open
Abstract
Ensembl Genomes (http://www.ensemblgenomes.org) is a new portal offering integrated access to genome-scale data from non-vertebrate species of scientific interest, developed using the Ensembl genome annotation and visualisation platform. Ensembl Genomes consists of five sub-portals (for bacteria, protists, fungi, plants and invertebrate metazoa) designed to complement the availability of vertebrate genomes in Ensembl. Many of the databases supporting the portal have been built in close collaboration with the scientific community, which we consider as essential for maintaining the accuracy and usefulness of the resource. A common set of user interfaces (which include a graphical genome browser, FTP, BLAST search, a query optimised data warehouse, programmatic access, and a Perl API) is provided for all domains. Data types incorporated include annotation of (protein and non-protein coding) genes, cross references to external resources, and high throughput experimental data (e.g. data from large scale studies of gene expression and polymorphism visualised in their genomic context). Additionally, extensive comparative analysis has been performed, both within defined clades and across the wider taxonomy, and sequence alignments and gene trees resulting from this can be accessed through the site.
Collapse
Affiliation(s)
- P J Kersey
- EMBL-European Bioinformatics Institute, Wellcome Trust Genome Campus, Cambridge CB10 1SD, UK.
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
20
|
Abstract
Building momentum to coordinate and leverage community orthology prediction resources. Better orthology-prediction resources would be beneficial for the whole biological community. A recent meeting discussed how to coordinate and leverage current efforts.
Collapse
|
21
|
Vilella AJ, Severin J, Ureta-Vidal A, Heng L, Durbin R, Birney E. EnsemblCompara GeneTrees: Complete, duplication-aware phylogenetic trees in vertebrates. Genome Res 2008; 19:327-35. [PMID: 19029536 DOI: 10.1101/gr.073585.107] [Citation(s) in RCA: 841] [Impact Index Per Article: 52.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/17/2023]
Abstract
We have developed a comprehensive gene orientated phylogenetic resource, EnsemblCompara GeneTrees, based on a computational pipeline to handle clustering, multiple alignment, and tree generation, including the handling of large gene families. We developed two novel non-sequence-based metrics of gene tree correctness and benchmarked a number of tree methods. The TreeBeST method from TreeFam shows the best performance in our hands. We also compared this phylogenetic approach to clustering approaches for ortholog prediction, showing a large increase in coverage using the phylogenetic approach. All data are made available in a number of formats and will be kept up to date with the Ensembl project.
Collapse
Affiliation(s)
- Albert J Vilella
- EMBL-EBI, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, United Kingdom
| | | | | | | | | | | |
Collapse
|
22
|
Ruan J, Li H, Chen Z, Coghlan A, Coin LJM, Guo Y, Hériché JK, Hu Y, Kristiansen K, Li R, Liu T, Moses A, Qin J, Vang S, Vilella AJ, Ureta-Vidal A, Bolund L, Wang J, Durbin R. TreeFam: 2008 Update. Nucleic Acids Res 2007; 36:D735-40. [PMID: 18056084 PMCID: PMC2238856 DOI: 10.1093/nar/gkm1005] [Citation(s) in RCA: 243] [Impact Index Per Article: 14.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022] Open
Abstract
TreeFam (http://www.treefam.org) was developed to provide curated phylogenetic trees for all animal gene families, as well as orthologue and paralogue assignments. Release 4.0 of TreeFam contains curated trees for 1314 families and automatically generated trees for another 14 351 families. We have expanded TreeFam to include 25 fully sequenced animal genomes, as well as four genomes from plant and fungal outgroup species. We have also introduced more accurate approaches for automatically grouping genes into families, for building phylogenetic trees, and for inferring orthologues and paralogues. The user interface for viewing phylogenetic trees and family information has been improved. Furthermore, a new perl API lets users easily extract data from the TreeFam mysql database.
Collapse
Affiliation(s)
- Jue Ruan
- Beijing Institute of Genomics of the Chinese Academy of Sciences, Beijing Genomics Institute, Beijing 101300, China, Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, Department of Epidemiology & Public Health, Imperial College, St Mary's Campus, Norfolk Place, London W2 1PG, UK, Department of Biochemistry and Molecular Biology, University of Southern Denmark, DK-5230 Odense M, Research Unit for Molecular Medicine, Aarhus University Hospital and Faculty of Health Sciences, University of Aarhus, DK-8200 Aarhus N, Denmark, EMBL-European Bioinformatics Institute, Hinxton, Cambridge, UK and Institute of Human Genetics, University of Aarhus, DK-8000 Aarhus C, Denmark
| | - Heng Li
- Beijing Institute of Genomics of the Chinese Academy of Sciences, Beijing Genomics Institute, Beijing 101300, China, Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, Department of Epidemiology & Public Health, Imperial College, St Mary's Campus, Norfolk Place, London W2 1PG, UK, Department of Biochemistry and Molecular Biology, University of Southern Denmark, DK-5230 Odense M, Research Unit for Molecular Medicine, Aarhus University Hospital and Faculty of Health Sciences, University of Aarhus, DK-8200 Aarhus N, Denmark, EMBL-European Bioinformatics Institute, Hinxton, Cambridge, UK and Institute of Human Genetics, University of Aarhus, DK-8000 Aarhus C, Denmark
| | - Zhongzhong Chen
- Beijing Institute of Genomics of the Chinese Academy of Sciences, Beijing Genomics Institute, Beijing 101300, China, Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, Department of Epidemiology & Public Health, Imperial College, St Mary's Campus, Norfolk Place, London W2 1PG, UK, Department of Biochemistry and Molecular Biology, University of Southern Denmark, DK-5230 Odense M, Research Unit for Molecular Medicine, Aarhus University Hospital and Faculty of Health Sciences, University of Aarhus, DK-8200 Aarhus N, Denmark, EMBL-European Bioinformatics Institute, Hinxton, Cambridge, UK and Institute of Human Genetics, University of Aarhus, DK-8000 Aarhus C, Denmark
| | - Avril Coghlan
- Beijing Institute of Genomics of the Chinese Academy of Sciences, Beijing Genomics Institute, Beijing 101300, China, Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, Department of Epidemiology & Public Health, Imperial College, St Mary's Campus, Norfolk Place, London W2 1PG, UK, Department of Biochemistry and Molecular Biology, University of Southern Denmark, DK-5230 Odense M, Research Unit for Molecular Medicine, Aarhus University Hospital and Faculty of Health Sciences, University of Aarhus, DK-8200 Aarhus N, Denmark, EMBL-European Bioinformatics Institute, Hinxton, Cambridge, UK and Institute of Human Genetics, University of Aarhus, DK-8000 Aarhus C, Denmark
| | - Lachlan James M. Coin
- Beijing Institute of Genomics of the Chinese Academy of Sciences, Beijing Genomics Institute, Beijing 101300, China, Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, Department of Epidemiology & Public Health, Imperial College, St Mary's Campus, Norfolk Place, London W2 1PG, UK, Department of Biochemistry and Molecular Biology, University of Southern Denmark, DK-5230 Odense M, Research Unit for Molecular Medicine, Aarhus University Hospital and Faculty of Health Sciences, University of Aarhus, DK-8200 Aarhus N, Denmark, EMBL-European Bioinformatics Institute, Hinxton, Cambridge, UK and Institute of Human Genetics, University of Aarhus, DK-8000 Aarhus C, Denmark
| | - Yiran Guo
- Beijing Institute of Genomics of the Chinese Academy of Sciences, Beijing Genomics Institute, Beijing 101300, China, Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, Department of Epidemiology & Public Health, Imperial College, St Mary's Campus, Norfolk Place, London W2 1PG, UK, Department of Biochemistry and Molecular Biology, University of Southern Denmark, DK-5230 Odense M, Research Unit for Molecular Medicine, Aarhus University Hospital and Faculty of Health Sciences, University of Aarhus, DK-8200 Aarhus N, Denmark, EMBL-European Bioinformatics Institute, Hinxton, Cambridge, UK and Institute of Human Genetics, University of Aarhus, DK-8000 Aarhus C, Denmark
| | - Jean-Karim Hériché
- Beijing Institute of Genomics of the Chinese Academy of Sciences, Beijing Genomics Institute, Beijing 101300, China, Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, Department of Epidemiology & Public Health, Imperial College, St Mary's Campus, Norfolk Place, London W2 1PG, UK, Department of Biochemistry and Molecular Biology, University of Southern Denmark, DK-5230 Odense M, Research Unit for Molecular Medicine, Aarhus University Hospital and Faculty of Health Sciences, University of Aarhus, DK-8200 Aarhus N, Denmark, EMBL-European Bioinformatics Institute, Hinxton, Cambridge, UK and Institute of Human Genetics, University of Aarhus, DK-8000 Aarhus C, Denmark
| | - Yafeng Hu
- Beijing Institute of Genomics of the Chinese Academy of Sciences, Beijing Genomics Institute, Beijing 101300, China, Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, Department of Epidemiology & Public Health, Imperial College, St Mary's Campus, Norfolk Place, London W2 1PG, UK, Department of Biochemistry and Molecular Biology, University of Southern Denmark, DK-5230 Odense M, Research Unit for Molecular Medicine, Aarhus University Hospital and Faculty of Health Sciences, University of Aarhus, DK-8200 Aarhus N, Denmark, EMBL-European Bioinformatics Institute, Hinxton, Cambridge, UK and Institute of Human Genetics, University of Aarhus, DK-8000 Aarhus C, Denmark
| | - Karsten Kristiansen
- Beijing Institute of Genomics of the Chinese Academy of Sciences, Beijing Genomics Institute, Beijing 101300, China, Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, Department of Epidemiology & Public Health, Imperial College, St Mary's Campus, Norfolk Place, London W2 1PG, UK, Department of Biochemistry and Molecular Biology, University of Southern Denmark, DK-5230 Odense M, Research Unit for Molecular Medicine, Aarhus University Hospital and Faculty of Health Sciences, University of Aarhus, DK-8200 Aarhus N, Denmark, EMBL-European Bioinformatics Institute, Hinxton, Cambridge, UK and Institute of Human Genetics, University of Aarhus, DK-8000 Aarhus C, Denmark
| | - Ruiqiang Li
- Beijing Institute of Genomics of the Chinese Academy of Sciences, Beijing Genomics Institute, Beijing 101300, China, Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, Department of Epidemiology & Public Health, Imperial College, St Mary's Campus, Norfolk Place, London W2 1PG, UK, Department of Biochemistry and Molecular Biology, University of Southern Denmark, DK-5230 Odense M, Research Unit for Molecular Medicine, Aarhus University Hospital and Faculty of Health Sciences, University of Aarhus, DK-8200 Aarhus N, Denmark, EMBL-European Bioinformatics Institute, Hinxton, Cambridge, UK and Institute of Human Genetics, University of Aarhus, DK-8000 Aarhus C, Denmark
| | - Tao Liu
- Beijing Institute of Genomics of the Chinese Academy of Sciences, Beijing Genomics Institute, Beijing 101300, China, Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, Department of Epidemiology & Public Health, Imperial College, St Mary's Campus, Norfolk Place, London W2 1PG, UK, Department of Biochemistry and Molecular Biology, University of Southern Denmark, DK-5230 Odense M, Research Unit for Molecular Medicine, Aarhus University Hospital and Faculty of Health Sciences, University of Aarhus, DK-8200 Aarhus N, Denmark, EMBL-European Bioinformatics Institute, Hinxton, Cambridge, UK and Institute of Human Genetics, University of Aarhus, DK-8000 Aarhus C, Denmark
| | - Alan Moses
- Beijing Institute of Genomics of the Chinese Academy of Sciences, Beijing Genomics Institute, Beijing 101300, China, Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, Department of Epidemiology & Public Health, Imperial College, St Mary's Campus, Norfolk Place, London W2 1PG, UK, Department of Biochemistry and Molecular Biology, University of Southern Denmark, DK-5230 Odense M, Research Unit for Molecular Medicine, Aarhus University Hospital and Faculty of Health Sciences, University of Aarhus, DK-8200 Aarhus N, Denmark, EMBL-European Bioinformatics Institute, Hinxton, Cambridge, UK and Institute of Human Genetics, University of Aarhus, DK-8000 Aarhus C, Denmark
| | - Junjie Qin
- Beijing Institute of Genomics of the Chinese Academy of Sciences, Beijing Genomics Institute, Beijing 101300, China, Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, Department of Epidemiology & Public Health, Imperial College, St Mary's Campus, Norfolk Place, London W2 1PG, UK, Department of Biochemistry and Molecular Biology, University of Southern Denmark, DK-5230 Odense M, Research Unit for Molecular Medicine, Aarhus University Hospital and Faculty of Health Sciences, University of Aarhus, DK-8200 Aarhus N, Denmark, EMBL-European Bioinformatics Institute, Hinxton, Cambridge, UK and Institute of Human Genetics, University of Aarhus, DK-8000 Aarhus C, Denmark
| | - Søren Vang
- Beijing Institute of Genomics of the Chinese Academy of Sciences, Beijing Genomics Institute, Beijing 101300, China, Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, Department of Epidemiology & Public Health, Imperial College, St Mary's Campus, Norfolk Place, London W2 1PG, UK, Department of Biochemistry and Molecular Biology, University of Southern Denmark, DK-5230 Odense M, Research Unit for Molecular Medicine, Aarhus University Hospital and Faculty of Health Sciences, University of Aarhus, DK-8200 Aarhus N, Denmark, EMBL-European Bioinformatics Institute, Hinxton, Cambridge, UK and Institute of Human Genetics, University of Aarhus, DK-8000 Aarhus C, Denmark
| | - Albert J. Vilella
- Beijing Institute of Genomics of the Chinese Academy of Sciences, Beijing Genomics Institute, Beijing 101300, China, Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, Department of Epidemiology & Public Health, Imperial College, St Mary's Campus, Norfolk Place, London W2 1PG, UK, Department of Biochemistry and Molecular Biology, University of Southern Denmark, DK-5230 Odense M, Research Unit for Molecular Medicine, Aarhus University Hospital and Faculty of Health Sciences, University of Aarhus, DK-8200 Aarhus N, Denmark, EMBL-European Bioinformatics Institute, Hinxton, Cambridge, UK and Institute of Human Genetics, University of Aarhus, DK-8000 Aarhus C, Denmark
| | - Abel Ureta-Vidal
- Beijing Institute of Genomics of the Chinese Academy of Sciences, Beijing Genomics Institute, Beijing 101300, China, Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, Department of Epidemiology & Public Health, Imperial College, St Mary's Campus, Norfolk Place, London W2 1PG, UK, Department of Biochemistry and Molecular Biology, University of Southern Denmark, DK-5230 Odense M, Research Unit for Molecular Medicine, Aarhus University Hospital and Faculty of Health Sciences, University of Aarhus, DK-8200 Aarhus N, Denmark, EMBL-European Bioinformatics Institute, Hinxton, Cambridge, UK and Institute of Human Genetics, University of Aarhus, DK-8000 Aarhus C, Denmark
| | - Lars Bolund
- Beijing Institute of Genomics of the Chinese Academy of Sciences, Beijing Genomics Institute, Beijing 101300, China, Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, Department of Epidemiology & Public Health, Imperial College, St Mary's Campus, Norfolk Place, London W2 1PG, UK, Department of Biochemistry and Molecular Biology, University of Southern Denmark, DK-5230 Odense M, Research Unit for Molecular Medicine, Aarhus University Hospital and Faculty of Health Sciences, University of Aarhus, DK-8200 Aarhus N, Denmark, EMBL-European Bioinformatics Institute, Hinxton, Cambridge, UK and Institute of Human Genetics, University of Aarhus, DK-8000 Aarhus C, Denmark
| | - Jun Wang
- Beijing Institute of Genomics of the Chinese Academy of Sciences, Beijing Genomics Institute, Beijing 101300, China, Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, Department of Epidemiology & Public Health, Imperial College, St Mary's Campus, Norfolk Place, London W2 1PG, UK, Department of Biochemistry and Molecular Biology, University of Southern Denmark, DK-5230 Odense M, Research Unit for Molecular Medicine, Aarhus University Hospital and Faculty of Health Sciences, University of Aarhus, DK-8200 Aarhus N, Denmark, EMBL-European Bioinformatics Institute, Hinxton, Cambridge, UK and Institute of Human Genetics, University of Aarhus, DK-8000 Aarhus C, Denmark
| | - Richard Durbin
- Beijing Institute of Genomics of the Chinese Academy of Sciences, Beijing Genomics Institute, Beijing 101300, China, Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, Department of Epidemiology & Public Health, Imperial College, St Mary's Campus, Norfolk Place, London W2 1PG, UK, Department of Biochemistry and Molecular Biology, University of Southern Denmark, DK-5230 Odense M, Research Unit for Molecular Medicine, Aarhus University Hospital and Faculty of Health Sciences, University of Aarhus, DK-8200 Aarhus N, Denmark, EMBL-European Bioinformatics Institute, Hinxton, Cambridge, UK and Institute of Human Genetics, University of Aarhus, DK-8000 Aarhus C, Denmark
- *To whom correspondence should be addressed.+44 (0) 1223 834244+44 (0) 1223 494919
| |
Collapse
|
23
|
Flicek P, Aken BL, Beal K, Ballester B, Caccamo M, Chen Y, Clarke L, Coates G, Cunningham F, Cutts T, Down T, Dyer SC, Eyre T, Fitzgerald S, Fernandez-Banet J, Gräf S, Haider S, Hammond M, Holland R, Howe KL, Howe K, Johnson N, Jenkinson A, Kähäri A, Keefe D, Kokocinski F, Kulesha E, Lawson D, Longden I, Megy K, Meidl P, Overduin B, Parker A, Pritchard B, Prlic A, Rice S, Rios D, Schuster M, Sealy I, Slater G, Smedley D, Spudich G, Trevanion S, Vilella AJ, Vogel J, White S, Wood M, Birney E, Cox T, Curwen V, Durbin R, Fernandez-Suarez XM, Herrero J, Hubbard TJP, Kasprzyk A, Proctor G, Smith J, Ureta-Vidal A, Searle S. Ensembl 2008. Nucleic Acids Res 2007; 36:D707-14. [PMID: 18000006 PMCID: PMC2238821 DOI: 10.1093/nar/gkm988] [Citation(s) in RCA: 370] [Impact Index Per Article: 21.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022] Open
Abstract
The Ensembl project (http://www.ensembl.org) is a comprehensive genome information system featuring an integrated set of genome annotation, databases and other information for chordate and selected model organism and disease vector genomes. As of release 47 (October 2007), Ensembl fully supports 35 species, with preliminary support for six additional species. New species in the past year include platypus and horse. Major additions and improvements to Ensembl since our previous report include extensive support for functional genomics data in the form of a specialized functional genomics database, genome-wide maps of protein–DNA interactions and the Ensembl regulatory build; support for customization of the Ensembl web interface through the addition of user accounts and user groups; and increased support for genome resequencing. We have also introduced new comparative genomics-based data mining options and report on the continued development of our software infrastructure.
Collapse
Affiliation(s)
- P Flicek
- European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK.
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
24
|
Abstract
Background DNA sequence polymorphisms analysis can provide valuable information on the evolutionary forces shaping nucleotide variation, and provides an insight into the functional significance of genomic regions. The recent ongoing genome projects will radically improve our capabilities to detect specific genomic regions shaped by natural selection. Current available methods and software, however, are unsatisfactory for such genome-wide analysis. Results We have developed methods for the analysis of DNA sequence polymorphisms at the genome-wide scale. These methods, which have been tested on a coalescent-simulated and actual data files from mouse and human, have been implemented in the VariScan software package version 2.0. Additionally, we have also incorporated a graphical-user interface. The main features of this software are: i) exhaustive population-genetic analyses including those based on the coalescent theory; ii) analysis adapted to the shallow data generated by the high-throughput genome projects; iii) use of genome annotations to conduct a comprehensive analyses separately for different functional regions; iv) identification of relevant genomic regions by the sliding-window and wavelet-multiresolution approaches; v) visualization of the results integrated with current genome annotations in commonly available genome browsers. Conclusion VariScan is a powerful and flexible suite of software for the analysis of DNA polymorphisms. The current version implements new algorithms, methods, and capabilities, providing an important tool for an exhaustive exploratory analysis of genome-wide DNA polymorphism data.
Collapse
Affiliation(s)
- Stephan Hutter
- Departament de Genètica, Facultat de Biologia, Universitat de Barcelona, Diagonal 645, 08028 Barcelona, Spain
- Department Biology II – Evolutionary Biology, University of Munich, Munich, Germany
| | - Albert J Vilella
- Departament de Genètica, Facultat de Biologia, Universitat de Barcelona, Diagonal 645, 08028 Barcelona, Spain
| | - Julio Rozas
- Departament de Genètica, Facultat de Biologia, Universitat de Barcelona, Diagonal 645, 08028 Barcelona, Spain
| |
Collapse
|
25
|
Abstract
SUMMARY VeriScan is a software package for the analysis of DNA sequence polymorphisms at the whole genome scale. Among other features, the software (1) can conduct many population genetic analyses; (2) incorporates a multiresolution wavelet transform-based method that allows capturing relevant information from DNA polymorphism data; (3) facilitates the visualization of the results in the most commonly used genome browsers.
Collapse
Affiliation(s)
- Albert J Vilella
- Departament de Genètica, Facultat de Biologia, Universitat de Barcelona, Diagonal 645, 08028 Barcelona, Spain
| | | | | | | |
Collapse
|