1
|
Alvarez-Jarreta J, Amos B, Aurrecoechea C, Bah S, Barba M, Barreto A, Basenko EY, Belnap R, Blevins A, Böhme U, Brestelli J, Brown S, Callan D, Campbell LI, Christophides GK, Crouch K, Davison HR, DeBarry JD, Demko R, Doherty R, Duan Y, Dundore W, Dyer S, Falke D, Fischer S, Gajria B, Galdi D, Giraldo-Calderón GI, Harb OS, Harper E, Helb D, Howington C, Hu S, Humphrey J, Iodice J, Jones A, Judkins J, Kelly SA, Kissinger JC, Kittur N, Kwon DK, Lamoureux K, Li W, Lodha D, MacCallum RM, Maslen G, McDowell MA, Myers J, Nural MV, Roos DS, Rund SSC, Shanmugasundram A, Sitnik V, Spruill D, Starns D, Tomko SS, Wang H, Warrenfeltz S, Wieck R, Wilkinson PA, Zheng J. VEuPathDB: the eukaryotic pathogen, vector and host bioinformatics resource center in 2023. Nucleic Acids Res 2024; 52:D808-D816. [PMID: 37953350 PMCID: PMC10767879 DOI: 10.1093/nar/gkad1003] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2023] [Revised: 10/09/2023] [Accepted: 10/19/2023] [Indexed: 11/14/2023] Open
Abstract
The Eukaryotic Pathogen, Vector and Host Informatics Resource (VEuPathDB, https://veupathdb.org) is a Bioinformatics Resource Center funded by the National Institutes of Health with additional funding from the Wellcome Trust. VEuPathDB supports >600 organisms that comprise invertebrate vectors, eukaryotic pathogens (protists and fungi) and relevant free-living or non-pathogenic species or hosts. Since 2004, VEuPathDB has analyzed omics data from the public domain using contemporary bioinformatic workflows, including orthology predictions via OrthoMCL, and integrated the analysis results with analysis tools, visualizations, and advanced search capabilities. The unique data mining platform coupled with >3000 pre-analyzed data sets facilitates the exploration of pertinent omics data in support of hypothesis driven research. Comparisons are easily made across data sets, data types and organisms. A Galaxy workspace offers the opportunity for the analysis of private large-scale datasets and for porting to VEuPathDB for comparisons with integrated data. The MapVEu tool provides a platform for exploration of spatially resolved data such as vector surveillance and insecticide resistance monitoring. To address the growing body of omics data and advances in laboratory techniques, VEuPathDB has added several new data types, searches and features, improved the Galaxy workspace environment, redesigned the MapVEu interface and updated the infrastructure to accommodate these changes.
Collapse
Affiliation(s)
| | - Beatrice Amos
- Institute of Systems, Molecular and Integrative Biology, University of Liverpool, Liverpool L69 7ZB, UK
| | | | - Saikou Bah
- School of Infection and Immunity, University of Glasgow, Glasgow, UK
| | | | - Ana Barreto
- University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Evelina Y Basenko
- Institute of Systems, Molecular and Integrative Biology, University of Liverpool, Liverpool L69 7ZB, UK
| | | | - Ann Blevins
- University of Pennsylvania School of Veterinary Medicine, Philadelphia, PA 19104, USA
| | | | | | - Stuart Brown
- University of Pennsylvania, Philadelphia, PA 19104, USA
| | | | | | | | - Kathryn Crouch
- School of Infection and Immunity, University of Glasgow, Glasgow, UK
| | - Helen R Davison
- Institute of Systems, Molecular and Integrative Biology, University of Liverpool, Liverpool L69 7ZB, UK
| | | | - Richard Demko
- University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Ryan Doherty
- University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Yikun Duan
- University of Pennsylvania, Philadelphia, PA 19104, USA
| | | | - Sarah Dyer
- European Bioinformatics Institute, Hinxton CB10 1SD, UK
| | - Dave Falke
- University of Georgia, Athens, GA 30602, USA
| | - Steve Fischer
- University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Bindu Gajria
- University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Daniel Galdi
- University of Pennsylvania, Philadelphia, PA 19104, USA
| | | | - Omar S Harb
- University of Pennsylvania, Philadelphia, PA 19104, USA
| | | | - Danica Helb
- University of Pennsylvania, Philadelphia, PA 19104, USA
| | | | - Sufen Hu
- University of Pennsylvania, Philadelphia, PA 19104, USA
| | | | - John Iodice
- University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Andrew Jones
- Institute of Systems, Molecular and Integrative Biology, University of Liverpool, Liverpool L69 7ZB, UK
| | - John Judkins
- University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Sarah A Kelly
- Imperial College London, South Kensington, London SW7 2BU, UK
| | | | | | - Dae Kun Kwon
- University of Notre Dame, Notre Dame, IN 46556, USA
| | | | - Wei Li
- University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Disha Lodha
- European Bioinformatics Institute, Hinxton CB10 1SD, UK
| | | | - Gareth Maslen
- Imperial College London, South Kensington, London SW7 2BU, UK
| | | | - Jeremy Myers
- University of Pennsylvania, Philadelphia, PA 19104, USA
| | | | - David S Roos
- University of Pennsylvania, Philadelphia, PA 19104, USA
| | | | - Achchuthan Shanmugasundram
- Institute of Systems, Molecular and Integrative Biology, University of Liverpool, Liverpool L69 7ZB, UK
- Genomics England Limited, London E14 5AB, UK
| | - Vasily Sitnik
- European Bioinformatics Institute, Hinxton CB10 1SD, UK
| | | | - David Starns
- Institute of Systems, Molecular and Integrative Biology, University of Liverpool, Liverpool L69 7ZB, UK
| | | | | | | | - Robert Wieck
- University of Notre Dame, Notre Dame, IN 46556, USA
| | - Paul A Wilkinson
- Institute of Systems, Molecular and Integrative Biology, University of Liverpool, Liverpool L69 7ZB, UK
| | - Jie Zheng
- University of Pennsylvania, Philadelphia, PA 19104, USA
| |
Collapse
|
2
|
Amos B, Aurrecoechea C, Barba M, Barreto A, Basenko E, Bażant W, Belnap R, Blevins AS, Böhme U, Brestelli J, Brunk BP, Caddick M, Callan D, Campbell L, Christensen M, Christophides G, Crouch K, Davis K, DeBarry J, Doherty R, Duan Y, Dunn M, Falke D, Fisher S, Flicek P, Fox B, Gajria B, Giraldo-Calderón GI, Harb OS, Harper E, Hertz-Fowler C, Hickman M, Howington C, Hu S, Humphrey J, Iodice J, Jones A, Judkins J, Kelly SA, Kissinger JC, Kwon DK, Lamoureux K, Lawson D, Li W, Lies K, Lodha D, Long J, MacCallum RM, Maslen G, McDowell MA, Nabrzyski J, Roos DS, Rund SC, Schulman S, Shanmugasundram A, Sitnik V, Spruill D, Starns D, Stoeckert C, Tomko SS, Wang H, Warrenfeltz S, Wieck R, Wilkinson PA, Xu L, Zheng J. VEuPathDB: the eukaryotic pathogen, vector and host bioinformatics resource center. Nucleic Acids Res 2022; 50:D898-D911. [PMID: 34718728 PMCID: PMC8728164 DOI: 10.1093/nar/gkab929] [Citation(s) in RCA: 185] [Impact Index Per Article: 92.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2020] [Revised: 09/21/2021] [Accepted: 10/04/2021] [Indexed: 11/13/2022] Open
Abstract
The Eukaryotic Pathogen, Vector and Host Informatics Resource (VEuPathDB, https://veupathdb.org) represents the 2019 merger of VectorBase with the EuPathDB projects. As a Bioinformatics Resource Center funded by the National Institutes of Health, with additional support from the Welllcome Trust, VEuPathDB supports >500 organisms comprising invertebrate vectors, eukaryotic pathogens (protists and fungi) and relevant free-living or non-pathogenic species or hosts. Designed to empower researchers with access to Omics data and bioinformatic analyses, VEuPathDB projects integrate >1700 pre-analysed datasets (and associated metadata) with advanced search capabilities, visualizations, and analysis tools in a graphic interface. Diverse data types are analysed with standardized workflows including an in-house OrthoMCL algorithm for predicting orthology. Comparisons are easily made across datasets, data types and organisms in this unique data mining platform. A new site-wide search facilitates access for both experienced and novice users. Upgraded infrastructure and workflows support numerous updates to the web interface, tools, searches and strategies, and Galaxy workspace where users can privately analyse their own data. Forthcoming upgrades include cloud-ready application architecture, expanded support for the Galaxy workspace, tools for interrogating host-pathogen interactions, and improved interactions with affiliated databases (ClinEpiDB, MicrobiomeDB) and other scientific resources, and increased interoperability with the Bacterial & Viral BRC.
Collapse
Affiliation(s)
- Beatrice Amos
- Institute of Systems, Molecular & Integrative Biology, University of Liverpool, Liverpool L69 7ZB, UK
| | - Cristina Aurrecoechea
- Center for Tropical & Emerging Global Diseases, University of Georgia, Athens, GA 30602, USA
| | - Matthieu Barba
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Ana Barreto
- Institute for Biomedical Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
- Department of Genetics, School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Evelina Y Basenko
- Institute of Systems, Molecular & Integrative Biology, University of Liverpool, Liverpool L69 7ZB, UK
| | - Wojciech Bażant
- Wellcome Centre for Integrative Parasitology, University of Glasgow, Glasgow G12 8TA, UK
| | - Robert Belnap
- Center for Tropical & Emerging Global Diseases, University of Georgia, Athens, GA 30602, USA
| | - Ann S Blevins
- Department of Pathology, School of Veterinary Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Ulrike Böhme
- Institute of Systems, Molecular & Integrative Biology, University of Liverpool, Liverpool L69 7ZB, UK
| | - John Brestelli
- Institute for Biomedical Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
- Department of Genetics, School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Brian P Brunk
- Department of Biology, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Mark Caddick
- Institute of Systems, Molecular & Integrative Biology, University of Liverpool, Liverpool L69 7ZB, UK
| | - Danielle Callan
- Institute for Biomedical Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
- Department of Genetics, School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Lahcen Campbell
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Mikkel B Christensen
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - George K Christophides
- Department of Life Sciences, Imperial College London, South Kensington Campus, London SW7 2AZ, UK
| | - Kathryn Crouch
- Wellcome Centre for Integrative Parasitology, University of Glasgow, Glasgow G12 8TA, UK
| | - Kristina Davis
- Center for Research Computing, University of Notre Dame, Notre Dame, IN 46556, USA
| | - Jeremy DeBarry
- Center for Tropical & Emerging Global Diseases, University of Georgia, Athens, GA 30602, USA
| | - Ryan Doherty
- Institute for Biomedical Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
- Department of Biology, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Yikun Duan
- Institute for Biomedical Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
- Department of Biology, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Michael Dunn
- Center for Research Computing, University of Notre Dame, Notre Dame, IN 46556, USA
| | - Dave Falke
- Center for Tropical & Emerging Global Diseases, University of Georgia, Athens, GA 30602, USA
| | - Steve Fisher
- Institute for Biomedical Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
- Department of Genetics, School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Paul Flicek
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Brett Fox
- Center for Research Computing, University of Notre Dame, Notre Dame, IN 46556, USA
| | - Bindu Gajria
- Department of Biology, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Gloria I Giraldo-Calderón
- Department of Biological Sciences, Eck Institute for Global Health, University of Notre Dame, Notre Dame, IN 46556, USA
- Departamento de Ciencias Biológicas y Departamento de Ciencias Básicas Médicas, Universidad Icesi, Calle 18 No. 122-135, Cali, Colombia
| | - Omar S Harb
- Department of Biology, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Elizabeth Harper
- Department of Biology, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Christiane Hertz-Fowler
- Institute of Systems, Molecular & Integrative Biology, University of Liverpool, Liverpool L69 7ZB, UK
| | - Mark J Hickman
- Department of Biology, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Connor Howington
- Center for Research Computing, University of Notre Dame, Notre Dame, IN 46556, USA
| | - Sufen Hu
- Institute for Biomedical Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
- Department of Biology, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Jay Humphrey
- Center for Tropical & Emerging Global Diseases, University of Georgia, Athens, GA 30602, USA
| | - John Iodice
- Institute for Biomedical Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
- Department of Genetics, School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Andrew Jones
- Institute of Systems, Molecular & Integrative Biology, University of Liverpool, Liverpool L69 7ZB, UK
| | - John Judkins
- Institute for Biomedical Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
- Department of Biology, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Sarah A Kelly
- Department of Life Sciences, Imperial College London, South Kensington Campus, London SW7 2AZ, UK
| | - Jessica C Kissinger
- Center for Tropical & Emerging Global Diseases, University of Georgia, Athens, GA 30602, USA
- Department of Genetics, University of Georgia, Athens, GA 30602, USA
- Institute of Bioinformatics, University of Georgia, Athens, GA 30602, USA
| | - Dae Kun Kwon
- Department of Civil & Environmental Engineering & Earth Sciences, University of Notre Dame, Notre Dame, IN 46556, USA
| | - Kristopher Lamoureux
- Center for Tropical & Emerging Global Diseases, University of Georgia, Athens, GA 30602, USA
| | - Daniel Lawson
- Department of Life Sciences, Imperial College London, South Kensington Campus, London SW7 2AZ, UK
| | - Wei Li
- Institute for Biomedical Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
- Department of Biology, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Kallie Lies
- Center for Research Computing, University of Notre Dame, Notre Dame, IN 46556, USA
| | - Disha Lodha
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Jamie Long
- Department of Biology, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Robert M MacCallum
- Department of Life Sciences, Imperial College London, South Kensington Campus, London SW7 2AZ, UK
| | - Gareth Maslen
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Mary Ann McDowell
- Department of Biological Sciences, Eck Institute for Global Health, University of Notre Dame, Notre Dame, IN 46556, USA
| | - Jaroslaw Nabrzyski
- Center for Research Computing, University of Notre Dame, Notre Dame, IN 46556, USA
| | - David S Roos
- Department of Biology, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Samuel S C Rund
- Department of Biological Sciences, Eck Institute for Global Health, University of Notre Dame, Notre Dame, IN 46556, USA
| | | | | | - Vasily Sitnik
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Drew Spruill
- Center for Tropical & Emerging Global Diseases, University of Georgia, Athens, GA 30602, USA
| | - David Starns
- Institute of Systems, Molecular & Integrative Biology, University of Liverpool, Liverpool L69 7ZB, UK
| | - Christian J Stoeckert
- Institute for Biomedical Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
- Department of Genetics, School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Sheena Shah Tomko
- Department of Biology, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Haiming Wang
- Center for Tropical & Emerging Global Diseases, University of Georgia, Athens, GA 30602, USA
| | - Susanne Warrenfeltz
- Center for Tropical & Emerging Global Diseases, University of Georgia, Athens, GA 30602, USA
| | - Robert Wieck
- Center for Research Computing, University of Notre Dame, Notre Dame, IN 46556, USA
| | - Paul A Wilkinson
- Institute of Systems, Molecular & Integrative Biology, University of Liverpool, Liverpool L69 7ZB, UK
| | - Lin Xu
- Institute for Biomedical Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
- Department of Biology, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Jie Zheng
- Institute for Biomedical Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
- Department of Genetics, School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
| |
Collapse
|
3
|
Campos M, Rona LDP, Willis K, Christophides GK, MacCallum RM. Unravelling population structure heterogeneity within the genome of the malaria vector Anopheles gambiae. BMC Genomics 2021; 22:422. [PMID: 34103015 PMCID: PMC8185951 DOI: 10.1186/s12864-021-07722-y] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2020] [Accepted: 05/18/2021] [Indexed: 12/26/2022] Open
Abstract
Background Whole genome re-sequencing provides powerful data for population genomic studies, allowing robust inferences of population structure, gene flow and evolutionary history. For the major malaria vector in Africa, Anopheles gambiae, other genetic aspects such as selection and adaptation are also important. In the present study, we explore population genetic variation from genome-wide sequencing of 765 An. gambiae and An. coluzzii specimens collected from across Africa. We used t-SNE, a recently popularized dimensionality reduction method, to create a 2D-map of An. gambiae and An. coluzzii genes that reflect their population structure similarities. Results The map allows intuitive navigation among genes distributed throughout the so-called “mainland” and numerous surrounding “island-like” gene clusters. These gene clusters of various sizes correspond predominantly to low recombination genomic regions such as inversions and centromeres, and also to recent selective sweeps. Because this mosquito species complex has been studied extensively, we were able to support our interpretations with previously published findings. Several novel observations and hypotheses are also made, including selective sweeps and a multi-locus selection event in Guinea-Bissau, a known intense hybridization zone between An. gambiae and An. coluzzii. Conclusions Our results present a rich dataset that could be utilized in functional investigations aiming to shed light onto An. gambiae s.l genome evolution and eventual speciation. In addition, the methodology presented here can be used to further characterize other species not so well studied as An. gambiae, shortening the time required to progress from field sampling to the identification of genes and genomic regions under unique evolutionary processes. Supplementary Information The online version contains supplementary material available at 10.1186/s12864-021-07722-y.
Collapse
Affiliation(s)
- Melina Campos
- Department of Life Sciences, Imperial College London, London, UK
| | - Luisa D P Rona
- Department of Life Sciences, Imperial College London, London, UK.,Department of Cell Biology, Embryology and Genetics, Federal University of Santa Catarina (UFSC), Florianópolis, Brazil.,National Institute of Science and Technology in Molecular Entomology, National Council for Scientific and Technological Development (INCT-EM, CNPq), Rio de Janeiro, Brazil
| | - Katie Willis
- Department of Life Sciences, Imperial College London, London, UK
| | | | | |
Collapse
|
4
|
Rund SSC, Braak K, Cator L, Copas K, Emrich SJ, Giraldo-Calderón GI, Johansson MA, Heydari N, Hobern D, Kelly SA, Lawson D, Lord C, MacCallum RM, Roche DG, Ryan SJ, Schigel D, Vandegrift K, Watts M, Zaspel JM, Pawar S. MIReAD, a minimum information standard for reporting arthropod abundance data. Sci Data 2019; 6:40. [PMID: 31024009 PMCID: PMC6484025 DOI: 10.1038/s41597-019-0042-5] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2018] [Accepted: 03/20/2019] [Indexed: 11/29/2022] Open
Abstract
Arthropods play a dominant role in natural and human-modified terrestrial ecosystem dynamics. Spatially-explicit arthropod population time-series data are crucial for statistical or mathematical models of these dynamics and assessment of their veterinary, medical, agricultural, and ecological impacts. Such data have been collected world-wide for over a century, but remain scattered and largely inaccessible. In particular, with the ever-present and growing threat of arthropod pests and vectors of infectious diseases, there are numerous historical and ongoing surveillance efforts, but the data are not reported in consistent formats and typically lack sufficient metadata to make reuse and re-analysis possible. Here, we present the first-ever minimum information standard for arthropod abundance, Minimum Information for Reusable Arthropod Abundance Data (MIReAD). Developed with broad stakeholder collaboration, it balances sufficiency for reuse with the practicality of preparing the data for submission. It is designed to optimize data (re)usability from the "FAIR," (Findable, Accessible, Interoperable, and Reusable) principles of public data archiving (PDA). This standard will facilitate data unification across research initiatives and communities dedicated to surveillance for detection and control of vector-borne diseases and pests.
Collapse
Affiliation(s)
- Samuel S C Rund
- VectorBase, Department of Biological Sciences, University of Notre Dame, Notre Dame, IN, USA.
| | - Kyle Braak
- Global Biodiversity Information Facility (GBIF) Secretariat, Copenhagen, Denmark
| | - Lauren Cator
- Department of Life Sciences, Imperial College London, Silwood Park Campus, Buckhurst Road, Ascot, Berkshire, United Kingdom
| | - Kyle Copas
- Global Biodiversity Information Facility (GBIF) Secretariat, Copenhagen, Denmark
| | - Scott J Emrich
- Department of Electrical Engineering and Computer Science, University of Tennessee, Knoxville, TN, USA
| | - Gloria I Giraldo-Calderón
- VectorBase, Department of Biological Sciences, University of Notre Dame, Notre Dame, IN, USA
- Universidad Icesi, Facultad de Ciencias Naturales, Calle 18 No. 122-135, Cali, Colombia
| | - Michael A Johansson
- Division of Vector-Borne Diseases, Centers for Disease Control and Prevention, 1324 Calle Cañada, San Juan, PR, USA
- Department of Epidemiology, Harvard School of Public Health, 677 Huntington Ave, Boston, MA, USA
| | - Naveed Heydari
- Center for Global Health and Translational Science, State University of New York Upstate Medical University, Syracuse, NY, USA
| | - Donald Hobern
- Global Biodiversity Information Facility (GBIF) Secretariat, Copenhagen, Denmark
| | - Sarah A Kelly
- VectorBase and Vector Immunogenomics and Infection Laboratory, Department of Life Sciences, Imperial College London, London, United Kingdom
| | - Daniel Lawson
- VectorBase and Vector Immunogenomics and Infection Laboratory, Department of Life Sciences, Imperial College London, London, United Kingdom
| | - Cynthia Lord
- Florida Medical Entomology Lab, University of Florida-IFAS, Vero Beach, FL, USA
| | - Robert M MacCallum
- VectorBase and Vector Immunogenomics and Infection Laboratory, Department of Life Sciences, Imperial College London, London, United Kingdom
| | - Dominique G Roche
- Institute of Biology, University of Neuchâtel, 2000, Neuchâtel, Switzerland
| | - Sadie J Ryan
- Quantitative Disease Ecology and Conservation Lab, Department of Geography, University of Florida, Gainesville, FL, USA
- Emerging Pathogens Institute, University of Florida, Gainesville, FL, USA
- College of Life Sciences, University of Kwa-Zulu Natal, Durban, South Africa
| | - Dmitry Schigel
- Global Biodiversity Information Facility (GBIF) Secretariat, Copenhagen, Denmark
| | - Kurt Vandegrift
- Center for Infectious Disease Dynamics, Department of Biology, The Pennsylvania State University, University Park, PA, USA
| | - Matthew Watts
- Department of Life Sciences, Imperial College London, Silwood Park Campus, Buckhurst Road, Ascot, Berkshire, United Kingdom
| | | | - Samraat Pawar
- Department of Life Sciences, Imperial College London, Silwood Park Campus, Buckhurst Road, Ascot, Berkshire, United Kingdom
| |
Collapse
|
5
|
Mauch M, MacCallum RM, Levy M, Leroi AM. The evolution of popular music: USA 1960-2010. R Soc Open Sci 2015; 2:150081. [PMID: 26064663 PMCID: PMC4453253 DOI: 10.1098/rsos.150081] [Citation(s) in RCA: 30] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/17/2015] [Accepted: 04/09/2015] [Indexed: 05/19/2023]
Abstract
In modern societies, cultural change seems ceaseless. The flux of fashion is especially obvious for popular music. While much has been written about the origin and evolution of pop, most claims about its history are anecdotal rather than scientific in nature. To rectify this, we investigate the US Billboard Hot 100 between 1960 and 2010. Using music information retrieval and text-mining tools, we analyse the musical properties of approximately 17 000 recordings that appeared in the charts and demonstrate quantitative trends in their harmonic and timbral properties. We then use these properties to produce an audio-based classification of musical styles and study the evolution of musical diversity and disparity, testing, and rejecting, several classical theories of cultural change. Finally, we investigate whether pop musical evolution has been gradual or punctuated. We show that, although pop music has evolved continuously, it did so with particular rapidity during three stylistic 'revolutions' around 1964, 1983 and 1991. We conclude by discussing how our study points the way to a quantitative science of cultural change.
Collapse
Affiliation(s)
- Matthias Mauch
- School of Electronic Engineering and Computer Science, Queen Mary University of London, London E1 4NS, UK
| | | | - Mark Levy
- Last.fm, 5-11 Lavingdon Street, London SE1 0NZ, UK
| | - Armand M. Leroi
- Division of Life Sciences, Imperial College London, London SW7 2AZ, UK
| |
Collapse
|
6
|
Giraldo-Calderón GI, Emrich SJ, MacCallum RM, Maslen G, Dialynas E, Topalis P, Ho N, Gesing S, Madey G, Collins FH, Lawson D. VectorBase: an updated bioinformatics resource for invertebrate vectors and other organisms related with human diseases. Nucleic Acids Res 2014; 43:D707-13. [PMID: 25510499 PMCID: PMC4383932 DOI: 10.1093/nar/gku1117] [Citation(s) in RCA: 433] [Impact Index Per Article: 43.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022] Open
Abstract
VectorBase is a National Institute of Allergy and Infectious Diseases supported Bioinformatics Resource Center (BRC) for invertebrate vectors of human pathogens. Now in its 11th year, VectorBase currently hosts the genomes of 35 organisms including a number of non-vectors for comparative analysis. Hosted data range from genome assemblies with annotated gene features, transcript and protein expression data to population genetics including variation and insecticide-resistance phenotypes. Here we describe improvements to our resource and the set of tools available for interrogating and accessing BRC data including the integration of Web Apollo to facilitate community annotation and providing Galaxy to support user-based workflows. VectorBase also actively supports our community through hands-on workshops and online tutorials. All information and data are freely available from our website at https://www.vectorbase.org/.
Collapse
Affiliation(s)
| | - Scott J Emrich
- Department of Computer Science and Engineering, University of Notre Dame, Notre Dame, IN 46556, USA ECK Institute for Global Health, University of Notre Dame, Notre Dame, IN 46556, USA
| | - Robert M MacCallum
- Department of Life Sciences, Imperial College London, South Kensington Campus, London SW7 2AZ, UK
| | - Gareth Maslen
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Emmanuel Dialynas
- Institute of Molecular Biology and Biotechnology (IMBB), FORTH, Vassilika Vouton,Nikolaou Plastira 100, 70013 Heraklion, Crete, Greece
| | - Pantelis Topalis
- Institute of Molecular Biology and Biotechnology (IMBB), FORTH, Vassilika Vouton,Nikolaou Plastira 100, 70013 Heraklion, Crete, Greece
| | - Nicholas Ho
- Department of Life Sciences, Imperial College London, South Kensington Campus, London SW7 2AZ, UK
| | - Sandra Gesing
- Center for Research Computing, University of Notre Dame, Notre Dame, IN 46556, USA
| | | | - Gregory Madey
- Department of Computer Science and Engineering, University of Notre Dame, Notre Dame, IN 46556, USA ECK Institute for Global Health, University of Notre Dame, Notre Dame, IN 46556, USA
| | - Frank H Collins
- Department of Biological Sciences, University of Notre Dame, Notre Dame, IN 46556, USA ECK Institute for Global Health, University of Notre Dame, Notre Dame, IN 46556, USA
| | - Daniel Lawson
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| |
Collapse
|
7
|
Neafsey DE, Waterhouse RM, Abai MR, Aganezov SS, Alekseyev MA, Allen JE, Amon J, Arcà B, Arensburger P, Artemov G, Assour LA, Basseri H, Berlin A, Birren BW, Blandin SA, Brockman AI, Burkot TR, Burt A, Chan CS, Chauve C, Chiu JC, Christensen M, Costantini C, Davidson VLM, Deligianni E, Dottorini T, Dritsou V, Gabriel SB, Guelbeogo WM, Hall AB, Han MV, Hlaing T, Hughes DST, Jenkins AM, Jiang X, Jungreis I, Kakani EG, Kamali M, Kemppainen P, Kennedy RC, Kirmitzoglou IK, Koekemoer LL, Laban N, Langridge N, Lawniczak MKN, Lirakis M, Lobo NF, Lowy E, MacCallum RM, Mao C, Maslen G, Mbogo C, McCarthy J, Michel K, Mitchell SN, Moore W, Murphy KA, Naumenko AN, Nolan T, Novoa EM, O'Loughlin S, Oringanje C, Oshaghi MA, Pakpour N, Papathanos PA, Peery AN, Povelones M, Prakash A, Price DP, Rajaraman A, Reimer LJ, Rinker DC, Rokas A, Russell TL, Sagnon N, Sharakhova MV, Shea T, Simão FA, Simard F, Slotman MA, Somboon P, Stegniy V, Struchiner CJ, Thomas GWC, Tojo M, Topalis P, Tubio JMC, Unger MF, Vontas J, Walton C, Wilding CS, Willis JH, Wu YC, Yan G, Zdobnov EM, Zhou X, Catteruccia F, Christophides GK, Collins FH, Cornman RS, Crisanti A, Donnelly MJ, Emrich SJ, Fontaine MC, Gelbart W, Hahn MW, Hansen IA, Howell PI, Kafatos FC, Kellis M, Lawson D, Louis C, Luckhart S, Muskavitch MAT, Ribeiro JM, Riehle MA, Sharakhov IV, Tu Z, Zwiebel LJ, Besansky NJ. Mosquito genomics. Highly evolvable malaria vectors: the genomes of 16 Anopheles mosquitoes. Science 2014; 347:1258522. [PMID: 25554792 DOI: 10.1126/science.1258522] [Citation(s) in RCA: 416] [Impact Index Per Article: 41.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/02/2022]
Abstract
Variation in vectorial capacity for human malaria among Anopheles mosquito species is determined by many factors, including behavior, immunity, and life history. To investigate the genomic basis of vectorial capacity and explore new avenues for vector control, we sequenced the genomes of 16 anopheline mosquito species from diverse locations spanning ~100 million years of evolution. Comparative analyses show faster rates of gene gain and loss, elevated gene shuffling on the X chromosome, and more intron losses, relative to Drosophila. Some determinants of vectorial capacity, such as chemosensory genes, do not show elevated turnover but instead diversify through protein-sequence changes. This dynamism of anopheline genes and genomes may contribute to their flexible capacity to take advantage of new ecological niches, including adapting to humans as primary hosts.
Collapse
Affiliation(s)
- Daniel E Neafsey
- Genome Sequencing and Analysis Program, Broad Institute, 415 Main Street, Cambridge, MA 02142, USA.
| | - Robert M Waterhouse
- Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, 32 Vassar Street, Cambridge, MA 02139, USA. The Broad Institute of Massachusetts Institute of Technology and Harvard, 415 Main Street, Cambridge, MA 02142, USA. Department of Genetic Medicine and Development, University of Geneva Medical School, Rue Michel-Servet 1, 1211 Geneva, Switzerland. Swiss Institute of Bioinformatics, Rue Michel-Servet 1, 1211 Geneva, Switzerland
| | - Mohammad R Abai
- Department of Medical Entomology and Vector Control, School of Public Health and Institute of Health Researches, Tehran University of Medical Sciences, Tehran, Iran
| | - Sergey S Aganezov
- George Washington University, Department of Mathematics and Computational Biology Institute, 45085 University Drive, Ashburn, VA 20147, USA
| | - Max A Alekseyev
- George Washington University, Department of Mathematics and Computational Biology Institute, 45085 University Drive, Ashburn, VA 20147, USA
| | - James E Allen
- European Molecular Biology Laboratory, European Bioinformatics Institute, EMBL-EBI, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - James Amon
- National Vector Borne Disease Control Programme, Ministry of Health, Tafea Province, Vanuatu
| | - Bruno Arcà
- Department of Public Health and Infectious Diseases, Division of Parasitology, Sapienza University of Rome, Piazzale Aldo Moro 5, 00185 Rome, Italy
| | - Peter Arensburger
- Department of Biological Sciences, California State Polytechnic-Pomona, 3801 West Temple Avenue, Pomona, CA 91768, USA
| | - Gleb Artemov
- Tomsk State University, 36 Lenina Avenue, Tomsk, Russia
| | - Lauren A Assour
- Department of Computer Science and Engineering, Eck Institute for Global Health, 211B Cushing Hall, University of Notre Dame, Notre Dame, IN 46556, USA
| | - Hamidreza Basseri
- Department of Medical Entomology and Vector Control, School of Public Health and Institute of Health Researches, Tehran University of Medical Sciences, Tehran, Iran
| | - Aaron Berlin
- Genome Sequencing and Analysis Program, Broad Institute, 415 Main Street, Cambridge, MA 02142, USA
| | - Bruce W Birren
- Genome Sequencing and Analysis Program, Broad Institute, 415 Main Street, Cambridge, MA 02142, USA
| | - Stephanie A Blandin
- Inserm, U963, F-67084 Strasbourg, France. CNRS, UPR9022, IBMC, F-67084 Strasbourg, France
| | - Andrew I Brockman
- Department of Life Sciences, Imperial College London, South Kensington Campus, London SW7 2AZ, UK
| | - Thomas R Burkot
- Faculty of Medicine, Health and Molecular Science, Australian Institute of Tropical Health Medicine, James Cook University, Cairns 4870, Australia
| | - Austin Burt
- Department of Life Sciences, Imperial College London, Silwood Park Campus, Ascot SL5 7PY, UK
| | - Clara S Chan
- Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, 32 Vassar Street, Cambridge, MA 02139, USA. The Broad Institute of Massachusetts Institute of Technology and Harvard, 415 Main Street, Cambridge, MA 02142, USA
| | - Cedric Chauve
- Department of Mathematics, Simon Fraser University, 8888 University Drive, Burnaby, BC V5A 1S6, Canada
| | - Joanna C Chiu
- Department of Entomology and Nematology, One Shields Avenue, University of California-Davis, Davis, CA 95616, USA
| | - Mikkel Christensen
- European Molecular Biology Laboratory, European Bioinformatics Institute, EMBL-EBI, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Carlo Costantini
- Institut de Recherche pour le Développement, Unités Mixtes de Recherche Maladies Infectieuses et Vecteurs Écologie, Génétique, Évolution et Contrôle, 911, Avenue Agropolis, BP 64501 Montpellier, France
| | - Victoria L M Davidson
- Division of Biology, Kansas State University, 271 Chalmers Hall, Manhattan, KS 66506, USA
| | - Elena Deligianni
- Institute of Molecular Biology and Biotechnology, Foundation for Research and Technology, Hellas, Nikolaou Plastira 100 GR-70013, Heraklion, Crete, Greece
| | - Tania Dottorini
- Department of Life Sciences, Imperial College London, South Kensington Campus, London SW7 2AZ, UK
| | - Vicky Dritsou
- Centre of Functional Genomics, University of Perugia, Perugia, Italy
| | - Stacey B Gabriel
- Genomics Platform, Broad Institute, 415 Main Street, Cambridge, MA 02142, USA
| | - Wamdaogo M Guelbeogo
- Centre National de Recherche et de Formation sur le Paludisme, Ouagadougou 01 BP 2208, Burkina Faso
| | - Andrew B Hall
- Program of Genetics, Bioinformatics, and Computational Biology, Virginia Polytechnic Institute and State University, Blacksburg, VA 24061, USA
| | - Mira V Han
- School of Life Sciences, University of Nevada, Las Vegas, NV 89154, USA
| | - Thaung Hlaing
- Department of Medical Research, No. 5 Ziwaka Road, Dagon Township, Yangon 11191, Myanmar
| | - Daniel S T Hughes
- European Molecular Biology Laboratory, European Bioinformatics Institute, EMBL-EBI, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK. Baylor College of Medicine, 1 Baylor Plaza, Houston, TX 77030, USA
| | - Adam M Jenkins
- Boston College, 140 Commonwealth Avenue, Chestnut Hill, MA 02467, USA
| | - Xiaofang Jiang
- Department of Biochemistry, Virginia Polytechnic Institute and State University, Blacksburg, VA 24061, USA. Program of Genetics, Bioinformatics, and Computational Biology, Virginia Polytechnic Institute and State University, Blacksburg, VA 24061, USA
| | - Irwin Jungreis
- Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, 32 Vassar Street, Cambridge, MA 02139, USA. The Broad Institute of Massachusetts Institute of Technology and Harvard, 415 Main Street, Cambridge, MA 02142, USA
| | - Evdoxia G Kakani
- Harvard School of Public Health, Department of Immunology and Infectious Diseases, Boston, MA 02115, USA. Dipartimento di Medicina Sperimentale e Scienze Biochimiche, Università degli Studi di Perugia, Perugia, Italy
| | - Maryam Kamali
- Department of Entomology, Virginia Polytechnic Institute and State University, Blacksburg, VA 24061, USA
| | - Petri Kemppainen
- Computational Evolutionary Biology Group, Faculty of Life Sciences, University of Manchester, Oxford Road, Manchester M13 9PT, UK
| | - Ryan C Kennedy
- Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, CA 94143, USA
| | - Ioannis K Kirmitzoglou
- Department of Life Sciences, Imperial College London, South Kensington Campus, London SW7 2AZ, UK. Bioinformatics Research Laboratory, Department of Biological Sciences, New Campus, University of Cyprus, CY 1678 Nicosia, Cyprus
| | - Lizette L Koekemoer
- Wits Research Institute for Malaria, Faculty of Health Sciences, and Vector Control Reference Unit, National Institute for Communicable Diseases of the National Health Laboratory Service, Sandringham 2131, Johannesburg, South Africa
| | - Njoroge Laban
- National Museums of Kenya, P.O. Box 40658-00100, Nairobi, Kenya
| | - Nicholas Langridge
- European Molecular Biology Laboratory, European Bioinformatics Institute, EMBL-EBI, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Mara K N Lawniczak
- Department of Life Sciences, Imperial College London, South Kensington Campus, London SW7 2AZ, UK
| | - Manolis Lirakis
- Department of Biology, University of Crete, 700 13 Heraklion, Greece
| | - Neil F Lobo
- Eck Institute for Global Health and Department of Biological Sciences, University of Notre Dame, 317 Galvin Life Sciences Building, Notre Dame, IN 46556, USA
| | - Ernesto Lowy
- European Molecular Biology Laboratory, European Bioinformatics Institute, EMBL-EBI, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Robert M MacCallum
- Department of Life Sciences, Imperial College London, South Kensington Campus, London SW7 2AZ, UK
| | - Chunhong Mao
- Virginia Bioinformatics Institute, 1015 Life Science Circle, Virginia Polytechnic Institute and State University, Blacksburg, VA 24061, USA
| | - Gareth Maslen
- European Molecular Biology Laboratory, European Bioinformatics Institute, EMBL-EBI, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Charles Mbogo
- Kenya Medical Research Institute-Wellcome Trust Research Programme, Centre for Geographic Medicine Research - Coast, P.O. Box 230-80108, Kilifi, Kenya
| | - Jenny McCarthy
- Department of Biological Sciences, California State Polytechnic-Pomona, 3801 West Temple Avenue, Pomona, CA 91768, USA
| | - Kristin Michel
- Division of Biology, Kansas State University, 271 Chalmers Hall, Manhattan, KS 66506, USA
| | - Sara N Mitchell
- Harvard School of Public Health, Department of Immunology and Infectious Diseases, Boston, MA 02115, USA
| | - Wendy Moore
- Department of Entomology, 1140 East South Campus Drive, Forbes 410, University of Arizona, Tucson, AZ 85721, USA
| | - Katherine A Murphy
- Department of Entomology and Nematology, One Shields Avenue, University of California-Davis, Davis, CA 95616, USA
| | - Anastasia N Naumenko
- Department of Entomology, Virginia Polytechnic Institute and State University, Blacksburg, VA 24061, USA
| | - Tony Nolan
- Department of Life Sciences, Imperial College London, South Kensington Campus, London SW7 2AZ, UK
| | - Eva M Novoa
- Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, 32 Vassar Street, Cambridge, MA 02139, USA. The Broad Institute of Massachusetts Institute of Technology and Harvard, 415 Main Street, Cambridge, MA 02142, USA
| | - Samantha O'Loughlin
- Department of Life Sciences, Imperial College London, Silwood Park Campus, Ascot SL5 7PY, UK
| | - Chioma Oringanje
- Department of Entomology, 1140 East South Campus Drive, Forbes 410, University of Arizona, Tucson, AZ 85721, USA
| | - Mohammad A Oshaghi
- Department of Medical Entomology and Vector Control, School of Public Health and Institute of Health Researches, Tehran University of Medical Sciences, Tehran, Iran
| | - Nazzy Pakpour
- Department of Medical Microbiology and Immunology, School of Medicine, University of California Davis, One Shields Avenue, Davis, CA 95616, USA
| | - Philippos A Papathanos
- Department of Life Sciences, Imperial College London, South Kensington Campus, London SW7 2AZ, UK. Centre of Functional Genomics, University of Perugia, Perugia, Italy
| | - Ashley N Peery
- Department of Entomology, Virginia Polytechnic Institute and State University, Blacksburg, VA 24061, USA
| | - Michael Povelones
- Department of Pathobiology, University of Pennsylvania School of Veterinary Medicine, 3800 Spruce Street, Philadelphia, PA 19104, USA
| | - Anil Prakash
- Regional Medical Research Centre NE, Indian Council of Medical Research, P.O. Box 105, Dibrugarh-786 001, Assam, India
| | - David P Price
- Department of Biology, New Mexico State University, Las Cruces, NM 88003, USA. Molecular Biology Program, New Mexico State University, Las Cruces, NM 88003, USA
| | - Ashok Rajaraman
- Department of Mathematics, Simon Fraser University, 8888 University Drive, Burnaby, BC V5A 1S6, Canada
| | - Lisa J Reimer
- Department of Vector Biology, Liverpool School of Tropical Medicine, Pembroke Place, Liverpool, L3 5QA, UK
| | - David C Rinker
- Center for Human Genetics Research, Vanderbilt University Medical Center, Nashville, TN 37235, USA
| | - Antonis Rokas
- Center for Human Genetics Research, Vanderbilt University Medical Center, Nashville, TN 37235, USA. Department of Biological Sciences, Vanderbilt University, Nashville, TN 37235, USA
| | - Tanya L Russell
- Faculty of Medicine, Health and Molecular Science, Australian Institute of Tropical Health Medicine, James Cook University, Cairns 4870, Australia
| | - N'Fale Sagnon
- Centre National de Recherche et de Formation sur le Paludisme, Ouagadougou 01 BP 2208, Burkina Faso
| | - Maria V Sharakhova
- Department of Entomology, Virginia Polytechnic Institute and State University, Blacksburg, VA 24061, USA
| | - Terrance Shea
- Genome Sequencing and Analysis Program, Broad Institute, 415 Main Street, Cambridge, MA 02142, USA
| | - Felipe A Simão
- Department of Genetic Medicine and Development, University of Geneva Medical School, Rue Michel-Servet 1, 1211 Geneva, Switzerland. Swiss Institute of Bioinformatics, Rue Michel-Servet 1, 1211 Geneva, Switzerland
| | - Frederic Simard
- Institut de Recherche pour le Développement, Unités Mixtes de Recherche Maladies Infectieuses et Vecteurs Écologie, Génétique, Évolution et Contrôle, 911, Avenue Agropolis, BP 64501 Montpellier, France
| | - Michel A Slotman
- Department of Entomology, Texas A&M University, College Station, TX 77807, USA
| | - Pradya Somboon
- Department of Parasitology, Faculty of Medicine, Chiang Mai University, Chiang Mai 50200, Thailand
| | | | - Claudio J Struchiner
- Fundação Oswaldo Cruz, Avenida Brasil 4365, RJ Brazil. Instituto de Medicina Social, Universidade do Estado do Rio de Janeiro, Rio de Janeiro, Brazil
| | - Gregg W C Thomas
- School of Informatics and Computing, Indiana University, Bloomington, IN 47405, USA
| | - Marta Tojo
- Department of Physiology, School of Medicine, Center for Research in Molecular Medicine and Chronic Diseases, Instituto de Investigaciones Sanitarias, University of Santiago de Compostela, Santiago de Compostela, A Coruña, Spain
| | - Pantelis Topalis
- Institute of Molecular Biology and Biotechnology, Foundation for Research and Technology, Hellas, Nikolaou Plastira 100 GR-70013, Heraklion, Crete, Greece
| | - José M C Tubio
- Wellcome Trust Sanger Institute, Hinxton, Cambridgeshire, CB10 1SA, UK
| | - Maria F Unger
- Eck Institute for Global Health and Department of Biological Sciences, University of Notre Dame, 317 Galvin Life Sciences Building, Notre Dame, IN 46556, USA
| | - John Vontas
- Department of Biology, University of Crete, 700 13 Heraklion, Greece
| | - Catherine Walton
- Computational Evolutionary Biology Group, Faculty of Life Sciences, University of Manchester, Oxford Road, Manchester M13 9PT, UK
| | - Craig S Wilding
- School of Natural Sciences and Psychology, Liverpool John Moores University, Liverpool L3 3AF, UK
| | - Judith H Willis
- Department of Cellular Biology, University of Georgia, Athens, GA 30602, USA
| | - Yi-Chieh Wu
- Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, 32 Vassar Street, Cambridge, MA 02139, USA. The Broad Institute of Massachusetts Institute of Technology and Harvard, 415 Main Street, Cambridge, MA 02142, USA. Department of Computer Science, Harvey Mudd College, Claremont, CA 91711, USA
| | - Guiyun Yan
- Program in Public Health, College of Health Sciences, University of California, Irvine, Hewitt Hall, Irvine, CA 92697, USA
| | - Evgeny M Zdobnov
- Department of Genetic Medicine and Development, University of Geneva Medical School, Rue Michel-Servet 1, 1211 Geneva, Switzerland. Swiss Institute of Bioinformatics, Rue Michel-Servet 1, 1211 Geneva, Switzerland
| | - Xiaofan Zhou
- Department of Biological Sciences, Vanderbilt University, Nashville, TN 37235, USA
| | - Flaminia Catteruccia
- Harvard School of Public Health, Department of Immunology and Infectious Diseases, Boston, MA 02115, USA. Dipartimento di Medicina Sperimentale e Scienze Biochimiche, Università degli Studi di Perugia, Perugia, Italy
| | - George K Christophides
- Department of Life Sciences, Imperial College London, South Kensington Campus, London SW7 2AZ, UK
| | - Frank H Collins
- Eck Institute for Global Health and Department of Biological Sciences, University of Notre Dame, 317 Galvin Life Sciences Building, Notre Dame, IN 46556, USA
| | - Robert S Cornman
- Department of Cellular Biology, University of Georgia, Athens, GA 30602, USA
| | - Andrea Crisanti
- Department of Life Sciences, Imperial College London, South Kensington Campus, London SW7 2AZ, UK. Centre of Functional Genomics, University of Perugia, Perugia, Italy
| | - Martin J Donnelly
- Department of Vector Biology, Liverpool School of Tropical Medicine, Pembroke Place, Liverpool, L3 5QA, UK. Malaria Programme, Wellcome Trust Sanger Institute, Cambridge CB10 1SJ, UK
| | - Scott J Emrich
- Department of Computer Science and Engineering, Eck Institute for Global Health, 211B Cushing Hall, University of Notre Dame, Notre Dame, IN 46556, USA
| | - Michael C Fontaine
- Eck Institute for Global Health and Department of Biological Sciences, University of Notre Dame, 317 Galvin Life Sciences Building, Notre Dame, IN 46556, USA. Centre of Evolutionary and Ecological Studies (Marine Evolution and Conservation group), University of Groningen, Nijenborgh 7, NL-9747 AG Groningen, Netherlands
| | - William Gelbart
- Department of Molecular and Cellular Biology, Harvard University, 16 Divinity Avenue, Cambridge, MA 02138, USA
| | - Matthew W Hahn
- Department of Biology, Indiana University, Bloomington, IN 47405, USA. School of Informatics and Computing, Indiana University, Bloomington, IN 47405, USA
| | - Immo A Hansen
- Department of Biology, New Mexico State University, Las Cruces, NM 88003, USA. Molecular Biology Program, New Mexico State University, Las Cruces, NM 88003, USA
| | - Paul I Howell
- Centers for Disease Control and Prevention, 1600 Clifton Road NE MSG49, Atlanta, GA 30329, USA
| | - Fotis C Kafatos
- Department of Life Sciences, Imperial College London, South Kensington Campus, London SW7 2AZ, UK
| | - Manolis Kellis
- Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, 32 Vassar Street, Cambridge, MA 02139, USA. The Broad Institute of Massachusetts Institute of Technology and Harvard, 415 Main Street, Cambridge, MA 02142, USA
| | - Daniel Lawson
- European Molecular Biology Laboratory, European Bioinformatics Institute, EMBL-EBI, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Christos Louis
- Department of Biology, University of Crete, 700 13 Heraklion, Greece. Institute of Molecular Biology and Biotechnology, Foundation for Research and Technology, Hellas, Nikolaou Plastira 100 GR-70013, Heraklion, Crete, Greece. Centre of Functional Genomics, University of Perugia, Perugia, Italy
| | - Shirley Luckhart
- Department of Medical Microbiology and Immunology, School of Medicine, University of California Davis, One Shields Avenue, Davis, CA 95616, USA
| | - Marc A T Muskavitch
- Boston College, 140 Commonwealth Avenue, Chestnut Hill, MA 02467, USA. Biogen Idec, 14 Cambridge Center, Cambridge, MA 02142, USA
| | - José M Ribeiro
- Laboratory of Malaria and Vector Research, National Institute of Allergy and Infectious Diseases, 12735 Twinbrook Parkway, Rockville, MD 20852, USA
| | - Michael A Riehle
- Department of Entomology, 1140 East South Campus Drive, Forbes 410, University of Arizona, Tucson, AZ 85721, USA
| | - Igor V Sharakhov
- Department of Entomology, Virginia Polytechnic Institute and State University, Blacksburg, VA 24061, USA. Program of Genetics, Bioinformatics, and Computational Biology, Virginia Polytechnic Institute and State University, Blacksburg, VA 24061, USA
| | - Zhijian Tu
- Program of Genetics, Bioinformatics, and Computational Biology, Virginia Polytechnic Institute and State University, Blacksburg, VA 24061, USA. Department of Biochemistry, Virginia Polytechnic Institute and State University, Blacksburg, VA 24061, USA
| | - Laurence J Zwiebel
- Departments of Biological Sciences and Pharmacology, Institutes for Chemical Biology, Genetics and Global Health, Vanderbilt University and Medical Center, Nashville, TN 37235, USA
| | - Nora J Besansky
- Eck Institute for Global Health and Department of Biological Sciences, University of Notre Dame, 317 Galvin Life Sciences Building, Notre Dame, IN 46556, USA.
| |
Collapse
|
8
|
Nelson DW, Rudehill A, MacCallum RM, Holst A, Wanecek M, Weitzberg E, Bellander BM. Multivariate outcome prediction in traumatic brain injury with focus on laboratory values. J Neurotrauma 2012; 29:2613-24. [PMID: 22994879 DOI: 10.1089/neu.2012.2468] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
Traumatic brain injury (TBI) is a major cause of morbidity and mortality. Identifying factors relevant to outcome can provide a better understanding of TBI pathophysiology, in addition to aiding prognostication. Many common laboratory variables have been related to outcome but may not be independent predictors in a multivariate setting. In this study, 757 patients were identified in the Karolinska TBI database who had retrievable early laboratory variables. These were analyzed towards a dichotomized Glasgow Outcome Scale (GOS) with logistic regression and relevance vector machines, a non-linear machine learning method, univariately and controlled for the known important predictors in TBI outcome: age, Glasgow Coma Score (GCS), pupil response, and computed tomography (CT) score. Accuracy was assessed with Nagelkerke's pseudo R². Of the 18 investigated laboratory variables, 15 were found significant (p<0.05) towards outcome in univariate analyses. In contrast, when adjusting for other predictors, few remained significant. Creatinine was found an independent predictor of TBI outcome. Glucose, albumin, and osmolarity levels were also identified as predictors, depending on analysis method. A worse outcome related to increasing osmolarity may warrant further study. Importantly, hemoglobin was not found significant when adjusted for post-resuscitation GCS as opposed to an admission GCS, and timing of GCS can thus have a major impact on conclusions. In total, laboratory variables added an additional 1.3-4.4% to pseudo R².
Collapse
Affiliation(s)
- David W Nelson
- Department of Physiology and Pharmacology, Section of Anesthesiology and Intensive Care, Karolinska Institutet, Stockholm, Sweden.
| | | | | | | | | | | | | |
Collapse
|
9
|
Martínez-Barnetche J, Gómez-Barreto RE, Ovilla-Muñoz M, Téllez-Sosa J, López DEG, Dinglasan RR, Mohien CU, MacCallum RM, Redmond SN, Gibbons JG, Rokas A, Machado CA, Cazares-Raga FE, González-Cerón L, Hernández-Martínez S, López MHR. Transcriptome of the adult female malaria mosquito vector Anopheles albimanus. BMC Genomics 2012; 13:207. [PMID: 22646700 PMCID: PMC3442982 DOI: 10.1186/1471-2164-13-207] [Citation(s) in RCA: 34] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2012] [Accepted: 05/30/2012] [Indexed: 11/22/2022] Open
Abstract
BACKGROUND Human Malaria is transmitted by mosquitoes of the genus Anopheles. Transmission is a complex phenomenon involving biological and environmental factors of humans, parasites and mosquitoes. Among more than 500 anopheline species, only a few species from different branches of the mosquito evolutionary tree transmit malaria, suggesting that their vectorial capacity has evolved independently. Anopheles albimanus (subgenus Nyssorhynchus) is an important malaria vector in the Americas. The divergence time between Anopheles gambiae, the main malaria vector in Africa, and the Neotropical vectors has been estimated to be 100 My. To better understand the biological basis of malaria transmission and to develop novel and effective means of vector control, there is a need to explore the mosquito biology beyond the An. gambiae complex. RESULTS We sequenced the transcriptome of the An. albimanus adult female. By combining Sanger, 454 and Illumina sequences from cDNA libraries derived from the midgut, cuticular fat body, dorsal vessel, salivary gland and whole body, we generated a single, high-quality assembly containing 16,669 transcripts, 92% of which mapped to the An. darlingi genome and covered 90% of the core eukaryotic genome. Bidirectional comparisons between the An. gambiae, An. darlingi and An. albimanus predicted proteomes allowed the identification of 3,772 putative orthologs. More than half of the transcripts had a match to proteins in other insect vectors and had an InterPro annotation. We identified several protein families that may be relevant to the study of Plasmodium-mosquito interaction. An open source transcript annotation browser called GDAV (Genome-Delinked Annotation Viewer) was developed to facilitate public access to the data generated by this and future transcriptome projects. CONCLUSIONS We have explored the adult female transcriptome of one important New World malaria vector, An. albimanus. We identified protein-coding transcripts involved in biological processes that may be relevant to the Plasmodium lifecycle and can serve as the starting point for searching targets for novel control strategies. Our data increase the available genomic information regarding An. albimanus several hundred-fold, and will facilitate molecular research in medical entomology, evolutionary biology, genomics and proteomics of anopheline mosquito vectors. The data reported in this manuscript is accessible to the community via the VectorBase website (http://www.vectorbase.org/Other/AdditionalOrganisms/).
Collapse
Affiliation(s)
- Jesús Martínez-Barnetche
- Centro de Investigación sobre Enfermedades Infecciosas, Instituto Nacional de Salud Pública, Cuernavaca, Morelos, México
| | - Rosa E Gómez-Barreto
- Centro de Investigación sobre Enfermedades Infecciosas, Instituto Nacional de Salud Pública, Cuernavaca, Morelos, México
| | - Marbella Ovilla-Muñoz
- Centro de Investigación sobre Enfermedades Infecciosas, Instituto Nacional de Salud Pública, Cuernavaca, Morelos, México
| | - Juan Téllez-Sosa
- Centro de Investigación sobre Enfermedades Infecciosas, Instituto Nacional de Salud Pública, Cuernavaca, Morelos, México
| | - David E García López
- Centro de Investigación sobre Enfermedades Infecciosas, Instituto Nacional de Salud Pública, Cuernavaca, Morelos, México
| | - Rhoel R Dinglasan
- Johns Hopkins Bloomberg School of Public Health. Department of Molecular Microbiology & Immunology, Johns Hopkins Malaria Research Institute, Baltimore, MD, 21205, USA
| | - Ceereena Ubaida Mohien
- Johns Hopkins Bloomberg School of Public Health. Department of Molecular Microbiology & Immunology, Johns Hopkins Malaria Research Institute, Baltimore, MD, 21205, USA
- Department of Molecular & Comparative Pathobiology, Johns Hopkins University School of Medicine, Baltimore, MD, USA
| | - Robert M MacCallum
- Division of Cell and Molecular Biology, Department of Life Sciences, Imperial College London, London, United Kingdom
| | - Seth N Redmond
- Pasteur Institut, 28 Rue Du Docteur Roux, Paris, 75015, France
| | - John G Gibbons
- Department of Biological Sciences, Vanderbilt University, Nashville, TN, USA
| | - Antonis Rokas
- Department of Biological Sciences, Vanderbilt University, Nashville, TN, USA
| | - Carlos A Machado
- Department of Biology, University of Maryland, College Park, MD, USA
| | - Febe E Cazares-Raga
- Departamento de Infectómica y Patogénesis Molecular, Cinvestav-IPN, México, DF, México
| | - Lilia González-Cerón
- Centro Regional de Investigación en Salud Pública, Instituto Nacional de Salud Pública, Tapachula, Chiapas, México
| | - Salvador Hernández-Martínez
- Centro de Investigación sobre Enfermedades Infecciosas, Instituto Nacional de Salud Pública, Cuernavaca, Morelos, México
| | - Mario H Rodríguez López
- Centro de Investigación sobre Enfermedades Infecciosas, Instituto Nacional de Salud Pública, Cuernavaca, Morelos, México
| |
Collapse
|
10
|
Chaerkady R, Kelkar DS, Muthusamy B, Kandasamy K, Dwivedi SB, Sahasrabuddhe NA, Kim MS, Renuse S, Pinto SM, Sharma R, Pawar H, Sekhar NR, Mohanty AK, Getnet D, Yang Y, Zhong J, Dash AP, MacCallum RM, Delanghe B, Mlambo G, Kumar A, Keshava Prasad TS, Okulate M, Kumar N, Pandey A. A proteogenomic analysis of Anopheles gambiae using high-resolution Fourier transform mass spectrometry. Genome Res 2011; 21:1872-81. [PMID: 21795387 DOI: 10.1101/gr.127951.111] [Citation(s) in RCA: 48] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/10/2023]
Abstract
Anopheles gambiae is a major mosquito vector responsible for malaria transmission, whose genome sequence was reported in 2002. Genome annotation is a continuing effort, and many of the approximately 13,000 genes listed in VectorBase for Anopheles gambiae are predictions that have still not been validated by any other method. To identify protein-coding genes of An. gambiae based on its genomic sequence, we carried out a deep proteomic analysis using high-resolution Fourier transform mass spectrometry for both precursor and fragment ions. Based on peptide evidence, we were able to support or correct more than 6000 gene annotations including 80 novel gene structures and about 500 translational start sites. An additional validation by RT-PCR and cDNA sequencing was successfully performed for 105 selected genes. Our proteogenomic analysis led to the identification of 2682 genome search-specific peptides. Numerous cases of encoded proteins were documented in regions annotated as intergenic, introns, or untranslated regions. Using a database created to contain potential splice sites, we also identified 35 novel splice junctions. This is a first report to annotate the An. gambiae genome using high-accuracy mass spectrometry data as a complementary technology for genome annotation.
Collapse
Affiliation(s)
- Raghothama Chaerkady
- McKusick-Nathans Institute of Genetic Medicine and Department of Biological Chemistry, Johns Hopkins University, Baltimore, Maryland 21205, USA
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
11
|
Nelson DW, Thornquist B, MacCallum RM, Nyström H, Holst A, Rudehill A, Wanecek M, Bellander BM, Weitzberg E. Analyses of cerebral microdialysis in patients with traumatic brain injury: relations to intracranial pressure, cerebral perfusion pressure and catheter placement. BMC Med 2011; 9:21. [PMID: 21366904 PMCID: PMC3056807 DOI: 10.1186/1741-7015-9-21] [Citation(s) in RCA: 36] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 12/08/2010] [Accepted: 03/02/2011] [Indexed: 11/12/2022] Open
Abstract
BACKGROUND Cerebral microdialysis (MD) is used to monitor local brain chemistry of patients with traumatic brain injury (TBI). Despite an extensive literature on cerebral MD in the clinical setting, it remains unclear how individual levels of real-time MD data are to be interpreted. Intracranial pressure (ICP) and cerebral perfusion pressure (CPP) are important continuous brain monitors in neurointensive care. They are used as surrogate monitors of cerebral blood flow and have an established relation to outcome. The purpose of this study was to investigate the relations between MD parameters and ICP and/or CPP in patients with TBI. METHODS Cerebral MD, ICP and CPP were monitored in 90 patients with TBI. Data were extensively analyzed, using over 7,350 samples of complete (hourly) MD data sets (glucose, lactate, pyruvate and glycerol) to seek representations of ICP, CPP and MD that were best correlated. MD catheter positions were located on computed tomography scans as pericontusional or nonpericontusional. MD markers were analyzed for correlations to ICP and CPP using time series regression analysis, mixed effects models and nonlinear (artificial neural networks) computer-based pattern recognition methods. RESULTS Despite much data indicating highly perturbed metabolism, MD shows weak correlations to ICP and CPP. In contrast, the autocorrelation of MD is high for all markers, even at up to 30 future hours. Consequently, subject identity alone explains 52% to 75% of MD marker variance. This indicates that the dominant metabolic processes monitored with MD are long-term, spanning days or longer. In comparison, short-term (differenced or Δ) changes of MD vs. CPP are significantly correlated in pericontusional locations, but with less than 1% explained variance. Moreover, CPP and ICP were significantly related to outcome based on Glasgow Outcome Scale scores, while no significant relations were found between outcome and MD. CONCLUSIONS The multitude of highly perturbed local chemistry seen with MD in patients with TBI predominately represents long-term metabolic patterns and is weakly correlated to ICP and CPP. This suggests that disturbances other than pressure and/or flow have a dominant influence on MD levels in patients with TBI.
Collapse
Affiliation(s)
- David W Nelson
- Neurointensive Care Unit, Karolinska University Hospital, Stockholm, Sweden.
| | | | | | | | | | | | | | | | | |
Collapse
|
12
|
Nelson DW, Nyström H, MacCallum RM, Thornquist B, Lilja A, Bellander BM, Rudehill A, Wanecek M, Weitzberg E. Extended analysis of early computed tomography scans of traumatic brain injured patients and relations to outcome. J Neurotrauma 2010; 27:51-64. [PMID: 19698072 DOI: 10.1089/neu.2009.0986] [Citation(s) in RCA: 104] [Impact Index Per Article: 7.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
Traumatic brain injury (TBI) is responsible for up to 45% of in-hospital trauma mortality. Computed tomography (CT) is central to acute TBI diagnostics, and millions of brain CT scans are conducted yearly worldwide. Though many studies have addressed individual predictors of outcome from findings on CT scans, few have done so from a multivariate perspective. As these parameters are interrelated in a complex manner, there is a need for a better understanding of them in this context. CT scans from 861 TBI patients were reviewed according to an extensive protocol. An extended analysis of CT parameters with respect to outcome was performed using linear and non-linear methods. We identified complex interactions and mutual information in many of the parameters. Variables predicting death differ from those predicting unfavorable versus favorable outcomes (Glasgow Outcome Scale scores of 1-3 versus 4-5 [GOS]). The most important parameter for prediction of unfavorable outcome is the magnitude of midline shift. In fact, this parameter, as a continuous variable, is by itself a better predictor and is better calibrated than the Marshall CT score, even for predicting death. In addition, hematoma volumes are nearly co-linear with midline shift and can be substituted for it. A score of traumatic subarachnoid/intraventricular blood components adds substantially to model calibration. A CT scoring system geared toward dichotomous GOS scores is suggested. CT parameters were found to add 6-10% additional estimated explained variance in the presence of the important clinical variables of age, Glasgow Coma Scale score, and pupillary response. Finally we present a practical clinical "rule of thumb" to help predict the probability of unfavorable outcome using clinical and CT variables.
Collapse
Affiliation(s)
- David W Nelson
- Department of Physiology and Pharmacology, Karolinska Institute, Stockholm, Sweden.
| | | | | | | | | | | | | | | | | |
Collapse
|
13
|
Lawson D, Arensburger P, Atkinson P, Besansky NJ, Bruggner RV, Butler R, Campbell KS, Christophides GK, Christley S, Dialynas E, Hammond M, Hill CA, Konopinski N, Lobo NF, MacCallum RM, Madey G, Megy K, Meyer J, Redmond S, Severson DW, Stinson EO, Topalis P, Birney E, Gelbart WM, Kafatos FC, Louis C, Collins FH. VectorBase: a data resource for invertebrate vector genomics. Nucleic Acids Res 2008; 37:D583-7. [PMID: 19028744 PMCID: PMC2686483 DOI: 10.1093/nar/gkn857] [Citation(s) in RCA: 204] [Impact Index Per Article: 12.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022] Open
Abstract
VectorBase (http://www.vectorbase.org) is an NIAID-funded Bioinformatic Resource Center focused on invertebrate vectors of human pathogens. VectorBase annotates and curates vector genomes providing a web accessible integrated resource for the research community. Currently, VectorBase contains genome information for three mosquito species: Aedes aegypti, Anopheles gambiae and Culex quinquefasciatus, a body louse Pediculus humanus and a tick species Ixodes scapularis. Since our last report VectorBase has initiated a community annotation system, a microarray and gene expression repository and controlled vocabularies for anatomy and insecticide resistance. We have continued to develop both the software infrastructure and tools for interrogating the stored data.
Collapse
Affiliation(s)
- Daniel Lawson
- European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK.
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
14
|
Abstract
MOTIVATION Current approaches to contact map prediction in proteins have focused on amino acid conservation and patterns of mutation at sequentially distant positions. This sequence information is poorly understood and very little progress has been made in this area during recent years. RESULTS In this study, an observation of 'striped' sequence patterns across beta-sheets prompted the development of a new type of contact map predictor. Computer program code was evolved with an evolutionary algorithm (genetic programming) to select residues and residue pairs likely to make contacts based solely on local sequence patterns extracted with the help of self-organizing maps. The mean prediction accuracy is 27% on a validation set of 156 domains up to 400 residues in length, where contacts are separated by at least 8 residues and length/10 pairs are predicted. The retrospective accuracy on a set of 15 CASP5 targets is 27% and 14% for length/10 and length/2 predicted pairs, respectively (both using a minimum residue separation of 24). This compares favourably to the equivalent 21% and 13% obtained for the best automated contact prediction methods at CASP5. The results suggest that protein architectures impose regularities in local sequence environments. Other sources of information, such as correlated/compensatory mutations, may further improve accuracy. AVAILABILITY A web-based prediction service is available at http://www.sbc.su.se/~maccallr/contactmaps
Collapse
Affiliation(s)
- Robert M MacCallum
- Stockholm Bioinformatics Center, Stockholm University, Stockholm, Sweden.
| |
Collapse
|
15
|
Koutsos AC, Blass C, Meister S, Schmidt S, MacCallum RM, Soares MB, Collins FH, Benes V, Zdobnov E, Kafatos FC, Christophides GK. Life cycle transcriptome of the malaria mosquito Anopheles gambiae and comparison with the fruitfly Drosophila melanogaster. Proc Natl Acad Sci U S A 2007; 104:11304-9. [PMID: 17563388 PMCID: PMC2040894 DOI: 10.1073/pnas.0703988104] [Citation(s) in RCA: 57] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
The African mosquito Anopheles gambiae is the major vector of human malaria. We report a genome-wide survey of mosquito gene expression profiles clustered temporally into developmental programs and spatially into adult tissue-specific patterns. Global expression analysis shows that genes that belong to related functional categories or that encode the same or functionally linked protein domains are associated with characteristic developmental programs or tissue patterns. Comparative analysis of our data together with data published from Drosophila melanogaster reveal an overall strong and positive correlation of developmental expression between orthologous genes. The degree of correlation varies, depending on association of orthologs with certain developmental programs or functional groups. Interestingly, the similarity of gene expression is not correlated with the coding sequence similarity of orthologs, indicating that expression profiles and coding sequences evolve independently. In addition to providing a comprehensive view of temporal and spatial gene expression during the A. gambiae life cycle, this large-scale comparative transcriptomic analysis has detected important evolutionary features of insect transcriptomes.
Collapse
Affiliation(s)
- Anastasios C. Koutsos
- *Division of Cell and Molecular Biology, Faculty of Natural Sciences, Imperial College London, SW7 2AZ London, United Kingdom
- European Molecular Biology Laboratory, 69117 Heidelberg, Germany
| | - Claudia Blass
- *Division of Cell and Molecular Biology, Faculty of Natural Sciences, Imperial College London, SW7 2AZ London, United Kingdom
- European Molecular Biology Laboratory, 69117 Heidelberg, Germany
| | - Stephan Meister
- *Division of Cell and Molecular Biology, Faculty of Natural Sciences, Imperial College London, SW7 2AZ London, United Kingdom
- European Molecular Biology Laboratory, 69117 Heidelberg, Germany
| | - Sabine Schmidt
- European Molecular Biology Laboratory, 69117 Heidelberg, Germany
| | - Robert M. MacCallum
- *Division of Cell and Molecular Biology, Faculty of Natural Sciences, Imperial College London, SW7 2AZ London, United Kingdom
| | | | - Frank H. Collins
- Center for Tropical Disease Research and Training, University of Notre Dame, Notre Dame, IN 46556; and
| | - Vladimir Benes
- European Molecular Biology Laboratory, 69117 Heidelberg, Germany
| | - Evgeny Zdobnov
- **Department of Genetic Medicine and Development, University of Geneva Medical School, 1211 Geneva, Switzerland
| | - Fotis C. Kafatos
- *Division of Cell and Molecular Biology, Faculty of Natural Sciences, Imperial College London, SW7 2AZ London, United Kingdom
- To whom correspondence may be addressed at:
Division of Cell and Molecular Biology, Imperial College London, South Kensington Campus, Room 6167, Sir Alexander Fleming Building, SW7 2AZ London, United Kingdom. E-mail: or
| | - George K. Christophides
- *Division of Cell and Molecular Biology, Faculty of Natural Sciences, Imperial College London, SW7 2AZ London, United Kingdom
- To whom correspondence may be addressed at:
Division of Cell and Molecular Biology, Imperial College London, South Kensington Campus, Room 6167, Sir Alexander Fleming Building, SW7 2AZ London, United Kingdom. E-mail: or
| |
Collapse
|
16
|
Abstract
UNLABELLED NucPred analyzes patterns in eukaryotic protein sequences and predicts if a protein spends at least some time in the nucleus or no time at all. Subcellular location of proteins represents functional information, which is important for understanding protein interactions, for the diagnosis of human diseases and for drug discovery. NucPred is a novel web tool based on regular expression matching and multiple program classifiers induced by genetic programming. A likelihood score is derived from the programs for each input sequence and each residue position. Different forms of visualization are provided to assist the detection of nuclear localization signals (NLSs). The NucPred server also provides access to additional sources of biological information (real and predicted) for a better validation and interpretation of results. AVAILABILITY The web interface to the NucPred tool is provided at http://www.sbc.su.se/~maccallr/nucpred. In addition, the Perl code is made freely available under the GNU Public Licence (GPL) for simple incorporation into other tools and web servers.
Collapse
Affiliation(s)
- Markus Brameier
- Bioinformatics Research Center (BiRC), University of Aarhus, Aarhus C, Denmark
| | | | | |
Collapse
|
17
|
Ohlson T, Aggarwal V, Elofsson A, MacCallum RM. Improved alignment quality by combining evolutionary information, predicted secondary structure and self-organizing maps. BMC Bioinformatics 2006; 7:357. [PMID: 16869963 PMCID: PMC1562450 DOI: 10.1186/1471-2105-7-357] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2006] [Accepted: 07/25/2006] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Protein sequence alignment is one of the basic tools in bioinformatics. Correct alignments are required for a range of tasks including the derivation of phylogenetic trees and protein structure prediction. Numerous studies have shown that the incorporation of predicted secondary structure information into alignment algorithms improves their performance. Secondary structure predictors have to be trained on a set of somewhat arbitrarily defined states (e.g. helix, strand, coil), and it has been shown that the choice of these states has some effect on alignment quality. However, it is not unlikely that prediction of other structural features also could provide an improvement. In this study we use an unsupervised clustering method, the self-organizing map, to assign sequence profile windows to "structural states" and assess their use in sequence alignment. RESULTS The addition of self-organizing map locations as inputs to a profile-profile scoring function improves the alignment quality of distantly related proteins slightly. The improvement is slightly smaller than that gained from the inclusion of predicted secondary structure. However, the information seems to be complementary as the two prediction schemes can be combined to improve the alignment quality by a further small but significant amount. CONCLUSION It has been observed in many studies that predicted secondary structure significantly improves the alignments. Here we have shown that the addition of self-organizing map locations can further improve the alignments as the self-organizing map locations seem to contain some information that is not captured by the predicted secondary structure.
Collapse
Affiliation(s)
- Tomas Ohlson
- Stockholm Bioinformatics Center, Stockholm University, SE-106 91 Stockholm, Sweden
| | - Varun Aggarwal
- Stockholm Bioinformatics Center, Stockholm University, SE-106 91 Stockholm, Sweden
- Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
| | - Arne Elofsson
- Stockholm Bioinformatics Center, Stockholm University, SE-106 91 Stockholm, Sweden
- Center for Biomembrane Research, Stockholm University, SE-106 91 Stockholm, Sweden
| | - Robert M MacCallum
- Stockholm Bioinformatics Center, Stockholm University, SE-106 91 Stockholm, Sweden
- Division of Cell and Molecular Biology, Imperial College London, London, UK
| |
Collapse
|
18
|
Abstract
Here we present the evaluation results of the Critical Assessment of Protein Structure Prediction (CASP6) contact prediction category. Contact prediction was assessed with standard measures well known in the field and the performance of specialist groups was evaluated alongside groups that submitted models with 3D coordinates. The evaluation was mainly focused on long range contact predictions for the set of new fold targets, although we analyzed predictions for all targets. Three groups with similar levels of accuracy and coverage performed a little better than the others. Comparisons of the predictions of the three best methods with those of CASP5/CAFASP3 suggested some improvement, although there were not enough targets in the comparisons to make this statistically significant.
Collapse
Affiliation(s)
- Osvaldo Graña
- Protein Design Group, Centro Nacional de Biotecnologia (CNB-CSIC), C/Darwin 3, Cantoblanco, Madrid, Spain
| | | | | | | | | | | | | | | |
Collapse
|
19
|
Fleming K, Kelley LA, Islam SA, MacCallum RM, Muller A, Pazos F, Sternberg MJ. The proteome: structure, function and evolution. Philos Trans R Soc Lond B Biol Sci 2006; 361:441-51. [PMID: 16524832 PMCID: PMC1609342 DOI: 10.1098/rstb.2005.1802] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
This paper reports two studies to model the inter-relationships between protein sequence, structure and function. First, an automated pipeline to provide a structural annotation of proteomes in the major genomes is described. The results are stored in a database at Imperial College, London (3D-GENOMICS) that can be accessed at www.sbg.bio.ic.ac.uk. Analysis of the assignments to structural superfamilies provides evolutionary insights. 3D-GENOMICS is being integrated with related proteome annotation data at University College London and the European Bioinformatics Institute in a project known as e-protein (http://www.e-protein.org/). The second topic is motivated by the developments in structural genomics projects in which the structure of a protein is determined prior to knowledge of its function. We have developed a new approach PHUNCTIONER that uses the gene ontology (GO) classification to supervise the extraction of the sequence signal responsible for protein function from a structure-based sequence alignment. Using GO we can obtain profiles for a range of specificities described in the ontology. In the region of low sequence similarity (around 15%), our method is more accurate than assignment from the closest structural homologue. The method is also able to identify the specific residues associated with the function of the protein family.
Collapse
Affiliation(s)
- Keiran Fleming
- Structural Bioinformatics Group, Centre for Bioinformatics, Division of Molecular Biosciences, Imperial College of Science, Technology and MedicineLondon SW7 2AZ, UK
| | - Lawrence A Kelley
- Structural Bioinformatics Group, Centre for Bioinformatics, Division of Molecular Biosciences, Imperial College of Science, Technology and MedicineLondon SW7 2AZ, UK
- Biomolecular Modelling Laboratory, Cancer Research UK44 Lincoln's Inn Fields, London WC2A 3PX, UK
| | - Suhail A Islam
- Structural Bioinformatics Group, Centre for Bioinformatics, Division of Molecular Biosciences, Imperial College of Science, Technology and MedicineLondon SW7 2AZ, UK
- Biomolecular Modelling Laboratory, Cancer Research UK44 Lincoln's Inn Fields, London WC2A 3PX, UK
| | - Robert M MacCallum
- Biomolecular Modelling Laboratory, Cancer Research UK44 Lincoln's Inn Fields, London WC2A 3PX, UK
| | - Arne Muller
- Structural Bioinformatics Group, Centre for Bioinformatics, Division of Molecular Biosciences, Imperial College of Science, Technology and MedicineLondon SW7 2AZ, UK
- Biomolecular Modelling Laboratory, Cancer Research UK44 Lincoln's Inn Fields, London WC2A 3PX, UK
| | - Florencio Pazos
- Structural Bioinformatics Group, Centre for Bioinformatics, Division of Molecular Biosciences, Imperial College of Science, Technology and MedicineLondon SW7 2AZ, UK
| | - Michael J.E Sternberg
- Structural Bioinformatics Group, Centre for Bioinformatics, Division of Molecular Biosciences, Imperial College of Science, Technology and MedicineLondon SW7 2AZ, UK
- Biomolecular Modelling Laboratory, Cancer Research UK44 Lincoln's Inn Fields, London WC2A 3PX, UK
- Author for correspondence ()
| |
Collapse
|
20
|
Brameier M, Haan J, Krings A, MacCallum RM. Automatic discovery of cross-family sequence features associated with protein function. BMC Bioinformatics 2006; 7:16. [PMID: 16409628 PMCID: PMC1395344 DOI: 10.1186/1471-2105-7-16] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2005] [Accepted: 01/12/2006] [Indexed: 11/21/2022] Open
Abstract
Background Methods for predicting protein function directly from amino acid sequences are useful tools in the study of uncharacterised protein families and in comparative genomics. Until now, this problem has been approached using machine learning techniques that attempt to predict membership, or otherwise, to predefined functional categories or subcellular locations. A potential drawback of this approach is that the human-designated functional classes may not accurately reflect the underlying biology, and consequently important sequence-to-function relationships may be missed. Results We show that a self-supervised data mining approach is able to find relationships between sequence features and functional annotations. No preconceived ideas about functional categories are required, and the training data is simply a set of protein sequences and their UniProt/Swiss-Prot annotations. The main technical aspect of the approach is the co-evolution of amino acid-based regular expressions and keyword-based logical expressions with genetic programming. Our experiments on a strictly non-redundant set of eukaryotic proteins reveal that the strongest and most easily detected sequence-to-function relationships are concerned with targeting to various cellular compartments, which is an area already well studied both experimentally and computationally. Of more interest are a number of broad functional roles which can also be correlated with sequence features. These include inhibition, biosynthesis, transcription and defence against bacteria. Despite substantial overlaps between these functions and their corresponding cellular compartments, we find clear differences in the sequence motifs used to predict some of these functions. For example, the presence of polyglutamine repeats appears to be linked more strongly to the "transcription" function than to the general "nuclear" function/location. Conclusion We have developed a novel and useful approach for knowledge discovery in annotated sequence data. The technique is able to identify functionally important sequence features and does not require expert knowledge. By viewing protein function from a sequence perspective, the approach is also suitable for discovering unexpected links between biological processes, such as the recently discovered role of ubiquitination in transcription.
Collapse
Affiliation(s)
- Markus Brameier
- Stockholm Bioinformatics Center, Stockholm University, 106 91 Stockholm, Sweden
- Bioinformatics Research Center, University of Aarhus, 8000 Aarhus C, Denmark
| | - Josien Haan
- Stockholm Bioinformatics Center, Stockholm University, 106 91 Stockholm, Sweden
| | - Andrea Krings
- Stockholm Bioinformatics Center, Stockholm University, 106 91 Stockholm, Sweden
| | - Robert M MacCallum
- Stockholm Bioinformatics Center, Stockholm University, 106 91 Stockholm, Sweden
- Division of Cell and Molecular Biology, Imperial College London, London SW7 2AZ, UK
| |
Collapse
|
21
|
Fleming K, Müller A, MacCallum RM, Sternberg MJE. 3D-GENOMICS: a database to compare structural and functional annotations of proteins between sequenced genomes. Nucleic Acids Res 2004; 32:D245-50. [PMID: 14681404 PMCID: PMC308798 DOI: 10.1093/nar/gkh064] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
The 3D-GENOMICS database (http://www.sbg.bio. ic.ac.uk/3dgenomics/) provides structural annotations for proteins from sequenced genomes. In August 2003 the database included data for 93 proteomes. The annotations stored in the database include homologous sequences from various sequence databases, domains from SCOP and Pfam, patterns from Prosite and other predicted sequence features such as transmembrane regions and coiled coils. In addition to annotations at the sequence level, several precomputed cross- proteome comparative analyses are available based on SCOP domain superfamily composition. Annotations are available to the user via a web interface to the database. Multiple points of entry are available so that a user is able to: (i) directly access annotations for a single protein sequence via keywords or accession codes, (ii) examine a sequence of interest chosen from a summary of annotations for a particular proteome, or (iii) access precomputed frequency-based cross-proteome comparative analyses.
Collapse
Affiliation(s)
- Keiran Fleming
- Department of Biological Sciences and Centre for Bioinformatics, Imperial College London, South Kensington Campus, London SW7 2AZ, UK
| | | | | | | |
Collapse
|
22
|
Abstract
MOTIVATION Graphical representations of proteins in online databases generally give default views orthogonal to the PDB file coordinate system. These views are often uninformative in terms of protein structure and/or function. Here we discuss the development of a simple automatic algorithm to provide a 'good' view of a protein domain with respect to its structural features. RESULTS We used dimension reduction with the preservation of topology (using Kohonen's self organising map) to map 3D carbon alpha coordinates into 2D. The original protein structure was then rotated to the view which corresponded most closely to the 2D mapping. This procedure, which we call OVOP, was evaluated in a public blind trial on the web against random views and a 'flattest' view. The OVOP views were consistently rated 'better' than the other views by our volunteers. AVAILABILITY The source code is available from the OVOP homepage: http://www.sbc.su.se/~oscar/ovop.
Collapse
Affiliation(s)
- Oscar Sverud
- Stockholm Bioinformatics Center, Stockholm University, Sweden
| | | |
Collapse
|
23
|
MacCallum RM. Introducing a Perl Genetic Programming System - and Can Meta-evolution Solve the Bloat Problem? Lecture Notes in Computer Science 2003. [DOI: 10.1007/3-540-36599-0_34] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/09/2023]
|
24
|
Abstract
This paper reports an analysis of the encoded proteins (the proteome) of the genomes of human, fly, worm, yeast, and representatives of bacteria and archaea in terms of the three-dimensional structures of their globular domains together with a general sequence-based study. We show that 39% of the human proteome can be assigned to known structures. We estimate that for 77% of the proteome, there is some functional annotation, but only 26% of the proteome can be assigned to standard sequence motifs that characterize function. Of the human protein sequences, 13% are transmembrane proteins, but only 3% of the residues in the proteome form membrane-spanning regions. There are substantial differences in the composition of globular domains of transmembrane proteins between the proteomes we have analyzed. Commonly occurring structural superfamilies are identified within the proteome. The frequencies of these superfamilies enable us to estimate that 98% of the human proteome evolved by domain duplication, with four of the 10 most duplicated superfamilies specific for multicellular organisms. The zinc-finger superfamily is massively duplicated in human compared to fly and worm, and occurrence of domains in repeats is more common in metazoa than in single cellular organisms. Structural superfamilies over- and underrepresented in human disease genes have been identified. Data and results can be downloaded and analyzed via web-based applications at http://www.sbg.bio.ic.ac.uk.
Collapse
Affiliation(s)
- Arne Müller
- Biomolecular Modelling Laboratory, Cancer Research UK, London, United Kingdom
| | | | | |
Collapse
|
25
|
Bates PA, Kelley LA, MacCallum RM, Sternberg MJ. Enhancement of protein modeling by human intervention in applying the automatic programs 3D-JIGSAW and 3D-PSSM. Proteins 2002; Suppl 5:39-46. [PMID: 11835480 DOI: 10.1002/prot.1168] [Citation(s) in RCA: 406] [Impact Index Per Article: 18.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
Abstract
Fourteen models were constructed and analyzed for the comparative modeling section of Critical Assessment of Techniques for Protein Structure Prediction (CASP4). Sequence identity between each target and the best possible parent(s) ranged between 55 and 13%, and the root-mean-square deviation between model and target was from 0.8 to 17.9 A. In the fold recognition section, 10 of the 11 remote homologues were recognized. The modeling protocols are a combination of automated computer algorithms, 3D-JIGSAW (for comparative modeling) and 3D-PSSM (for fold recognition), with human intervention at certain critical stages. In particular, intervention is required to check superfamily assignment, best possible parents from which to model, sequence alignments to those parents and take-off regions for modeling variable regions. There now is a convergence of algorithms for comparative modeling and fold recognition, particularly in the region of remote homology.
Collapse
Affiliation(s)
- P A Bates
- Biomolecular Modelling Laboratory, Imperial Cancer Research Fund, London, United Kingdom.
| | | | | | | |
Collapse
|
26
|
Abstract
A method (three-dimensional position-specific scoring matrix, 3D-PSSM) to recognise remote protein sequence homologues is described. The method combines the power of multiple sequence profiles with knowledge of protein structure to provide enhanced recognition and thus functional assignment of newly sequenced genomes. The method uses structural alignments of homologous proteins of similar three-dimensional structure in the structural classification of proteins (SCOP) database to obtain a structural equivalence of residues. These equivalences are used to extend multiply aligned sequences obtained by standard sequence searches. The resulting large superfamily-based multiple alignment is converted into a PSSM. Combined with secondary structure matching and solvation potentials, 3D-PSSM can recognise structural and functional relationships beyond state-of-the-art sequence methods. In a cross-validated benchmark on 136 homologous relationships unambiguously undetectable by position-specific iterated basic local alignment search tool (PSI-Blast), 3D-PSSM can confidently assign 18 %. The method was applied to the remaining unassigned regions of the Mycoplasma genitalium genome and an additional 13 regions were assigned with 95 % confidence. 3D-PSSM is available to the community as a web server: http://www.bmm.icnet.uk/servers/3dpssm
Collapse
Affiliation(s)
- L A Kelley
- Biomolecular Modelling Laboratory, Imperial Cancer Research Fund, 44 Lincoln's Inn Fields, London, WC2A 3PX, England
| | | | | |
Collapse
|
27
|
MacCallum RM, Kelley LA, Sternberg MJ. SAWTED: structure assignment with text description--enhanced detection of remote homologues with automated SWISS-PROT annotation comparisons. Bioinformatics 2000; 16:125-9. [PMID: 10842733 DOI: 10.1093/bioinformatics/16.2.125] [Citation(s) in RCA: 47] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
MOTIVATION Sequence database search methods often identify putative sub-threshold hits of known function or structure for a given query sequence. It is widespread practice to filter these hits by hand using knowledge of function and other factors; to the expert, some hits may appear more sensible than others. SAWTED (Structure Assignment With Text Description) is an automated solution to this post-filtering problem which will be applicable to large scale genome assignments. RESULTS A standard document comparison algorithm is applied to text descriptions extracted from SWISS-PROT annotations. The added value of SAWTED in combination with PSI-BLAST has been shown with a benchmark of difficult remote homologues taken from the SCOP structure database. AVAILABILITY A WAWTED PSI-BLAST Web server is available to perform sensitive searches against the protein structure database (http://www.bmm.icnet.uk/servers/sawted). CONTACT R.MacCallum@icrf.icnet.uk
Collapse
Affiliation(s)
- R M MacCallum
- Biomolecular Modelling Laboratory, Imperial Cancer Research Fund, London, UK.
| | | | | |
Collapse
|
28
|
Abstract
The recognition of remote protein homologies is a major aspect of the structural and functional annotation of newly determined genomes. Here we benchmark the coverage and error rate of genome annotation using the widely used homology-searching program PSI-BLAST (position-specific iterated basic local alignment search tool). This study evaluates the one-to-many success rate for recognition, as often there are several homologues in the database and only one needs to be identified for annotating the sequence. In contrast, previous benchmarks considered one-to-one recognition in which a single query was required to find a particular target. The benchmark constructs a model genome from the full sequences of the structural classification of protein (SCOP) database and searches against a target library of remote homologous domains (<20 % identity). The structural benchmark provides a reliable list of correct and false homology assignments. PSI-BLAST successfully annotated 40 % of the domains in the model genome that had at least one homologue in the target library. This coverage is more than three times that if one-to-one recognition is evaluated (11 % coverage of domains). Although a structural benchmark was used, the results equally apply to just sequence homology searches. Accordingly, structural and sequence assignments were made to the sequences of Mycoplasma genitalium and Mycobacterium tuberculosis (see http://www.bmm.icnet. uk). The extent of missed assignments and of new superfamilies can be estimated for these genomes for both structural and functional annotations.
Collapse
Affiliation(s)
- A Müller
- Biomolecular Modelling Laboratory, Imperial Cancer Research Fund, 44 Lincoln's Inn Fields, London, WC2A 3PX, England
| | | | | |
Collapse
|
29
|
Fischer D, Barret C, Bryson K, Elofsson A, Godzik A, Jones D, Karplus KJ, Kelley LA, MacCallum RM, Pawowski K, Rost B, Rychlewski L, Sternberg M. CAFASP-1: critical assessment of fully automated structure prediction methods. Proteins 1999; Suppl 3:209-17. [PMID: 10526371 DOI: 10.1002/(sici)1097-0134(1999)37:3+<209::aid-prot27>3.3.co;2-p] [Citation(s) in RCA: 36] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Abstract
The results of the first Critical Assessment of Fully Automated Structure Prediction (CAFASP-1) are presented. The objective was to evaluate the success rates of fully automatic web servers for fold recognition which are available to the community. This study was based on the targets used in the third meeting on the Critical Assessment of Techniques for Protein Structure Prediction (CASP-3). However, unlike CASP-3, the study was not a blind trial, as it was held after the structures of the targets were known. The aim was to assess the performance of methods without the user intervention that several groups used in their CASP-3 submissions. Although it is clear that "human plus machine" predictions are superior to automated ones, this CAFASP-1 experiment is extremely valuable for users of our methods; it provides an indication of the performance of the methods alone, and not of the "human plus machine" performance assessed in CASP. This information may aid users in choosing which programs they wish to use and in evaluating the reliability of the programs when applied to their specific prediction targets. In addition, evaluation of fully automated methods is particularly important to assess their applicability at genomic scales. For each target, groups submitted the top-ranking folds generated from their servers. In CAFASP-1 we concentrated on fold-recognition web servers only and evaluated only recognition of the correct fold, and not, as in CASP-3, alignment accuracy. Although some performance differences appeared within each of the four target categories used here, overall, no single server has proved markedly superior to the others. The results showed that current fully automated fold recognition servers can often identify remote similarities when pairwise sequence search methods fail. Nevertheless, in only a few cases outside the family-level targets has the score of the top-ranking fold been significant enough to allow for a confident fully automated prediction. Because the goals, rules, and procedures of CAFASP-1 were different from those used at CASP-3, the results reported here are not comparable with those reported in CASP-3. Nevertheless, it is clear that current automated fold recognition methods can not yet compete with "human-expert plus machine" predictions. Finally, CAFASP-1 has been useful in identifying the requirements for a future blind trial of automated served-based protein structure prediction.
Collapse
Affiliation(s)
- D Fischer
- Department of Mathematics and Computer Science, Ben Gurion University, Beer-Sheva, Israel.
| | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
30
|
Abstract
The third comparative assessment of techniques of protein structure prediction (CASP3) was held during 1998. This is a blind trial in which structures are predicted prior to having knowledge of the coordinates, which are then revealed to enable the assessment. Three sections at the meeting evaluated different methodologies - comparative modelling, fold recognition and ab initio methods. For some, but not all of the target coordinates, high quality models were submitted in each of these sections. There have been improvements in prediction techniques since CASP2 in 1996, most notably for ab initio methods.
Collapse
Affiliation(s)
- M J Sternberg
- Biomolecular Modelling Laboratory, Imperial Cancer Research Fund, London, UK.
| | | | | | | |
Collapse
|
31
|
Fischer D, Barret C, Bryson K, Elofsson A, Godzik A, Jones D, Karplus KJ, Kelley LA, MacCallum RM, Pawowski K, Rost B, Rychlewski L, Sternberg M. CAFASP-1: Critical assessment of fully automated structure prediction methods. Proteins 1999. [DOI: 10.1002/(sici)1097-0134(1999)37:3+<209::aid-prot27>3.0.co;2-y] [Citation(s) in RCA: 107] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
|
32
|
Abstract
We have analysed antigen-contacting residues and combining site shape in the antibody Fv and Fab crystal structures now available from the Protein Data Bank. Antigen-contacting propensities are presented for each antibody residue, allowing a new definition for the complementarity determining regions (CDRs) to be proposed based on observed antigen contacts. Contacts are more common at CDR residues which are located centrally within the combining site; some less central CDR residues are only contacted by large antigens. Non-contacting residues within the CDRs coincide with residues identified by Chothia and co-workers as important in defining "canonical" conformations. An objective means of classifying protein surfaces by gross topography has been developed and applied to the antibody combining site surfaces. The surfaces have been clustered into four topographic classes: concave and moderately concave (mostly hapten binders), ridged (mostly peptide binders) and planar (mostly protein binders). We have determined the topographic classes for ten pairs of complexed and uncomplexed antibody-antigen crystal structures; four change topographic class on complexation. The results will be of use in antibody engineering, antigen docking and in clinical immunology. To demonstrate one application, we show how the data can be used to locate the antigen binding pocket on antibody structures.
Collapse
Affiliation(s)
- R M MacCallum
- Department of Biochemistry and Molecular Biology, University College London, UK
| | | | | |
Collapse
|