1
|
Asma H, Liu L, Halfon MS. SCRMshaw: Supervised cis-regulatory module prediction for insect genomes. PLoS One 2024; 19:e0311752. [PMID: 39637210 PMCID: PMC11620701 DOI: 10.1371/journal.pone.0311752] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2024] [Accepted: 09/24/2024] [Indexed: 12/07/2024] Open
Abstract
As the number of sequenced insect genomes continues to grow, there is a pressing need for rapid and accurate annotation of their regulatory component. SCRMshaw is a computational tool designed to predict cis-regulatory modules ("enhancers") in the genomes of various insect species. A key advantage of SCRMshaw is its accessibility. It requires minimal resources-just a genome sequence and training data from known Drosophila regulatory sequences, which are readily available for download. Even users with modest computational skills can run SCRMshaw on a desktop computer for basic applications, although a high-performance computing cluster is recommended for optimal results. SCRMshaw can be tailored to specific needs: users can employ a single set of training data to predict enhancers associated with a particular gene expression pattern, or utilize multiple sets to provide a first-pass regulatory annotation for a newly-sequenced genome. This protocol provides an extensive update to the previously published SCRMshaw protocol and aligns with the methods used in a recent annotation of over 30 insect regulatory genomes. It includes the most recent modifications to the SCRMshaw protocol and details an end-to-end pipeline that begins with a sequenced genome and ends with a fully-annotated regulatory genome. Relevant scripts are available via GitHub, and a living protocol that will be updated as necessary is linked to this article at protocols.io.
Collapse
Affiliation(s)
- Hasiba Asma
- Departments of Biochemistry, University at Buffalo-State University of New York, Buffalo, NY, United States of America
| | - Luna Liu
- Biomedical Informatics, University at Buffalo-State University of New York, Buffalo, NY, United States of America
| | - Marc S. Halfon
- Departments of Biochemistry, University at Buffalo-State University of New York, Buffalo, NY, United States of America
- Biomedical Informatics, University at Buffalo-State University of New York, Buffalo, NY, United States of America
- Biological Sciences, University at Buffalo-State University of New York, Buffalo, NY, United States of America
| |
Collapse
|
2
|
Asma H, Tieke E, Deem KD, Rahmat J, Dong T, Huang X, Tomoyasu Y, Halfon MS. Regulatory genome annotation of 33 insect species. eLife 2024; 13:RP96738. [PMID: 39392676 PMCID: PMC11469670 DOI: 10.7554/elife.96738] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/12/2024] Open
Abstract
Annotation of newly sequenced genomes frequently includes genes, but rarely covers important non-coding genomic features such as the cis-regulatory modules-e.g., enhancers and silencers-that regulate gene expression. Here, we begin to remedy this situation by developing a workflow for rapid initial annotation of insect regulatory sequences, and provide a searchable database resource with enhancer predictions for 33 genomes. Using our previously developed SCRMshaw computational enhancer prediction method, we predict over 2.8 million regulatory sequences along with the tissues where they are expected to be active, in a set of insect species ranging over 360 million years of evolution. Extensive analysis and validation of the data provides several lines of evidence suggesting that we achieve a high true-positive rate for enhancer prediction. One, we show that our predictions target specific loci, rather than random genomic locations. Two, we predict enhancers in orthologous loci across a diverged set of species to a significantly higher degree than random expectation would allow. Three, we demonstrate that our predictions are highly enriched for regions of accessible chromatin. Four, we achieve a validation rate in excess of 70% using in vivo reporter gene assays. As we continue to annotate both new tissues and new species, our regulatory annotation resource will provide a rich source of data for the research community and will have utility for both small-scale (single gene, single species) and large-scale (many genes, many species) studies of gene regulation. In particular, the ability to search for functionally related regulatory elements in orthologous loci should greatly facilitate studies of enhancer evolution even among distantly related species.
Collapse
Affiliation(s)
- Hasiba Asma
- Program in Genetics, Genomics, and Bioinformatics, University at Buffalo-State University of New YorkBuffaloUnited States
| | - Ellen Tieke
- Department of Biology, Miami UniversityOxfordUnited States
| | - Kevin D Deem
- Department of Biology, Miami UniversityOxfordUnited States
| | - Jabale Rahmat
- Department of Biology, Miami UniversityOxfordUnited States
| | - Tiffany Dong
- Department of Biochemistry, University at Buffalo-State University of New YorkBuffaloUnited States
| | - Xinbo Huang
- Department of Biochemistry, University at Buffalo-State University of New YorkBuffaloUnited States
| | | | - Marc S Halfon
- Program in Genetics, Genomics, and Bioinformatics, University at Buffalo-State University of New YorkBuffaloUnited States
- Department of Biochemistry, University at Buffalo-State University of New YorkBuffaloUnited States
- Department of Biomedical Informatics, University at Buffalo-State University of New YorkBuffaloUnited States
- Department of Biological Sciences, University at Buffalo-State University of New YorkBuffaloUnited States
| |
Collapse
|
3
|
Schember I, Reid W, Sterling-Lentsch G, Halfon MS. Conserved and novel enhancers in the Aedes aegypti single-minded locus recapitulate embryonic ventral midline gene expression. PLoS Genet 2024; 20:e1010891. [PMID: 38683842 PMCID: PMC11081499 DOI: 10.1371/journal.pgen.1010891] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2023] [Revised: 05/09/2024] [Accepted: 04/16/2024] [Indexed: 05/02/2024] Open
Abstract
Transcriptional cis-regulatory modules, e.g., enhancers, control the time and location of metazoan gene expression. While changes in enhancers can provide a powerful force for evolution, there is also significant deep conservation of enhancers for developmentally important genes, with function and sequence characteristics maintained over hundreds of millions of years of divergence. Not well understood, however, is how the overall regulatory composition of a locus evolves, with important outstanding questions such as how many enhancers are conserved vs. novel, and to what extent are the locations of conserved enhancers within a locus maintained? We begin here to address these questions with a comparison of the respective single-minded (sim) loci in the two dipteran species Drosophila melanogaster (fruit fly) and Aedes aegypti (mosquito). sim encodes a highly conserved transcription factor that mediates development of the arthropod embryonic ventral midline. We identify two enhancers in the A. aegypti sim locus and demonstrate that they function equivalently in both transgenic flies and transgenic mosquitoes. One A. aegypti enhancer is highly similar to known Drosophila counterparts in its activity, location, and autoregulatory capability. The other differs from any known Drosophila sim enhancers with a novel location, failure to autoregulate, and regulation of expression in a unique subset of midline cells. Our results suggest that the conserved pattern of sim expression in the two species is the result of both conserved and novel regulatory sequences. Further examination of this locus will help to illuminate how the overall regulatory landscape of a conserved developmental gene evolves.
Collapse
Affiliation(s)
- Isabella Schember
- Department of Biochemistry, University at Buffalo-State University of New York, Buffalo, New York, United States of America
| | - William Reid
- Department of Biochemistry, University at Buffalo-State University of New York, Buffalo, New York, United States of America
| | - Geyenna Sterling-Lentsch
- Department of Biochemistry, University at Buffalo-State University of New York, Buffalo, New York, United States of America
| | - Marc S. Halfon
- Department of Biochemistry, University at Buffalo-State University of New York, Buffalo, New York, United States of America
- Department of Biomedical Informatics, University at Buffalo-State University of New York, Buffalo, New York, United States of America
- Department of Biological Sciences, University at Buffalo-State University of New York, Buffalo, New York, United States of America
- New York State Center of Excellence in Bioinformatics & Life Sciences, Buffalo, New York, United States of America
| |
Collapse
|
4
|
Cheatle Jarvela AM, Trelstad CS, Pick L. Anterior-posterior patterning of segments in Anopheles stephensi offers insights into the transition from sequential to simultaneous segmentation in holometabolous insects. JOURNAL OF EXPERIMENTAL ZOOLOGY. PART B, MOLECULAR AND DEVELOPMENTAL EVOLUTION 2023; 340:116-130. [PMID: 34734470 PMCID: PMC9061899 DOI: 10.1002/jez.b.23102] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/18/2021] [Revised: 10/13/2021] [Accepted: 10/16/2021] [Indexed: 11/10/2022]
Abstract
The gene regulatory network for segmentation in arthropods offers valuable insights into how networks evolve owing to the breadth of species examined and the extremely detailed knowledge gained in the model organism Drosophila melanogaster. These studies have shown that Drosophila's network represents a derived state that acquired changes to accelerate segment patterning, whereas most insects specify segments gradually as the embryo elongates. Such heterochronic shifts in segmentation have potentially emerged multiple times within holometabolous insects, resulting in many mechanistic variants and difficulties in isolating underlying commonalities that permit such shifts. Recent studies identified regulatory genes that work as timing factors, coordinating gene expression transitions during segmentation. These studies predict that changes in timing factor deployment explain shifts in segment patterning relative to other developmental events. Here, we test this hypothesis by characterizing the temporal and spatial expression of the pair-rule patterning genes in the malaria vector mosquito, Anopheles stephensi. This insect is a Dipteran (fly), like Drosophila, but represents an ancient divergence within this clade, offering a useful counterpart for evo-devo studies. In mosquito embryos, we observe anterior to posterior sequential addition of stripes for many pair-rule genes and a wave of broad timer gene expression across this axis. Segment polarity gene stripes are added sequentially in the wake of the timer gene wave and the full pattern is not complete until the embryo is fully elongated. This "progressive segmentation" mode in Anopheles displays commonalities with both Drosophila's rapid segmentation mechanism and sequential modes used by more distantly related insects.
Collapse
Affiliation(s)
- Alys M. Cheatle Jarvela
- Department of Entomology, University of Maryland, College Park, 4291 Fieldhouse Drive, College Park, MD 20742, U.S.A
| | - Catherine S. Trelstad
- Department of Entomology, University of Maryland, College Park, 4291 Fieldhouse Drive, College Park, MD 20742, U.S.A
| | - Leslie Pick
- Department of Entomology, University of Maryland, College Park, 4291 Fieldhouse Drive, College Park, MD 20742, U.S.A
| |
Collapse
|
5
|
Perkins ML, Gandara L, Crocker J. A synthetic synthesis to explore animal evolution and development. Philos Trans R Soc Lond B Biol Sci 2022; 377:20200517. [PMID: 35634925 PMCID: PMC9149795 DOI: 10.1098/rstb.2020.0517] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022] Open
Abstract
Identifying the general principles by which genotypes are converted into phenotypes remains a challenge in the post-genomic era. We still lack a predictive understanding of how genes shape interactions among cells and tissues in response to signalling and environmental cues, and hence how regulatory networks generate the phenotypic variation required for adaptive evolution. Here, we discuss how techniques borrowed from synthetic biology may facilitate a systematic exploration of evolvability across biological scales. Synthetic approaches permit controlled manipulation of both endogenous and fully engineered systems, providing a flexible platform for investigating causal mechanisms in vivo. Combining synthetic approaches with multi-level phenotyping (phenomics) will supply a detailed, quantitative characterization of how internal and external stimuli shape the morphology and behaviour of living organisms. We advocate integrating high-throughput experimental data with mathematical and computational techniques from a variety of disciplines in order to pursue a comprehensive theory of evolution. This article is part of the theme issue ‘Genetic basis of adaptation and speciation: from loci to causative mutations’.
Collapse
Affiliation(s)
- Mindy Liu Perkins
- Developmental Biology Unit, European Molecular Biology Laboratory, 69117 Heidelberg, Germany
| | - Lautaro Gandara
- Developmental Biology Unit, European Molecular Biology Laboratory, 69117 Heidelberg, Germany
| | - Justin Crocker
- Developmental Biology Unit, European Molecular Biology Laboratory, 69117 Heidelberg, Germany
| |
Collapse
|
6
|
Common Themes and Future Challenges in Understanding Gene Regulatory Network Evolution. Cells 2022; 11:cells11030510. [PMID: 35159319 PMCID: PMC8834487 DOI: 10.3390/cells11030510] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2021] [Revised: 01/26/2022] [Accepted: 01/29/2022] [Indexed: 12/18/2022] Open
Abstract
A major driving force behind the evolution of species-specific traits and novel structures is alterations in gene regulatory networks (GRNs). Comprehending evolution therefore requires an understanding of the nature of changes in GRN structure and the responsible mechanisms. Here, we review two insect pigmentation GRNs in order to examine common themes in GRN evolution and to reveal some of the challenges associated with investigating changes in GRNs across different evolutionary distances at the molecular level. The pigmentation GRN in Drosophila melanogaster and other drosophilids is a well-defined network for which studies from closely related species illuminate the different ways co-option of regulators can occur. The pigmentation GRN for butterflies of the Heliconius species group is less fully detailed but it is emerging as a useful model for exploring important questions about redundancy and modularity in cis-regulatory systems. Both GRNs serve to highlight the ways in which redeployment of trans-acting factors can lead to GRN rewiring and network co-option. To gain insight into GRN evolution, we discuss the importance of defining GRN architecture at multiple levels both within and between species and of utilizing a range of complementary approaches.
Collapse
|
7
|
Schember I, Halfon MS. Identification of new Anopheles gambiae transcriptional enhancers using a cross-species prediction approach. INSECT MOLECULAR BIOLOGY 2021; 30:410-419. [PMID: 33866636 PMCID: PMC8266755 DOI: 10.1111/imb.12705] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/05/2020] [Revised: 02/09/2021] [Accepted: 03/31/2021] [Indexed: 06/12/2023]
Abstract
The success of transgenic mosquito vector control approaches relies on well-targeted gene expression, requiring the identification and characterization of a diverse set of mosquito promoters and transcriptional enhancers. However, few enhancers have been characterized in Anopheles gambiae to date. Here, we employ the SCRMshaw method we previously developed to predict enhancers in the A. gambiae genome, preferentially targeting vector-relevant tissues such as the salivary glands, midgut and nervous system. We demonstrate a high overall success rate, with at least 8 of 11 (73%) tested sequences validating as enhancers in an in vivo xenotransgenic assay. Four tested sequences drive expression in either the salivary gland or the midgut, making them directly useful for probing the biology of these infection-relevant tissues. The success of our study suggests that computational enhancer prediction should serve as an effective means for identifying A. gambiae enhancers with activity in tissues involved in malaria propagation and transmission.
Collapse
Affiliation(s)
- Isabella Schember
- Department of Biochemistry, University at Buffalo-State University of New York, Buffalo, NY 14203
| | - Marc S. Halfon
- Department of Biochemistry, University at Buffalo-State University of New York, Buffalo, NY 14203
- Department of Biomedical Informatics, University at Buffalo-State University of New York, Buffalo, NY 14203
- Department of Biological Sciences, University at Buffalo-State University of New York, Buffalo, NY 14203
- NY State Center of Excellence in Bioinformatics & Life Sciences, Buffalo, NY 14203
- Department of Molecular and Cellular Biology and Program in Cancer Genetics, Roswell Park Comprehensive Cancer Center, Buffalo, NY 14263
| |
Collapse
|
8
|
Lezcano ÓM, Sánchez-Polo M, Ruiz JL, Gómez-Díaz E. Chromatin Structure and Function in Mosquitoes. Front Genet 2020; 11:602949. [PMID: 33365050 PMCID: PMC7750206 DOI: 10.3389/fgene.2020.602949] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/04/2020] [Accepted: 10/29/2020] [Indexed: 12/27/2022] Open
Abstract
The principles and function of chromatin and nuclear architecture have been extensively studied in model organisms, such as Drosophila melanogaster. However, little is known about the role of these epigenetic processes in transcriptional regulation in other insects including mosquitoes, which are major disease vectors and a worldwide threat for human health. Some of these life-threatening diseases are malaria, which is caused by protozoan parasites of the genus Plasmodium and transmitted by Anopheles mosquitoes; dengue fever, which is caused by an arbovirus mainly transmitted by Aedes aegypti; and West Nile fever, which is caused by an arbovirus transmitted by Culex spp. In this contribution, we review what is known about chromatin-associated mechanisms and the 3D genome structure in various mosquito vectors, including Anopheles, Aedes, and Culex spp. We also discuss the similarities between epigenetic mechanisms in mosquitoes and the model organism Drosophila melanogaster, and advocate that the field could benefit from the cross-application of state-of-the-art functional genomic technologies that are well-developed in the fruit fly. Uncovering the mosquito regulatory genome can lead to the discovery of unique regulatory networks associated with the parasitic life-style of these insects. It is also critical to understand the molecular interactions between the vectors and the pathogens that they transmit, which could hold the key to major breakthroughs on the fight against mosquito-borne diseases. Finally, it is clear that epigenetic mechanisms controlling mosquito environmental plasticity and evolvability are also of utmost importance, particularly in the current context of globalization and climate change.
Collapse
Affiliation(s)
| | | | - José L. Ruiz
- Instituto de Parasitología y Biomedicina López-Neyra (IPBLN), Consejo Superior de Investigaciones Científicas, Granada, Spain
| | - Elena Gómez-Díaz
- Instituto de Parasitología y Biomedicina López-Neyra (IPBLN), Consejo Superior de Investigaciones Científicas, Granada, Spain
| |
Collapse
|
9
|
Rivera J, Keränen SVE, Gallo SM, Halfon MS. REDfly: the transcriptional regulatory element database for Drosophila. Nucleic Acids Res 2020; 47:D828-D834. [PMID: 30329093 PMCID: PMC6323911 DOI: 10.1093/nar/gky957] [Citation(s) in RCA: 45] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2018] [Accepted: 10/04/2018] [Indexed: 12/21/2022] Open
Abstract
The REDfly database provides a comprehensive curation of experimentally-validated Drosophila transcriptional cis-regulatory elements and includes information on DNA sequence, experimental evidence, patterns of regulated gene expression, and more. Now in its thirteenth year, REDfly has grown to over 23 000 records of tested reporter gene constructs and 2200 tested transcription factor binding sites. Recent developments include the start of curation of predicted cis-regulatory modules in addition to experimentally-verified ones, improved search and filtering, and increased interaction with the authors of curated papers. An expanded data model that will capture information on temporal aspects of gene regulation, regulation in response to environmental and other non-developmental cues, sexually dimorphic gene regulation, and non-endogenous (ectopic) aspects of reporter gene expression is under development and expected to be in place within the coming year. REDfly is freely accessible at http://redfly.ccr.buffalo.edu, and news about database updates and new features can be followed on Twitter at @REDfly_database.
Collapse
Affiliation(s)
- John Rivera
- Center for Computational Research, State University of New York at Buffalo, Buffalo, NY 14203, USA.,New York State Center of Excellence in Bioinformatics and Life Sciences, State University of New York at Buffalo, Buffalo, NY 14203, USA
| | | | - Steven M Gallo
- Center for Computational Research, State University of New York at Buffalo, Buffalo, NY 14203, USA.,New York State Center of Excellence in Bioinformatics and Life Sciences, State University of New York at Buffalo, Buffalo, NY 14203, USA
| | - Marc S Halfon
- New York State Center of Excellence in Bioinformatics and Life Sciences, State University of New York at Buffalo, Buffalo, NY 14203, USA.,Department of Biochemistry, State University of New York at Buffalo, Buffalo, NY 14203, USA.,Department of Biomedical Informatics, State University of New York at Buffalo, Buffalo, NY 14203, USA.,Department of Biological Sciences, State University of New York at Buffalo, Buffalo, NY 14203, USA.,Department of Molecular and Cellular Biology and Program in Cancer Genetics, Roswell Park Cancer Institute, Buffalo, NY 14263, USA
| |
Collapse
|
10
|
On the specificity of gene regulatory networks: How does network co-option affect subsequent evolution? Curr Top Dev Biol 2020; 139:375-405. [PMID: 32450967 DOI: 10.1016/bs.ctdb.2020.03.002] [Citation(s) in RCA: 23] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/04/2023]
Abstract
The process of multicellular organismal development hinges upon the specificity of developmental programs: for different parts of the organism to form unique features, processes must exist to specify each part. This specificity is thought to be hardwired into gene regulatory networks, which activate cohorts of genes in particular tissues at particular times during development. However, the evolution of gene regulatory networks sometimes occurs by mechanisms that sacrifice specificity. One such mechanism is network co-option, in which existing gene networks are redeployed in new developmental contexts. While network co-option may offer an efficient mechanism for generating novel phenotypes, losses of tissue specificity at redeployed network genes could restrict the ability of the affected traits to evolve independently. At present, there has not been a detailed discussion regarding how tissue specificity of network genes might be altered due to gene network co-option at its initiation, as well as how trait independence can be retained or restored after network co-option. A lack of clarity about network co-option makes it more difficult to speculate on the long-term evolutionary implications of this mechanism. In this review, we will discuss the possible initial outcomes of network co-option, outline the mechanisms by which networks may retain or subsequently regain specificity after network co-option, and comment on some of the possible evolutionary consequences of network co-option. We place special emphasis on the need to consider selectively-neutral outcomes of network co-option to improve our understanding of the role of this mechanism in trait evolution.
Collapse
|
11
|
Tomoyasu Y, Halfon MS. How to study enhancers in non-traditional insect models. ACTA ACUST UNITED AC 2020; 223:223/Suppl_1/jeb212241. [PMID: 32034049 DOI: 10.1242/jeb.212241] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
Transcriptional enhancers are central to the function and evolution of genes and gene regulation. At the organismal level, enhancers play a crucial role in coordinating tissue- and context-dependent gene expression. At the population level, changes in enhancers are thought to be a major driving force that facilitates evolution of diverse traits. An amazing array of diverse traits seen in insect morphology, physiology and behavior has been the subject of research for centuries. Although enhancer studies in insects outside of Drosophila have been limited, recent advances in functional genomic approaches have begun to make such studies possible in an increasing selection of insect species. Here, instead of comprehensively reviewing currently available technologies for enhancer studies in established model organisms such as Drosophila, we focus on a subset of computational and experimental approaches that are likely applicable to non-Drosophila insects, and discuss the pros and cons of each approach. We discuss the importance of validating enhancer function and evaluate several possible validation methods, such as reporter assays and genome editing. Key points and potential pitfalls when establishing a reporter assay system in non-traditional insect models are also discussed. We close with a discussion of how to advance enhancer studies in insects, both by improving computational approaches and by expanding the genetic toolbox in various insects. Through these discussions, this Review provides a conceptual framework for studying the function and evolution of enhancers in non-traditional insect models.
Collapse
Affiliation(s)
| | - Marc S Halfon
- Department of Biochemistry, University at Buffalo-State University of New York, Buffalo, NY 14203, USA
| |
Collapse
|
12
|
Asma H, Halfon MS. Computational enhancer prediction: evaluation and improvements. BMC Bioinformatics 2019; 20:174. [PMID: 30953451 PMCID: PMC6451241 DOI: 10.1186/s12859-019-2781-x] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2019] [Accepted: 03/27/2019] [Indexed: 12/21/2022] Open
Abstract
BACKGROUND Identifying transcriptional enhancers and other cis-regulatory modules (CRMs) is an important goal of post-sequencing genome annotation. Computational approaches provide a useful complement to empirical methods for CRM discovery, but it is critical that we develop effective means to evaluate their performance in terms of estimating their sensitivity and specificity. RESULTS We introduce here pCRMeval, a pipeline for in silico evaluation of any enhancer prediction tools that are flexible enough to be applied to the Drosophila melanogaster genome. pCRMeval compares the result of predictions with the extensive existing knowledge of experimentally-validated Drosophila CRMs in order to estimate the precision and relative sensitivity of the prediction method. In the case of supervised prediction methods-when training data composed of validated CRMs are used-pCRMeval can also assess the sensitivity of specific training sets. We demonstrate the utility of pCRMeval through evaluation of our SCRMshaw CRM prediction method and training data. By measuring the impact of different parameters on SCRMshaw performance, as assessed by pCRMeval, we develop a more robust version of SCRMshaw, SCRMshaw_HD, that improves the number of predictions while maintaining sensitivity and specificity. Our analysis also demonstrates that SCRMshaw_HD, when applied to increasingly less well-assembled genomes, maintains its strong predictive power with only a minor drop-off in performance. CONCLUSION Our pCRMeval pipeline provides a general framework for evaluation that can be applied to any CRM prediction method, particularly a supervised method. While we make use of it here primarily to test and improve a particular method for CRM prediction, SCRMshaw, pCRMeval should provide a valuable platform to the research community not only for evaluating individual methods, but also for comparing between competing methods.
Collapse
Affiliation(s)
- Hasiba Asma
- Program in Genetics, Genomics, and Bioinformatics, University at Buffalo-State University of New York, 701 Ellicott St, Buffalo, NY, 14203, USA
| | - Marc S Halfon
- Program in Genetics, Genomics, and Bioinformatics, University at Buffalo-State University of New York, 701 Ellicott St, Buffalo, NY, 14203, USA.
- Department of Biochemistry, University at Buffalo-State University of New York, 701 Ellicott St, Buffalo, NY, 14203, USA.
- Department of Biological Sciences, University at Buffalo-State University of New York, 701 Ellicott St, Buffalo, NY, 14203, USA.
- Department of Biomedical Informatics, University at Buffalo-State University of New York, 701 Ellicott St, Buffalo, NY, 14203, USA.
- NY State Center of Excellence in Bioinformatics and Life Sciences, 701 Ellicott St, Buffalo, NY, 14203, USA.
- Molecular and Cellular Biology Department and Program in Cancer Genetics, Roswell Park Comprehensive Cancer Center, Buffalo, NY, 14263, USA.
| |
Collapse
|
13
|
Abstract
Although the number of sequenced insect genomes numbers in the hundreds, little is known about gene regulatory sequences in any species other than the well-studied Drosophila melanogaster. We provide here a detailed protocol for using SCRMshaw, a computational method for predicting cis-regulatory modules (CRMs, also "enhancers") in sequenced insect genomes. SCRMshaw is effective for CRM discovery throughout the range of holometabolous insects and potentially in even more diverged species, with true-positive prediction rates of 75% or better. Minimal requirements for using SCRMshaw are a genome sequence and training data in the form of known Drosophila CRMs; a comprehensive set of the latter can be obtained from the SCRMshaw download site. For basic applications, a user with only modest computational know-how can run SCRMshaw on a desktop computer. SCRMshaw can be run with a single, narrow set of training data to predict CRMs regulating a specific pattern of gene expression, or with multiple sets of training data covering a broad range of CRM activities to provide an initial rough regulatory annotation of a complete, newly-sequenced genome.
Collapse
Affiliation(s)
- Majid Kazemian
- Departments of Biochemistry and Computer Science, Purdue University, West Lafayette, IN, USA.
| | - Marc S Halfon
- Departments of Biochemistry, Biomedical Informatics, and Biological Sciences, University at Buffalo-State University of New York, Buffalo, NY, USA.
- NY State Center of Excellence in Bioinformatics and Life Sciences, Buffalo, NY, USA.
- Department of Molecular and Cellular Biology and Program in Cancer Genetics, Roswell Park Comprehensive Cancer Center, Buffalo, NY, USA.
| |
Collapse
|
14
|
Saul MC, Blatti C, Yang W, Bukhari SA, Shpigler HY, Troy JM, Seward CH, Sloofman L, Chandrasekaran S, Bell AM, Stubbs L, Robinson GE, Zhao SD, Sinha S. Cross-species systems analysis of evolutionary toolkits of neurogenomic response to social challenge. GENES BRAIN AND BEHAVIOR 2018; 18:e12502. [PMID: 29968347 DOI: 10.1111/gbb.12502] [Citation(s) in RCA: 27] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/18/2018] [Revised: 06/18/2018] [Accepted: 06/20/2018] [Indexed: 12/15/2022]
Abstract
Social challenges like territorial intrusions evoke behavioral responses in widely diverging species. Recent work has showed that evolutionary "toolkits"-genes and modules with lineage-specific variations but deep conservation of function-participate in the behavioral response to social challenge. Here, we develop a multispecies computational-experimental approach to characterize such a toolkit at a systems level. Brain transcriptomic responses to social challenge was probed via RNA-seq profiling in three diverged species-honey bees, mice and three-spined stickleback fish-following a common methodology, allowing fair comparisons across species. Data were collected from multiple brain regions and multiple time points after social challenge exposure, achieving anatomical and temporal resolution substantially greater than previous work. We developed statistically rigorous analyses equipped to find homologous functional groups among these species at the levels of individual genes, functional and coexpressed gene modules, and transcription factor subnetworks. We identified six orthogroups involved in response to social challenge, including groups represented by mouse genes Npas4 and Nr4a1, as well as common modulation of systems such as transcriptional regulators, ion channels, G-protein-coupled receptors and synaptic proteins. We also identified conserved coexpression modules enriched for mitochondrial fatty acid metabolism and heat shock that constitute the shared neurogenomic response. Our analysis suggests a toolkit wherein nuclear receptors, interacting with chaperones, induce transcriptional changes in mitochondrial activity, neural cytoarchitecture and synaptic transmission after social challenge. It shows systems-level mechanisms that have been repeatedly co-opted during evolution of analogous behaviors, thus advancing the genetic toolkit concept beyond individual genes.
Collapse
Affiliation(s)
- Michael C Saul
- Carl R. Woese Institute for Genomic Biology, University of Illinois at Urbana-Champaign, Urbana, Illinois
| | - Charles Blatti
- Carl R. Woese Institute for Genomic Biology, University of Illinois at Urbana-Champaign, Urbana, Illinois.,Department of Computer Science, University of Illinois at Urbana-Champaign, Urbana, Illinois
| | - Wei Yang
- Carl R. Woese Institute for Genomic Biology, University of Illinois at Urbana-Champaign, Urbana, Illinois.,Department of Computer Science, University of Illinois at Urbana-Champaign, Urbana, Illinois
| | - Syed A Bukhari
- Carl R. Woese Institute for Genomic Biology, University of Illinois at Urbana-Champaign, Urbana, Illinois.,Interdisciplinary Informatics Program, University of Illinois at Urbana-Champaign, Urbana, Illinois
| | - Hagai Y Shpigler
- Carl R. Woese Institute for Genomic Biology, University of Illinois at Urbana-Champaign, Urbana, Illinois.,Department of Ecology, Evolution and Behavior, Hebrew University, Jerusalem, Israel
| | - Joseph M Troy
- Carl R. Woese Institute for Genomic Biology, University of Illinois at Urbana-Champaign, Urbana, Illinois.,Interdisciplinary Informatics Program, University of Illinois at Urbana-Champaign, Urbana, Illinois
| | - Christopher H Seward
- Carl R. Woese Institute for Genomic Biology, University of Illinois at Urbana-Champaign, Urbana, Illinois.,Department of Cell and Developmental Biology, University of Illinois at Urbana-Champaign, Urbana, Illinois
| | - Laura Sloofman
- Carl R. Woese Institute for Genomic Biology, University of Illinois at Urbana-Champaign, Urbana, Illinois.,Genetics and Genomic Sciences, Mount Sinai Health System, New York, New York
| | | | - Alison M Bell
- Carl R. Woese Institute for Genomic Biology, University of Illinois at Urbana-Champaign, Urbana, Illinois.,Interdisciplinary Informatics Program, University of Illinois at Urbana-Champaign, Urbana, Illinois.,Department of Animal Biology, University of Illinois at Urbana-Champaign, Urbana, Illinois.,Neuroscience Program, University of Illinois at Urbana-Champaign, Urbana, Illinois
| | - Lisa Stubbs
- Carl R. Woese Institute for Genomic Biology, University of Illinois at Urbana-Champaign, Urbana, Illinois.,Interdisciplinary Informatics Program, University of Illinois at Urbana-Champaign, Urbana, Illinois.,Department of Cell and Developmental Biology, University of Illinois at Urbana-Champaign, Urbana, Illinois.,Neuroscience Program, University of Illinois at Urbana-Champaign, Urbana, Illinois
| | - Gene E Robinson
- Carl R. Woese Institute for Genomic Biology, University of Illinois at Urbana-Champaign, Urbana, Illinois.,Neuroscience Program, University of Illinois at Urbana-Champaign, Urbana, Illinois.,Department of Entomology, University of Illinois at Urbana-Champaign, Urbana, Illinois
| | - Sihai D Zhao
- Carl R. Woese Institute for Genomic Biology, University of Illinois at Urbana-Champaign, Urbana, Illinois.,Department of Statistics, University of Illinois at Urbana-Champaign, Urbana, Illinois
| | - Saurabh Sinha
- Carl R. Woese Institute for Genomic Biology, University of Illinois at Urbana-Champaign, Urbana, Illinois.,Department of Computer Science, University of Illinois at Urbana-Champaign, Urbana, Illinois.,Department of Entomology, University of Illinois at Urbana-Champaign, Urbana, Illinois
| |
Collapse
|
15
|
Carrillo-Baltodano AM, Meyer NP. Decoupling brain from nerve cord development in the annelid Capitella teleta: Insights into the evolution of nervous systems. Dev Biol 2017; 431:134-144. [DOI: 10.1016/j.ydbio.2017.09.022] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2017] [Revised: 09/17/2017] [Accepted: 09/17/2017] [Indexed: 10/18/2022]
|
16
|
Perspectives on Gene Regulatory Network Evolution. Trends Genet 2017; 33:436-447. [PMID: 28528721 DOI: 10.1016/j.tig.2017.04.005] [Citation(s) in RCA: 46] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2017] [Revised: 04/24/2017] [Accepted: 04/25/2017] [Indexed: 11/23/2022]
Abstract
Animal development proceeds through the activity of genes and their cis-regulatory modules (CRMs) working together in sets of gene regulatory networks (GRNs). The emergence of species-specific traits and novel structures results from evolutionary changes in GRNs. Recent work in a wide variety of animal models, and particularly in insects, has started to reveal the modes and mechanisms of GRN evolution. I discuss here various aspects of GRN evolution and argue that developmental system drift (DSD), in which conserved phenotype is nevertheless a result of changed genetic interactions, should regularly be viewed from the perspective of GRN evolution. Advances in methods to discover related CRMs in diverse insect species, a critical requirement for detailed GRN characterization, are also described.
Collapse
|