1
|
Asma H, Liu L, Halfon MS. SCRMshaw: Supervised cis-regulatory module prediction for insect genomes. PLoS One 2024; 19:e0311752. [PMID: 39637210 PMCID: PMC11620701 DOI: 10.1371/journal.pone.0311752] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2024] [Accepted: 09/24/2024] [Indexed: 12/07/2024] Open
Abstract
As the number of sequenced insect genomes continues to grow, there is a pressing need for rapid and accurate annotation of their regulatory component. SCRMshaw is a computational tool designed to predict cis-regulatory modules ("enhancers") in the genomes of various insect species. A key advantage of SCRMshaw is its accessibility. It requires minimal resources-just a genome sequence and training data from known Drosophila regulatory sequences, which are readily available for download. Even users with modest computational skills can run SCRMshaw on a desktop computer for basic applications, although a high-performance computing cluster is recommended for optimal results. SCRMshaw can be tailored to specific needs: users can employ a single set of training data to predict enhancers associated with a particular gene expression pattern, or utilize multiple sets to provide a first-pass regulatory annotation for a newly-sequenced genome. This protocol provides an extensive update to the previously published SCRMshaw protocol and aligns with the methods used in a recent annotation of over 30 insect regulatory genomes. It includes the most recent modifications to the SCRMshaw protocol and details an end-to-end pipeline that begins with a sequenced genome and ends with a fully-annotated regulatory genome. Relevant scripts are available via GitHub, and a living protocol that will be updated as necessary is linked to this article at protocols.io.
Collapse
Affiliation(s)
- Hasiba Asma
- Departments of Biochemistry, University at Buffalo-State University of New York, Buffalo, NY, United States of America
| | - Luna Liu
- Biomedical Informatics, University at Buffalo-State University of New York, Buffalo, NY, United States of America
| | - Marc S. Halfon
- Departments of Biochemistry, University at Buffalo-State University of New York, Buffalo, NY, United States of America
- Biomedical Informatics, University at Buffalo-State University of New York, Buffalo, NY, United States of America
- Biological Sciences, University at Buffalo-State University of New York, Buffalo, NY, United States of America
| |
Collapse
|
2
|
Asma H, Tieke E, Deem KD, Rahmat J, Dong T, Huang X, Tomoyasu Y, Halfon MS. Regulatory genome annotation of 33 insect species. eLife 2024; 13:RP96738. [PMID: 39392676 PMCID: PMC11469670 DOI: 10.7554/elife.96738] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/12/2024] Open
Abstract
Annotation of newly sequenced genomes frequently includes genes, but rarely covers important non-coding genomic features such as the cis-regulatory modules-e.g., enhancers and silencers-that regulate gene expression. Here, we begin to remedy this situation by developing a workflow for rapid initial annotation of insect regulatory sequences, and provide a searchable database resource with enhancer predictions for 33 genomes. Using our previously developed SCRMshaw computational enhancer prediction method, we predict over 2.8 million regulatory sequences along with the tissues where they are expected to be active, in a set of insect species ranging over 360 million years of evolution. Extensive analysis and validation of the data provides several lines of evidence suggesting that we achieve a high true-positive rate for enhancer prediction. One, we show that our predictions target specific loci, rather than random genomic locations. Two, we predict enhancers in orthologous loci across a diverged set of species to a significantly higher degree than random expectation would allow. Three, we demonstrate that our predictions are highly enriched for regions of accessible chromatin. Four, we achieve a validation rate in excess of 70% using in vivo reporter gene assays. As we continue to annotate both new tissues and new species, our regulatory annotation resource will provide a rich source of data for the research community and will have utility for both small-scale (single gene, single species) and large-scale (many genes, many species) studies of gene regulation. In particular, the ability to search for functionally related regulatory elements in orthologous loci should greatly facilitate studies of enhancer evolution even among distantly related species.
Collapse
Affiliation(s)
- Hasiba Asma
- Program in Genetics, Genomics, and Bioinformatics, University at Buffalo-State University of New YorkBuffaloUnited States
| | - Ellen Tieke
- Department of Biology, Miami UniversityOxfordUnited States
| | - Kevin D Deem
- Department of Biology, Miami UniversityOxfordUnited States
| | - Jabale Rahmat
- Department of Biology, Miami UniversityOxfordUnited States
| | - Tiffany Dong
- Department of Biochemistry, University at Buffalo-State University of New YorkBuffaloUnited States
| | - Xinbo Huang
- Department of Biochemistry, University at Buffalo-State University of New YorkBuffaloUnited States
| | | | - Marc S Halfon
- Program in Genetics, Genomics, and Bioinformatics, University at Buffalo-State University of New YorkBuffaloUnited States
- Department of Biochemistry, University at Buffalo-State University of New YorkBuffaloUnited States
- Department of Biomedical Informatics, University at Buffalo-State University of New YorkBuffaloUnited States
- Department of Biological Sciences, University at Buffalo-State University of New YorkBuffaloUnited States
| |
Collapse
|
3
|
Dyer NA, Lucas ER, Nagi SC, McDermott DP, Brenas JH, Miles A, Clarkson CS, Mawejje HD, Wilding CS, Halfon MS, Asma H, Heinz E, Donnelly MJ. Mechanisms of transcriptional regulation in Anopheles gambiae revealed by allele-specific expression. Proc Biol Sci 2024; 291:20241142. [PMID: 39288798 PMCID: PMC11407855 DOI: 10.1098/rspb.2024.1142] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2024] [Revised: 07/05/2024] [Accepted: 07/24/2024] [Indexed: 09/19/2024] Open
Abstract
Malaria control relies on insecticides targeting the mosquito vector, but this is increasingly compromised by insecticide resistance, which can be achieved by elevated expression of detoxifying enzymes that metabolize the insecticide. In diploid organisms, gene expression is regulated both in cis, by regulatory sequences on the same chromosome, and by trans acting factors, affecting both alleles equally. Differing levels of transcription can be caused by mutations in cis-regulatory modules (CRM), but few of these have been identified in mosquitoes. We crossed bendiocarb-resistant and susceptible Anopheles gambiae strains to identify cis-regulated genes that might be responsible for the resistant phenotype using RNAseq, and CRM sequences controlling gene expression in insecticide resistance relevant tissues were predicted using machine learning. We found 115 genes showing allele-specific expression (ASE) in hybrids of insecticide susceptible and resistant strains, suggesting cis-regulation is an important mechanism of gene expression regulation in A. gambiae. The genes showing ASE included a higher proportion of Anopheles-specific genes on average younger than genes with balanced allelic expression.
Collapse
Affiliation(s)
- Naomi A. Dyer
- Department of Vector Biology, Liverpool School of Tropical Medicine, Pembroke Place, LiverpoolL3 5QA, UK
| | - Eric R. Lucas
- Department of Vector Biology, Liverpool School of Tropical Medicine, Pembroke Place, LiverpoolL3 5QA, UK
| | - Sanjay C. Nagi
- Department of Vector Biology, Liverpool School of Tropical Medicine, Pembroke Place, LiverpoolL3 5QA, UK
| | - Daniel P. McDermott
- Department of Vector Biology, Liverpool School of Tropical Medicine, Pembroke Place, LiverpoolL3 5QA, UK
| | - Jon H. Brenas
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, CambridgeCB10 1SA, UK
| | - Alistair Miles
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, CambridgeCB10 1SA, UK
| | - Chris S. Clarkson
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, CambridgeCB10 1SA, UK
| | - Henry D. Mawejje
- Infectious Diseases Research Collaboration (IDRC), Plot 2C Nakasero Hill Road, PO Box 7475, Kampala, Uganda
| | - Craig S. Wilding
- School of Biological and Environmental Sciences, Liverpool John Moores University, Byrom Street, LiverpoolL3 3AF, UK
| | - Marc S. Halfon
- Department of Biochemistry, Jacobs School of Medicine & Biomedical Sciences, University at Buffalo-State University of New York, 955 Main Street, Buffalo, NY14203, USA
| | - Hasiba Asma
- Department of Biochemistry, Jacobs School of Medicine & Biomedical Sciences, University at Buffalo-State University of New York, 955 Main Street, Buffalo, NY14203, USA
| | - Eva Heinz
- Department of Vector Biology, Liverpool School of Tropical Medicine, Pembroke Place, LiverpoolL3 5QA, UK
- Strathclyde Institute of Pharmacy & Biomedical Sciences, University of Strathclyde, GlasgowG4 0RE, UK
- Department of Clinical Sciences, Liverpool School of Tropical Medicine, Pembroke Place, LiverpoolL3 5QA, UK
| | - Martin J. Donnelly
- Department of Vector Biology, Liverpool School of Tropical Medicine, Pembroke Place, LiverpoolL3 5QA, UK
| |
Collapse
|
4
|
Dyer NA, Lucas ER, Nagi SC, McDermott DP, Brenas JH, Miles A, Clarkson CS, Mawejje HD, Wilding CS, Halfon MS, Asma H, Heinz E, Donnelly MJ. Mechanisms of transcriptional regulation in Anopheles gambiae revealed by allele specific expression. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.11.22.568226. [PMID: 38045426 PMCID: PMC10690255 DOI: 10.1101/2023.11.22.568226] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/05/2023]
Abstract
Malaria control relies on insecticides targeting the mosquito vector, but this is increasingly compromised by insecticide resistance, which can be achieved by elevated expression of detoxifying enzymes that metabolize the insecticide. In diploid organisms, gene expression is regulated both in cis, by regulatory sequences on the same chromosome, and by trans acting factors, affecting both alleles equally. Differing levels of transcription can be caused by mutations in cis-regulatory modules (CRM), but few of these have been identified in mosquitoes. We crossed bendiocarb resistant and susceptible Anopheles gambiae strains to identify cis-regulated genes that might be responsible for the resistant phenotype using RNAseq, and cis-regulatory module sequences controlling gene expression in insecticide resistance relevant tissues were predicted using machine learning. We found 115 genes showing allele specific expression in hybrids of insecticide susceptible and resistant strains, suggesting cis regulation is an important mechanism of gene expression regulation in Anopheles gambiae. The genes showing allele specific expression included a higher proportion of Anopheles specific genes on average younger than genes those with balanced allelic expression.
Collapse
Affiliation(s)
- Naomi A Dyer
- Department of Vector Biology, Liverpool School of Tropical Medicine, Pembroke Place, Liverpool, L3 5QA, UK
| | - Eric R Lucas
- Department of Vector Biology, Liverpool School of Tropical Medicine, Pembroke Place, Liverpool, L3 5QA, UK
| | - Sanjay C Nagi
- Department of Vector Biology, Liverpool School of Tropical Medicine, Pembroke Place, Liverpool, L3 5QA, UK
| | - Daniel P McDermott
- Department of Vector Biology, Liverpool School of Tropical Medicine, Pembroke Place, Liverpool, L3 5QA, UK
| | - Jon H Brenas
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SA, UK
| | - Alistair Miles
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SA, UK
| | - Chris S Clarkson
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SA, UK
| | - Henry D Mawejje
- Infectious Diseases Research Collaboration (IDRC), Plot 2C Nakasero Hill Road, P.O.Box 7475, Kampala, Uganda
| | - Craig S Wilding
- School of Biological and Environmental Sciences, Liverpool John Moores University, Byrom Street, Liverpool, L3 3AF, UK
| | - Marc S Halfon
- Department of Biochemistry, Jacobs School of Medicine & Biomedical Sciences, University at Buffalo-State University of New York, 955 Main Street, Buffalo, New York 14203, USA
| | - Hasiba Asma
- Department of Biochemistry, Jacobs School of Medicine & Biomedical Sciences, University at Buffalo-State University of New York, 955 Main Street, Buffalo, New York 14203, USA
| | - Eva Heinz
- Department of Vector Biology, Liverpool School of Tropical Medicine, Pembroke Place, Liverpool, L3 5QA, UK
- Department of Clinical Sciences, Liverpool School of Tropical Medicine, Pembroke Place, Liverpool, L3 5QA, UK
| | - Martin J Donnelly
- Department of Vector Biology, Liverpool School of Tropical Medicine, Pembroke Place, Liverpool, L3 5QA, UK
| |
Collapse
|
5
|
Weinstein ML, Jaenke CM, Asma H, Spangler M, Kohnen KA, Konys CC, Williams ME, Williams AV, Rebeiz M, Halfon MS, Williams TM. A novel role for trithorax in the gene regulatory network for a rapidly evolving fruit fly pigmentation trait. PLoS Genet 2023; 19:e1010653. [PMID: 36795790 PMCID: PMC9977049 DOI: 10.1371/journal.pgen.1010653] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2022] [Revised: 03/01/2023] [Accepted: 02/03/2023] [Indexed: 02/17/2023] Open
Abstract
Animal traits develop through the expression and action of numerous regulatory and realizator genes that comprise a gene regulatory network (GRN). For each GRN, its underlying patterns of gene expression are controlled by cis-regulatory elements (CREs) that bind activating and repressing transcription factors. These interactions drive cell-type and developmental stage-specific transcriptional activation or repression. Most GRNs remain incompletely mapped, and a major barrier to this daunting task is CRE identification. Here, we used an in silico method to identify predicted CREs (pCREs) that comprise the GRN which governs sex-specific pigmentation of Drosophila melanogaster. Through in vivo assays, we demonstrate that many pCREs activate expression in the correct cell-type and developmental stage. We employed genome editing to demonstrate that two CREs control the pupal abdomen expression of trithorax, whose function is required for the dimorphic phenotype. Surprisingly, trithorax had no detectable effect on this GRN's key trans-regulators, but shapes the sex-specific expression of two realizator genes. Comparison of sequences orthologous to these CREs supports an evolutionary scenario where these trithorax CREs predated the origin of the dimorphic trait. Collectively, this study demonstrates how in silico approaches can shed novel insights on the GRN basis for a trait's development and evolution.
Collapse
Affiliation(s)
- Michael L. Weinstein
- Department of Biology, University of Dayton, 300 College Park, Dayton, Ohio, United States of America
| | - Chad M. Jaenke
- Department of Biology, University of Dayton, 300 College Park, Dayton, Ohio, United States of America
| | - Hasiba Asma
- Program in Genetics, Genomics, and Bioinformatics, University at Buffalo-State University of New York, Buffalo, New York, United States of America
| | - Matthew Spangler
- Department of Biology, University of Dayton, 300 College Park, Dayton, Ohio, United States of America
| | - Katherine A. Kohnen
- Department of Biology, University of Dayton, 300 College Park, Dayton, Ohio, United States of America
| | - Claire C. Konys
- Department of Biology, University of Dayton, 300 College Park, Dayton, Ohio, United States of America
| | - Melissa E. Williams
- Department of Biology, University of Dayton, 300 College Park, Dayton, Ohio, United States of America
| | - Ashley V. Williams
- West Carrollton High School, 5833 Student St., Dayton, Ohio, United States of America
| | - Mark Rebeiz
- Department of Biological Sciences, University of Pittsburgh, Pittsburgh, Pennsylvania, United States of America
| | - Marc S. Halfon
- Program in Genetics, Genomics, and Bioinformatics, University at Buffalo-State University of New York, Buffalo, New York, United States of America
- Department of Biochemistry, University at Buffalo-State University of New York, Buffalo, New York, United States of America
| | - Thomas M. Williams
- Department of Biology, University of Dayton, 300 College Park, Dayton, Ohio, United States of America
- The Integrative Science and Engineering Center, University of Dayton, 300 College Park, Dayton, Ohio, United States of America
- * E-mail:
| |
Collapse
|
6
|
Schember I, Halfon MS. Identification of new Anopheles gambiae transcriptional enhancers using a cross-species prediction approach. INSECT MOLECULAR BIOLOGY 2021; 30:410-419. [PMID: 33866636 PMCID: PMC8266755 DOI: 10.1111/imb.12705] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/05/2020] [Revised: 02/09/2021] [Accepted: 03/31/2021] [Indexed: 06/12/2023]
Abstract
The success of transgenic mosquito vector control approaches relies on well-targeted gene expression, requiring the identification and characterization of a diverse set of mosquito promoters and transcriptional enhancers. However, few enhancers have been characterized in Anopheles gambiae to date. Here, we employ the SCRMshaw method we previously developed to predict enhancers in the A. gambiae genome, preferentially targeting vector-relevant tissues such as the salivary glands, midgut and nervous system. We demonstrate a high overall success rate, with at least 8 of 11 (73%) tested sequences validating as enhancers in an in vivo xenotransgenic assay. Four tested sequences drive expression in either the salivary gland or the midgut, making them directly useful for probing the biology of these infection-relevant tissues. The success of our study suggests that computational enhancer prediction should serve as an effective means for identifying A. gambiae enhancers with activity in tissues involved in malaria propagation and transmission.
Collapse
Affiliation(s)
- Isabella Schember
- Department of Biochemistry, University at Buffalo-State University of New York, Buffalo, NY 14203
| | - Marc S. Halfon
- Department of Biochemistry, University at Buffalo-State University of New York, Buffalo, NY 14203
- Department of Biomedical Informatics, University at Buffalo-State University of New York, Buffalo, NY 14203
- Department of Biological Sciences, University at Buffalo-State University of New York, Buffalo, NY 14203
- NY State Center of Excellence in Bioinformatics & Life Sciences, Buffalo, NY 14203
- Department of Molecular and Cellular Biology and Program in Cancer Genetics, Roswell Park Comprehensive Cancer Center, Buffalo, NY 14263
| |
Collapse
|
7
|
Tomoyasu Y, Halfon MS. How to study enhancers in non-traditional insect models. ACTA ACUST UNITED AC 2020; 223:223/Suppl_1/jeb212241. [PMID: 32034049 DOI: 10.1242/jeb.212241] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
Transcriptional enhancers are central to the function and evolution of genes and gene regulation. At the organismal level, enhancers play a crucial role in coordinating tissue- and context-dependent gene expression. At the population level, changes in enhancers are thought to be a major driving force that facilitates evolution of diverse traits. An amazing array of diverse traits seen in insect morphology, physiology and behavior has been the subject of research for centuries. Although enhancer studies in insects outside of Drosophila have been limited, recent advances in functional genomic approaches have begun to make such studies possible in an increasing selection of insect species. Here, instead of comprehensively reviewing currently available technologies for enhancer studies in established model organisms such as Drosophila, we focus on a subset of computational and experimental approaches that are likely applicable to non-Drosophila insects, and discuss the pros and cons of each approach. We discuss the importance of validating enhancer function and evaluate several possible validation methods, such as reporter assays and genome editing. Key points and potential pitfalls when establishing a reporter assay system in non-traditional insect models are also discussed. We close with a discussion of how to advance enhancer studies in insects, both by improving computational approaches and by expanding the genetic toolbox in various insects. Through these discussions, this Review provides a conceptual framework for studying the function and evolution of enhancers in non-traditional insect models.
Collapse
Affiliation(s)
| | - Marc S Halfon
- Department of Biochemistry, University at Buffalo-State University of New York, Buffalo, NY 14203, USA
| |
Collapse
|
8
|
Asma H, Halfon MS. Computational enhancer prediction: evaluation and improvements. BMC Bioinformatics 2019; 20:174. [PMID: 30953451 PMCID: PMC6451241 DOI: 10.1186/s12859-019-2781-x] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2019] [Accepted: 03/27/2019] [Indexed: 12/21/2022] Open
Abstract
BACKGROUND Identifying transcriptional enhancers and other cis-regulatory modules (CRMs) is an important goal of post-sequencing genome annotation. Computational approaches provide a useful complement to empirical methods for CRM discovery, but it is critical that we develop effective means to evaluate their performance in terms of estimating their sensitivity and specificity. RESULTS We introduce here pCRMeval, a pipeline for in silico evaluation of any enhancer prediction tools that are flexible enough to be applied to the Drosophila melanogaster genome. pCRMeval compares the result of predictions with the extensive existing knowledge of experimentally-validated Drosophila CRMs in order to estimate the precision and relative sensitivity of the prediction method. In the case of supervised prediction methods-when training data composed of validated CRMs are used-pCRMeval can also assess the sensitivity of specific training sets. We demonstrate the utility of pCRMeval through evaluation of our SCRMshaw CRM prediction method and training data. By measuring the impact of different parameters on SCRMshaw performance, as assessed by pCRMeval, we develop a more robust version of SCRMshaw, SCRMshaw_HD, that improves the number of predictions while maintaining sensitivity and specificity. Our analysis also demonstrates that SCRMshaw_HD, when applied to increasingly less well-assembled genomes, maintains its strong predictive power with only a minor drop-off in performance. CONCLUSION Our pCRMeval pipeline provides a general framework for evaluation that can be applied to any CRM prediction method, particularly a supervised method. While we make use of it here primarily to test and improve a particular method for CRM prediction, SCRMshaw, pCRMeval should provide a valuable platform to the research community not only for evaluating individual methods, but also for comparing between competing methods.
Collapse
Affiliation(s)
- Hasiba Asma
- Program in Genetics, Genomics, and Bioinformatics, University at Buffalo-State University of New York, 701 Ellicott St, Buffalo, NY, 14203, USA
| | - Marc S Halfon
- Program in Genetics, Genomics, and Bioinformatics, University at Buffalo-State University of New York, 701 Ellicott St, Buffalo, NY, 14203, USA.
- Department of Biochemistry, University at Buffalo-State University of New York, 701 Ellicott St, Buffalo, NY, 14203, USA.
- Department of Biological Sciences, University at Buffalo-State University of New York, 701 Ellicott St, Buffalo, NY, 14203, USA.
- Department of Biomedical Informatics, University at Buffalo-State University of New York, 701 Ellicott St, Buffalo, NY, 14203, USA.
- NY State Center of Excellence in Bioinformatics and Life Sciences, 701 Ellicott St, Buffalo, NY, 14203, USA.
- Molecular and Cellular Biology Department and Program in Cancer Genetics, Roswell Park Comprehensive Cancer Center, Buffalo, NY, 14263, USA.
| |
Collapse
|