1
|
Ma S, Su T, Lu X, Qi Q. Bacterial genome reduction for optimal chassis of synthetic biology: a review. Crit Rev Biotechnol 2024; 44:660-673. [PMID: 37380345 DOI: 10.1080/07388551.2023.2208285] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2022] [Revised: 10/13/2022] [Accepted: 02/20/2023] [Indexed: 06/30/2023]
Abstract
Bacteria with streamlined genomes, that harbor full functional genes for essential metabolic networks, are able to synthesize the desired products more effectively and thus have advantages as production platforms in industrial applications. To obtain streamlined chassis genomes, a large amount of effort has been made to reduce existing bacterial genomes. This work falls into two categories: rational and random reduction. The identification of essential gene sets and the emergence of various genome-deletion techniques have greatly promoted genome reduction in many bacteria over the past few decades. Some of the constructed genomes possessed desirable properties for industrial applications, such as: increased genome stability, transformation capacity, cell growth, and biomaterial productivity. The decreased growth and perturbations in physiological phenotype of some genome-reduced strains may limit their applications as optimized cell factories. This review presents an assessment of the advancements made to date in bacterial genome reduction to construct optimal chassis for synthetic biology, including: the identification of essential gene sets, the genome-deletion techniques, the properties and industrial applications of artificially streamlined genomes, the obstacles encountered in constructing reduced genomes, and the future perspectives.
Collapse
Affiliation(s)
- Shuai Ma
- State Key Laboratory of Microbial Technology, Shandong University, Qingdao, P. R. China
| | - Tianyuan Su
- State Key Laboratory of Microbial Technology, Shandong University, Qingdao, P. R. China
| | - Xuemei Lu
- State Key Laboratory of Microbial Technology, Shandong University, Qingdao, P. R. China
| | - Qingsheng Qi
- State Key Laboratory of Microbial Technology, Shandong University, Qingdao, P. R. China
| |
Collapse
|
2
|
Tiani KA, Stover PJ. DTYMK is an essential gene in mice and heterozygosity does not cause neural tube defects. Arch Biochem Biophys 2024; 755:109991. [PMID: 38621447 DOI: 10.1016/j.abb.2024.109991] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2024] [Revised: 04/03/2024] [Accepted: 04/11/2024] [Indexed: 04/17/2024]
Abstract
Regulation of nucleotide biosynthesis is necessary for maintaining cellular processes including DNA replication and repair. A key enzyme in this process is deoxythymidylate kinase (dTYMK), which catalyzes the initial step in the production of dTTP from dTMP. This gene constitutes the first merged step of dTTP synthesis from the de novo and salvage pathways which regulate dTMP biosynthesis. Decreased de novo dTMP biosynthesis causes dysregulated dTTP:dUTP pools, and leads to increased uracil in DNA and neural tube closure defect (NTD) development in mice. The goal of this research was to investigate if dTYMK, the downstream enzyme in dTTP production, is an essential gene in mice and if impairments in dTYMK play a causal role in development including NTD pathology in mice. Dtymk+/- C57BL/6J females were weaned onto either a control, excess folic acid, or folic acid deficient diet and timed breeding was performed after 8 weeks on diet. The offspring were analyzed for NTDs and other reproductive outcomes at embryonic day 12.5 (E12.5). Dtymk-/- mice were confirmed to be embryonic lethal before E12.5, and Dtymk+/- mice on all three experimental diets did not show the presence of open neural tube defects, spina bifida or exencephaly. However, the expression of dTYMK in Dtymk+/- mouse embryos was confirmed to be decreased by approximately 3-fold compared to Dtymk+/+ embryos. Although dTYMK was demonstrated to be an essential gene in mice and is required for the regulation of nucleotide pools in vitro, there was no evidence of increased risk of NTDs because of a reduction in expression of this enzyme during embryonic development. It is possible that a further reduction in expression may be required to see developmental anomalies in C57BL/6J mice.
Collapse
Affiliation(s)
- Kendra A Tiani
- Division of Nutritional Sciences, Cornell University, Ithaca, NY, 14853, USA
| | - Patrick J Stover
- College of Agriculture and Life Sciences, Texas A&M University, College Station, TX, 77843-2142, USA.
| |
Collapse
|
3
|
Yin R, Gutierrez A, Kobren SN, Avillach P. VarPPUD: Variant post prioritization developed for undiagnosed genetic disorders. medRxiv 2024:2024.04.15.24305876. [PMID: 38699371 PMCID: PMC11065012 DOI: 10.1101/2024.04.15.24305876] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/05/2024]
Abstract
Rare and ultra-rare genetic conditions are estimated to impact nearly 1 in 17 people worldwide, yet accurately pinpointing the diagnostic variants underlying each of these conditions remains a formidable challenge. Because comprehensive, in vivo functional assessment of all possible genetic variants is infeasible, clinicians instead consider in silico variant pathogenicity predictions to distinguish plausibly disease-causing from benign variants across the genome. However, in the most difficult undiagnosed cases, such as those accepted to the Undiagnosed Diseases Network (UDN), existing pathogenicity predictions cannot reliably discern true etiological variant(s) from other deleterious candidate variants that were prioritized through N-of-1 efforts. Pinpointing the disease-causing variant from a pool of plausible candidates remains a largely manual effort requiring extensive clinical workups, functional and experimental assays, and eventual identification of genotype- and phenotype-matched individuals. Here, we introduce VarPPUD, a tool trained on prioritized variants from UDN cases, that leverages gene-, amino acid-, and nucleotide-level features to discern pathogenic variants from other deleterious variants that are unlikely to be confirmed as disease relevant. VarPPUD achieves a cross-validated accuracy of 79.3% and precision of 77.5% on a held-out subset of uniquely challenging UDN cases, respectively representing an average 18.6% and 23.4% improvement over nine traditional pathogenicity prediction approaches on this task. We validate VarPPUD's ability to discriminate likely from unlikely pathogenic variants on synthetic, GAN-generated candidate variants as well. Finally, we show how VarPPUD can be probed to evaluate each input feature's importance and contribution toward prediction-an essential step toward understanding the distinct characteristics of newly-uncovered disease-causing variants. Significance Statement Patients with chronic, undiagnosed and underdiagnosed genetic conditions often endure expensive and excruciating years-long diagnostic odysseys without clear results. In many instances, clinical genome sequencing of patients and their family members fails to reveal known disease-causing variants, although compelling variants of uncertain significance are frequently encountered. Existing computational tools struggle to reliably differentiate truly disease-causing variants from other plausible candidate variants within these prioritized sets. Consequently, the confirmation of disease-causing variants often necessitates extensive experimental follow-up, including studies in model organisms and identification of other similarly presenting genotype-matched individuals, a process that can extend for several years. Here, we present VarPPUD, a tool trained specifically to distinguish likely from unlikely to be confirmed pathogenic variants that were prioritized across cases in the Undiagnosed Diseases Network. By evaluating the importance and impact of different input feature values on prediction, we gain deeper insights into the distinctive attributes of difficult-to-identify diagnostic variants. For patients who remain undiagnosed following comprehensive whole genome sequencing, our new method VarPPUD may reveal pathogenic variants amid a pool of candidate variants, thereby advancing diagnostic efforts where progress has otherwise stalled.
Collapse
|
4
|
Mu W, Luo T, Barrera A, Bounds LR, Klann TS, Ter Weele M, Bryois J, Crawford GE, Sullivan PF, Gersbach CA, Love MI, Li Y. Machine learning methods for predicting guide RNA effects in CRISPR epigenome editing experiments. bioRxiv 2024:2024.04.18.590188. [PMID: 38659894 PMCID: PMC11042384 DOI: 10.1101/2024.04.18.590188] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/26/2024]
Abstract
CRISPR epigenomic editing technologies enable functional interrogation of non-coding elements. However, current computational methods for guide RNA (gRNA) design do not effectively predict the power potential, molecular and cellular impact to optimize for efficient gRNAs, which are crucial for successful applications of these technologies. We present "launch-dCas9" (machine LeArning based UNified CompreHensive framework for CRISPR-dCas9) to predict gRNA impact from multiple perspectives, including cell fitness, wildtype abundance (gauging power potential), and gene expression in single cells. Our launchdCas9, built and evaluated using experiments involving >1 million gRNAs targeted across the human genome, demonstrates relatively high prediction accuracy (AUC up to 0.81) and generalizes across cell lines. Method-prioritized top gRNA(s) are 4.6-fold more likely to exert effects, compared to other gRNAs in the same cis-regulatory region. Furthermore, launchdCas9 identifies the most critical sequence-related features and functional annotations from >40 features considered. Our results establish launch-dCas9 as a promising approach to design gRNAs for CRISPR epigenomic experiments.
Collapse
Affiliation(s)
- Wancen Mu
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | - Tianyou Luo
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | - Alejandro Barrera
- Center for Genomic and Computational Biology, Duke University, Durham, NC, USA
- Center for Advanced Genomic Technologies, Duke University, Durham, NC, USA
| | - Lexi R Bounds
- Center for Advanced Genomic Technologies, Duke University, Durham, NC, USA
- Department of Biomedical Engineering, Duke University, Durham, NC, USA
| | - Tyler S Klann
- Center for Genomic and Computational Biology, Duke University, Durham, NC, USA
- Center for Advanced Genomic Technologies, Duke University, Durham, NC, USA
- Department of Biomedical Engineering, Duke University, Durham, NC, USA
| | - Maria Ter Weele
- Center for Advanced Genomic Technologies, Duke University, Durham, NC, USA
- Department of Biomedical Engineering, Duke University, Durham, NC, USA
| | - Julien Bryois
- Department of Medical Epidemiology and Biostatistics, Karolinska Institutet, Stockholm, Sweden
| | - Gregory E Crawford
- Center for Genomic and Computational Biology, Duke University, Durham, NC, USA
- Center for Advanced Genomic Technologies, Duke University, Durham, NC, USA
- Department of Pediatrics, Division of Medical Genetics, Duke University Medical Center, Durham, NC, USA
| | - Patrick F Sullivan
- Department of Medical Epidemiology and Biostatistics, Karolinska Institutet, Stockholm, Sweden
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
- Department of Psychiatry, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | - Charles A Gersbach
- Center for Genomic and Computational Biology, Duke University, Durham, NC, USA
- Center for Advanced Genomic Technologies, Duke University, Durham, NC, USA
- Department of Biomedical Engineering, Duke University, Durham, NC, USA
| | - Michael I Love
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | - Yun Li
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| |
Collapse
|
5
|
Wang HT, Xiao FH, Gao ZL, Guo LY, Yang LQ, Li GH, Kong QP. Methylation entropy landscape of Chinese long-lived individuals reveals lower epigenetic noise related to human healthy aging. Aging Cell 2024:e14163. [PMID: 38566438 DOI: 10.1111/acel.14163] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2023] [Revised: 03/12/2024] [Accepted: 03/15/2024] [Indexed: 04/04/2024] Open
Abstract
The transition from ordered to noisy is a significant epigenetic signature of aging and age-related disease. As a paradigm of healthy human aging and longevity, long-lived individuals (LLI, >90 years old) may possess characteristic strategies in coping with the disordered epigenetic regulation. In this study, we constructed high-resolution blood epigenetic noise landscapes for this cohort by a methylation entropy (ME) method using whole genome bisulfite sequencing (WGBS). Although a universal increase in global ME occurred with chronological age in general control samples, this trend was suppressed in LLIs. Importantly, we identified 38,923 genomic regions with LLI-specific lower ME (LLI-specific lower entropy regions, for short, LLI-specific LERs). These regions were overrepresented in promoters, which likely function in transcriptional noise suppression. Genes associated with LLI-specific LERs have a considerable impact on SNP-based heritability of some aging-related disorders (e.g., asthma and stroke). Furthermore, neutrophil was identified as the primary cell type sustaining LLI-specific LERs. Our results highlight the stability of epigenetic order in promoters of genes involved with aging and age-related disorders within LLI epigenomes. This unique epigenetic feature reveals a previously unknown role of epigenetic order maintenance in specific genomic regions of LLIs, which helps open a new avenue on the epigenetic regulation mechanism in human healthy aging and longevity.
Collapse
Affiliation(s)
- Hao-Tian Wang
- Key Laboratory of Genetic Evolution & Animal Models (Chinese Academy of Sciences), Key Laboratory of Healthy Aging Research of Yunnan Province, Kunming Key Laboratory of Healthy Aging Study, KIZ/CUHK Joint Laboratory of Bioresources and Molecular Research in Common Diseases, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming, China
| | - Fu-Hui Xiao
- Key Laboratory of Genetic Evolution & Animal Models (Chinese Academy of Sciences), Key Laboratory of Healthy Aging Research of Yunnan Province, Kunming Key Laboratory of Healthy Aging Study, KIZ/CUHK Joint Laboratory of Bioresources and Molecular Research in Common Diseases, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming, China
| | - Zong-Liang Gao
- Key Laboratory of Genetic Evolution & Animal Models (Chinese Academy of Sciences), Key Laboratory of Healthy Aging Research of Yunnan Province, Kunming Key Laboratory of Healthy Aging Study, KIZ/CUHK Joint Laboratory of Bioresources and Molecular Research in Common Diseases, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming, China
| | - Li-Yun Guo
- Key Laboratory of Genetic Evolution & Animal Models (Chinese Academy of Sciences), Key Laboratory of Healthy Aging Research of Yunnan Province, Kunming Key Laboratory of Healthy Aging Study, KIZ/CUHK Joint Laboratory of Bioresources and Molecular Research in Common Diseases, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming, China
| | - Li-Qin Yang
- Key Laboratory of Genetic Evolution & Animal Models (Chinese Academy of Sciences), Key Laboratory of Healthy Aging Research of Yunnan Province, Kunming Key Laboratory of Healthy Aging Study, KIZ/CUHK Joint Laboratory of Bioresources and Molecular Research in Common Diseases, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming, China
| | - Gong-Hua Li
- Key Laboratory of Genetic Evolution & Animal Models (Chinese Academy of Sciences), Key Laboratory of Healthy Aging Research of Yunnan Province, Kunming Key Laboratory of Healthy Aging Study, KIZ/CUHK Joint Laboratory of Bioresources and Molecular Research in Common Diseases, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming, China
| | - Qing-Peng Kong
- Key Laboratory of Genetic Evolution & Animal Models (Chinese Academy of Sciences), Key Laboratory of Healthy Aging Research of Yunnan Province, Kunming Key Laboratory of Healthy Aging Study, KIZ/CUHK Joint Laboratory of Bioresources and Molecular Research in Common Diseases, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming, China
- CAS Center for Excellence in Animal Evolution and Genetics, Chinese Academy of Sciences, Kunming, China
| |
Collapse
|
6
|
Shanley HT, Taki AC, Nguyen N, Wang T, Byrne JJ, Ang CS, Leeming MG, Williamson N, Chang BCH, Jabbar A, Sleebs BE, Gasser RB. Comparative structure activity and target exploration of 1,2-diphenylethynes in Haemonchus contortus and Caenorhabditis elegans. Int J Parasitol Drugs Drug Resist 2024; 25:100534. [PMID: 38554597 PMCID: PMC10992699 DOI: 10.1016/j.ijpddr.2024.100534] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2024] [Revised: 03/14/2024] [Accepted: 03/17/2024] [Indexed: 04/01/2024]
Abstract
Infections and diseases caused by parasitic nematodes have a major adverse impact on the health and productivity of animals and humans worldwide. The control of these parasites often relies heavily on the treatment with commercially available chemical compounds (anthelmintics). However, the excessive or uncontrolled use of these compounds in livestock animals has led to major challenges linked to drug resistance in nematodes. Therefore, there is a need to develop new anthelmintics with novel mechanism(s) of action. Recently, we identified a small molecule, designated UMW-9729, with nematocidal activity against the free-living model organism Caenorhabditis elegans. Here, we evaluated UMW-9729's potential as an anthelmintic in a structure-activity relationship (SAR) study in C. elegans and the highly pathogenic, blood-feeding Haemonchus contortus (barber's pole worm), and explored the compound-target relationship using thermal proteome profiling (TPP). First, we synthesised and tested 25 analogues of UMW-9729 for their nematocidal activity in both H. contortus (larvae and adults) and C. elegans (young adults), establishing a preliminary nematocidal pharmacophore for both species. We identified several compounds with marked activity against either H. contortus or C. elegans which had greater efficacy than UMW-9729, and found a significant divergence in compound bioactivity between these two nematode species. We also identified a UMW-9729 analogue, designated 25, that moderately inhibited the motility of adult female H. contortus in vitro. Subsequently, we inferred three H. contortus proteins (HCON_00134350, HCON_00021470 and HCON_00099760) and five C. elegans proteins (F30A10.9, F15B9.8, B0361.6, DNC-4 and UNC-11) that interacted directly with UMW-9729; however, no conserved protein target was shared between the two nematode species. Future work aims to extend the SAR investigation in these and other parasitic nematode species, and validate individual proteins identified here as possible targets of UMW-9729. Overall, the present study evaluates this anthelmintic candidate and highlights some challenges associated with early anthelmintic investigation.
Collapse
Affiliation(s)
- Harrison T Shanley
- Department of Veterinary Biosciences, Melbourne Veterinary School, Faculty of Science, The University of Melbourne, Parkville, Victoria, 3010, Australia; Walter and Eliza Hall Institute of Medical Research, Parkville, Victoria, 3052, Australia
| | - Aya C Taki
- Department of Veterinary Biosciences, Melbourne Veterinary School, Faculty of Science, The University of Melbourne, Parkville, Victoria, 3010, Australia
| | - Nghi Nguyen
- Walter and Eliza Hall Institute of Medical Research, Parkville, Victoria, 3052, Australia
| | - Tao Wang
- Department of Veterinary Biosciences, Melbourne Veterinary School, Faculty of Science, The University of Melbourne, Parkville, Victoria, 3010, Australia
| | - Joseph J Byrne
- Department of Veterinary Biosciences, Melbourne Veterinary School, Faculty of Science, The University of Melbourne, Parkville, Victoria, 3010, Australia
| | - Ching-Seng Ang
- Melbourne Mass Spectrometry and Proteomics Facility, The Bio21 Molecular Science and Biotechnology Institute, The University of Melbourne, Parkville, Victoria, 3010, Australia
| | - Michael G Leeming
- Melbourne Mass Spectrometry and Proteomics Facility, The Bio21 Molecular Science and Biotechnology Institute, The University of Melbourne, Parkville, Victoria, 3010, Australia
| | - Nicholas Williamson
- Melbourne Mass Spectrometry and Proteomics Facility, The Bio21 Molecular Science and Biotechnology Institute, The University of Melbourne, Parkville, Victoria, 3010, Australia
| | - Bill C H Chang
- Department of Veterinary Biosciences, Melbourne Veterinary School, Faculty of Science, The University of Melbourne, Parkville, Victoria, 3010, Australia
| | - Abdul Jabbar
- Department of Veterinary Biosciences, Melbourne Veterinary School, Faculty of Science, The University of Melbourne, Parkville, Victoria, 3010, Australia
| | - Brad E Sleebs
- Department of Veterinary Biosciences, Melbourne Veterinary School, Faculty of Science, The University of Melbourne, Parkville, Victoria, 3010, Australia; Walter and Eliza Hall Institute of Medical Research, Parkville, Victoria, 3052, Australia.
| | - Robin B Gasser
- Department of Veterinary Biosciences, Melbourne Veterinary School, Faculty of Science, The University of Melbourne, Parkville, Victoria, 3010, Australia.
| |
Collapse
|
7
|
Ye C, Wu Q, Chen S, Zhang X, Xu W, Wu Y, Zhang Y, Yue Y. ECDEP: identifying essential proteins based on evolutionary community discovery and subcellular localization. BMC Genomics 2024; 25:117. [PMID: 38279081 PMCID: PMC10821549 DOI: 10.1186/s12864-024-10019-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2023] [Accepted: 01/15/2024] [Indexed: 01/28/2024] Open
Abstract
BACKGROUND In cellular activities, essential proteins play a vital role and are instrumental in comprehending fundamental biological necessities and identifying pathogenic genes. Current deep learning approaches for predicting essential proteins underutilize the potential of gene expression data and are inadequate for the exploration of dynamic networks with limited evaluation across diverse species. RESULTS We introduce ECDEP, an essential protein identification model based on evolutionary community discovery. ECDEP integrates temporal gene expression data with a protein-protein interaction (PPI) network and employs the 3-Sigma rule to eliminate outliers at each time point, constructing a dynamic network. Next, we utilize edge birth and death information to establish an interaction streaming source to feed into the evolutionary community discovery algorithm and then identify overlapping communities during the evolution of the dynamic network. SVM recursive feature elimination (RFE) is applied to extract the most informative communities, which are combined with subcellular localization data for classification predictions. We assess the performance of ECDEP by comparing it against ten centrality methods, four shallow machine learning methods with RFE, and two deep learning methods that incorporate multiple biological data sources on Saccharomyces. Cerevisiae (S. cerevisiae), Homo sapiens (H. sapiens), Mus musculus, and Caenorhabditis elegans. ECDEP achieves an AP value of 0.86 on the H. sapiens dataset and the contribution ratio of community features in classification reaches 0.54 on the S. cerevisiae (Krogan) dataset. CONCLUSIONS Our proposed method adeptly integrates network dynamics and yields outstanding results across various datasets. Furthermore, the incorporation of evolutionary community discovery algorithms amplifies the capacity of gene expression data in classification.
Collapse
Affiliation(s)
- Chen Ye
- School of Information and Artificial Intelligence, Anhui Agricultural University, Hefei, Anhui, 230036, China
- Anhui Beidou Precision Agriculture Information Engineering Research Center, Anhui Agricultural University, Hefei, 230036, China
| | - Qi Wu
- School of Information and Artificial Intelligence, Anhui Agricultural University, Hefei, Anhui, 230036, China
- Anhui Beidou Precision Agriculture Information Engineering Research Center, Anhui Agricultural University, Hefei, 230036, China
| | - Shuxia Chen
- School of Information and Artificial Intelligence, Anhui Agricultural University, Hefei, Anhui, 230036, China
- Anhui Beidou Precision Agriculture Information Engineering Research Center, Anhui Agricultural University, Hefei, 230036, China
| | - Xuemei Zhang
- School of Information and Artificial Intelligence, Anhui Agricultural University, Hefei, Anhui, 230036, China
- Anhui Beidou Precision Agriculture Information Engineering Research Center, Anhui Agricultural University, Hefei, 230036, China
| | - Wenwen Xu
- School of Information and Artificial Intelligence, Anhui Agricultural University, Hefei, Anhui, 230036, China
- Anhui Beidou Precision Agriculture Information Engineering Research Center, Anhui Agricultural University, Hefei, 230036, China
| | - Yunzhi Wu
- School of Information and Artificial Intelligence, Anhui Agricultural University, Hefei, Anhui, 230036, China
- Anhui Beidou Precision Agriculture Information Engineering Research Center, Anhui Agricultural University, Hefei, 230036, China
| | - Youhua Zhang
- School of Information and Artificial Intelligence, Anhui Agricultural University, Hefei, Anhui, 230036, China
- Anhui Beidou Precision Agriculture Information Engineering Research Center, Anhui Agricultural University, Hefei, 230036, China
| | - Yi Yue
- School of Information and Artificial Intelligence, Anhui Agricultural University, Hefei, Anhui, 230036, China.
- Anhui Beidou Precision Agriculture Information Engineering Research Center, Anhui Agricultural University, Hefei, 230036, China.
| |
Collapse
|
8
|
Cacheiro P, Lawson S, Van den Veyver IB, Marengo G, Zocche D, Murray SA, Duyzend M, Robinson PN, Smedley D. Lethal phenotypes in Mendelian disorders. medRxiv 2024:2024.01.12.24301168. [PMID: 38260283 PMCID: PMC10802756 DOI: 10.1101/2024.01.12.24301168] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/24/2024]
Abstract
Essential genes are those whose function is required for cell proliferation and/or organism survival. A gene's intolerance to loss-of-function can be allocated within a spectrum, as opposed to being considered a binary feature, since this function might be essential at different stages of development, genetic backgrounds or other contexts. Existing resources that collect and characterise the essentiality status of genes are based on either proliferation assessment in human cell lines, embryonic and postnatal viability evaluation in different model organisms, and gene metrics such as intolerance to variation scores derived from human population sequencing studies. There are also several repositories available that document phenotypic annotations for rare disorders in humans such as the Online Mendelian Inheritance in Man (OMIM) and the Human Phenotype Ontology (HPO) knowledgebases. This raises the prospect of being able to use clinical data, including lethality as the most severe phenotypic manifestation, to further our characterisation of gene essentiality. Here we queried OMIM for terms related to lethality and classified all Mendelian genes into categories, according to the earliest age of death recorded for the associated disorders, from prenatal death to no reports of premature death. To showcase this curated catalogue of human essential genes, we developed the Lethal Phenotypes Portal (https://lethalphenotypes.research.its.qmul.ac.uk), where we also explore the relationships between these lethality categories, constraint metrics and viability in cell lines and mouse. Further analysis of the genes in these categories reveals differences in the mode of inheritance of the associated disorders, physiological systems affected and disease class. We highlight how the phenotypic similarity between genes in the same lethality category combined with gene family/group information can be used for novel disease gene discovery. Finally, we explore the overlaps and discrepancies between the lethal phenotypes observed in mouse and human and discuss potential explanations that include differences in transcriptional regulation, functional compensation and molecular disease mechanisms. We anticipate that this resource will aid clinicians in the diagnosis of early lethal conditions and assist researchers in investigating the properties that make these genes essential for human development.
Collapse
Affiliation(s)
- Pilar Cacheiro
- William Harvey Research Institute, Queen Mary University of London, London, UK
| | | | - Ignatia B. Van den Veyver
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, USA
- Department of Obstetrics and Gynecology, Baylor College of Medicine, Houston, TX, USA
| | - Gabriel Marengo
- William Harvey Research Institute, Queen Mary University of London, London, UK
| | - David Zocche
- North West Thames Regional Genetics Service, Northwick Park & St Mark’s Hospitals, London, UK
| | | | | | - Peter N. Robinson
- Berlin Institute of Health at Charité-Universitätsmedizin Berlin, Berlin, Germany
| | - Damian Smedley
- William Harvey Research Institute, Queen Mary University of London, London, UK
| |
Collapse
|
9
|
Giordano M, Falbo E, Maddalena L, Piccirillo M, Granata I. Untangling the Context-Specificity of Essential Genes by Means of Machine Learning: A Constructive Experience. Biomolecules 2023; 14:18. [PMID: 38254618 PMCID: PMC10813179 DOI: 10.3390/biom14010018] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2023] [Revised: 11/29/2023] [Accepted: 12/20/2023] [Indexed: 01/24/2024] Open
Abstract
Gene essentiality is a genetic concept crucial for a comprehensive understanding of life and evolution. In the last decade, many essential genes (EGs) have been determined using different experimental and computational approaches, and this information has been used to reduce the genomes of model organisms. A growing amount of evidence highlights that essentiality is a property that depends on the context. Because of their importance in vital biological processes, recognising context-specific EGs (csEGs) could help for identifying new potential pharmacological targets and to improve precision therapeutics. Since most of the computational procedures proposed to identify and predict EGs neglect their context-specificity, we focused on this aspect, providing a theoretical and experimental overview of the literature, data and computational methods dedicated to recognising csEGs. To this end, we adapted existing computational methods to exploit a specific context (the kidney tissue) and experimented with four different prediction methods using the labels provided by four different identification approaches. The considerations derived from the analysis of the obtained results, confirmed and validated also by further experiments for a different tissue context, provide the reader with guidance on exploiting existing tools for achieving csEGs identification and prediction.
Collapse
Affiliation(s)
- Maurizio Giordano
- Institute for High-Performance Computing and Networking (ICAR), National Research Council (CNR), V. Pietro Castellino 111, 80131 Naples, Italy; (E.F.); (L.M.); (M.P.); (I.G.)
| | | | | | | | | |
Collapse
|
10
|
Abstract
Epigenetic machinery contributes to gene regulation in eukaryotic species. However, the machinery including more than 600 epigenetic regulator (ER) genes responsible for reading, writing, and erasing histone modifications and DNA modifications remains largely uncharacterized across species. We compile a comprehensive list of ERs based on an evolutionary analysis across 23 species, which is the most comprehensive ER list in various species until recently. We further perform comparative transcriptomic analyses across different tissues in humans, mice, as well as other amniote species. We observe a consistent tissue-of-origin expression specificity pattern of duplicated ER genes across species and suggest links between expression specificity and ER gene evolution as well as ER function. Additional analyses further suggest that ER duplication can generate tissue-specific ER genes with the same epigenetic substrates, which may be closely related to their regulatory specificity in tissue development. Our work can serve as a foundation to better comprehend the tissue-specific expression patterns of ER genes from an evolutionary perspective and also the functional implications of ERs in tissue-specific epigenetic regulation.
Collapse
Affiliation(s)
- Jilu Wang
- Wenzhou Institute, University of Chinese Academy of Sciences, Wenzhou, Zhejiang, People's Republic of China
| | - Aiai Shi
- Wenzhou Institute, University of Chinese Academy of Sciences, Wenzhou, Zhejiang, People's Republic of China
| | - Jie Lyu
- Wenzhou Institute, University of Chinese Academy of Sciences, Wenzhou, Zhejiang, People's Republic of China.,Joint Centre of Translational Medicine, the First Affiliated Hospital of Wenzhou Medical University, Wenzhou, People's Republic of China.,Joint Centre of Translational Medicine, Wenzhou Institute, University of Chinese Academy of Sciences, Wenzhou, Zhejiang, People's Republic of China.,Oujiang Laboratory (Zhejiang Lab for Regenerative Medicine, Vision and Brain Health), Wenzhou, Zhejiang, People's Republic of China
| |
Collapse
|
11
|
Ma J, Song J, Young ND, Chang BCH, Korhonen PK, Campos TL, Liu H, Gasser RB. 'Bingo'-a large language model- and graph neural network-based workflow for the prediction of essential genes from protein data. Brief Bioinform 2023; 25:bbad472. [PMID: 38152979 PMCID: PMC10753293 DOI: 10.1093/bib/bbad472] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2023] [Revised: 10/22/2023] [Accepted: 11/28/2023] [Indexed: 12/29/2023] Open
Abstract
The identification and characterization of essential genes are central to our understanding of the core biological functions in eukaryotic organisms, and has important implications for the treatment of diseases caused by, for example, cancers and pathogens. Given the major constraints in testing the functions of genes of many organisms in the laboratory, due to the absence of in vitro cultures and/or gene perturbation assays for most metazoan species, there has been a need to develop in silico tools for the accurate prediction or inference of essential genes to underpin systems biological investigations. Major advances in machine learning approaches provide unprecedented opportunities to overcome these limitations and accelerate the discovery of essential genes on a genome-wide scale. Here, we developed and evaluated a large language model- and graph neural network (LLM-GNN)-based approach, called 'Bingo', to predict essential protein-coding genes in the metazoan model organisms Caenorhabditis elegans and Drosophila melanogaster as well as in Mus musculus and Homo sapiens (a HepG2 cell line) by integrating LLM and GNNs with adversarial training. Bingo predicts essential genes under two 'zero-shot' scenarios with transfer learning, showing promise to compensate for a lack of high-quality genomic and proteomic data for non-model organisms. In addition, the attention mechanisms and GNNExplainer were employed to manifest the functional sites and structural domain with most contribution to essentiality. In conclusion, Bingo provides the prospect of being able to accurately infer the essential genes of little- or under-studied organisms of interest, and provides a biological explanation for gene essentiality.
Collapse
Affiliation(s)
- Jiani Ma
- Department of Veterinary Biosciences, Melbourne Veterinary School, The University of Melbourne, Parkville, Victoria 3010, Australia
- School of Information and Control Engineering, China University of Mining and Technology, Xuzhou 221116, China
| | - Jiangning Song
- Department of Veterinary Biosciences, Melbourne Veterinary School, The University of Melbourne, Parkville, Victoria 3010, Australia
- Monash Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Melbourne, Victoria 3800, Australia
| | - Neil D Young
- Department of Veterinary Biosciences, Melbourne Veterinary School, The University of Melbourne, Parkville, Victoria 3010, Australia
| | - Bill C H Chang
- Department of Veterinary Biosciences, Melbourne Veterinary School, The University of Melbourne, Parkville, Victoria 3010, Australia
| | - Pasi K Korhonen
- Department of Veterinary Biosciences, Melbourne Veterinary School, The University of Melbourne, Parkville, Victoria 3010, Australia
| | - Tulio L Campos
- Department of Veterinary Biosciences, Melbourne Veterinary School, The University of Melbourne, Parkville, Victoria 3010, Australia
- Bioinformatics Core Facility, Instituto Aggeu Magalhaes, Fundaçao Oswaldo Cruz (IAM-Fiocruz), Recife, Pernambuco, Brazil
| | - Hui Liu
- School of Information and Control Engineering, China University of Mining and Technology, Xuzhou 221116, China
| | - Robin B Gasser
- Department of Veterinary Biosciences, Melbourne Veterinary School, The University of Melbourne, Parkville, Victoria 3010, Australia
| |
Collapse
|
12
|
Bay ÖF, Hayes KS, Schwartz JM, Grencis RK, Roberts IS. A genome-scale metabolic model of parasitic whipworm. Nat Commun 2023; 14:6937. [PMID: 37907472 PMCID: PMC10618284 DOI: 10.1038/s41467-023-42552-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2023] [Accepted: 10/13/2023] [Indexed: 11/02/2023] Open
Abstract
Genome-scale metabolic models are widely used to enhance our understanding of metabolic features of organisms, host-pathogen interactions and to identify therapeutics for diseases. Here we present iTMU798, the genome-scale metabolic model of the mouse whipworm Trichuris muris. The model demonstrates the metabolic features of T. muris and allows the prediction of metabolic steps essential for its survival. Specifically, that Thioredoxin Reductase (TrxR) enzyme is essential, a prediction we validate in vitro with the drug auranofin. Furthermore, our observation that the T. muris genome lacks gsr-1 encoding Glutathione Reductase (GR) but has GR activity that can be inhibited by auranofin indicates a mechanism for the reduction of glutathione by the TrxR enzyme in T. muris. In addition, iTMU798 predicts seven essential amino acids that cannot be synthesised by T. muris, a prediction we validate for the amino acid tryptophan. Overall, iTMU798 is as a powerful tool to study not only the T. muris metabolism but also other Trichuris spp. in understanding host parasite interactions and the rationale design of new intervention strategies.
Collapse
Affiliation(s)
- Ömer F Bay
- Division of Infection, Immunity and Respiratory Medicine, Faculty of Biology, Medicine and Health, University of Manchester, Manchester, UK
- Bioinformatics, Abdullah Gül University, Kayseri, Türkiye
- The Lydia Becker Institute of Immunology and Inflammation, Faculty of Biology, Medicine and Health, University of Manchester, Manchester, UK
| | - Kelly S Hayes
- Division of Infection, Immunity and Respiratory Medicine, Faculty of Biology, Medicine and Health, University of Manchester, Manchester, UK
- The Lydia Becker Institute of Immunology and Inflammation, Faculty of Biology, Medicine and Health, University of Manchester, Manchester, UK
- The Wellcome Trust Centre for Cell-Matrix Research, University of Manchester, Manchester, UK
| | - Jean-Marc Schwartz
- Division of Evolution, Infection and Genomics, Faculty of Biology, Medicine and Health, University of Manchester, Manchester, UK
| | - Richard K Grencis
- Division of Infection, Immunity and Respiratory Medicine, Faculty of Biology, Medicine and Health, University of Manchester, Manchester, UK.
- The Lydia Becker Institute of Immunology and Inflammation, Faculty of Biology, Medicine and Health, University of Manchester, Manchester, UK.
- The Wellcome Trust Centre for Cell-Matrix Research, University of Manchester, Manchester, UK.
| | - Ian S Roberts
- Division of Infection, Immunity and Respiratory Medicine, Faculty of Biology, Medicine and Health, University of Manchester, Manchester, UK.
- The Lydia Becker Institute of Immunology and Inflammation, Faculty of Biology, Medicine and Health, University of Manchester, Manchester, UK.
| |
Collapse
|
13
|
Abstract
Protein coding genes exhibit different degrees of intolerance to loss-of-function variation. The most intolerant genes, whose function is essential for cell or/and organism survival, inform on fundamental biological processes related to cell proliferation and organism development and provide a window on the molecular mechanisms of human disease. Here we present a brief overview of the resources and knowledge gathered around gene essentiality, from cancer cell lines to model organisms to human development. We outline the implications of using different sources of evidence and definitions to determine which genes are essential and highlight how information on the essentiality status of a gene can inform novel disease gene discovery and therapeutic target identification.
Collapse
Affiliation(s)
- Pilar Cacheiro
- William Harvey Research Institute, Queen Mary University of London, London, UK
| | - Damian Smedley
- William Harvey Research Institute, Queen Mary University of London, London, UK.
| |
Collapse
|
14
|
Zhou JB, Tang D, He L, Lin S, Lei JH, Sun H, Xu X, Deng CX. Machine learning model for anti-cancer drug combinations: Analysis, prediction, and validation. Pharmacol Res 2023; 194:106830. [PMID: 37343647 DOI: 10.1016/j.phrs.2023.106830] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 04/30/2023] [Revised: 06/10/2023] [Accepted: 06/17/2023] [Indexed: 06/23/2023]
Abstract
Drug combination therapy is a highly effective approach for enhancing the therapeutic efficacy of anti-cancer drugs and overcoming drug resistance. However, the innumerable possible drug combinations make it impractical to screen all synergistic drug pairs. Moreover, biological insights into synergistic drug pairs are still lacking. To address this challenge, we systematically analyzed drug combination datasets curated from multiple databases to identify drug pairs more likely to show synergy. We classified drug pairs based on their MoA and discovered that 110 MoA pairs were significantly enriched in synergy in at least one type of cancer. To improve the accuracy of predicting synergistic effects of drug pairs, we developed a suite of machine learning models that achieve better predictive performance. Unlike most previous methods that were rarely validated by wet-lab experiments, our models were validated using two-dimensional cell lines and three-dimensional tumor slice culture (3D-TSC) models, implying their practical utility. Our prediction and validation results indicated that the combination of the RTK inhibitors Lapatinib and Pazopanib exhibited a strong therapeutic effect in breast cancer by blocking the downstream PI3K/AKT/mTOR signaling pathway. Furthermore, we incorporated molecular features to identify potential biomarkers for synergistic drug pairs, and almost all potential biomarkers found connections between drug targets and corresponding molecular features using protein-protein interaction network. Overall, this study provides valuable insights to complement and guide rational efforts to develop drug combination treatments.
Collapse
Affiliation(s)
- Jing-Bo Zhou
- Cancer Center, Faculty of Health Sciences, University of Macau, Macau SAR, China; Centre for Precision Medicine Research and Training, Faculty of Health Sciences, University of Macau, Macau SAR, China
| | - Dongyang Tang
- Cancer Center, Faculty of Health Sciences, University of Macau, Macau SAR, China; Centre for Precision Medicine Research and Training, Faculty of Health Sciences, University of Macau, Macau SAR, China
| | - Lin He
- Cancer Center, Faculty of Health Sciences, University of Macau, Macau SAR, China; Centre for Precision Medicine Research and Training, Faculty of Health Sciences, University of Macau, Macau SAR, China
| | - Shiqi Lin
- Cancer Center, Faculty of Health Sciences, University of Macau, Macau SAR, China; Centre for Precision Medicine Research and Training, Faculty of Health Sciences, University of Macau, Macau SAR, China
| | - Josh Haipeng Lei
- Cancer Center, Faculty of Health Sciences, University of Macau, Macau SAR, China; Centre for Precision Medicine Research and Training, Faculty of Health Sciences, University of Macau, Macau SAR, China
| | - Heng Sun
- Cancer Center, Faculty of Health Sciences, University of Macau, Macau SAR, China; Centre for Precision Medicine Research and Training, Faculty of Health Sciences, University of Macau, Macau SAR, China
| | - Xiaoling Xu
- Cancer Center, Faculty of Health Sciences, University of Macau, Macau SAR, China; Centre for Precision Medicine Research and Training, Faculty of Health Sciences, University of Macau, Macau SAR, China; MOE Frontier Science Center for Precision Oncology, University of Macau, Macau SAR, China
| | - Chu-Xia Deng
- Cancer Center, Faculty of Health Sciences, University of Macau, Macau SAR, China; Centre for Precision Medicine Research and Training, Faculty of Health Sciences, University of Macau, Macau SAR, China; MOE Frontier Science Center for Precision Oncology, University of Macau, Macau SAR, China.
| |
Collapse
|
15
|
Cesur MF, Basile A, Patil KR, Çakır T. A new metabolic model of Drosophila melanogaster and the integrative analysis of Parkinson's disease. Life Sci Alliance 2023; 6:e202201695. [PMID: 37236669 PMCID: PMC10215973 DOI: 10.26508/lsa.202201695] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/27/2022] [Revised: 05/11/2023] [Accepted: 05/12/2023] [Indexed: 05/28/2023] Open
Abstract
High conservation of the disease-associated genes between flies and humans facilitates the common use of Drosophila melanogaster to study metabolic disorders under controlled laboratory conditions. However, metabolic modeling studies are highly limited for this organism. We here report a comprehensively curated genome-scale metabolic network model of Drosophila using an orthology-based approach. The gene coverage and metabolic information of the draft model derived from a reference human model were expanded via Drosophila-specific KEGG and MetaCyc databases, with several curation steps to avoid metabolic redundancy and stoichiometric inconsistency. Furthermore, we performed literature-based curations to improve gene-reaction associations, subcellular metabolite locations, and various metabolic pathways. The performance of the resulting Drosophila model (8,230 reactions, 6,990 metabolites, and 2,388 genes), iDrosophila1 (https://github.com/SysBioGTU/iDrosophila), was assessed using flux balance analysis in comparison with the other currently available fly models leading to superior or comparable results. We also evaluated the transcriptome-based prediction capacity of iDrosophila1, where differential metabolic pathways during Parkinson's disease could be successfully elucidated. Overall, iDrosophila1 is promising to investigate system-level metabolic alterations in response to genetic and environmental perturbations.
Collapse
Affiliation(s)
- Müberra Fatma Cesur
- Systems Biology and Bioinformatics Program, Department of Bioengineering, Gebze Technical University, Kocaeli, Turkey
| | - Arianna Basile
- Medical Research Council Toxicology Unit, University of Cambridge, Cambridge, UK
| | - Kiran Raosaheb Patil
- Medical Research Council Toxicology Unit, University of Cambridge, Cambridge, UK
| | - Tunahan Çakır
- Systems Biology and Bioinformatics Program, Department of Bioengineering, Gebze Technical University, Kocaeli, Turkey
| |
Collapse
|
16
|
Itai T, Jia P, Dai Y, Chen J, Chen X, Zhao Z. De novo mutations disturb early brain development more frequently than common variants in schizophrenia. Am J Med Genet B Neuropsychiatr Genet 2023; 192:62-70. [PMID: 36863698 DOI: 10.1002/ajmg.b.32932] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/13/2022] [Revised: 12/08/2022] [Accepted: 01/29/2023] [Indexed: 03/04/2023]
Abstract
Investigating functional, temporal, and cell-type expression features of mutations is important for understanding a complex disease. Here, we collected and analyzed common variants and de novo mutations (DNMs) in schizophrenia (SCZ). We collected 2,636 missense and loss-of-function (LoF) DNMs in 2,263 genes across 3,477 SCZ patients (SCZ-DNMs). We curated three gene lists: (a) SCZ-neuroGenes (159 genes), which are intolerant to LoF and missense DNMs and are neurologically important, (b) SCZ-moduleGenes (52 genes), which were derived from network analyses of SCZ-DNMs, and (c) SCZ-commonGenes (120 genes) from a recent GWAS as reference. To compare temporal gene expression, we used the BrainSpan dataset. We defined a fetal effect score (FES) to quantify the involvement of each gene in prenatal brain development. We further employed the specificity indexes (SIs) to evaluate cell-type expression specificity from single-cell expression data in cerebral cortices of humans and mice. Compared with SCZ-commonGenes, SCZ-neuroGenes and SCZ-moduleGenes were highly expressed in the prenatal stage, had higher FESs, and had higher SIs in fetal replicating cells and undifferentiated cell types. Our results suggested that gene expression patterns in specific cell types in early fetal stages might have impacts on the risk of SCZ during adulthood.
Collapse
Affiliation(s)
- Toshiyuki Itai
- Center for Precision Health, School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, Texas, USA
| | - Peilin Jia
- Center for Precision Health, School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, Texas, USA
| | - Yulin Dai
- Center for Precision Health, School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, Texas, USA
| | - Jingchun Chen
- Nevada Institute of Personalized Medicine, University of Nevada Las Vegas, Las Vegas, Nevada, USA
| | - Xiangning Chen
- Center for Precision Health, School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, Texas, USA
| | - Zhongming Zhao
- Center for Precision Health, School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, Texas, USA.,Human Genetics Center, School of Public Health, The University of Texas Health Science Center at Houston, Houston, Texas, USA.,Faillace Department of Psychiatry and Behavioral Sciences, McGovern Medical School, The University of Texas Health Science Center at Houston, Houston, Texas, USA
| |
Collapse
|
17
|
Gorlov IP, Conway K, Edmiston SN, Parrish EA, Hao H, Amos CI, Tsavachidis S, Gorlova OY, Begg C, Hernando E, Cheng C, Shen R, Orlow I, Luo L, Ernstoff MS, Kuan PF, Ollila DW, Tsai YS, Berwick M, Thomas NE. Methylation of nonessential genes in cutaneous melanoma - Rule Out hypothesis. Melanoma Res 2023; 33:163-172. [PMID: 36805567 PMCID: PMC10148896 DOI: 10.1097/cmr.0000000000000881] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/23/2023]
Abstract
Differential methylation plays an important role in melanoma development and is associated with survival, progression and response to treatment. However, the mechanisms by which methylation promotes melanoma development are poorly understood. The traditional explanation of selective advantage provided by differential methylation postulates that hypermethylation of regulatory 5'-cytosine-phosphate-guanine-3' dinucleotides (CpGs) downregulates the expression of tumor suppressor genes and therefore promotes tumorigenesis. We believe that other (not necessarily alternative) explanations of the selective advantages of methylation are also possible. Here, we hypothesize that melanoma cells use methylation to shut down transcription of nonessential genes - those not required for cell survival and proliferation. Suppression of nonessential genes allows tumor cells to be more efficient in terms of energy and resource usage, providing them with a selective advantage over the tumor cells that transcribe and subsequently translate genes they do not need. We named the hypothesis the Rule Out (RO) hypothesis. The RO hypothesis predicts higher methylation of CpGs located in regulatory regions (CpG islands) of nonessential genes. It also predicts the higher methylation of regulatory CpGs linked to nonessential genes in melanomas compared to nevi and lower expression of nonessential genes in malignant (derived from melanoma) versus normal (derived from nonaffected skin) melanocytes. The analyses conducted using in-house and publicly available data found that all predictions derived from the RO hypothesis hold, providing observational support for the hypothesis.
Collapse
Affiliation(s)
- Ivan P Gorlov
- Department of Medicine, Baylor College of Medicine, Houston, Texas
| | - Kathleen Conway
- Department of Dermatology, University of North Carolina
- Department of Epidemiology
- Lineberger Comprehensive Cancer Center, The University of North Carolina at Chapel Hill, Chapel Hill, North Carolina
| | - Sharon N Edmiston
- Department of Dermatology, University of North Carolina
- Department of Epidemiology
- Lineberger Comprehensive Cancer Center, The University of North Carolina at Chapel Hill, Chapel Hill, North Carolina
| | - Eloise A Parrish
- Department of Applied Mathematics and Statistics, State University of New York, Stony Brook
| | - Honglin Hao
- Department of Dermatology, University of North Carolina
| | | | | | - Olga Y Gorlova
- Department of Medicine, Baylor College of Medicine, Houston, Texas
| | - Colin Begg
- Department of Epidemiology and Biostatistics, Memorial Sloan Kettering Cancer Center, New York
| | - Eva Hernando
- Department of Pathology, New York University School of Medicine, New York
| | - Chao Cheng
- Department of Medicine, Baylor College of Medicine, Houston, Texas
| | - Ronglai Shen
- Department of Epidemiology and Biostatistics, Memorial Sloan Kettering Cancer Center, New York
| | - Irene Orlow
- Department of Epidemiology and Biostatistics, Memorial Sloan Kettering Cancer Center, New York
| | - Li Luo
- Department of Internal Medicine, University of New Mexico, Albuquerque, New Maxico
| | - Marc S Ernstoff
- Roswell Park Comprehensive Cancer Center, Elm and Carlton, Buffalo
| | - Pei Fen Kuan
- Department of Applied Mathematics and Statistics, State University of New York, Stony Brook and
| | - David W Ollila
- Lineberger Comprehensive Cancer Center, The University of North Carolina at Chapel Hill, Chapel Hill, North Carolina
- Department of Surgery, University of North Carolina, Chapel Hill, North Carolina, USA
| | - Yihsuan S Tsai
- Lineberger Comprehensive Cancer Center, The University of North Carolina at Chapel Hill, Chapel Hill, North Carolina
| | - Marianne Berwick
- Department of Internal Medicine, University of New Mexico, Albuquerque, New Maxico
| | - Nancy E Thomas
- Department of Dermatology, University of North Carolina
- Lineberger Comprehensive Cancer Center, The University of North Carolina at Chapel Hill, Chapel Hill, North Carolina
| |
Collapse
|
18
|
Mugwanda K, Hamese S, Van Zyl WF, Prinsloo E, Du Plessis M, Dicks LMT, Thimiri Govinda Raj DB. Recent advances in genetic tools for engineering probiotic lactic acid bacteria. Biosci Rep 2023; 43. [PMID: 36597861 DOI: 10.1042/BSR20211299] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2022] [Revised: 12/19/2022] [Accepted: 01/03/2023] [Indexed: 01/05/2023] Open
Abstract
Synthetic biology has grown exponentially in the last few years, with a variety of biological applications. One of the emerging applications of synthetic biology is to exploit the link between microorganisms, biologics, and human health. To exploit this link, it is critical to select effective synthetic biology tools for use in appropriate microorganisms that would address unmet needs in human health through the development of new game-changing applications and by complementing existing technological capabilities. Lactic acid bacteria (LAB) are considered appropriate chassis organisms that can be genetically engineered for therapeutic and industrial applications. Here, we have reviewed comprehensively various synthetic biology techniques for engineering probiotic LAB strains, such as clustered regularly interspaced short palindromic repeats (CRISPR)/Cas9 mediated genome editing, homologous recombination, and recombineering. In addition, we also discussed heterologous protein expression systems used in engineering probiotic LAB. By combining computational biology with genetic engineering, there is a lot of potential to develop next-generation synthetic LAB with capabilities to address bottlenecks in industrial scale-up and complex biologics production. Recently, we started working on Lactochassis project where we aim to develop next generation synthetic LAB for biomedical application.
Collapse
|
19
|
Liao W, Nie W, Ahmad I, Chen G, Zhu B. The occurrence, characteristics, and adaptation of A-to-I RNA editing in bacteria: A review. Front Microbiol 2023; 14:1143929. [PMID: 36960293 PMCID: PMC10027721 DOI: 10.3389/fmicb.2023.1143929] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/13/2023] [Accepted: 02/15/2023] [Indexed: 03/09/2023] Open
Abstract
A-to-I RNA editing is a very important post-transcriptional modification or co-transcriptional modification that creates isoforms and increases the diversity of proteins. In this process, adenosine (A) in RNA molecules is hydrolyzed and deaminated into inosine (I). It is well known that ADAR (adenosine deaminase acting on RNA)-dependent A-to-I mRNA editing is widespread in animals. Next, the discovery of A-to-I mRNA editing was mediated by TadA (tRNA-specific adenosine deaminase) in Escherichia coli which is ADAR-independent event. Previously, the editing event S128P on the flagellar structural protein FliC enhanced the bacterial tolerance to oxidative stress in Xoc. In addition, the editing events T408A on the enterobactin iron receptor protein XfeA act as switches by controlling the uptake of Fe3+ in response to the concentration of iron in the environment. Even though bacteria have fewer editing events, the great majority of those that are currently preserved have adaptive benefits. Interestingly, it was found that a TadA-independent A-to-I RNA editing event T408A occurred on xfeA, indicating that there may be other new enzymes that perform a function like TadA. Here, we review recent advances in the characteristics, functions, and adaptations of editing in bacteria.
Collapse
Affiliation(s)
- Weixue Liao
- Shanghai Yangtze River Delta Eco-Environmental Change and Management Observation and Research Station, Shanghai Cooperative Innovation Center for Modern Seed Industry, School of Agriculture and Biology, Shanghai Jiao Tong University, Shanghai, China
| | - Wenhan Nie
- Shanghai Yangtze River Delta Eco-Environmental Change and Management Observation and Research Station, Shanghai Cooperative Innovation Center for Modern Seed Industry, School of Agriculture and Biology, Shanghai Jiao Tong University, Shanghai, China
- *Correspondence: Wenhan Nie,
| | - Iftikhar Ahmad
- Shanghai Yangtze River Delta Eco-Environmental Change and Management Observation and Research Station, Shanghai Cooperative Innovation Center for Modern Seed Industry, School of Agriculture and Biology, Shanghai Jiao Tong University, Shanghai, China
- Department of Environmental Sciences, COMSATS University Islamabad, Vehari Campus, Vehari, Pakistan
| | - Gongyou Chen
- Shanghai Yangtze River Delta Eco-Environmental Change and Management Observation and Research Station, Shanghai Cooperative Innovation Center for Modern Seed Industry, School of Agriculture and Biology, Shanghai Jiao Tong University, Shanghai, China
| | - Bo Zhu
- Shanghai Yangtze River Delta Eco-Environmental Change and Management Observation and Research Station, Shanghai Cooperative Innovation Center for Modern Seed Industry, School of Agriculture and Biology, Shanghai Jiao Tong University, Shanghai, China
- Bo Zhu,
| |
Collapse
|
20
|
Manzo M, Giordano M, Maddalena L, Guarracino MR, Granata I. Novel Data Science Methodologies for Essential Genes Identification Based on Network Analysis. Studies in Computational Intelligence 2023:117-145. [DOI: 10.1007/978-3-031-24453-7_7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/02/2023]
|
21
|
Möller S, Saul N, Projahn E, Barrantes I, Gézsi A, Walter M, Antal P, Fuellen G. Gene co-expression analyses of health(span) across multiple species. NAR Genom Bioinform 2022; 4:lqac083. [PMID: 36458022 PMCID: PMC9706456 DOI: 10.1093/nargab/lqac083] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/02/2021] [Revised: 08/20/2022] [Accepted: 10/31/2022] [Indexed: 12/03/2022] Open
Abstract
Health(span)-related gene clusters/modules were recently identified based on knowledge about the cross-species genetic basis of health, to interpret transcriptomic datasets describing health-related interventions. However, the cross-species comparison of health-related observations reveals a lot of heterogeneity, not least due to widely varying health(span) definitions and study designs, posing a challenge for the exploration of conserved healthspan modules and, specifically, their transfer across species. To improve the identification and exploration of conserved/transferable healthspan modules, here we apply an established workflow based on gene co-expression network analyses employing GEO/ArrayExpress data for human and animal models, and perform a comprehensive meta-study of the resulting modules related to health(span), yielding a small set of literature backed health(span) candidate genes. For each experiment, WGCNA (weighted gene correlation network analysis) was used to infer modules of genes which correlate in their expression with a 'health phenotype score' and to determine the most-connected (hub) genes (and their interactions) for each such module. After mapping these hub genes to their human orthologs, 12 health(span) genes were identified in at least two species (ACTN3, ANK1, MRPL18, MYL1, PAXIP1, PPP1CA, SCN3B, SDCBP, SKIV2L, TUBG1, TYROBP, WIPF1), for which enrichment analysis by g:profiler found an association with actin filament-based movement and associated organelles, as well as muscular structures. We conclude that a meta-study of hub genes from co-expression network analyses for the complex phenotype health(span), across multiple species, can yield molecular-mechanistic insights and can direct experimentalists to further investigate the contribution of individual genes and their interactions to health(span).
Collapse
Affiliation(s)
- Steffen Möller
- To whom correspondence should be addressed. Tel: +49 381 494 7361; Fax: +49 381 494 7203;
| | - Nadine Saul
- Humboldt-University of Berlin, Institute of Biology, Berlin, Germany
| | - Elias Projahn
- Rostock University Medical Center, Institute for Biostatistics and Informatics in Medicine and Ageing Research, Rostock, Germany
| | - Israel Barrantes
- Rostock University Medical Center, Institute for Biostatistics and Informatics in Medicine and Ageing Research, Rostock, Germany
| | - András Gézsi
- Budapest University of Technology and Economics, Department of Measurement and Information Systems, Budapest, Hungary
| | - Michael Walter
- Rostock University Medical Center, Institute for Clinical Chemistry and Laboratory Medicine, Rostock, Germany
| | - Péter Antal
- Budapest University of Technology and Economics, Department of Measurement and Information Systems, Budapest, Hungary
| | - Georg Fuellen
- Rostock University Medical Center, Institute for Biostatistics and Informatics in Medicine and Ageing Research, Rostock, Germany
| |
Collapse
|
22
|
Banik A, Podder S, Saha S, Chatterjee P, Halder AK, Nasipuri M, Basu S, Plewczynski D. Rule-Based Pruning and In Silico Identification of Essential Proteins in Yeast PPIN. Cells 2022; 11:2648. [PMID: 36078056 PMCID: PMC9454873 DOI: 10.3390/cells11172648] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2022] [Revised: 08/18/2022] [Accepted: 08/22/2022] [Indexed: 11/25/2022] Open
Abstract
Proteins are vital for the significant cellular activities of living organisms. However, not all of them are essential. Identifying essential proteins through different biological experiments is relatively more laborious and time-consuming than the computational approaches used in recent times. However, practical implementation of conventional scientific methods sometimes becomes challenging due to poor performance impact in specific scenarios. Thus, more developed and efficient computational prediction models are required for essential protein identification. An effective methodology is proposed in this research, capable of predicting essential proteins in a refined yeast protein–protein interaction network (PPIN). The rule-based refinement is done using protein complex and local interaction density information derived from the neighborhood properties of proteins in the network. Identification and pruning of non-essential proteins are equally crucial here. In the initial phase, careful assessment is performed by applying node and edge weights to identify and discard the non-essential proteins from the interaction network. Three cut-off levels are considered for each node and edge weight for pruning the non-essential proteins. Once the PPIN has been filtered out, the second phase starts with two centralities-based approaches: (1) local interaction density (LID) and (2) local interaction density with protein complex (LIDC), which are successively implemented to identify the essential proteins in the yeast PPIN. Our proposed methodology achieves better performance in comparison to the existing state-of-the-art techniques.
Collapse
|
23
|
Yue Y, Ye C, Peng PY, Zhai HX, Ahmad I, Xia C, Wu YZ, Zhang YH. A deep learning framework for identifying essential proteins based on multiple biological information. BMC Bioinformatics 2022; 23:318. [PMID: 35927611 PMCID: PMC9351218 DOI: 10.1186/s12859-022-04868-8] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2022] [Accepted: 07/29/2022] [Indexed: 11/15/2022] Open
Abstract
Background Essential Proteins are demonstrated to exert vital functions on cellular processes and are indispensable for the survival and reproduction of the organism. Traditional centrality methods perform poorly on complex protein–protein interaction (PPI) networks. Machine learning approaches based on high-throughput data lack the exploitation of the temporal and spatial dimensions of biological information. Results We put forward a deep learning framework to predict essential proteins by integrating features obtained from the PPI network, subcellular localization, and gene expression profiles. In our model, the node2vec method is applied to learn continuous feature representations for proteins in the PPI network, which capture the diversity of connectivity patterns in the network. The concept of depthwise separable convolution is employed on gene expression profiles to extract properties and observe the trends of gene expression over time under different experimental conditions. Subcellular localization information is mapped into a long one-dimensional vector to capture its characteristics. Additionally, we use a sampling method to mitigate the impact of imbalanced learning when training the model. With experiments carried out on the data of Saccharomyces cerevisiae, results show that our model outperforms traditional centrality methods and machine learning methods. Likewise, the comparative experiments have manifested that our process of various biological information is preferable. Conclusions Our proposed deep learning framework effectively identifies essential proteins by integrating multiple biological data, proving a broader selection of subcellular localization information significantly improves the results of prediction and depthwise separable convolution implemented on gene expression profiles enhances the performance.
Collapse
Affiliation(s)
- Yi Yue
- Anhui Provincial Engineering Laboratory for Beidou Precision Agriculture Information, Anhui Agricultural University, Hefei, 230036, China. .,School of Information and Computer, Anhui Agricultural University, Hefei, 230036, China. .,School of Life Sciences, Anhui Agricultural University, Hefei, 230036, China. .,State Key Laboratory of Tea Plant Biology and Utilization, Anhui Agricultural University, Hefei, 230036, China.
| | - Chen Ye
- Anhui Provincial Engineering Laboratory for Beidou Precision Agriculture Information, Anhui Agricultural University, Hefei, 230036, China.,School of Information and Computer, Anhui Agricultural University, Hefei, 230036, China
| | - Pei-Yun Peng
- Anhui Provincial Engineering Laboratory for Beidou Precision Agriculture Information, Anhui Agricultural University, Hefei, 230036, China.,School of Information and Computer, Anhui Agricultural University, Hefei, 230036, China
| | - Hui-Xin Zhai
- Anhui Provincial Engineering Laboratory for Beidou Precision Agriculture Information, Anhui Agricultural University, Hefei, 230036, China.,School of Information and Computer, Anhui Agricultural University, Hefei, 230036, China
| | - Iftikhar Ahmad
- Anhui Provincial Engineering Laboratory for Beidou Precision Agriculture Information, Anhui Agricultural University, Hefei, 230036, China.,School of Information and Computer, Anhui Agricultural University, Hefei, 230036, China
| | - Chuan Xia
- Anhui Provincial Engineering Laboratory for Beidou Precision Agriculture Information, Anhui Agricultural University, Hefei, 230036, China.,School of Information and Computer, Anhui Agricultural University, Hefei, 230036, China
| | - Yun-Zhi Wu
- Anhui Provincial Engineering Laboratory for Beidou Precision Agriculture Information, Anhui Agricultural University, Hefei, 230036, China.,School of Information and Computer, Anhui Agricultural University, Hefei, 230036, China.,State Key Laboratory of Tea Plant Biology and Utilization, Anhui Agricultural University, Hefei, 230036, China
| | - You-Hua Zhang
- Anhui Provincial Engineering Laboratory for Beidou Precision Agriculture Information, Anhui Agricultural University, Hefei, 230036, China. .,School of Information and Computer, Anhui Agricultural University, Hefei, 230036, China. .,School of Life Sciences, Anhui Agricultural University, Hefei, 230036, China.
| |
Collapse
|
24
|
Trastulla L, Noorbakhsh J, Vazquez F, McFarland J, Iorio F. Computational estimation of quality and clinical relevance of cancer cell lines. Mol Syst Biol 2022; 18:e11017. [PMID: 35822563 PMCID: PMC9277610 DOI: 10.15252/msb.202211017] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2022] [Revised: 06/10/2022] [Accepted: 06/13/2022] [Indexed: 12/12/2022] Open
Abstract
Immortal cancer cell lines (CCLs) are the most widely used system for investigating cancer biology and for the preclinical development of oncology therapies. Pharmacogenomic and genome‐wide editing screenings have facilitated the discovery of clinically relevant gene–drug interactions and novel therapeutic targets via large panels of extensively characterised CCLs. However, tailoring pharmacological strategies in a precision medicine context requires bridging the existing gaps between tumours and in vitro models. Indeed, intrinsic limitations of CCLs such as misidentification, the absence of tumour microenvironment and genetic drift have highlighted the need to identify the most faithful CCLs for each primary tumour while addressing their heterogeneity, with the development of new models where necessary. Here, we discuss the most significant limitations of CCLs in representing patient features, and we review computational methods aiming at systematically evaluating the suitability of CCLs as tumour proxies and identifying the best patient representative in vitro models. Additionally, we provide an overview of the applications of these methods to more complex models and discuss future machine‐learning‐based directions that could resolve some of the arising discrepancies.
Collapse
Affiliation(s)
- Lucia Trastulla
- Human Technopole, Milano, Italy.,Open Targets, Cambridge, UK
| | | | - Francisca Vazquez
- Broad Institute of MIT and Harvard, Cambridge, MA, USA.,Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA, USA
| | | | - Francesco Iorio
- Human Technopole, Milano, Italy.,Open Targets, Cambridge, UK
| |
Collapse
|
25
|
Zhang Y, Zhang W, Xin X, Du P. dbEssLnc: A manually curated database of human and mouse essential lncRNA genes. Comput Struct Biotechnol J 2022. [PMID: 35685362 PMCID: PMC9162909 DOI: 10.1016/j.csbj.2022.05.043] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2022] [Revised: 05/20/2022] [Accepted: 05/21/2022] [Indexed: 02/07/2023] Open
Abstract
Long non-coding RNAs (lncRNAs) play important roles in many biological processes. Knocking out or knocking down some lncRNAs will lead to lethality or infertility. These lncRNAs are called essential lncRNAs. Knowledges of essential lncRNAs are important in establishing minimal genomes of living cells, developing drug therapies and early diagnostic approaches for complex diseases. However, existing databases focus on collecting essential coding genes. Essential non-coding gene records are rare in existing databases. A comprehensive collection of essential non-coding genes, particularly essential lncRNA genes, is demanded. We manually curated 207 essential lncRNAs from literatures for establishing a database on essential lncRNAs, which is named as dbEssLnc (Database of essential lncRNAs). The dbEssLnc database has a web-based user-friendly interface for the users to browse, to search, to visualize and to blast search records in the database. The dbEssLnc database is freely accessible at https://esslnc.pufengdu.org. All data and source codes for mirroring the dbEssLnc database have been deposited in GitHub (https://github.com/yyZhang14/dbEssLnc).
Collapse
|
26
|
Hütter CVR, Sin C, Müller F, Menche J. Network cartographs for interpretable visualizations. Nat Comput Sci 2022; 2:84-89. [PMID: 38177513 PMCID: PMC10766564 DOI: 10.1038/s43588-022-00199-z] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/11/2021] [Accepted: 01/20/2022] [Indexed: 01/06/2024]
Abstract
Networks offer an intuitive visual representation of complex systems. Important network characteristics can often be recognized by eye and, in turn, patterns that stand out visually often have a meaningful interpretation. In conventional network layout algorithms, however, the precise determinants of a node's position within a layout are difficult to decipher and to control. Here we propose an approach for directly encoding arbitrary structural or functional network characteristics into node positions. We introduce a series of two- and three-dimensional layouts, benchmark their efficiency for model networks, and demonstrate their power for elucidating structure-to-function relationships in large-scale biological networks.
Collapse
Affiliation(s)
- Christiane V R Hütter
- Department of Structural and Computational Biology, Max Perutz Labs, University of Vienna, Vienna, Austria
- CeMM Research Center for Molecular Medicine of the Austrian Academy of Sciences, Vienna, Austria
- Vienna BioCenter PhD Program, a Doctoral School of the University of Vienna and the Medical University of Vienna, Vienna, Austria
| | - Celine Sin
- Department of Structural and Computational Biology, Max Perutz Labs, University of Vienna, Vienna, Austria
- CeMM Research Center for Molecular Medicine of the Austrian Academy of Sciences, Vienna, Austria
| | - Felix Müller
- Department of Structural and Computational Biology, Max Perutz Labs, University of Vienna, Vienna, Austria
- CeMM Research Center for Molecular Medicine of the Austrian Academy of Sciences, Vienna, Austria
| | - Jörg Menche
- Department of Structural and Computational Biology, Max Perutz Labs, University of Vienna, Vienna, Austria.
- CeMM Research Center for Molecular Medicine of the Austrian Academy of Sciences, Vienna, Austria.
- Faculty of Mathematics, University of Vienna, Vienna, Austria.
| |
Collapse
|
27
|
Soubise B, Jiang Y, Douet-Guilbert N, Troadec MB. RBM22, a Key Player of Pre-mRNA Splicing and Gene Expression Regulation, Is Altered in Cancer. Cancers (Basel) 2022; 14:cancers14030643. [PMID: 35158909 PMCID: PMC8833553 DOI: 10.3390/cancers14030643] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/14/2021] [Revised: 01/19/2022] [Accepted: 01/22/2022] [Indexed: 01/05/2023] Open
Abstract
RNA-Binding Proteins (RBP) are very diverse and cover a large number of functions in the cells. This review focuses on RBM22, a gene encoding an RBP and belonging to the RNA-Binding Motif (RBM) family of genes. RBM22 presents a Zinc Finger like and a Zinc Finger domain, an RNA-Recognition Motif (RRM), and a Proline-Rich domain with a general structure suggesting a fusion of two yeast genes during evolution: Cwc2 and Ecm2. RBM22 is mainly involved in pre-mRNA splicing, playing the essential role of maintaining the conformation of the catalytic core of the spliceosome and acting as a bridge between the catalytic core and other essential protein components of the spliceosome. RBM22 is also involved in gene regulation, and is able to bind DNA, acting as a bona fide transcription factor on a large number of target genes. Undoubtedly due to its wide scope in the regulation of gene expression, RBM22 has been associated with several pathologies and, notably, with the aggressiveness of cancer cells and with the phenotype of a myelodysplastic syndrome. Mutations, enforced expression level, and haploinsufficiency of RBM22 gene are observed in those diseases. RBM22 could represent a potential therapeutic target in specific diseases, and, notably, in cancer.
Collapse
Affiliation(s)
- Benoît Soubise
- Université de Brest, Inserm, EFS, UMR 1078, GGB, F-29200 Brest, France; (B.S.); (Y.J.); (N.D.-G.)
| | - Yan Jiang
- Université de Brest, Inserm, EFS, UMR 1078, GGB, F-29200 Brest, France; (B.S.); (Y.J.); (N.D.-G.)
- Department of Hematology, The First Hospital of Jilin University, Changchun 130021, China
| | - Nathalie Douet-Guilbert
- Université de Brest, Inserm, EFS, UMR 1078, GGB, F-29200 Brest, France; (B.S.); (Y.J.); (N.D.-G.)
- CHRU Brest, Service de Génétique, Laboratoire de Génétique Chromosomique, F-29200 Brest, France
| | - Marie-Bérengère Troadec
- Université de Brest, Inserm, EFS, UMR 1078, GGB, F-29200 Brest, France; (B.S.); (Y.J.); (N.D.-G.)
- CHRU Brest, Service de Génétique, Laboratoire de Génétique Chromosomique, F-29200 Brest, France
- Correspondence: ; Tel.: +33-2-98-01-64-55
| |
Collapse
|
28
|
Krauze AV, Camphausen K. Molecular Biology in Treatment Decision Processes-Neuro-Oncology Edition. Int J Mol Sci 2021; 22:13278. [PMID: 34948075 PMCID: PMC8703419 DOI: 10.3390/ijms222413278] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2021] [Revised: 12/02/2021] [Accepted: 12/03/2021] [Indexed: 11/30/2022] Open
Abstract
Computational approaches including machine learning, deep learning, and artificial intelligence are growing in importance in all medical specialties as large data repositories are increasingly being optimised. Radiation oncology as a discipline is at the forefront of large-scale data acquisition and well positioned towards both the production and analysis of large-scale oncologic data with the potential for clinically driven endpoints and advancement of patient outcomes. Neuro-oncology is comprised of malignancies that often carry poor prognosis and significant neurological sequelae. The analysis of radiation therapy mediated treatment and the potential for computationally mediated analyses may lead to more precise therapy by employing large scale data. We analysed the state of the literature pertaining to large scale data, computational analysis, and the advancement of molecular biomarkers in neuro-oncology with emphasis on radiation oncology. We aimed to connect existing and evolving approaches to realistic avenues for clinical implementation focusing on low grade gliomas (LGG), high grade gliomas (HGG), management of the elderly patient with HGG, rare central nervous system tumors, craniospinal irradiation, and re-irradiation to examine how computational analysis and molecular science may synergistically drive advances in personalised radiation therapy (RT) and optimise patient outcomes.
Collapse
Affiliation(s)
- Andra V. Krauze
- Radiation Oncology Branch, Center for Cancer Research, National Cancer Institute, NIH, 9000 Rockville Pike, Building 10, Bethesda, MD 20892, USA;
| | | |
Collapse
|
29
|
Beder T, Aromolaran O, Dönitz J, Tapanelli S, Adedeji E, Adebiyi E, Bucher G, Koenig R. Identifying essential genes across eukaryotes by machine learning. NAR Genom Bioinform 2021; 3:lqab110. [PMID: 34859210 PMCID: PMC8634067 DOI: 10.1093/nargab/lqab110] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/21/2021] [Revised: 10/09/2021] [Accepted: 11/29/2021] [Indexed: 02/07/2023] Open
Abstract
Identifying essential genes on a genome scale is resource intensive and has been performed for only a few eukaryotes. For less studied organisms essentiality might be predicted by gene homology. However, this approach cannot be applied to non-conserved genes. Additionally, divergent essentiality information is obtained from studying single cells or whole, multi-cellular organisms, and particularly when derived from human cell line screens and human population studies. We employed machine learning across six model eukaryotes and 60 381 genes, using 41 635 features derived from the sequence, gene function information and network topology. Within a leave-one-organism-out cross-validation, the classifiers showed high generalizability with an average accuracy close to 80% in the left-out species. As a case study, we applied the method to Tribolium castaneum and Bombyx mori and validated predictions experimentally yielding similar performances. Finally, using the classifier based on the studied model organisms enabled linking the essentiality information of human cell line screens and population studies.
Collapse
Affiliation(s)
- Thomas Beder
- Integrated Research and Treatment Center, Center for Sepsis Control and Care (CSCC), Jena University Hospital, Am Klinikum 1, 07747 Jena, Germany
- Institute of Infectious Diseases and Infection Control, Jena University Hospital, Am Klinikum 1, 07747 Jena, Germany
- Department of Internal Medicine II, University Medical Center Schleswig-Holstein, Campus Kiel, 24105 Kiel, Germany
| | - Olufemi Aromolaran
- Department of Computer & Information Sciences, Covenant University, Ota, Ogun State, Nigeria
- Covenant University Bioinformatics Research (CUBRe), Covenant University, Ota, Ogun State, Nigeria
| | - Jürgen Dönitz
- Department of Evolutionary Developmental Genetics, GZMB, University of Göttingen, Justus-von-Liebig-Weg 11, 37077 Göttingen, Germany
- Department of Medical Bioinformatics, University Medical Center Göttingen (UMG), 37099 Göttingen, Germany
| | - Sofia Tapanelli
- Department of Life Sciences, Imperial College London, London SW7 2AZ, UK
| | - Eunice O Adedeji
- Covenant University Bioinformatics Research (CUBRe), Covenant University, Ota, Ogun State, Nigeria
- Department of Biochemistry, Covenant University, Ota, Ogun State, Nigeria
| | - Ezekiel Adebiyi
- Department of Computer & Information Sciences, Covenant University, Ota, Ogun State, Nigeria
- Covenant University Bioinformatics Research (CUBRe), Covenant University, Ota, Ogun State, Nigeria
| | - Gregor Bucher
- Department of Evolutionary Developmental Genetics, GZMB, University of Göttingen, Justus-von-Liebig-Weg 11, 37077 Göttingen, Germany
| | - Rainer Koenig
- Integrated Research and Treatment Center, Center for Sepsis Control and Care (CSCC), Jena University Hospital, Am Klinikum 1, 07747 Jena, Germany
- Institute of Infectious Diseases and Infection Control, Jena University Hospital, Am Klinikum 1, 07747 Jena, Germany
| |
Collapse
|
30
|
Abstract
The translation of mRNAs that contain a premature termination codon (PTC) generates truncated proteins that may have toxic dominant negative effects. Nonsense-mediated decay (NMD) is an mRNA surveillance pathway that degrades PTC-containing mRNAs to limit the production of truncated proteins. NMD activation requires a ribosome terminating translation at a PTC, but what happens to the polypeptides synthesized during the translation cycle needed to activate NMD is incompletely understood. Here, by establishing reporter systems that encode the same polypeptide sequence before a normal termination codon or PTC, we show that termination of protein synthesis at a PTC is sufficient to selectively destabilize polypeptides in mammalian cells. Proteasome inhibition specifically rescues the levels of nascent polypeptides produced from PTC-containing mRNAs within an hour, but also disrupts mRNA homeostasis within a few hours. PTC-terminated polypeptide destabilization is also alleviated by depleting the central NMD factor UPF1 or SMG1, the kinase that phosphorylates UPF1 to activate NMD, but not by inhibiting SMG1 kinase activity. Our results suggest that polypeptide degradation is linked to PTC recognition in mammalian cells and clarify a framework to investigate these mechanisms.
Collapse
Affiliation(s)
- Vincent Chu
- Department of Cell Biology, Harvard Medical School, Blavatnik Institute, Boston, MA 02115.,Department of Chemistry and Chemical Biology, Harvard University, Cambridge, MA 02138
| | - Qing Feng
- Department of Cell Biology, Harvard Medical School, Blavatnik Institute, Boston, MA 02115
| | - Yang Lim
- Department of Biological Chemistry and Molecular Pharmacology, Harvard Medical School, Blavatnik Institute, Boston, MA 02115
| | - Sichen Shao
- Department of Cell Biology, Harvard Medical School, Blavatnik Institute, Boston, MA 02115
| |
Collapse
|
31
|
Ringwald M, Richardson JE, Baldarelli RM, Blake JA, Kadin JA, Smith C, Bult CJ. Mouse Genome Informatics (MGI): latest news from MGD and GXD. Mamm Genome 2021; 33:4-18. [PMID: 34698891 PMCID: PMC8913530 DOI: 10.1007/s00335-021-09921-0] [Citation(s) in RCA: 19] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2021] [Accepted: 09/21/2021] [Indexed: 12/01/2022]
Abstract
The Mouse Genome Informatics (MGI) database system combines multiple expertly curated community data resources into a shared knowledge management ecosystem united by common metadata annotation standards. MGI's mission is to facilitate the use of the mouse as an experimental model for understanding the genetic and genomic basis of human health and disease. MGI is the authoritative source for mouse gene, allele, and strain nomenclature and is the primary source of mouse phenotype annotations, functional annotations, developmental gene expression information, and annotations of mouse models with human diseases. MGI maintains mouse anatomy and phenotype ontologies and contributes to the development of the Gene Ontology and Disease Ontology and uses these ontologies as standard terminologies for annotation. The Mouse Genome Database (MGD) and the Gene Expression Database (GXD) are MGI's two major knowledgebases. Here, we highlight some of the recent changes and enhancements to MGD and GXD that have been implemented in response to changing needs of the biomedical research community and to improve the efficiency of expert curation. MGI can be accessed freely at http://www.informatics.jax.org .
Collapse
|
32
|
Campos TL, Korhonen PK, Hofmann A, Gasser RB, Young ND. Harnessing model organism genomics to underpin the machine learning-based prediction of essential genes in eukaryotes - Biotechnological implications. Biotechnol Adv 2021; 54:107822. [PMID: 34461202 DOI: 10.1016/j.biotechadv.2021.107822] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/09/2021] [Revised: 08/17/2021] [Accepted: 08/24/2021] [Indexed: 12/17/2022]
Abstract
The availability of high-quality genomes and advances in functional genomics have enabled large-scale studies of essential genes in model eukaryotes, including the 'elegant worm' (Caenorhabditis elegans; Nematoda) and the 'vinegar fly' (Drosophila melanogaster; Arthropoda). However, this is not the case for other, much less-studied organisms, such as socioeconomically important parasites, for which functional genomic platforms usually do not exist. Thus, there is a need to develop innovative techniques or approaches for the prediction, identification and investigation of essential genes. A key approach that could enable the prediction of such genes is machine learning (ML). Here, we undertake an historical review of experimental and computational approaches employed for the characterisation of essential genes in eukaryotes, with a particular focus on model ecdysozoans (C. elegans and D. melanogaster), and discuss the possible applicability of ML-approaches to organisms such as socioeconomically important parasites. We highlight some recent results showing that high-performance ML, combined with feature engineering, allows a reliable prediction of essential genes from extensive, publicly available 'omic data sets, with major potential to prioritise such genes (with statistical confidence) for subsequent functional genomic validation. These findings could 'open the door' to fundamental and applied research areas. Evidence of some commonality in the essential gene-complement between these two organisms indicates that an ML-engineering approach could find broader applicability to ecdysozoans such as parasitic nematodes or arthropods, provided that suitably large and informative data sets become/are available for proper feature engineering, and for the robust training and validation of algorithms. This area warrants detailed exploration to, for example, facilitate the identification and characterisation of essential molecules as novel targets for drugs and vaccines against parasitic diseases. This focus is particularly important, given the substantial impact that such diseases have worldwide, and the current challenges associated with their prevention and control and with drug resistance in parasite populations.
Collapse
Affiliation(s)
- Tulio L Campos
- Department of Veterinary Biosciences, Melbourne Veterinary School, The University of Melbourne, Parkville, Victoria 3010, Australia; Bioinformatics Core Facility, Instituto Aggeu Magalhães, Fundação Oswaldo Cruz (IAM-Fiocruz), Recife, Pernambuco, Brazil
| | - Pasi K Korhonen
- Department of Veterinary Biosciences, Melbourne Veterinary School, The University of Melbourne, Parkville, Victoria 3010, Australia
| | - Andreas Hofmann
- Department of Veterinary Biosciences, Melbourne Veterinary School, The University of Melbourne, Parkville, Victoria 3010, Australia
| | - Robin B Gasser
- Department of Veterinary Biosciences, Melbourne Veterinary School, The University of Melbourne, Parkville, Victoria 3010, Australia.
| | - Neil D Young
- Department of Veterinary Biosciences, Melbourne Veterinary School, The University of Melbourne, Parkville, Victoria 3010, Australia.
| |
Collapse
|
33
|
Zahra NUA, Jamil F, Uddin R. Protein Integrated Network Analysis to Reveal Potential Drug Targets Against Extended Drug-Resistant Mycobacterium tuberculosis XDR1219. Mol Biotechnol 2021. [PMID: 34382159 DOI: 10.1007/s12033-021-00377-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2021] [Accepted: 07/30/2021] [Indexed: 10/20/2022]
Abstract
The reconstruction and analysis of the protein-protein interaction (PPI) network is a powerful approach to understand the complex biological and molecular functions in normal and disease states of the cell. The interactome of most organisms is largely unidentified except some model organisms. The current study focused on the construction of PPI network for the human pathogen Mycobacterium tuberculosis (MTB)-resistant strain XDR1219 using computational methods. In this work, a bioinformatics approach was employed to reveal potential drug targets. The pipeline adopted the combination of an extensive integrated network analysis that led to identify 22 key proteins involved in drug resistance, resistant metabolic pathways, virulence, pathogenesis and persistency of the infection. The MTB XDR1219 interactome consists of 11,383 non-redundant PPIs among 1499 proteins covering 38% of the entire MTB XDR1219 proteome. The overall quality of the network was assessed and topological parameters of the PPI were calculated. The predicted interactions were functionally annotated and their relevance was assessed with the functional similarity. The study attempts to present the interactome of previously unidentified MTB XDR1219 and revealed potential drug targets that can be further explored by scientific community.
Collapse
|
34
|
Xu D, Lyon S, Bu CH, Hildebrand S, Choi JH, Zhong X, Liu A, Turer EE, Zhang Z, Russell J, Ludwig S, Mahrt E, Nair-Gill E, Shi H, Wang Y, Zhang D, Yue T, Wang KW, SoRelle JA, Su L, Misawa T, McAlpine W, Sun L, Wang J, Zhan X, Choi M, Farokhnia R, Sakla A, Schneider S, Coco H, Coolbaugh G, Hayse B, Mazal S, Medler D, Nguyen B, Rodriguez E, Wadley A, Tang M, Li X, Anderton P, Keller K, Press A, Scott L, Quan J, Cooper S, Collie T, Qin B, Cardin J, Simpson R, Tadesse M, Sun Q, Wise CA, Rios JJ, Moresco EMY, Beutler B. Thousands of induced germline mutations affecting immune cells identified by automated meiotic mapping coupled with machine learning. Proc Natl Acad Sci U S A 2021; 118:e2106786118. [PMID: 34260399 PMCID: PMC8285956 DOI: 10.1073/pnas.2106786118] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/05/2023] Open
Abstract
Forward genetic studies use meiotic mapping to adduce evidence that a particular mutation, normally induced by a germline mutagen, is causative of a particular phenotype. Particularly in small pedigrees, cosegregation of multiple mutations, occasional unawareness of mutations, and paucity of homozygotes may lead to erroneous declarations of cause and effect. We sought to improve the identification of mutations causing immune phenotypes in mice by creating Candidate Explorer (CE), a machine-learning software program that integrates 67 features of genetic mapping data into a single numeric score, mathematically convertible to the probability of verification of any putative mutation-phenotype association. At this time, CE has evaluated putative mutation-phenotype associations arising from screening damaging mutations in ∼55% of mouse genes for effects on flow cytometry measurements of immune cells in the blood. CE has therefore identified more than half of genes within which mutations can be causative of flow cytometric phenovariation in Mus musculus The majority of these genes were not previously known to support immune function or homeostasis. Mouse geneticists will find CE data informative in identifying causative mutations within quantitative trait loci, while clinical geneticists may use CE to help connect causative variants with rare heritable diseases of immunity, even in the absence of linkage information. CE displays integrated mutation, phenotype, and linkage data, and is freely available for query online.
Collapse
Affiliation(s)
- Darui Xu
- Center for the Genetics of Host Defense, University of Texas Southwestern Medical Center, Dallas, TX 75390
| | - Stephen Lyon
- Center for the Genetics of Host Defense, University of Texas Southwestern Medical Center, Dallas, TX 75390
| | - Chun Hui Bu
- Center for the Genetics of Host Defense, University of Texas Southwestern Medical Center, Dallas, TX 75390
| | - Sara Hildebrand
- Center for the Genetics of Host Defense, University of Texas Southwestern Medical Center, Dallas, TX 75390
| | - Jin Huk Choi
- Center for the Genetics of Host Defense, University of Texas Southwestern Medical Center, Dallas, TX 75390
- Department of Immunology, University of Texas Southwestern Medical Center, Dallas, TX 75390
| | - Xue Zhong
- Center for the Genetics of Host Defense, University of Texas Southwestern Medical Center, Dallas, TX 75390
| | - Aijie Liu
- Center for the Genetics of Host Defense, University of Texas Southwestern Medical Center, Dallas, TX 75390
| | - Emre E Turer
- Center for the Genetics of Host Defense, University of Texas Southwestern Medical Center, Dallas, TX 75390
- Department of Internal Medicine, Division of Gastroenterology, University of Texas Southwestern Medical Center, Dallas, TX 75390
| | - Zhao Zhang
- Center for the Genetics of Host Defense, University of Texas Southwestern Medical Center, Dallas, TX 75390
| | - Jamie Russell
- Center for the Genetics of Host Defense, University of Texas Southwestern Medical Center, Dallas, TX 75390
| | - Sara Ludwig
- Center for the Genetics of Host Defense, University of Texas Southwestern Medical Center, Dallas, TX 75390
| | - Elena Mahrt
- Center for the Genetics of Host Defense, University of Texas Southwestern Medical Center, Dallas, TX 75390
| | - Evan Nair-Gill
- Center for the Genetics of Host Defense, University of Texas Southwestern Medical Center, Dallas, TX 75390
| | - Hexin Shi
- Center for the Genetics of Host Defense, University of Texas Southwestern Medical Center, Dallas, TX 75390
| | - Ying Wang
- Center for the Genetics of Host Defense, University of Texas Southwestern Medical Center, Dallas, TX 75390
| | - Duanwu Zhang
- Center for the Genetics of Host Defense, University of Texas Southwestern Medical Center, Dallas, TX 75390
| | - Tao Yue
- Center for the Genetics of Host Defense, University of Texas Southwestern Medical Center, Dallas, TX 75390
| | - Kuan-Wen Wang
- Center for the Genetics of Host Defense, University of Texas Southwestern Medical Center, Dallas, TX 75390
| | - Jeffrey A SoRelle
- Center for the Genetics of Host Defense, University of Texas Southwestern Medical Center, Dallas, TX 75390
| | - Lijing Su
- Center for the Genetics of Host Defense, University of Texas Southwestern Medical Center, Dallas, TX 75390
| | - Takuma Misawa
- Center for the Genetics of Host Defense, University of Texas Southwestern Medical Center, Dallas, TX 75390
| | - William McAlpine
- Center for the Genetics of Host Defense, University of Texas Southwestern Medical Center, Dallas, TX 75390
| | - Lei Sun
- Center for the Genetics of Host Defense, University of Texas Southwestern Medical Center, Dallas, TX 75390
| | - Jianhui Wang
- Center for the Genetics of Host Defense, University of Texas Southwestern Medical Center, Dallas, TX 75390
| | - Xiaoming Zhan
- Center for the Genetics of Host Defense, University of Texas Southwestern Medical Center, Dallas, TX 75390
| | - Mihwa Choi
- Center for the Genetics of Host Defense, University of Texas Southwestern Medical Center, Dallas, TX 75390
| | - Roxana Farokhnia
- Center for the Genetics of Host Defense, University of Texas Southwestern Medical Center, Dallas, TX 75390
| | - Andrew Sakla
- Center for the Genetics of Host Defense, University of Texas Southwestern Medical Center, Dallas, TX 75390
| | - Sara Schneider
- Center for the Genetics of Host Defense, University of Texas Southwestern Medical Center, Dallas, TX 75390
| | - Hannah Coco
- Center for the Genetics of Host Defense, University of Texas Southwestern Medical Center, Dallas, TX 75390
| | - Gabrielle Coolbaugh
- Center for the Genetics of Host Defense, University of Texas Southwestern Medical Center, Dallas, TX 75390
| | - Braden Hayse
- Center for the Genetics of Host Defense, University of Texas Southwestern Medical Center, Dallas, TX 75390
| | - Sara Mazal
- Center for the Genetics of Host Defense, University of Texas Southwestern Medical Center, Dallas, TX 75390
| | - Dawson Medler
- Center for the Genetics of Host Defense, University of Texas Southwestern Medical Center, Dallas, TX 75390
| | - Brandon Nguyen
- Center for the Genetics of Host Defense, University of Texas Southwestern Medical Center, Dallas, TX 75390
| | - Edward Rodriguez
- Center for the Genetics of Host Defense, University of Texas Southwestern Medical Center, Dallas, TX 75390
| | - Andrew Wadley
- Center for the Genetics of Host Defense, University of Texas Southwestern Medical Center, Dallas, TX 75390
| | - Miao Tang
- Center for the Genetics of Host Defense, University of Texas Southwestern Medical Center, Dallas, TX 75390
| | - Xiaohong Li
- Center for the Genetics of Host Defense, University of Texas Southwestern Medical Center, Dallas, TX 75390
| | - Priscilla Anderton
- Center for the Genetics of Host Defense, University of Texas Southwestern Medical Center, Dallas, TX 75390
| | - Katie Keller
- Center for the Genetics of Host Defense, University of Texas Southwestern Medical Center, Dallas, TX 75390
| | - Amanda Press
- Center for the Genetics of Host Defense, University of Texas Southwestern Medical Center, Dallas, TX 75390
| | - Lindsay Scott
- Center for the Genetics of Host Defense, University of Texas Southwestern Medical Center, Dallas, TX 75390
| | - Jiexia Quan
- Center for the Genetics of Host Defense, University of Texas Southwestern Medical Center, Dallas, TX 75390
| | - Sydney Cooper
- Center for the Genetics of Host Defense, University of Texas Southwestern Medical Center, Dallas, TX 75390
| | - Tiffany Collie
- Center for the Genetics of Host Defense, University of Texas Southwestern Medical Center, Dallas, TX 75390
| | - Baifang Qin
- Center for the Genetics of Host Defense, University of Texas Southwestern Medical Center, Dallas, TX 75390
| | - Jennifer Cardin
- Center for the Genetics of Host Defense, University of Texas Southwestern Medical Center, Dallas, TX 75390
| | - Rochelle Simpson
- Center for the Genetics of Host Defense, University of Texas Southwestern Medical Center, Dallas, TX 75390
| | - Meron Tadesse
- Center for the Genetics of Host Defense, University of Texas Southwestern Medical Center, Dallas, TX 75390
| | - Qihua Sun
- Center for the Genetics of Host Defense, University of Texas Southwestern Medical Center, Dallas, TX 75390
| | - Carol A Wise
- Center for Pediatric Bone Biology and Translational Research, Scottish Rite for Children, Dallas, TX 75219
- McDermott Center for Human Growth & Development, University of Texas Southwestern Medical Center, Dallas, TX 75390
- Department of Orthopaedic Surgery, University of Texas Southwestern Medical Center, Dallas, TX 75390
- Department of Pediatrics, University of Texas Southwestern Medical Center, Dallas, TX 75390
| | - Jonathan J Rios
- Center for Pediatric Bone Biology and Translational Research, Scottish Rite for Children, Dallas, TX 75219
- McDermott Center for Human Growth & Development, University of Texas Southwestern Medical Center, Dallas, TX 75390
- Department of Orthopaedic Surgery, University of Texas Southwestern Medical Center, Dallas, TX 75390
- Department of Pediatrics, University of Texas Southwestern Medical Center, Dallas, TX 75390
| | - Eva Marie Y Moresco
- Center for the Genetics of Host Defense, University of Texas Southwestern Medical Center, Dallas, TX 75390
| | - Bruce Beutler
- Center for the Genetics of Host Defense, University of Texas Southwestern Medical Center, Dallas, TX 75390;
| |
Collapse
|
35
|
Daniels MW, Dvorkin D, Powers RK, Kechris K. Semi-Supervised Learning Using Hierarchical Mixture Models: Gene Essentiality Case Study. MCA 2021; 26:40. [DOI: 10.3390/mca26020040] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/26/2022]
Abstract
Integrating gene-level data is useful for predicting the role of genes in biological processes. This problem has typically focused on supervised classification, which requires large training sets of positive and negative examples. However, training data sets that are too small for supervised approaches can still provide valuable information. We describe a hierarchical mixture model that uses limited positively labeled gene training data for semi-supervised learning. We focus on the problem of predicting essential genes, where a gene is required for the survival of an organism under particular conditions. We applied cross-validation and found that the inclusion of positively labeled samples in a semi-supervised learning framework with the hierarchical mixture model improves the detection of essential genes compared to unsupervised, supervised, and other semi-supervised approaches. There was also improved prediction performance when genes are incorrectly assumed to be non-essential. Our comparisons indicate that the incorporation of even small amounts of existing knowledge improves the accuracy of prediction and decreases variability in predictions. Although we focused on gene essentiality, the hierarchical mixture model and semi-supervised framework is standard for problems focused on prediction of genes or other features, with multiple data types characterizing the feature, and a small set of positive labels.
Collapse
|
36
|
Galperin MY, Wolf YI, Garushyants SK, Vera Alvarez R, Koonin EV. Non-essential ribosomal proteins in bacteria and archaea identified using COGs. J Bacteriol 2021; 203:JB. [PMID: 33753464 DOI: 10.1128/JB.00058-21] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/05/2023] Open
Abstract
Ribosomal proteins (RPs) are highly conserved across the bacterial and archaeal domains. Although many RPs are essential for survival, genome analysis demonstrates the absence of some RP genes in many bacterial and archaeal genomes. Furthermore, global transposon mutagenesis and/or targeted deletion showed that elimination of some RP genes had only a moderate effect on the bacterial growth rate. Here, we systematically analyze the evolutionary conservation of RPs in prokaryotes by compiling the list of the ribosomal genes that are missing from one or more genomes in the recently updated version of the Clusters of Orthologous Genes (COG) database. Some of these absences occurred because the respective genes carried frameshifts, presumably, resulting from sequencing errors, while others were overlooked and not translated during genome annotation. Apart from these annotation errors, we identified multiple genuine losses of RP genes in a variety of bacteria and archaea. Some of these losses are clade-specific, whereas others occur in symbionts and parasites with dramatically reduced genomes. The lists of computationally and experimentally defined non-essential ribosomal genes show a substantial overlap, revealing a common trend in prokaryote ribosome evolution that could be linked to the architecture and assembly of the ribosomes. Thus, RPs that are located at the surface of the ribosome and/or are incorporated at a late stage of ribosome assembly are more likely to be non-essential and to be lost during microbial evolution, particularly, in the course of genome compaction.IMPORTANCEIn many prokaryote genomes, one or more ribosomal protein (RP) genes are missing. Analysis of 1,309 prokaryote genomes included in the COG database shows that only about half of the RPs are universally conserved in bacteria and archaea. In contrast, up to 16 other RPs are missing in some genomes, primarily, tiny (<1 Mb) genomes of host-associated bacteria and archaea. Ten universal and nine archaea-specific ribosomal proteins show clear patterns of lineage-specific gene loss. Most of the RPs that are frequently lost from bacterial genomes are located on the ribosome periphery and are non-essential in Escherichia coli and Bacillus subtilis These results reveal general trends and common constraints in the architecture and evolution of ribosomes in prokaryotes.
Collapse
|