1
|
Sun L, Zhang M, Xie L, Xu X, Xu P, Xu L. Computational prediction of Lee retention indices of polycyclic aromatic hydrocarbons by using machine learning. Chem Biol Drug Des 2023; 101:380-394. [PMID: 36102275 DOI: 10.1111/cbdd.14137] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2022] [Revised: 08/15/2022] [Accepted: 08/28/2022] [Indexed: 01/14/2023]
Abstract
Given the difficult of experimental determination, quantitative structure-property relationship (QSPR) and deep learning (DL) provide an important tool to predict physicochemical property of chemical compounds. In this paper, partial least squares (PLS), genetic function approximation (GFA), and deep neural network (DNN) were used to predict the Lee retention index (Lee-RI) of PAHs in SE-52 and DB-5 stationary phases. Four molecular descriptors, molecular weight (MW), quantitative estimate of drug-likeness (QED), atomic charge weighted negative surface area (Jurs_PNSA_3), and relative negative charge (Jurs_RNCG) were selected to construct regression models based on genetic algorithm. For SE-52, PLS model showed best prediction power, followed by DNN and GFA. The relative error (RE), root mean square error (RMSE), and regression coefficient (R2 ) of best PLS regression model are 1.228%, 5.407, and 0.980. For DB-5, DNN model showed best prediction power, followed by GFA and PLS. The RE, RMSE and R2 of best DNN regression model for DB-5-1 and DB-5-2 are 1.058%, 4.325%, 0.976%, 0.821%, 3.795%, and 0.970%, respectively. The three regression models not only show good predictive ability, but also highlight the stability and ductility of the models.
Collapse
Affiliation(s)
- Linkang Sun
- Institute of Bioinformatics and Medical Engineering, School of Electrical and Information Engineering, Jiangsu University of Technology, Changzhou, China
| | - Min Zhang
- School of Computer Engineering, Jiangsu University of Technology, Changzhou, China
| | - Liangxu Xie
- Institute of Bioinformatics and Medical Engineering, School of Electrical and Information Engineering, Jiangsu University of Technology, Changzhou, China
| | - Xiaojun Xu
- Institute of Bioinformatics and Medical Engineering, School of Electrical and Information Engineering, Jiangsu University of Technology, Changzhou, China
| | - Peng Xu
- Department of Orthopedics, Second Military Medical University Affiliated Changzheng Hospital, Shanghai, China
| | - Lei Xu
- Institute of Bioinformatics and Medical Engineering, School of Electrical and Information Engineering, Jiangsu University of Technology, Changzhou, China
| |
Collapse
|
3
|
Pourhaghighi R, Ash PEA, Phanse S, Goebels F, Hu LZM, Chen S, Zhang Y, Wierbowski SD, Boudeau S, Moutaoufik MT, Malty RH, Malolepsza E, Tsafou K, Nathan A, Cromar G, Guo H, Abdullatif AA, Apicco DJ, Becker LA, Gitler AD, Pulst SM, Youssef A, Hekman R, Havugimana PC, White CA, Blum BC, Ratti A, Bryant CD, Parkinson J, Lage K, Babu M, Yu H, Bader GD, Wolozin B, Emili A. BraInMap Elucidates the Macromolecular Connectivity Landscape of Mammalian Brain. Cell Syst 2020; 10:333-350.e14. [PMID: 32325033 PMCID: PMC7938770 DOI: 10.1016/j.cels.2020.03.003] [Citation(s) in RCA: 34] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/13/2018] [Revised: 11/25/2019] [Accepted: 03/20/2020] [Indexed: 12/12/2022]
Abstract
Connectivity webs mediate the unique biology of the mammalian brain. Yet, while cell circuit maps are increasingly available, knowledge of their underlying molecular networks remains limited. Here, we applied multi-dimensional biochemical fractionation with mass spectrometry and machine learning to survey endogenous macromolecules across the adult mouse brain. We defined a global "interactome" comprising over one thousand multi-protein complexes. These include hundreds of brain-selective assemblies that have distinct physical and functional attributes, show regional and cell-type specificity, and have links to core neurological processes and disorders. Using reciprocal pull-downs and a transgenic model, we validated a putative 28-member RNA-binding protein complex associated with amyotrophic lateral sclerosis, suggesting a coordinated function in alternative splicing in disease progression. This brain interaction map (BraInMap) resource facilitates mechanistic exploration of the unique molecular machinery driving core cellular processes of the central nervous system. It is publicly available and can be explored here https://www.bu.edu/dbin/cnsb/mousebrain/.
Collapse
Affiliation(s)
- Reza Pourhaghighi
- Donnelly Center for Cellular and Biomolecular Research, University of Toronto, Toronto, ON, Canada
| | - Peter E A Ash
- Department of Pharmacology and Experimental Therapeutics, Boston University School of Medicine, Boston, MA, USA
| | - Sadhna Phanse
- Donnelly Center for Cellular and Biomolecular Research, University of Toronto, Toronto, ON, Canada; Department of Biochemistry, University of Regina, Regina, SK, Canada; Center for Network Systems Biology, Boston University, Boston, MA, USA
| | - Florian Goebels
- Donnelly Center for Cellular and Biomolecular Research, University of Toronto, Toronto, ON, Canada
| | - Lucas Z M Hu
- Donnelly Center for Cellular and Biomolecular Research, University of Toronto, Toronto, ON, Canada
| | - Siwei Chen
- Department of Biological Statistics and Computational Biology, Cornell University, Ithaca, NY, USA
| | - Yingying Zhang
- Department of Biological Statistics and Computational Biology, Cornell University, Ithaca, NY, USA
| | - Shayne D Wierbowski
- Department of Biological Statistics and Computational Biology, Cornell University, Ithaca, NY, USA
| | - Samantha Boudeau
- Department of Pharmacology and Experimental Therapeutics, Boston University School of Medicine, Boston, MA, USA
| | | | - Ramy H Malty
- Department of Biochemistry, University of Regina, Regina, SK, Canada
| | - Edyta Malolepsza
- Department of Surgery, Massachusetts General Hospital, Harvard Medical School, Boston, MA, USA; Broad Institute of Massachusetts Institute of Technology and Harvard University, Boston, MA, USA
| | - Kalliopi Tsafou
- Department of Surgery, Massachusetts General Hospital, Harvard Medical School, Boston, MA, USA; Broad Institute of Massachusetts Institute of Technology and Harvard University, Boston, MA, USA
| | - Aparna Nathan
- Department of Surgery, Massachusetts General Hospital, Harvard Medical School, Boston, MA, USA; Broad Institute of Massachusetts Institute of Technology and Harvard University, Boston, MA, USA
| | - Graham Cromar
- Program in Molecular Medicine, Hospital for Sick Children and University of Toronto, Toronto, ON, Canada
| | - Hongbo Guo
- Donnelly Center for Cellular and Biomolecular Research, University of Toronto, Toronto, ON, Canada
| | - Ali Al Abdullatif
- Department of Pharmacology and Experimental Therapeutics, Boston University School of Medicine, Boston, MA, USA
| | - Daniel J Apicco
- Department of Pharmacology and Experimental Therapeutics, Boston University School of Medicine, Boston, MA, USA
| | - Lindsay A Becker
- Department of Genetics, Stanford University School of Medicine, Stanford, CA, USA
| | - Aaron D Gitler
- Department of Genetics, Stanford University School of Medicine, Stanford, CA, USA
| | - Stefan M Pulst
- Department of Neurology, University of Utah, Salt Lake City, UT, USA
| | - Ahmed Youssef
- Program in Bioinformatics, Boston University, Boston, MA, USA; Center for Network Systems Biology, Boston University, Boston, MA, USA; Department of Biochemistry, Boston University School of Medicine, Boston University, Boston, MA, USA
| | - Ryan Hekman
- Center for Network Systems Biology, Boston University, Boston, MA, USA; Department of Biochemistry, Boston University School of Medicine, Boston University, Boston, MA, USA
| | - Pierre C Havugimana
- Center for Network Systems Biology, Boston University, Boston, MA, USA; Department of Biochemistry, Boston University School of Medicine, Boston University, Boston, MA, USA; Departments of Biochemistry and Biology, Boston University, Boston, MA, USA
| | - Carl A White
- Center for Network Systems Biology, Boston University, Boston, MA, USA; Department of Biochemistry, Boston University School of Medicine, Boston University, Boston, MA, USA
| | - Benjamin C Blum
- Center for Network Systems Biology, Boston University, Boston, MA, USA; Department of Biochemistry, Boston University School of Medicine, Boston University, Boston, MA, USA
| | - Antonia Ratti
- Department of Neurology and Laboratory of Neuroscience, IRCCS, Milan, Italy
| | - Camron D Bryant
- Department of Pharmacology and Experimental Therapeutics, Boston University School of Medicine, Boston, MA, USA
| | - John Parkinson
- Program in Molecular Medicine, Hospital for Sick Children and University of Toronto, Toronto, ON, Canada
| | - Kasper Lage
- Department of Surgery, Massachusetts General Hospital, Harvard Medical School, Boston, MA, USA; Broad Institute of Massachusetts Institute of Technology and Harvard University, Boston, MA, USA
| | - Mohan Babu
- Department of Biochemistry, University of Regina, Regina, SK, Canada
| | - Haiyuan Yu
- Department of Biological Statistics and Computational Biology, Cornell University, Ithaca, NY, USA
| | - Gary D Bader
- Donnelly Center for Cellular and Biomolecular Research, University of Toronto, Toronto, ON, Canada
| | - Benjamin Wolozin
- Department of Pharmacology and Experimental Therapeutics, Boston University School of Medicine, Boston, MA, USA; Department of Neurology, Boston University School of Medicine, Boston, MA, USA; Program in Neuroscience, Boston University, Boston, MA, USA.
| | - Andrew Emili
- Donnelly Center for Cellular and Biomolecular Research, University of Toronto, Toronto, ON, Canada; Program in Bioinformatics, Boston University, Boston, MA, USA; Center for Network Systems Biology, Boston University, Boston, MA, USA; Department of Biochemistry, Boston University School of Medicine, Boston University, Boston, MA, USA; Departments of Biochemistry and Biology, Boston University, Boston, MA, USA.
| |
Collapse
|
4
|
Sanchez-Taltavull D, Perkins TJ, Dommann N, Melin N, Keogh A, Candinas D, Stroka D, Beldi G. Bayesian correlation is a robust gene similarity measure for single-cell RNA-seq data. NAR Genom Bioinform 2020; 2:lqaa002. [PMID: 33575552 PMCID: PMC7671344 DOI: 10.1093/nargab/lqaa002] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2019] [Revised: 11/30/2019] [Accepted: 01/09/2020] [Indexed: 02/07/2023] Open
Abstract
Assessing similarity is highly important for bioinformatics algorithms to determine correlations between biological information. A common problem is that similarity can appear by chance, particularly for low expressed entities. This is especially relevant in single-cell RNA-seq (scRNA-seq) data because read counts are much lower compared to bulk RNA-seq. Recently, a Bayesian correlation scheme that assigns low similarity to genes that have low confidence expression estimates has been proposed to assess similarity for bulk RNA-seq. Our goal is to extend the properties of the Bayesian correlation in scRNA-seq data by considering three ways to compute similarity. First, we compute the similarity of pairs of genes over all cells. Second, we identify specific cell populations and compute the correlation in those populations. Third, we compute the similarity of pairs of genes over all clusters, by considering the total mRNA expression. We demonstrate that Bayesian correlations are more reproducible than Pearson correlations. Compared to Pearson correlations, Bayesian correlations have a smaller dependence on the number of input cells. We show that the Bayesian correlation algorithm assigns high similarity values to genes with a biological relevance in a specific population. We conclude that Bayesian correlation is a robust similarity measure in scRNA-seq data.
Collapse
Affiliation(s)
- Daniel Sanchez-Taltavull
- Visceral Surgery and Medicine, Inselspital, Bern University Hospital, Department for BioMedical Research, University of Bern, Murtenstrasse 35, 3008 Bern, Switzerland
| | - Theodore J Perkins
- Regenerative Medicine Program, Ottawa Hospital Research Institute, Ottawa, Ontario, ON K1H8L6, Canada.,Department of Biochemistry, Microbiology and Immunology, University of Ottawa, Ottawa, Ontario, ON K1H8L6, Canada
| | - Noelle Dommann
- Visceral Surgery and Medicine, Inselspital, Bern University Hospital, Department for BioMedical Research, University of Bern, Murtenstrasse 35, 3008 Bern, Switzerland
| | - Nicolas Melin
- Visceral Surgery and Medicine, Inselspital, Bern University Hospital, Department for BioMedical Research, University of Bern, Murtenstrasse 35, 3008 Bern, Switzerland
| | - Adrian Keogh
- Visceral Surgery and Medicine, Inselspital, Bern University Hospital, Department for BioMedical Research, University of Bern, Murtenstrasse 35, 3008 Bern, Switzerland
| | - Daniel Candinas
- Visceral Surgery and Medicine, Inselspital, Bern University Hospital, Department for BioMedical Research, University of Bern, Murtenstrasse 35, 3008 Bern, Switzerland
| | - Deborah Stroka
- Visceral Surgery and Medicine, Inselspital, Bern University Hospital, Department for BioMedical Research, University of Bern, Murtenstrasse 35, 3008 Bern, Switzerland
| | - Guido Beldi
- Visceral Surgery and Medicine, Inselspital, Bern University Hospital, Department for BioMedical Research, University of Bern, Murtenstrasse 35, 3008 Bern, Switzerland
| |
Collapse
|
5
|
Hu LZ, Goebels F, Tan JH, Wolf E, Kuzmanov U, Wan C, Phanse S, Xu C, Schertzberg M, Fraser AG, Bader GD, Emili A. EPIC: software toolkit for elution profile-based inference of protein complexes. Nat Methods 2019; 16:737-742. [PMID: 31308550 PMCID: PMC7995176 DOI: 10.1038/s41592-019-0461-4] [Citation(s) in RCA: 50] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/16/2018] [Accepted: 05/15/2019] [Indexed: 11/08/2022]
Abstract
Protein complexes are key macromolecular machines of the cell, but their description remains incomplete. We and others previously reported an experimental strategy for global characterization of native protein assemblies based on chromatographic fractionation of biological extracts coupled to precision mass spectrometry analysis (chromatographic fractionation-mass spectrometry, CF-MS), but the resulting data are challenging to process and interpret. Here, we describe EPIC (elution profile-based inference of complexes), a software toolkit for automated scoring of large-scale CF-MS data to define high-confidence multi-component macromolecules from diverse biological specimens. As a case study, we used EPIC to map the global interactome of Caenorhabditis elegans, defining 612 putative worm protein complexes linked to diverse biological processes. These included novel subunits and assemblies unique to nematodes that we validated using orthogonal methods. The open source EPIC software is freely available as a Jupyter notebook packaged in a Docker container (https://hub.docker.com/r/baderlab/bio-epic/).
Collapse
Affiliation(s)
- Lucas ZhongMing Hu
- Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, Toronto, Ontario, Canada
- Department of Molecular Genetics, University of Toronto, Toronto, Ontario, Canada
| | - Florian Goebels
- Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, Toronto, Ontario, Canada
| | - June H Tan
- Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, Toronto, Ontario, Canada
- Department of Molecular Genetics, University of Toronto, Toronto, Ontario, Canada
| | - Eric Wolf
- Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, Toronto, Ontario, Canada
- Department of Molecular Genetics, University of Toronto, Toronto, Ontario, Canada
| | - Uros Kuzmanov
- Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, Toronto, Ontario, Canada
| | - Cuihong Wan
- Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, Toronto, Ontario, Canada
- School of Life Science, Central China Normal University, Wuhan, China
| | - Sadhna Phanse
- Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, Toronto, Ontario, Canada
| | - Changjiang Xu
- Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, Toronto, Ontario, Canada
| | - Mike Schertzberg
- Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, Toronto, Ontario, Canada
| | - Andrew G Fraser
- Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, Toronto, Ontario, Canada
- Department of Molecular Genetics, University of Toronto, Toronto, Ontario, Canada
| | - Gary D Bader
- Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, Toronto, Ontario, Canada.
- Department of Molecular Genetics, University of Toronto, Toronto, Ontario, Canada.
| | - Andrew Emili
- Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, Toronto, Ontario, Canada.
- Department of Molecular Genetics, University of Toronto, Toronto, Ontario, Canada.
- Departments of Biochemistry and Biology, Boston University, Boston, MA, USA.
| |
Collapse
|