1
|
Li F, Dong S, Leier A, Han M, Guo X, Xu J, Wang X, Pan S, Jia C, Zhang Y, Webb GI, Coin LJM, Li C, Song J. Positive-unlabeled learning in bioinformatics and computational biology: a brief review. Brief Bioinform 2021; 23:6415313. [PMID: 34729589 DOI: 10.1093/bib/bbab461] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2021] [Revised: 09/27/2021] [Accepted: 10/07/2021] [Indexed: 12/14/2022] Open
Abstract
Conventional supervised binary classification algorithms have been widely applied to address significant research questions using biological and biomedical data. This classification scheme requires two fully labeled classes of data (e.g. positive and negative samples) to train a classification model. However, in many bioinformatics applications, labeling data is laborious, and the negative samples might be potentially mislabeled due to the limited sensitivity of the experimental equipment. The positive unlabeled (PU) learning scheme was therefore proposed to enable the classifier to learn directly from limited positive samples and a large number of unlabeled samples (i.e. a mixture of positive or negative samples). To date, several PU learning algorithms have been developed to address various biological questions, such as sequence identification, functional site characterization and interaction prediction. In this paper, we revisit a collection of 29 state-of-the-art PU learning bioinformatic applications to address various biological questions. Various important aspects are extensively discussed, including PU learning methodology, biological application, classifier design and evaluation strategy. We also comment on the existing issues of PU learning and offer our perspectives for the future development of PU learning applications. We anticipate that our work serves as an instrumental guideline for a better understanding of the PU learning framework in bioinformatics and further developing next-generation PU learning frameworks for critical biological applications.
Collapse
Affiliation(s)
- Fuyi Li
- Monash University, Australia
| | | | - André Leier
- Department of Genetics, UAB School of Medicine, USA
| | - Meiya Han
- Department of Biochemistry and Molecular Biology, Monash University, Australia
| | | | - Jing Xu
- Computer Science and Technology from Nankai University, China
| | - Xiaoyu Wang
- Department of Biochemistry and Molecular Biology and Biomedicine Discovery Institute, Monash University, Australia
| | - Shirui Pan
- University of Technology Sydney (UTS), Ultimo, NSW, Australia
| | - Cangzhi Jia
- College of Science, Dalian Maritime University, Australia
| | - Yang Zhang
- Northwestern Polytechnical University, China
| | - Geoffrey I Webb
- Faculty of Information Technology at Monash University, Australia
| | - Lachlan J M Coin
- Department of Clinical Pathology, University of Melbourne, Australia
| | - Chen Li
- Biomedicine Discovery Institute and Department of Biochemistry of Molecular Biology, Monash University, Australia
| | - Jiangning Song
- Monash Biomedicine Discovery Institute, Monash University, Melbourne, Australia
| |
Collapse
|
2
|
Napolitano F, Rapakoulia T, Annunziata P, Hasegawa A, Cardon M, Napolitano S, Vaccaro L, Iuliano A, Wanderlingh LG, Kasukawa T, Medina DL, Cacchiarelli D, Gao X, di Bernardo D, Arner E. Automatic identification of small molecules that promote cell conversion and reprogramming. Stem Cell Reports 2021; 16:1381-1390. [PMID: 33891873 PMCID: PMC8185468 DOI: 10.1016/j.stemcr.2021.03.028] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2020] [Revised: 03/24/2021] [Accepted: 03/25/2021] [Indexed: 12/04/2022] Open
Abstract
Controlling cell fate has great potential for regenerative medicine, drug discovery, and basic research. Although transcription factors are able to promote cell reprogramming and transdifferentiation, methods based on their upregulation often show low efficiency. Small molecules that can facilitate conversion between cell types can ameliorate this problem working through safe, rapid, and reversible mechanisms. Here, we present DECCODE, an unbiased computational method for identification of such molecules based on transcriptional data. DECCODE matches a large collection of drug-induced profiles for drug treatments against a large dataset of primary cell transcriptional profiles to identify drugs that either alone or in combination enhance cell reprogramming and cell conversion. Extensive validation in the context of human induced pluripotent stem cells shows that DECCODE is able to prioritize drugs and drug combinations enhancing cell reprogramming. We also provide predictions for cell conversion with single drugs and drug combinations for 145 different cell types.
Collapse
Affiliation(s)
- Francesco Napolitano
- Telethon Institute of Genetics and Medicine (TIGEM), Pozzuoli (NA) 80078, Italy; Computational Bioscience Research Center, King Abdullah University of Science and Technology (KAUST), Thuwal 23955-6900, Saudi Arabia
| | - Trisevgeni Rapakoulia
- Computational Bioscience Research Center, King Abdullah University of Science and Technology (KAUST), Thuwal 23955-6900, Saudi Arabia; Max Planck Institute for Molecular Genetics, Ihnestrasse 63-73, 14195 Berlin, Germany
| | - Patrizia Annunziata
- Telethon Institute of Genetics and Medicine (TIGEM), Armenise/Harvard Laboratory of Integrative Genomics, Pozzuoli (NA) 80078, Italy
| | - Akira Hasegawa
- RIKEN Center for Integrative Medical Sciences, Yokohama, Kanagawa 230-0045 Japan
| | - Melissa Cardon
- RIKEN Center for Integrative Medical Sciences, Yokohama, Kanagawa 230-0045 Japan
| | - Sara Napolitano
- Telethon Institute of Genetics and Medicine (TIGEM), Pozzuoli (NA) 80078, Italy
| | - Lorenzo Vaccaro
- Telethon Institute of Genetics and Medicine (TIGEM), Armenise/Harvard Laboratory of Integrative Genomics, Pozzuoli (NA) 80078, Italy
| | - Antonella Iuliano
- Telethon Institute of Genetics and Medicine (TIGEM), Pozzuoli (NA) 80078, Italy
| | | | - Takeya Kasukawa
- RIKEN Center for Integrative Medical Sciences, Yokohama, Kanagawa 230-0045 Japan
| | - Diego L Medina
- Telethon Institute of Genetics and Medicine (TIGEM), Pozzuoli (NA) 80078, Italy; Department of Translational Medicine, University of Naples Federico II, Naples, Italy
| | - Davide Cacchiarelli
- Telethon Institute of Genetics and Medicine (TIGEM), Armenise/Harvard Laboratory of Integrative Genomics, Pozzuoli (NA) 80078, Italy; Department of Translational Medicine, University of Naples Federico II, Naples, Italy.
| | - Xin Gao
- Computational Bioscience Research Center, King Abdullah University of Science and Technology (KAUST), Thuwal 23955-6900, Saudi Arabia.
| | - Diego di Bernardo
- Telethon Institute of Genetics and Medicine (TIGEM), Pozzuoli (NA) 80078, Italy; Department of Chemical, Materials and Industrial Production Engineering, University of Naples Federico II, 80125 Naples, Italy.
| | - Erik Arner
- RIKEN Center for Integrative Medical Sciences, Yokohama, Kanagawa 230-0045 Japan; Graduate School of Integrated Sciences for Life, Hiroshima University, Kagamiyama, Higashi-Hiroshima, 739-8528 Japan.
| |
Collapse
|
3
|
Sanford EM, Emert BL, Coté A, Raj A. Gene regulation gravitates toward either addition or multiplication when combining the effects of two signals. eLife 2020; 9:e59388. [PMID: 33284110 PMCID: PMC7771960 DOI: 10.7554/elife.59388] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2020] [Accepted: 12/04/2020] [Indexed: 01/07/2023] Open
Abstract
Two different cell signals often affect transcription of the same gene. In such cases, it is natural to ask how the combined transcriptional response compares to the individual responses. The most commonly used mechanistic models predict additive or multiplicative combined responses, but a systematic genome-wide evaluation of these predictions is not available. Here, we analyzed the transcriptional response of human MCF-7 cells to retinoic acid and TGF-β, applied individually and in combination. The combined transcriptional responses of induced genes exhibited a range of behaviors, but clearly favored both additive and multiplicative outcomes. We performed paired chromatin accessibility measurements and found that increases in accessibility were largely additive. There was some association between super-additivity of accessibility and multiplicative or super-multiplicative combined transcriptional responses, while sub-additivity of accessibility associated with additive transcriptional responses. Our findings suggest that mechanistic models of combined transcriptional regulation must be able to reproduce a range of behaviors.
Collapse
Affiliation(s)
- Eric M Sanford
- Genomics and Computational Biology Graduate Group, Perelman School of Medicine, University of PennsylvaniaPhiladelphiaUnited States
| | - Benjamin L Emert
- Genomics and Computational Biology Graduate Group, Perelman School of Medicine, University of PennsylvaniaPhiladelphiaUnited States
| | - Allison Coté
- Department of Bioengineering, School of Engineering and Applied Sciences, University of PennsylvaniaPhiladelphiaUnited States
- Department of Genetics, Perelman School of Medicine, University of PennsylvaniaPhiladelphiaUnited States
| | - Arjun Raj
- Department of Bioengineering, School of Engineering and Applied Sciences, University of PennsylvaniaPhiladelphiaUnited States
- Department of Genetics, Perelman School of Medicine, University of PennsylvaniaPhiladelphiaUnited States
| |
Collapse
|
4
|
Prediction Methods of Herbal Compounds in Chinese Medicinal Herbs. Molecules 2018; 23:molecules23092303. [PMID: 30201875 PMCID: PMC6225236 DOI: 10.3390/molecules23092303] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/12/2018] [Revised: 09/06/2018] [Accepted: 09/07/2018] [Indexed: 12/12/2022] Open
Abstract
Chinese herbal medicine has recently gained worldwide attention. The curative mechanism of Chinese herbal medicine is compared with that of western medicine at the molecular level. The treatment mechanism of most Chinese herbal medicines is still not clear. How do we integrate Chinese herbal medicine compounds with modern medicine? Chinese herbal medicine drug-like prediction method is particularly important. A growing number of Chinese herbal source compounds are now widely used as drug-like compound candidates. An important way for pharmaceutical companies to develop drugs is to discover potentially active compounds from related herbs in Chinese herbs. The methods for predicting the drug-like properties of Chinese herbal compounds include the virtual screening method, pharmacophore model method and machine learning method. In this paper, we focus on the prediction methods for the medicinal properties of Chinese herbal medicines. We analyze the advantages and disadvantages of the above three methods, and then introduce the specific steps of the virtual screening method. Finally, we present the prospect of the joint application of various methods.
Collapse
|