1
|
Hu F, Zhang W, Huang H, Li W, Li Y, Yin P. A Transferability-Based Method for Evaluating the Protein Representation Learning. IEEE J Biomed Health Inform 2024; 28:3158-3166. [PMID: 38416611 DOI: 10.1109/jbhi.2024.3370680] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/01/2024]
Abstract
Self-supervised pre-trained language models have recently risen as a powerful approach in learning protein representations, showing exceptional effectiveness in various biological tasks, such as drug discovery. Amidst the evolving trend in protein language model development, there is an observable shift towards employing large-scale multimodal and multitask models. However, the predominant reliance on empirical assessments using specific benchmark datasets for evaluating these models raises concerns about the comprehensiveness and efficiency of current evaluation methods. Addressing this gap, our study introduces a novel quantitative approach for estimating the performance of transferring multi-task pre-trained protein representations to downstream tasks. This transferability-based method is designed to quantify the similarities in latent space distributions between pre-trained features and those fine-tuned for downstream tasks. It encompasses a broad spectrum, covering multiple domains and a variety of heterogeneous tasks. To validate this method, we constructed a diverse set of protein-specific pre-training tasks. The resulting protein representations were then evaluated across several downstream biological tasks. Our experimental results demonstrate a robust correlation between the transferability scores obtained using our method and the actual transfer performance observed. This significant correlation highlights the potential of our method as a more comprehensive and efficient tool for evaluating protein representation learning.
Collapse
|
2
|
Bernardo L, Lomagno A, Mauri PL, Di Silvestre D. Integration of Omics Data and Network Models to Unveil Negative Aspects of SARS-CoV-2, from Pathogenic Mechanisms to Drug Repurposing. BIOLOGY 2023; 12:1196. [PMID: 37759595 PMCID: PMC10525644 DOI: 10.3390/biology12091196] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/18/2023] [Revised: 08/25/2023] [Accepted: 08/30/2023] [Indexed: 09/29/2023]
Abstract
Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) caused the COVID-19 health emergency, affecting and killing millions of people worldwide. Following SARS-CoV-2 infection, COVID-19 patients show a spectrum of symptoms ranging from asymptomatic to very severe manifestations. In particular, bronchial and pulmonary cells, involved at the initial stage, trigger a hyper-inflammation phase, damaging a wide range of organs, including the heart, brain, liver, intestine and kidney. Due to the urgent need for solutions to limit the virus' spread, most efforts were initially devoted to mapping outbreak trajectories and variant emergence, as well as to the rapid search for effective therapeutic strategies. Samples collected from hospitalized or dead COVID-19 patients from the early stages of pandemic have been analyzed over time, and to date they still represent an invaluable source of information to shed light on the molecular mechanisms underlying the organ/tissue damage, the knowledge of which could offer new opportunities for diagnostics and therapeutic designs. For these purposes, in combination with clinical data, omics profiles and network models play a key role providing a holistic view of the pathways, processes and functions most affected by viral infection. In fact, in addition to epidemiological purposes, networks are being increasingly adopted for the integration of multiomics data, and recently their use has expanded to the identification of drug targets or the repositioning of existing drugs. These topics will be covered here by exploring the landscape of SARS-CoV-2 survey-based studies using systems biology approaches derived from omics data, paying particular attention to those that have considered samples of human origin.
Collapse
Affiliation(s)
| | | | | | - Dario Di Silvestre
- Institute for Biomedical Technologies—National Research Council (ITB-CNR), 20054 Segrate, Italy; (L.B.); (A.L.); (P.L.M.)
| |
Collapse
|
4
|
Zhang W, Zhang Y, Min Z, Mo J, Ju Z, Guan W, Zeng B, Liu Y, Chen J, Zhang Q, Li H, Zeng C, Wei Y, Chan GCF. COVID19db: a comprehensive database platform to discover potential drugs and targets of COVID-19 at whole transcriptomic scale. Nucleic Acids Res 2021; 50:D747-D757. [PMID: 34554255 PMCID: PMC8728200 DOI: 10.1093/nar/gkab850] [Citation(s) in RCA: 20] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/07/2021] [Revised: 09/08/2021] [Accepted: 09/11/2021] [Indexed: 12/26/2022] Open
Abstract
Many open access transcriptomic data of coronavirus disease 2019 (COVID-19) were generated, they have great heterogeneity and are difficult to analyze. To utilize these invaluable data for better understanding of COVID-19, additional software should be developed. Especially for researchers without bioinformatic skills, a user-friendly platform is mandatory. We developed the COVID19db platform (http://hpcc.siat.ac.cn/covid19db & http://www.biomedical-web.com/covid19db) that provides 39 930 drug–target–pathway interactions and 95 COVID-19 related datasets, which include transcriptomes of 4127 human samples across 13 body sites associated with the exposure of 33 microbes and 33 drugs/agents. To facilitate data application, each dataset was standardized and annotated with rich clinical information. The platform further provides 14 different analytical applications to analyze various mechanisms underlying COVID-19. Moreover, the 14 applications enable researchers to customize grouping and setting for different analyses and allow them to perform analyses using their own data. Furthermore, a Drug Discovery tool is designed to identify potential drugs and targets at whole transcriptomic scale. For proof of concept, we used COVID19db and identified multiple potential drugs and targets for COVID-19. In summary, COVID19db provides user-friendly web interfaces to freely analyze, download data, and submit new data for further integration, it can accelerate the identification of effective strategies against COVID-19.
Collapse
Affiliation(s)
- Wenliang Zhang
- Department of Pediatrics, The University of Hong Kong-Shenzhen Hospital, Shenzhen, Guangdong 518053, China.,Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, Guangdong 518055, China.,Department of Bioinformatics, Outstanding Biotechnology Co., Ltd.-Shenzhen, Shenzhen, China
| | - Yan Zhang
- Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, Guangdong 518055, China.,Department of Clinical Oncology, The University of Hong Kong-Shenzhen Hospital, Shenzhen, Guangdong 518053, China
| | - Zhuochao Min
- School of Information and Software Engineering, University of Electronic Science and Technology of China, Chengdu 610054, China
| | - Jing Mo
- Department of Bioinformatics, Outstanding Biotechnology Co., Ltd.-Shenzhen, Shenzhen, China
| | - Zhen Ju
- Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, Guangdong 518055, China.,Center for High Performance Computing, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen, Guangdong 518055, China.,CAS Key Laboratory of Health Informatics, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen, Guangdong 518055, China
| | - Wen Guan
- Department of Bioinformatics, Outstanding Biotechnology Co., Ltd.-Shenzhen, Shenzhen, China.,Guangdong Key Laboratory of Animal Conservation and Resource Utilization, Institute of Zoology, Guangdong Academy of Sciences, Guangzhou 510260, China
| | - Binghui Zeng
- Department of Bioinformatics, Outstanding Biotechnology Co., Ltd.-Shenzhen, Shenzhen, China.,Hospital of Stomatology, Guangdong Provincial Key Laboratory of Stomatology, Guanghua School of Stomatology, Sun Yat-sen University, Guangzhou 510055, China
| | - Yang Liu
- Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, Guangdong 518055, China.,Department of Clinical Oncology, The University of Hong Kong-Shenzhen Hospital, Shenzhen, Guangdong 518053, China
| | - Jianliang Chen
- Department of Pediatrics, The University of Hong Kong-Shenzhen Hospital, Shenzhen, Guangdong 518053, China
| | - Qianshen Zhang
- Department of Pediatrics, The University of Hong Kong-Shenzhen Hospital, Shenzhen, Guangdong 518053, China
| | - Hanguang Li
- Department of Pediatrics, The University of Hong Kong-Shenzhen Hospital, Shenzhen, Guangdong 518053, China
| | - Chunxia Zeng
- Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, Guangdong 518055, China.,Center for High Performance Computing, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen, Guangdong 518055, China.,CAS Key Laboratory of Health Informatics, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen, Guangdong 518055, China
| | - Yanjie Wei
- Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, Guangdong 518055, China.,Center for High Performance Computing, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen, Guangdong 518055, China.,CAS Key Laboratory of Health Informatics, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen, Guangdong 518055, China
| | - Godfrey Chi-Fung Chan
- Department of Pediatrics, The University of Hong Kong-Shenzhen Hospital, Shenzhen, Guangdong 518053, China.,Department of Pediatrics and Adolescent Medicine, LKS Faculty of Medicine, The University of Hong Kong, 21 Sassoon Road, Hong Kong 999077, China
| |
Collapse
|