1
|
Heyndrickx W, Mervin L, Morawietz T, Sturm N, Friedrich L, Zalewski A, Pentina A, Humbeck L, Oldenhof M, Niwayama R, Schmidtke P, Fechner N, Simm J, Arany A, Drizard N, Jabal R, Afanasyeva A, Loeb R, Verma S, Harnqvist S, Holmes M, Pejo B, Telenczuk M, Holway N, Dieckmann A, Rieke N, Zumsande F, Clevert DA, Krug M, Luscombe C, Green D, Ertl P, Antal P, Marcus D, Do Huu N, Fuji H, Pickett S, Acs G, Boniface E, Beck B, Sun Y, Gohier A, Rippmann F, Engkvist O, Göller AH, Moreau Y, Galtier MN, Schuffenhauer A, Ceulemans H. MELLODDY: Cross-pharma Federated Learning at Unprecedented Scale Unlocks Benefits in QSAR without Compromising Proprietary Information. J Chem Inf Model 2024; 64:2331-2344. [PMID: 37642660 PMCID: PMC11005050 DOI: 10.1021/acs.jcim.3c00799] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2023] [Indexed: 08/31/2023]
Abstract
Federated multipartner machine learning has been touted as an appealing and efficient method to increase the effective training data volume and thereby the predictivity of models, particularly when the generation of training data is resource-intensive. In the landmark MELLODDY project, indeed, each of ten pharmaceutical companies realized aggregated improvements on its own classification or regression models through federated learning. To this end, they leveraged a novel implementation extending multitask learning across partners, on a platform audited for privacy and security. The experiments involved an unprecedented cross-pharma data set of 2.6+ billion confidential experimental activity data points, documenting 21+ million physical small molecules and 40+ thousand assays in on-target and secondary pharmacodynamics and pharmacokinetics. Appropriate complementary metrics were developed to evaluate the predictive performance in the federated setting. In addition to predictive performance increases in labeled space, the results point toward an extended applicability domain in federated learning. Increases in collective training data volume, including by means of auxiliary data resulting from single concentration high-throughput and imaging assays, continued to boost predictive performance, albeit with a saturating return. Markedly higher improvements were observed for the pharmacokinetics and safety panel assay-based task subsets.
Collapse
Affiliation(s)
| | - Lewis Mervin
- AstraZeneca
R&D, Biomedical Campus, 1 Francis Crick Ave, Cambridge CB2 0SL, U.K.
| | - Tobias Morawietz
- Bayer
Pharma
AG, Global Drug Discovery, Chemical Research,
Computational Chemistry, Aprather Weg 18 a, Wuppertal 42096, Germany
| | - Noé Sturm
- Novartis
Institutes for BioMedical Research, Novartis Campus, Basel 4002, Switzerland
| | - Lukas Friedrich
- Merck KGaA, Global Research & Development, Frankfurter Strasse 250, Darmstadt 64293, Germany
| | - Adam Zalewski
- Amgen Research
(Munich) GmbH, Staffelseestraße
2, Munich 81477, Germany
| | - Anastasia Pentina
- Bayer AG, Machine Learning Research, Research & Development,
Pharmaceuticals, Berlin 10117, Germany
| | - Lina Humbeck
- BI Medicinal
Chemistry Department, Boehringer Ingelheim
Pharma GmbH & Co. KG, Birkendorfer Str. 65, Biberach an der Riss 88397, Germany
| | - Martijn Oldenhof
- KU
Leuven, ESAT-STADIUS, Kasteelpark Arenberg 10, Heverlee 3001, Belgium
| | - Ritsuya Niwayama
- Institut
de recherches Servier, 125 chemin de ronde Croissy-sur-Seine, Île-de-France 78290, France
| | | | - Nikolas Fechner
- Novartis
Institutes for BioMedical Research, Novartis Campus, Basel 4002, Switzerland
| | - Jaak Simm
- KU
Leuven, ESAT-STADIUS, Kasteelpark Arenberg 10, Heverlee 3001, Belgium
| | - Adam Arany
- KU
Leuven, ESAT-STADIUS, Kasteelpark Arenberg 10, Heverlee 3001, Belgium
| | | | - Rama Jabal
- Iktos, 65 rue de Prony, Paris 75017, France
| | - Arina Afanasyeva
- Modality
Informatics Group, Digital Research Solutions, Advanced Informatics
& Analytics, Astellas Pharma Inc., 21 Miyukigaoka, Tsukuba-shi, Ibaraki 305-8585, Japan
| | - Regis Loeb
- KU
Leuven, ESAT-STADIUS, Kasteelpark Arenberg 10, Heverlee 3001, Belgium
| | - Shlok Verma
- GlaxoSmithKline, Computational Sciences, Gunnels Wood Road Stevenage, Herts SG1 2NY, U.K.
| | - Simon Harnqvist
- GlaxoSmithKline, Computational Sciences, Gunnels Wood Road Stevenage, Herts SG1 2NY, U.K.
| | - Matthew Holmes
- GlaxoSmithKline, Computational Sciences, Gunnels Wood Road Stevenage, Herts SG1 2NY, U.K.
| | - Balazs Pejo
- Budapest
University of Technology and Economics, Department of Networked Systems and Services, Műegyetem rkp. 3, Budapest 1111, Hungary
| | | | - Nicholas Holway
- Novartis
Institutes for BioMedical Research, Novartis Campus, Basel 4002, Switzerland
| | - Arne Dieckmann
- Bayer
AG, API Production, Product Supply, Pharmaceuticals, Ernst-Schering-Straße 14, Bergkamen 59192, Germany
| | - Nicola Rieke
- NVIDIA
GmbH, Floessergasse 2, Munich 81369, Germany
| | | | - Djork-Arné Clevert
- Bayer AG, Machine Learning Research, Research & Development,
Pharmaceuticals, Berlin 10117, Germany
| | - Michael Krug
- Merck KGaA, Global Research & Development, Frankfurter Strasse 250, Darmstadt 64293, Germany
| | - Christopher Luscombe
- GlaxoSmithKline, Computational Sciences, Gunnels Wood Road Stevenage, Herts SG1 2NY, U.K.
| | - Darren Green
- GlaxoSmithKline, Computational Sciences, Gunnels Wood Road Stevenage, Herts SG1 2NY, U.K.
| | - Peter Ertl
- Novartis
Institutes for BioMedical Research, Novartis Campus, Basel 4002, Switzerland
| | - Peter Antal
- Budapest
University of Technology and Economics, Department of Measurement and Information Systems, Műegyetem rkp. 3, Budapest 1111, Hungary
| | - David Marcus
- GlaxoSmithKline, Computational Sciences, Gunnels Wood Road Stevenage, Herts SG1 2NY, U.K.
| | | | - Hideyoshi Fuji
- Modality
Informatics Group, Digital Research Solutions, Advanced Informatics
& Analytics, Astellas Pharma Inc., 21 Miyukigaoka, Tsukuba-shi, Ibaraki 305-8585, Japan
| | - Stephen Pickett
- GlaxoSmithKline, Computational Sciences, Gunnels Wood Road Stevenage, Herts SG1 2NY, U.K.
| | - Gergely Acs
- Budapest
University of Technology and Economics, Department of Networked Systems and Services, Műegyetem rkp. 3, Budapest 1111, Hungary
| | - Eric Boniface
- Substra
Foundation - Labelia Labs, 4 rue Voltaire, Nantes 44000, France
| | - Bernd Beck
- BI Medicinal
Chemistry Department, Boehringer Ingelheim
Pharma GmbH & Co. KG, Birkendorfer Str. 65, Biberach an der Riss 88397, Germany
| | - Yax Sun
- Amgen
Research, 1 Amgen Center
Drive, Thousand Oaks, California 92130, United States
| | - Arnaud Gohier
- Institut
de recherches Servier, 125 chemin de ronde Croissy-sur-Seine, Île-de-France 78290, France
| | - Friedrich Rippmann
- Merck KGaA, Global Research & Development, Frankfurter Strasse 250, Darmstadt 64293, Germany
| | - Ola Engkvist
- AstraZeneca, Molecular AI, Discovery Sciences,
R&D, Pepparedsleden
1, Mölndal 431 50, Sweden
| | - Andreas H. Göller
- Bayer
Pharma
AG, Global Drug Discovery, Chemical Research,
Computational Chemistry, Aprather Weg 18 a, Wuppertal 42096, Germany
| | - Yves Moreau
- KU
Leuven, ESAT-STADIUS, Kasteelpark Arenberg 10, Heverlee 3001, Belgium
| | | | - Ansgar Schuffenhauer
- Novartis
Institutes for BioMedical Research, Novartis Campus, Basel 4002, Switzerland
| | - Hugo Ceulemans
- Janssen
Pharmaceutica NV, Turnhoutseweg 30, Beerse 2340, Belgium
| |
Collapse
|
2
|
Huang D, Ye X, Sakurai T. Multi-party collaborative drug discovery via federated learning. Comput Biol Med 2024; 171:108181. [PMID: 38428094 DOI: 10.1016/j.compbiomed.2024.108181] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2023] [Revised: 01/28/2024] [Accepted: 02/18/2024] [Indexed: 03/03/2024]
Abstract
In the field of drug discovery and pharmacology research, precise and rapid prediction of drug-target binding affinity (DTA) and drug-drug interaction (DDI) are essential for drug efficacy and safety. However, pharmacological data are often distributed across different institutions. Moreover, due to concerns regarding data privacy and intellectual property, the sharing of pharmacological data is often restricted. It is difficult for institutions to achieve the desired performance by solely utilizing their data. This urgent challenge calls for a solution that not only enhances collaboration between multiple institutions to improve prediction accuracy but also safeguards data privacy. In this study, we propose a novel federated learning (FL) framework to advance the prediction of DTA and DDI, namely FL-DTA and FL-DDI. The proposed framework enables multiple institutions to collaboratively train a predictive model without the need to share their local data. Moreover, to ensure data privacy, we employ secure multi-party computation (MPC) during the federated learning model aggregation phase. We evaluated the proposed method on two DTA and one DDI benchmark datasets and compared them with centralized learning and local learning. The experimental results indicate that the proposed method performs closely to centralized learning, and significantly outperforms local learning. Moreover, the proposed framework ensures data security while promoting collaboration among institutions, thereby accelerating the drug discovery process.
Collapse
Affiliation(s)
- Dong Huang
- Department of Computer Science, University of Tsukuba, Tsukuba, 3058577, Japan
| | - Xiucai Ye
- Department of Computer Science, University of Tsukuba, Tsukuba, 3058577, Japan.
| | - Tetsuya Sakurai
- Department of Computer Science, University of Tsukuba, Tsukuba, 3058577, Japan
| |
Collapse
|
3
|
Smajić A, Grandits M, Ecker GF. Privacy-preserving techniques for decentralized and secure machine learning in drug discovery. Drug Discov Today 2023; 28:103820. [PMID: 37935330 DOI: 10.1016/j.drudis.2023.103820] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2023] [Revised: 10/17/2023] [Accepted: 11/01/2023] [Indexed: 11/09/2023]
Abstract
Data availability, data security, and privacy concerns often hamper optimal performance efficiency of machine learning (ML) techniques. Therefore, novel techniques for the utilization of private/sensitive data in the field of drug discovery have been proposed for ML model-building tasks. Some examples of the different techniques are secure multiparty computation, distributed deep learning, homomorphic encryption, blockchain-based peer-to-peer networking, differential privacy, and federated learning, as well as combinations of such techniques. In this paper, we present an overview of these techniques for decentralized ML to illustrate its benefits and drawbacks in the field of drug discovery.
Collapse
Affiliation(s)
- Aljoša Smajić
- Department of Pharmaceutical Sciences, University of Vienna, Vienna, Austria
| | - Melanie Grandits
- Department of Pharmaceutical Sciences, University of Vienna, Vienna, Austria
| | - Gerhard F Ecker
- Department of Pharmaceutical Sciences, University of Vienna, Vienna, Austria
| |
Collapse
|
4
|
Huang D, Ye X, Zhang Y, Sakurai T. Collaborative analysis for drug discovery by federated learning on non-IID data. Methods 2023; 219:1-7. [PMID: 37689121 DOI: 10.1016/j.ymeth.2023.09.001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2023] [Revised: 08/23/2023] [Accepted: 09/05/2023] [Indexed: 09/11/2023] Open
Abstract
With the increasing availability of large-scale QSAR (Quantitative Structure-Activity Relationship) datasets, collaborative analysis has become a promising approach for drug discovery. Traditional centralized analysis which typically concentrates data on a central server for training faces challenges such as data privacy and security. Distributed analysis such as federated learning offers a solution by enabling collaborative model training without sharing raw data. However, it may fail when the training data in the local devices are non-independent and identically distributed (non-IID). In this paper, we propose a novel framework for collaborative drug discovery using federated learning on non-IID datasets. We address the difficulty of training on non-IID data by globally sharing a small subset of data among all institutions. Our framework allows multiple institutions to jointly train a robust predictive model while preserving the privacy of their individual data. We leverage the federated learning paradigm to distribute the model training process across local devices, eliminating the need for data exchange. The experimental results on 15 benchmark datasets demonstrate that the proposed method achieves competitive predictive accuracy to centralized analysis while respecting data privacy. Moreover, our framework offers benefits such as reduced data transmission and enhanced scalability, making it suitable for large-scale collaborative drug discovery efforts.
Collapse
Affiliation(s)
- Dong Huang
- Department of Computer Science, University of Tsukuba, Tsukuba 3058577, Japan
| | - Xiucai Ye
- Department of Computer Science, University of Tsukuba, Tsukuba 3058577, Japan.
| | - Ying Zhang
- Beidahuang Industry Group General Hospital, Harbin, China.
| | - Tetsuya Sakurai
- Department of Computer Science, University of Tsukuba, Tsukuba 3058577, Japan
| |
Collapse
|
5
|
Pan L, Xiao X, Liu S, Peng S. An Integration Framework of Secure Multiparty Computation and Deep Neural Network for Improving Drug-Drug Interaction Predictions. J Comput Biol 2023; 30:1034-1045. [PMID: 37707993 DOI: 10.1089/cmb.2023.0076] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/16/2023] Open
Abstract
Drug-drug interaction (DDI) is a key concern in drug development and pharmacovigilance. It is important to improve DDI predictions by integrating multisource data from various pharmaceutical companies. Unfortunately, the data privacy and financial interest issues seriously influence the interinstitutional collaborations for DDI predictions. We propose multiparty computation DDI (MPCDDI), a secure MPC-based deep learning framework for DDI predictions. MPCDDI leverages the secret sharing technologies to incorporate the drug-related feature data from multiple institutions and develops a deep learning model for DDI predictions. In MPCDDI, all data transmission and deep learning operations are integrated into secure MPC frameworks to enable high-quality collaboration among pharmaceutical institutions without divulging private drug-related information. The results suggest that MPCDDI is superior to other eight baselines and achieves the similar performance to that of the corresponding plaintext collaborations. More interestingly, MPCDDI significantly outperforms methods that use private data from the single institution. In summary, MPCDDI is an effective framework for promoting collaborative and privacy-preserving drug discovery.
Collapse
Affiliation(s)
- Liang Pan
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, China
| | - Xia Xiao
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, China
| | | | - Shaoliang Peng
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, China
- The State Key Laboratory of Chemo/Biosensing and Chemometrics, Hunan University, Changsha, China
| |
Collapse
|
6
|
Federated machine learning in data-protection-compliant research. NAT MACH INTELL 2023. [DOI: 10.1038/s42256-022-00601-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/26/2023]
|
7
|
Wirth FN, Kussel T, Müller A, Hamacher K, Prasser F. EasySMPC: a simple but powerful no-code tool for practical secure multiparty computation. BMC Bioinformatics 2022; 23:531. [PMID: 36494612 PMCID: PMC9733077 DOI: 10.1186/s12859-022-05044-8] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2022] [Accepted: 11/08/2022] [Indexed: 12/13/2022] Open
Abstract
BACKGROUND Modern biomedical research is data-driven and relies heavily on the re-use and sharing of data. Biomedical data, however, is subject to strict data protection requirements. Due to the complexity of the data required and the scale of data use, obtaining informed consent is often infeasible. Other methods, such as anonymization or federation, in turn have their own limitations. Secure multi-party computation (SMPC) is a cryptographic technology for distributed calculations, which brings formally provable security and privacy guarantees and can be used to implement a wide-range of analytical approaches. As a relatively new technology, SMPC is still rarely used in real-world biomedical data sharing activities due to several barriers, including its technical complexity and lack of usability. RESULTS To overcome these barriers, we have developed the tool EasySMPC, which is implemented in Java as a cross-platform, stand-alone desktop application provided as open-source software. The tool makes use of the SMPC method Arithmetic Secret Sharing, which allows to securely sum up pre-defined sets of variables among different parties in two rounds of communication (input sharing and output reconstruction) and integrates this method into a graphical user interface. No additional software services need to be set up or configured, as EasySMPC uses the most widespread digital communication channel available: e-mails. No cryptographic keys need to be exchanged between the parties and e-mails are exchanged automatically by the software. To demonstrate the practicability of our solution, we evaluated its performance in a wide range of data sharing scenarios. The results of our evaluation show that our approach is scalable (summing up 10,000 variables between 20 parties takes less than 300 s) and that the number of participants is the essential factor. CONCLUSIONS We have developed an easy-to-use "no-code solution" for performing secure joint calculations on biomedical data using SMPC protocols, which is suitable for use by scientists without IT expertise and which has no special infrastructure requirements. We believe that innovative approaches to data sharing with SMPC are needed to foster the translation of complex protocols into practice.
Collapse
Affiliation(s)
- Felix Nikolaus Wirth
- grid.484013.a0000 0004 6879 971XBerlin Institute of Health at Charité – Universitätsmedizin Berlin, Medical Informatics Group, Charitéplatz 1, 10117 Berlin, Germany
| | - Tobias Kussel
- grid.6546.10000 0001 0940 1669Computational Biology and Simulation, TU Darmstadt, Darmstadt, Germany
| | - Armin Müller
- grid.484013.a0000 0004 6879 971XBerlin Institute of Health at Charité – Universitätsmedizin Berlin, Medical Informatics Group, Charitéplatz 1, 10117 Berlin, Germany
| | - Kay Hamacher
- grid.6546.10000 0001 0940 1669Computational Biology and Simulation, TU Darmstadt, Darmstadt, Germany
| | - Fabian Prasser
- grid.484013.a0000 0004 6879 971XBerlin Institute of Health at Charité – Universitätsmedizin Berlin, Medical Informatics Group, Charitéplatz 1, 10117 Berlin, Germany
| |
Collapse
|
8
|
Zhu W, Luo J, White AD. Federated learning of molecular properties with graph neural networks in a heterogeneous setting. PATTERNS (NEW YORK, N.Y.) 2022; 3:100521. [PMID: 35755872 PMCID: PMC9214329 DOI: 10.1016/j.patter.2022.100521] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/04/2022] [Revised: 04/15/2022] [Accepted: 05/06/2022] [Indexed: 12/04/2022]
Abstract
Chemistry research has both high material and computational costs to conduct experiments. Intuitions are interested in differing classes of molecules, creating heterogeneous data that cannot be easily joined by conventional methods. This work introduces federated heterogeneous molecular learning. Federated learning allows end users to build a global model collaboratively while keeping their training data isolated. We first simulate a heterogeneous federated-learning benchmark (FedChem) by jointly performing scaffold splitting and latent Dirichlet allocation on existing datasets. Our results on FedChem show that significant learning challenges arise when working with heterogeneous molecules across clients. We then propose a method to alleviate the problem: Federated Learning by Instance reweighTing (FLIT(+)). FLIT(+) can align local training across clients. Experiments conducted on FedChem validate the advantages of this method. This work should enable a new type of collaboration for improving artificial intelligence (AI) in chemistry that mitigates concerns about sharing valuable chemical data.
Collapse
Affiliation(s)
- Wei Zhu
- Department of Computer Science, University of Rochester, Rochester, NY, USA
| | - Jiebo Luo
- Department of Computer Science, University of Rochester, Rochester, NY, USA
| | - Andrew D. White
- Department of Chemical Engineering, University of Rochester, Rochester, NY, USA
| |
Collapse
|
9
|
Desmet C, Cook DJ. Recent Developments in Privacy-Preserving Mining of Clinical Data. ACM/IMS TRANSACTIONS ON DATA SCIENCE 2021; 2:28. [PMID: 35018368 PMCID: PMC8746818 DOI: 10.1145/3447774] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/01/2020] [Accepted: 01/01/2021] [Indexed: 06/14/2023]
Abstract
With the dramatic increases in both the capability to collect personal data and the capability to analyze large amounts of data, increasingly sophisticated and personal insights are being drawn. These insights are valuable for clinical applications but also open up possibilities for identification and abuse of personal information. In this paper, we survey recent research on classical methods of privacy-preserving data mining. Looking at dominant techniques and recent innovations to them, we examine the applicability of these methods to the privacy-preserving analysis of clinical data. We also discuss promising directions for future research in this area.
Collapse
|
10
|
Xiong Z, Cheng Z, Lin X, Xu C, Liu X, Wang D, Luo X, Zhang Y, Jiang H, Qiao N, Zheng M. Facing small and biased data dilemma in drug discovery with enhanced federated learning approaches. SCIENCE CHINA-LIFE SCIENCES 2021; 65:529-539. [PMID: 34319533 DOI: 10.1007/s11427-021-1946-0] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/08/2021] [Accepted: 05/16/2021] [Indexed: 12/11/2022]
Abstract
Artificial intelligence (AI) models usually require large amounts of high-quality training data, which is in striking contrast to the situation of small and biased data faced by current drug discovery pipelines. The concept of federated learning has been proposed to utilize distributed data from different sources without leaking sensitive information of the data. This emerging decentralized machine learning paradigm is expected to dramatically improve the success rate of AI-powered drug discovery. Here, we simulated the federated learning process with different property and activity datasets from different sources, among which overlapping molecules with high or low biases exist in the recorded values. Beyond the benefit of gaining more data, we also demonstrated that federated training has a regularization effect superior to centralized training on the pooled datasets with high biases. Moreover, different network architectures for clients and aggregation algorithms for coordinators have been compared on the performance of federated learning, where personalized federated learning shows promising results. Our work demonstrates the applicability of federated learning in predicting drug-related properties and highlights its promising role in addressing the small and biased data dilemma in drug discovery.
Collapse
Affiliation(s)
- Zhaoping Xiong
- Shanghai Institute for Advanced Immunochemical Studies, and School of Life Science and Technology, Shanghai Tech University, Shanghai, 200031, China.,Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai, 201203, China.,University of Chinese Academy of Sciences, Beijing, 100049, China
| | - Ziqiang Cheng
- Shanghai Institute for Advanced Immunochemical Studies, and School of Life Science and Technology, Shanghai Tech University, Shanghai, 200031, China.,School of Information Science and Technology, University of Science and Technology of China, Hefei, 230000, China
| | - Xinyuan Lin
- Laboratory of Health Intelligence, Huawei Technologies Co., Ltd, Shenzhen, 518100, China
| | - Chi Xu
- Laboratory of Health Intelligence, Huawei Technologies Co., Ltd, Shenzhen, 518100, China
| | - Xiaohong Liu
- Shanghai Institute for Advanced Immunochemical Studies, and School of Life Science and Technology, Shanghai Tech University, Shanghai, 200031, China.,Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai, 201203, China
| | - Dingyan Wang
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai, 201203, China.,University of Chinese Academy of Sciences, Beijing, 100049, China
| | - Xiaomin Luo
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai, 201203, China
| | - Yong Zhang
- Laboratory of Health Intelligence, Huawei Technologies Co., Ltd, Shenzhen, 518100, China
| | - Hualiang Jiang
- Shanghai Institute for Advanced Immunochemical Studies, and School of Life Science and Technology, Shanghai Tech University, Shanghai, 200031, China. .,Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai, 201203, China.
| | - Nan Qiao
- Laboratory of Health Intelligence, Huawei Technologies Co., Ltd, Shenzhen, 518100, China.
| | - Mingyue Zheng
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai, 201203, China.
| |
Collapse
|
11
|
Chen R, Zhang Y, Dou Z, Chen F, Xie K, Wang S. Data Sharing and Privacy in Pharmaceutical Studies. Curr Pharm Des 2021; 27:911-918. [PMID: 33438533 DOI: 10.2174/1381612827999210112204732] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2020] [Accepted: 09/30/2020] [Indexed: 11/22/2022]
Abstract
Adverse drug events have been a long-standing concern for the wide-ranging harms to public health, and the substantial disease burden. The key to diminish or eliminate the impacts is to build a comprehensive pharmacovigilance system. Application of the "big data" approach has been proved to assist the detection of adverse drug events by involving previously unavailable data sources and promoting health information exchange. Even though challenges and potential risks still remain. The lack of effective privacy-preserving measures in the flow of medical data is the most important Accepted: one, where urgent actions are required to prevent the threats and facilitate the construction of pharmacovigilance systems. Several privacy protection methods are reviewed in this article, which may be helpful to break the barrier.
Collapse
Affiliation(s)
- Rufan Chen
- Department of Bioinformatics, Hangzhou Nuowei Information Technology Co., Ltd, Hangzhou, China
| | - Yi Zhang
- Department of Cardiology, Xinhua Hospital, School of Medicine, Shanghai Jiaotong University, Shanghai, China
| | - Zuochao Dou
- Department of Bioinformatics, Hangzhou Nuowei Information Technology Co., Ltd, Hangzhou, China
| | - Feng Chen
- Department of Bioinformatics, Hangzhou Nuowei Information Technology Co., Ltd, Hangzhou, China
| | - Kang Xie
- Key Lab of Information Network Security of Ministry of Public Security, the Third Research Institute of Ministry of Public Security, Shanghai, China
| | - Shuang Wang
- Department of Bioinformatics, Hangzhou Nuowei Information Technology Co., Ltd, Hangzhou, China
| |
Collapse
|
12
|
Chen S, Xue D, Chuai G, Yang Q, Liu Q. FL-QSAR: a federated learning-based QSAR prototype for collaborative drug discovery. Bioinformatics 2021; 36:5492-5498. [PMID: 33289524 DOI: 10.1093/bioinformatics/btaa1006] [Citation(s) in RCA: 22] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2020] [Revised: 10/25/2020] [Accepted: 11/19/2020] [Indexed: 11/12/2022] Open
Abstract
MOTIVATION Quantitative structure-activity relationship (QSAR) analysis is commonly used in drug discovery. Collaborations among pharmaceutical institutions can lead to a better performance in QSAR prediction, however, intellectual property and related financial interests remain substantially hindering inter-institutional collaborations in QSAR modeling for drug discovery. RESULTS For the first time, we verified the feasibility of applying the horizontal federated learning (HFL), which is a recently developed collaborative and privacy-preserving learning framework to perform QSAR analysis. A prototype platform of federated-learning-based QSAR modeling for collaborative drug discovery, i.e. FL-QSAR, is presented accordingly. We first compared the HFL framework with a classic privacy-preserving computation framework, i.e. secure multiparty computation to indicate its difference from various perspective. Then we compared FL-QSAR with the public collaboration in terms of QSAR modeling. Our extensive experiments demonstrated that (i) collaboration by FL-QSAR outperforms a single client using only its private data, and (ii) collaboration by FL-QSAR achieves almost the same performance as that of collaboration via cleartext learning algorithms using all shared information. Taking together, our results indicate that FL-QSAR under the HFL framework provides an efficient solution to break the barriers between pharmaceutical institutions in QSAR modeling, therefore promote the development of collaborative and privacy-preserving drug discovery with extendable ability to other privacy-related biomedical areas. AVAILABILITY AND IMPLEMENTATION The source codes of FL-QSAR are available on the GitHub: https://github.com/bm2-lab/FL-QSAR. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Shaoqi Chen
- Translational Medical Center for Stem Cell Therapy and Institute for Regenerative Medicine, Shanghai East Hospital, Bioinformatics Department, School of Life Sciences and Technology, Tongji University, Shanghai 200092, China
| | - Dongyu Xue
- Translational Medical Center for Stem Cell Therapy and Institute for Regenerative Medicine, Shanghai East Hospital, Bioinformatics Department, School of Life Sciences and Technology, Tongji University, Shanghai 200092, China
| | - Guohui Chuai
- Translational Medical Center for Stem Cell Therapy and Institute for Regenerative Medicine, Shanghai East Hospital, Bioinformatics Department, School of Life Sciences and Technology, Tongji University, Shanghai 200092, China
| | - Qiang Yang
- Department of AI, WeBank, Shenzhen 518055, China.,Department of Computer Science and Engineering, Hong Kong University of Science and Technology, Clear Water Bay, Hong Kong, China
| | - Qi Liu
- Translational Medical Center for Stem Cell Therapy and Institute for Regenerative Medicine, Shanghai East Hospital, Bioinformatics Department, School of Life Sciences and Technology, Tongji University, Shanghai 200092, China
| |
Collapse
|
13
|
Putative cell type discovery from single-cell gene expression data. Nat Methods 2020; 17:621-628. [DOI: 10.1038/s41592-020-0825-9] [Citation(s) in RCA: 46] [Impact Index Per Article: 11.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2019] [Accepted: 04/02/2020] [Indexed: 12/15/2022]
|