1
|
Fu J, Liu X, Deng R, Jiang X, Cai W, Fu H, Shao X. Accurate Prediction of CRISPR/Cas13a Guide Activity Using Feature Selection and Deep Learning. J Chem Inf Model 2025; 65:3380-3387. [PMID: 40091632 DOI: 10.1021/acs.jcim.4c02438] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/19/2025]
Abstract
CRISPR/Cas13a serves as a key tool for nucleic acid tests; therefore, accurate prediction of its activity is essential for creating robust and sensitive diagnosis. In this study, we create a dual-branch neural network model that achieves high prediction accuracy and classification performance across two independent CRISPR/Cas13a data sets, outperforming previously published models relying solely on sequence features. The model integrates direct sequence encoding with descriptive features and yields 99 key descriptive features out of 1553, extracted through statistical analysis, which critically influence guide-target interactions and Cas13a guide activity. By employing Shapley Additive Explanations and Integrated Gradients for feature importance analysis, we show that sequence composition, mismatch type and frequency, and the protospacer flanking site region are primary features. These findings underscore the importance of using descriptive features as complementary inputs to deep learning-based encoding and provide valuable insights into the mechanisms underlying guide-target interaction. All in all, this study not only introduces a reliable and efficient model for Cas13a guide activity prediction but also offers a foundation for future rational design efforts.
Collapse
Affiliation(s)
- Jiashun Fu
- Research Center for Analytical Sciences, College of Chemistry, Nankai University, Tianjin 300071, China
- Haihe Laboratory of Sustainable Chemical Transformations, Tianjin 300192, China
| | - Xuyang Liu
- Research Center for Analytical Sciences, College of Chemistry, Nankai University, Tianjin 300071, China
- Haihe Laboratory of Sustainable Chemical Transformations, Tianjin 300192, China
| | - Ruijie Deng
- College of Biomass Science and Engineering, Healthy Food Evaluation Research Center, Sichuan University, Chengdu 610065, China
| | - Xiue Jiang
- Research Center for Analytical Sciences, College of Chemistry, Nankai University, Tianjin 300071, China
| | - Wensheng Cai
- Research Center for Analytical Sciences, College of Chemistry, Nankai University, Tianjin 300071, China
- Haihe Laboratory of Sustainable Chemical Transformations, Tianjin 300192, China
| | - Haohao Fu
- Research Center for Analytical Sciences, College of Chemistry, Nankai University, Tianjin 300071, China
- Haihe Laboratory of Sustainable Chemical Transformations, Tianjin 300192, China
| | - Xueguang Shao
- Research Center for Analytical Sciences, College of Chemistry, Nankai University, Tianjin 300071, China
- Haihe Laboratory of Sustainable Chemical Transformations, Tianjin 300192, China
| |
Collapse
|
2
|
Lin Y, Gao B, Tang J, Zhang Q, Qian H, Wu H. Deep Bayesian active learning using in-memory computing hardware. NATURE COMPUTATIONAL SCIENCE 2025; 5:27-36. [PMID: 39715830 PMCID: PMC11774754 DOI: 10.1038/s43588-024-00744-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/03/2024] [Accepted: 11/19/2024] [Indexed: 12/25/2024]
Abstract
Labeling data is a time-consuming, labor-intensive and costly procedure for many artificial intelligence tasks. Deep Bayesian active learning (DBAL) boosts labeling efficiency exponentially, substantially reducing costs. However, DBAL demands high-bandwidth data transfer and probabilistic computing, posing great challenges for conventional deterministic hardware. Here we propose a memristor stochastic gradient Langevin dynamics in situ learning method that uses the stochastic of memristor modulation to learn efficiency, enabling DBAL within the computation-in-memory (CIM) framework. To prove the feasibility and effectiveness of the proposed method, we implemented in-memory DBAL on a memristor-based stochastic CIM system and successfully demonstrated a robot's skill learning task. The inherent stochastic characteristics of memristors allow a four-layer memristor Bayesian deep neural network to efficiently identify and learn from uncertain samples. Compared with cutting-edge conventional complementary metal-oxide-semiconductor-based hardware implementation, the stochastic CIM system achieves a remarkable 44% boost in speed and could conserve 153 times more energy.
Collapse
Affiliation(s)
- Yudeng Lin
- School of Integrated Circuits, Beijing National Research Center for Information Science and Technology, Tsinghua University, Beijing, China
| | - Bin Gao
- School of Integrated Circuits, Beijing National Research Center for Information Science and Technology, Tsinghua University, Beijing, China.
| | - Jianshi Tang
- School of Integrated Circuits, Beijing National Research Center for Information Science and Technology, Tsinghua University, Beijing, China
| | - Qingtian Zhang
- School of Integrated Circuits, Beijing National Research Center for Information Science and Technology, Tsinghua University, Beijing, China
| | - He Qian
- School of Integrated Circuits, Beijing National Research Center for Information Science and Technology, Tsinghua University, Beijing, China
| | - Huaqiang Wu
- School of Integrated Circuits, Beijing National Research Center for Information Science and Technology, Tsinghua University, Beijing, China.
| |
Collapse
|
3
|
Fralish Z, Reker D. Finding the most potent compounds using active learning on molecular pairs. Beilstein J Org Chem 2024; 20:2152-2162. [PMID: 39224230 PMCID: PMC11368049 DOI: 10.3762/bjoc.20.185] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2024] [Accepted: 08/02/2024] [Indexed: 09/04/2024] Open
Abstract
Active learning allows algorithms to steer iterative experimentation to accelerate and de-risk molecular optimizations, but actively trained models might still exhibit poor performance during early project stages where the training data is limited and model exploitation might lead to analog identification with limited scaffold diversity. Here, we present ActiveDelta, an adaptive approach that leverages paired molecular representations to predict improvements from the current best training compound to prioritize further data acquisition. We apply the ActiveDelta concept to both graph-based deep (Chemprop) and tree-based (XGBoost) models during exploitative active learning for 99 Ki benchmarking datasets. We show that both ActiveDelta implementations excel at identifying more potent inhibitors compared to the standard exploitative active learning implementations of Chemprop, XGBoost, and Random Forest. The ActiveDelta approach is also able to identify more chemically diverse inhibitors in terms of their Murcko scaffolds. Finally, deep models such as Chemprop trained on data selected through ActiveDelta approaches can more accurately identify inhibitors in test data created through simulated time-splits. Overall, this study highlights the large potential for molecular pairing approaches to further improve popular active learning strategies in low data regimes by enabling faster and more accurate identification of more diverse molecular hits against critical drug targets.
Collapse
Affiliation(s)
- Zachary Fralish
- Department of Biomedical Engineering, Duke University, Durham, NC 27708, USA
| | - Daniel Reker
- Department of Biomedical Engineering, Duke University, Durham, NC 27708, USA
| |
Collapse
|
4
|
Wang L, Zhou Z, Yang X, Shi S, Zeng X, Cao D. The present state and challenges of active learning in drug discovery. Drug Discov Today 2024; 29:103985. [PMID: 38642700 DOI: 10.1016/j.drudis.2024.103985] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2024] [Revised: 04/08/2024] [Accepted: 04/15/2024] [Indexed: 04/22/2024]
Abstract
Active learning (AL) is an iterative feedback process that efficiently identifies valuable data within vast chemical space, even with limited labeled data. This characteristic renders it a valuable approach to tackle the ongoing challenges faced in drug discovery, such as the ever-expanding explore space and the limitations of labeled data. Consequently, AL is increasingly gaining prominence in the field of drug development. In this paper, we comprehensively review the application of AL at all stages of drug discovery, including compounds-target interaction prediction, virtual screening, molecular generation and optimization, as well as molecular properties prediction. Additionally, we discuss the challenges and prospects associated with the current applications of AL in drug discovery.
Collapse
Affiliation(s)
- Lei Wang
- Xiangya School of Pharmaceutical Sciences, Central South University, Changsha 410013, Hunan, China
| | - Zhenran Zhou
- Department of Computer Science, Hunan University, Changsha 410082, Hunan, China
| | - Xixi Yang
- Department of Computer Science, Hunan University, Changsha 410082, Hunan, China
| | - Shaohua Shi
- Institute for Advancing Translational Medicine in Bone and Joint Diseases, School of Chinese Medicine, Hong Kong Baptist University, Hong Kong SAR, China
| | - Xiangxiang Zeng
- Department of Computer Science, Hunan University, Changsha 410082, Hunan, China.
| | - Dongsheng Cao
- Xiangya School of Pharmaceutical Sciences, Central South University, Changsha 410013, Hunan, China.
| |
Collapse
|
5
|
Vasanthakumari P, Zhu Y, Brettin T, Partin A, Shukla M, Xia F, Narykov O, Weil MR, Stevens RL. A Comprehensive Investigation of Active Learning Strategies for Conducting Anti-Cancer Drug Screening. Cancers (Basel) 2024; 16:530. [PMID: 38339281 PMCID: PMC10854925 DOI: 10.3390/cancers16030530] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2023] [Revised: 01/12/2024] [Accepted: 01/22/2024] [Indexed: 02/12/2024] Open
Abstract
It is well-known that cancers of the same histology type can respond differently to a treatment. Thus, computational drug response prediction is of paramount importance for both preclinical drug screening studies and clinical treatment design. To build drug response prediction models, treatment response data need to be generated through screening experiments and used as input to train the prediction models. In this study, we investigate various active learning strategies of selecting experiments to generate response data for the purposes of (1) improving the performance of drug response prediction models built on the data and (2) identifying effective treatments. Here, we focus on constructing drug-specific response prediction models for cancer cell lines. Various approaches have been designed and applied to select cell lines for screening, including a random, greedy, uncertainty, diversity, combination of greedy and uncertainty, sampling-based hybrid, and iteration-based hybrid approach. All of these approaches are evaluated and compared using two criteria: (1) the number of identified hits that are selected experiments validated to be responsive, and (2) the performance of the response prediction model trained on the data of selected experiments. The analysis was conducted for 57 drugs and the results show a significant improvement on identifying hits using active learning approaches compared with the random and greedy sampling method. Active learning approaches also show an improvement on response prediction performance for some of the drugs and analysis runs compared with the greedy sampling method.
Collapse
Affiliation(s)
- Priyanka Vasanthakumari
- Division of Data Science and Learning, Argonne National Laboratory, Lemont, IL 60439, USA; (Y.Z.); (A.P.); (M.S.); (F.X.); (O.N.)
| | - Yitan Zhu
- Division of Data Science and Learning, Argonne National Laboratory, Lemont, IL 60439, USA; (Y.Z.); (A.P.); (M.S.); (F.X.); (O.N.)
| | - Thomas Brettin
- Computing, Environment and Life Sciences, Argonne National Laboratory, Lemont, IL 60439, USA; (T.B.); (R.L.S.)
| | - Alexander Partin
- Division of Data Science and Learning, Argonne National Laboratory, Lemont, IL 60439, USA; (Y.Z.); (A.P.); (M.S.); (F.X.); (O.N.)
| | - Maulik Shukla
- Division of Data Science and Learning, Argonne National Laboratory, Lemont, IL 60439, USA; (Y.Z.); (A.P.); (M.S.); (F.X.); (O.N.)
| | - Fangfang Xia
- Division of Data Science and Learning, Argonne National Laboratory, Lemont, IL 60439, USA; (Y.Z.); (A.P.); (M.S.); (F.X.); (O.N.)
| | - Oleksandr Narykov
- Division of Data Science and Learning, Argonne National Laboratory, Lemont, IL 60439, USA; (Y.Z.); (A.P.); (M.S.); (F.X.); (O.N.)
| | - Michael Ryan Weil
- Cancer Research Technology Program, Cancer Data Science Initiatives, Frederick National Laboratory for Cancer Research, Frederick, MD 21701, USA;
| | - Rick L. Stevens
- Computing, Environment and Life Sciences, Argonne National Laboratory, Lemont, IL 60439, USA; (T.B.); (R.L.S.)
- Department of Computer Science, The University of Chicago, Chicago, IL 60637, USA
| |
Collapse
|
6
|
Lu AX, Moses AM. Using Dimensionality Reduction to Visualize Phenotypic Changes in High-Throughput Microscopy. Methods Mol Biol 2024; 2800:217-229. [PMID: 38709487 DOI: 10.1007/978-1-0716-3834-7_15] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/07/2024]
Abstract
High-throughput microscopy has enabled screening of cell phenotypes at unprecedented scale. Systematic identification of cell phenotype changes (such as cell morphology and protein localization changes) is a major analysis goal. Because cell phenotypes are high-dimensional, unbiased approaches to detect and visualize the changes in phenotypes are still needed. Here, we suggest that changes in cellular phenotype can be visualized in reduced dimensionality representations of the image feature space. We describe a freely available analysis pipeline to visualize changes in protein localization in feature spaces obtained from deep learning. As an example, we use the pipeline to identify changes in subcellular localization after the yeast GFP collection was treated with hydroxyurea.
Collapse
Affiliation(s)
- Alex X Lu
- Microsoft Research New England, Cambridge, MA, USA.
| | - Alan M Moses
- Department of Cell & Systems Biology, University of Toronto, Toronto, Canada
| |
Collapse
|
7
|
Hashizume T, Ying BW. Challenges in developing cell culture media using machine learning. Biotechnol Adv 2024; 70:108293. [PMID: 37984683 DOI: 10.1016/j.biotechadv.2023.108293] [Citation(s) in RCA: 9] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2023] [Revised: 10/17/2023] [Accepted: 11/14/2023] [Indexed: 11/22/2023]
Abstract
Microbial and mammalian cells are widely used in the food, pharmaceutical, and medical industries. Developing or optimizing culture media is essential to improve cell culture performance as a critical technology in cell culture engineering. Methodologies for media optimization have been developed to a great extent, such as the approaches of one-factor-at-a-time (OFAT) and response surface methodology (RSM). The present review introduces the emerging machine learning (ML) technology in cell culture engineering by combining high-throughput experimental technologies to develop highly efficient and effective culture media. The commonly used ML algorithms and the successful applications of employing ML in medium optimization are summarized. This review highlights the benefits of ML-assisted medium development and guides the selection of the media optimization method appropriate for various cell culture purposes.
Collapse
Affiliation(s)
- Takamasa Hashizume
- School of Life and Environmental Sciences, University of Tsukuba, 1-1-1 Tennodai, Tsukuba, 305-8572 Ibaraki, Japan
| | - Bei-Wen Ying
- School of Life and Environmental Sciences, University of Tsukuba, 1-1-1 Tennodai, Tsukuba, 305-8572 Ibaraki, Japan.
| |
Collapse
|
8
|
Hashizume T, Ozawa Y, Ying BW. Employing active learning in the optimization of culture medium for mammalian cells. NPJ Syst Biol Appl 2023; 9:20. [PMID: 37253825 DOI: 10.1038/s41540-023-00284-7] [Citation(s) in RCA: 14] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2023] [Accepted: 05/18/2023] [Indexed: 06/01/2023] Open
Abstract
Medium optimization is a crucial step during cell culture for biopharmaceutics and regenerative medicine; however, this step remains challenging, as both media and cells are highly complex systems. Here, we addressed this issue by employing active learning. Specifically, we introduced machine learning to cell culture experiments to optimize culture medium. The cell line HeLa-S3 and the gradient-boosting decision tree algorithm were used to find optimized media as pilot studies. To acquire the training data, cell culture was performed in a large variety of medium combinations. The cellular NAD(P)H abundance, represented as A450, was used to indicate the goodness of culture media. In active learning, regular and time-saving modes were developed using culture data at 168 h and 96 h, respectively. Both modes successfully fine-tuned 29 components to generate a medium for improved cell culture. Intriguingly, the two modes provided different predictions for the concentrations of vitamins and amino acids, and a significant decrease was commonly predicted for fetal bovine serum (FBS) compared to the commercial medium. In addition, active learning-assisted medium optimization significantly increased the cellular concentration of NAD(P)H, an active chemical with a constant abundance in living cells. Our study demonstrated the efficiency and practicality of active learning for medium optimization and provided valuable information for employing machine learning technology in cell biology experiments.
Collapse
Affiliation(s)
- Takamasa Hashizume
- School of Life and Environmental Sciences, University of Tsukuba, 1-1-1 Tennodai, Tsukuba, 305-8572, Ibaraki, Japan
| | - Yuki Ozawa
- School of Life and Environmental Sciences, University of Tsukuba, 1-1-1 Tennodai, Tsukuba, 305-8572, Ibaraki, Japan
| | - Bei-Wen Ying
- School of Life and Environmental Sciences, University of Tsukuba, 1-1-1 Tennodai, Tsukuba, 305-8572, Ibaraki, Japan.
| |
Collapse
|
9
|
Shields MD, Gurley K, Catarelli R, Chauhan M, Ojeda-Tuz M, Masters FJ. Active learning applied to automated physical systems increases the rate of discovery. Sci Rep 2023; 13:8402. [PMID: 37225752 DOI: 10.1038/s41598-023-35257-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2022] [Accepted: 05/15/2023] [Indexed: 05/26/2023] Open
Abstract
Active machine learning is widely used in computational studies where repeated numerical simulations can be conducted on high performance computers without human intervention. But translation of these active learning methods to physical systems has proven more difficult and the accelerated pace of discoveries aided by these methods remains as yet unrealized. Through the presentation of a general active learning framework and its application to large-scale boundary layer wind tunnel experiments, we demonstrate that the active learning framework used so successfully in computational studies is directly applicable to the investigation of physical experimental systems and the corresponding improvements in the rate of discovery can be transformative. We specifically show that, for our wind tunnel experiments, we are able to achieve in approximately 300 experiments a learning objective that would be impossible using traditional methods.
Collapse
Affiliation(s)
- Michael D Shields
- Department of Civil and Systems Engineering, Johns Hopkins University, Baltimore, MD, 21212, USA.
| | - Kurtis Gurley
- Department of Civil and Coastal Engineering, University of Florida, Gainesville, FL, 32611, USA
| | - Ryan Catarelli
- Department of Civil and Coastal Engineering, University of Florida, Gainesville, FL, 32611, USA
| | - Mohit Chauhan
- Department of Civil and Systems Engineering, Johns Hopkins University, Baltimore, MD, 21212, USA
| | - Mariel Ojeda-Tuz
- Department of Civil and Coastal Engineering, University of Florida, Gainesville, FL, 32611, USA
| | - Forrest J Masters
- Department of Civil and Coastal Engineering, University of Florida, Gainesville, FL, 32611, USA
| |
Collapse
|
10
|
Pandi A, Diehl C, Yazdizadeh Kharrazi A, Scholz SA, Bobkova E, Faure L, Nattermann M, Adam D, Chapin N, Foroughijabbari Y, Moritz C, Paczia N, Cortina NS, Faulon JL, Erb TJ. A versatile active learning workflow for optimization of genetic and metabolic networks. Nat Commun 2022; 13:3876. [PMID: 35790733 PMCID: PMC9256728 DOI: 10.1038/s41467-022-31245-z] [Citation(s) in RCA: 46] [Impact Index Per Article: 15.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/12/2022] [Accepted: 06/10/2022] [Indexed: 11/13/2022] Open
Abstract
Optimization of biological networks is often limited by wet lab labor and cost, and the lack of convenient computational tools. Here, we describe METIS, a versatile active machine learning workflow with a simple online interface for the data-driven optimization of biological targets with minimal experiments. We demonstrate our workflow for various applications, including cell-free transcription and translation, genetic circuits, and a 27-variable synthetic CO2-fixation cycle (CETCH cycle), improving these systems between one and two orders of magnitude. For the CETCH cycle, we explore 1025 conditions with only 1,000 experiments to yield the most efficient CO2-fixation cascade described to date. Beyond optimization, our workflow also quantifies the relative importance of individual factors to the performance of a system identifying unknown interactions and bottlenecks. Overall, our workflow opens the way for convenient optimization and prototyping of genetic and metabolic networks with customizable adjustments according to user experience, experimental setup, and laboratory facilities. Optimization of biological networks is often limited by wet lab labor and cost, and the lack of convenient computational tools. Here, aimed at democratization and standardization, the authors describe METIS, a modular and versatile active machine learning workflow with a simple online interface for the optimization of biological target functions with minimal experimental datasets.
Collapse
Affiliation(s)
- Amir Pandi
- Department of Biochemistry & Synthetic Metabolism, Max Planck Institute for Terrestrial Microbiology, Marburg, Germany.
| | - Christoph Diehl
- Department of Biochemistry & Synthetic Metabolism, Max Planck Institute for Terrestrial Microbiology, Marburg, Germany
| | | | - Scott A Scholz
- Department of Biochemistry & Synthetic Metabolism, Max Planck Institute for Terrestrial Microbiology, Marburg, Germany
| | - Elizaveta Bobkova
- Department of Biochemistry & Synthetic Metabolism, Max Planck Institute for Terrestrial Microbiology, Marburg, Germany
| | - Léon Faure
- Micalis Institute, INRAE, AgroParisTech, University of Paris-Saclay, Jouy-en-Josas, France
| | - Maren Nattermann
- Department of Biochemistry & Synthetic Metabolism, Max Planck Institute for Terrestrial Microbiology, Marburg, Germany
| | - David Adam
- Department of Biochemistry & Synthetic Metabolism, Max Planck Institute for Terrestrial Microbiology, Marburg, Germany
| | - Nils Chapin
- Department of Biochemistry & Synthetic Metabolism, Max Planck Institute for Terrestrial Microbiology, Marburg, Germany
| | - Yeganeh Foroughijabbari
- Department of Biochemistry & Synthetic Metabolism, Max Planck Institute for Terrestrial Microbiology, Marburg, Germany
| | - Charles Moritz
- Department of Biochemistry & Synthetic Metabolism, Max Planck Institute for Terrestrial Microbiology, Marburg, Germany
| | - Nicole Paczia
- Core Facility for Metabolomics and Small Molecule Mass Spectrometry, Max Planck Institute for Terrestrial Microbiology, Marburg, Germany
| | - Niña Socorro Cortina
- Department of Biochemistry & Synthetic Metabolism, Max Planck Institute for Terrestrial Microbiology, Marburg, Germany.,LiVeritas Biosciences, Inc., 432N Canal St.; Ste. 20, South San Francisco, CA, 94080, USA
| | - Jean-Loup Faulon
- Micalis Institute, INRAE, AgroParisTech, University of Paris-Saclay, Jouy-en-Josas, France.,Genomique Metabolique, Genoscope, Institut Francois Jacob, CEA, CNRS, Univ Evry, University of Paris-Saclay, Evry, France.,Manchester Institute of Biotechnology, SYNBIOCHEM center, School of Chemistry, The University of Manchester, Manchester, UK
| | - Tobias J Erb
- Department of Biochemistry & Synthetic Metabolism, Max Planck Institute for Terrestrial Microbiology, Marburg, Germany. .,SYNMIKRO Center of Synthetic Microbiology, Marburg, Germany.
| |
Collapse
|
11
|
Comparison of the Meta-Active Machine Learning Model Applied to Biological Data-Driven Experiments with Other Models. JOURNAL OF HEALTHCARE ENGINEERING 2021; 2021:8014850. [PMID: 34938423 PMCID: PMC8687783 DOI: 10.1155/2021/8014850] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/09/2021] [Revised: 10/11/2021] [Accepted: 11/14/2021] [Indexed: 11/30/2022]
Abstract
Currently, many methods that could estimate the effects of conditions on a given biological target require either strong modelling assumptions or separate screens. Traditionally, many conditions and targets, without doing all possible experiments, could be achieved by driven experimentation or several mathematical methods, especially conversational machine learning methods. However, these methods still could not avoid and replace manual labels completely. This paper presented a meta-active machine learning method to resolve this problem. This project has used nine traditional machine learning methods to compare their accuracy and running time. In addition, this paper analyzes the meta-active machine learning method (MAML) compared with a classical screening method and progressive experiments. The obtained results show that applying this method yields the best experimental results on the current dataset.
Collapse
|
12
|
Chen J, Hou J, Wong KC. Categorical Matrix Completion With Active Learning for High-Throughput Screening. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2021; 18:2261-2270. [PMID: 32203025 DOI: 10.1109/tcbb.2020.2982142] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
The recent advances in wet-lab automation enable high-throughput experiments to be conducted seamlessly. In particular, the exhaustive enumeration of all possible conditions is always involved in high-throughput screening. Nonetheless, such a screening strategy is hardly believed to be optimal and cost-effective. By incorporating artificial intelligence, we design an open-source model based on categorical matrix completion and active machine learning to guide high throughput screening experiments. Specifically, we narrow our scope to the high-throughput screening for chemical compound effects on diverse protein sub-cellular locations. In the proposed model, we believe that exploration is more important than the exploitation in the long-run of high-throughput screening experiment, Therefore, we design several innovations to circumvent the existing limitations. In particular, categorical matrix completion is designed to accurately impute the missing experiments while margin sampling is also implemented for uncertainty estimation. The model is systematically tested on both simulated and real data. The simulation results reflect that our model can be robust to diverse scenarios, while the real data results demonstrate the wet-lab applicability of our model for high-throughput screening experiments. Lastly, we attribute the model success to its exploration ability by revealing the related matrix ranks and distinct experiment coverage comparisons.
Collapse
|
13
|
Sun H, Murphy RF. Evaluation of Categorical Matrix Completion Algorithms: Towards Improved Active Learning for Drug Discovery. Bioinformatics 2021; 37:3538-3545. [PMID: 33983377 PMCID: PMC8545350 DOI: 10.1093/bioinformatics/btab322] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2020] [Revised: 04/05/2021] [Accepted: 04/29/2021] [Indexed: 11/14/2022] Open
Abstract
MOTIVATION High throughput and high content screening are extensively used to determine the effect of small molecule compounds and other potential therapeutics upon particular targets as part of the early drug development process. However, screening is typically used to find compounds that have a desired effect but not to identify potential undesirable side effects. This is because the size of the search space precludes measuring the potential effect of all compounds on all targets. Active machine learning has been proposed as a solution to this problem. RESULTS In this article, we describe an improved imputation method, Impute By Committee, for completion of matrices containing categorical values. We compare this method to existing approaches in the context of modeling the effects of many compounds on many targets using latent similarities between compounds and conditions. We also compare these methods for the task of driving active learning in well-characterized settings for synthetic and real datasets. Our new approach performed the best overall both in the accuracy of matrix completion itself and in the number of experiments needed to train an accurate predictive model compared to random selection of experiments. We further improved upon the performance of our new method by developing an adaptive switching strategy for active learning that iteratively chooses between different matrix completion methods. AVAILABILITY A Reproducible Research Archive containing all data and code will be made available upon acceptance at http://murphylab.cbd.cmu.edu/software. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Huangqingbo Sun
- Computational Biology Department, Carnegie Mellon University, Pittsburgh, 15213, USA
| | - Robert F Murphy
- Computational Biology Department, Carnegie Mellon University, Pittsburgh, 15213, USA.,Department of Biological Sciences, Carnegie Mellon University, Pittsburgh, 15213, USA.,Department of Biomedical Engineering, Carnegie Mellon University, Pittsburgh, 15213, USA.,Machine Learning Department, Carnegie Mellon University, Pittsburgh, 15213, USA
| |
Collapse
|
14
|
Brown J. Practical Chemogenomic Modeling and Molecule Discovery Strategies Unveiled by Active Learning. SYSTEMS MEDICINE 2021. [DOI: 10.1016/b978-0-12-801238-3.11533-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022] Open
|
15
|
Wang X, Rai N, Merchel Piovesan Pereira B, Eetemadi A, Tagkopoulos I. Accelerated knowledge discovery from omics data by optimal experimental design. Nat Commun 2020; 11:5026. [PMID: 33024104 PMCID: PMC7538421 DOI: 10.1038/s41467-020-18785-y] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/31/2019] [Accepted: 08/27/2020] [Indexed: 12/15/2022] Open
Abstract
How to design experiments that accelerate knowledge discovery on complex biological landscapes remains a tantalizing question. We present an optimal experimental design method (coined OPEX) to identify informative omics experiments using machine learning models for both experimental space exploration and model training. OPEX-guided exploration of Escherichia coli’s populations exposed to biocide and antibiotic combinations lead to more accurate predictive models of gene expression with 44% less data. Analysis of the proposed experiments shows that broad exploration of the experimental space followed by fine-tuning emerges as the optimal strategy. Additionally, analysis of the experimental data reveals 29 cases of cross-stress protection and 4 cases of cross-stress vulnerability. Further validation reveals the central role of chaperones, stress response proteins and transport pumps in cross-stress exposure. This work demonstrates how active learning can be used to guide omics data collection for training predictive models, making evidence-driven decisions and accelerating knowledge discovery in life sciences. How to design experiments that accelerate knowledge discovery on complex biological landscapes remains a tantalizing question. Here, the authors present OPEX, an optimal experimental design method to identify informative omics experiments for both experimental space exploration and model training.
Collapse
Affiliation(s)
- Xiaokang Wang
- Department of Biomedical Engineering, University of California, Davis, CA, 95616, USA.,Genome Center, University of California, Davis, CA, 95616, USA
| | - Navneet Rai
- Genome Center, University of California, Davis, CA, 95616, USA.,Department of Computer Science, University of California, Davis, CA, 95616, USA
| | - Beatriz Merchel Piovesan Pereira
- Genome Center, University of California, Davis, CA, 95616, USA.,Microbiology Graduate Group, University of California, Davis, CA, 95616, USA
| | - Ameen Eetemadi
- Genome Center, University of California, Davis, CA, 95616, USA.,Department of Computer Science, University of California, Davis, CA, 95616, USA
| | - Ilias Tagkopoulos
- Genome Center, University of California, Davis, CA, 95616, USA. .,Department of Computer Science, University of California, Davis, CA, 95616, USA.
| |
Collapse
|
16
|
Reker D. Practical considerations for active machine learning in drug discovery. DRUG DISCOVERY TODAY. TECHNOLOGIES 2020; 32-33:73-79. [PMID: 33386097 DOI: 10.1016/j.ddtec.2020.06.001] [Citation(s) in RCA: 35] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/15/2020] [Revised: 06/01/2020] [Accepted: 06/10/2020] [Indexed: 02/01/2023]
Abstract
Active machine learning enables the automated selection of the most valuable next experiments to improve predictive modelling and hasten active retrieval in drug discovery. Although a long established theoretical concept and introduced to drug discovery approximately 15 years ago, the deployment of active learning technology in the discovery pipelines across academia and industry remains slow. With the recent re-discovered enthusiasm for artificial intelligence as well as improved flexibility of laboratory automation, active learning is expected to surge and become a key technology for molecular optimizations. This review recapitulates key findings from previous active learning studies to highlight the challenges and opportunities of applying adaptive machine learning to drug discovery. Specifically, considerations regarding implementation, infrastructural integration, and expected benefits are discussed. By focusing on these practical aspects of active learning, this review aims at providing insights for scientists planning to implement active learning workflows in their discovery pipelines.
Collapse
Affiliation(s)
- Daniel Reker
- Koch Institute for Integrative Cancer Research and MIT-IBM Watson AI Lab, Massachusetts Institute of Technology, Cambridge, MA, USA; Division of Gastroenterology, Hepatology and Endoscopy, Department of Medicine, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA.
| |
Collapse
|
17
|
Borkowski O, Koch M, Zettor A, Pandi A, Batista AC, Soudier P, Faulon JL. Large scale active-learning-guided exploration for in vitro protein production optimization. Nat Commun 2020; 11:1872. [PMID: 32312991 PMCID: PMC7170859 DOI: 10.1038/s41467-020-15798-5] [Citation(s) in RCA: 46] [Impact Index Per Article: 9.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2020] [Accepted: 03/24/2020] [Indexed: 12/21/2022] Open
Abstract
Lysate-based cell-free systems have become a major platform to study gene expression but batch-to-batch variation makes protein production difficult to predict. Here we describe an active learning approach to explore a combinatorial space of ~4,000,000 cell-free buffer compositions, maximizing protein production and identifying critical parameters involved in cell-free productivity. We also provide a one-step-method to achieve high quality predictions for protein production using minimal experimental effort regardless of the lysate quality.
Collapse
Affiliation(s)
- Olivier Borkowski
- Génomique Métabolique, Genoscope, Institut François Jacob, CEA, CNRS, Univ Evry, Université Paris-Saclay, 91057, Evry, France
| | - Mathilde Koch
- Micalis Institute, INRAE, AgroParisTech, Université Paris-Saclay, Jouy-en-Josas, France
| | - Agnès Zettor
- Chemogenomic and Biological Screening Core Facility, Institut Pasteur, Department of Structural Biology and Chemistry, Center for Technological Resources and Research (C2RT), 25/28 rue du Dr Roux, 75724, Paris Cedex 15, France
| | - Amir Pandi
- Micalis Institute, INRAE, AgroParisTech, Université Paris-Saclay, Jouy-en-Josas, France
| | | | - Paul Soudier
- Micalis Institute, INRAE, AgroParisTech, Université Paris-Saclay, Jouy-en-Josas, France
| | - Jean-Loup Faulon
- Génomique Métabolique, Genoscope, Institut François Jacob, CEA, CNRS, Univ Evry, Université Paris-Saclay, 91057, Evry, France. .,Micalis Institute, INRAE, AgroParisTech, Université Paris-Saclay, Jouy-en-Josas, France. .,SYNBIOCHEM Center, Manchester Institute of Biotechnology, School of Chemistry, University of Manchester, Manchester, UK.
| |
Collapse
|
18
|
Eyke NS, Green WH, Jensen KF. Iterative experimental design based on active machine learning reduces the experimental burden associated with reaction screening. REACT CHEM ENG 2020. [DOI: 10.1039/d0re00232a] [Citation(s) in RCA: 31] [Impact Index Per Article: 6.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2022]
Abstract
Through iterative selection of maximally informative experiments, active learning renders exhaustive screening obsolete. Chosen experiments are used to train models that are accurate over the entire domain, thus reducing the experiment burden.
Collapse
Affiliation(s)
- Natalie S. Eyke
- Department of Chemical Engineering
- Massachusetts Institute of Technology
- Cambridge
- USA
| | - William H. Green
- Department of Chemical Engineering
- Massachusetts Institute of Technology
- Cambridge
- USA
| | - Klavs F. Jensen
- Department of Chemical Engineering
- Massachusetts Institute of Technology
- Cambridge
- USA
| |
Collapse
|
19
|
Stephenson N, Shane E, Chase J, Rowland J, Ries D, Justice N, Zhang J, Chan L, Cao R. Survey of Machine Learning Techniques in Drug Discovery. Curr Drug Metab 2019; 20:185-193. [DOI: 10.2174/1389200219666180820112457] [Citation(s) in RCA: 111] [Impact Index Per Article: 18.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2017] [Revised: 01/01/2018] [Accepted: 03/19/2018] [Indexed: 12/19/2022]
Abstract
Background:Drug discovery, which is the process of discovering new candidate medications, is very important for pharmaceutical industries. At its current stage, discovering new drugs is still a very expensive and time-consuming process, requiring Phases I, II and III for clinical trials. Recently, machine learning techniques in Artificial Intelligence (AI), especially the deep learning techniques which allow a computational model to generate multiple layers, have been widely applied and achieved state-of-the-art performance in different fields, such as speech recognition, image classification, bioinformatics, etc. One very important application of these AI techniques is in the field of drug discovery.Methods:We did a large-scale literature search on existing scientific websites (e.g, ScienceDirect, Arxiv) and startup companies to understand current status of machine learning techniques in drug discovery.Results:Our experiments demonstrated that there are different patterns in machine learning fields and drug discovery fields. For example, keywords like prediction, brain, discovery, and treatment are usually in drug discovery fields. Also, the total number of papers published in drug discovery fields with machine learning techniques is increasing every year.Conclusion:The main focus of this survey is to understand the current status of machine learning techniques in the drug discovery field within both academic and industrial settings, and discuss its potential future applications. Several interesting patterns for machine learning techniques in drug discovery fields are discussed in this survey.
Collapse
Affiliation(s)
- Natalie Stephenson
- Department of Computer Science, Pacific Lutheran University, Tacoma, WA 98447, United States
| | - Emily Shane
- Department of Computer Science, Pacific Lutheran University, Tacoma, WA 98447, United States
| | - Jessica Chase
- Department of Computer Science, Pacific Lutheran University, Tacoma, WA 98447, United States
| | - Jason Rowland
- Department of Computer Science, Pacific Lutheran University, Tacoma, WA 98447, United States
| | - David Ries
- Department of Computer Science, Pacific Lutheran University, Tacoma, WA 98447, United States
| | - Nicola Justice
- Department of Mathematics, Pacific Lutheran University, Tacoma, WA 98447, United States
| | - Jie Zhang
- Key Laboratory of Hebei Province for Plant Physiology and Molecular Pathology, College of Life Sciences, Hebei Agricultural University, Baoding, China
| | - Leong Chan
- School of Business, Pacific Lutheran University, Tacoma, WA 98447, United States
| | - Renzhi Cao
- Department of Computer Science, Pacific Lutheran University, Tacoma, WA 98447, United States
| |
Collapse
|
20
|
Krittanawong C, Johnson KW, Hershman SG, Tang WW. Big data, artificial intelligence, and cardiovascular precision medicine. EXPERT REVIEW OF PRECISION MEDICINE AND DRUG DEVELOPMENT 2018. [DOI: 10.1080/23808993.2018.1528871] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022]
Affiliation(s)
- Chayakrit Krittanawong
- Department of Internal Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Kipp W. Johnson
- Institute for Next Generation Healthcare, Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Steven G. Hershman
- Department of Medicine, Stanford University, Stanford, CA, USA
- Division of Cardiovascular Medicine, Department of Medicine, Stanford University, Stanford, CA, USA
| | - W.H. Wilson Tang
- Department of Cardiovascular Medicine, Heart and Vascular Institute, Cleveland Clinic, Cleveland, OH, USA
- Department of Cellular and Molecular Medicine, Lerner Research Institute, Cleveland, OH, USA
- Center for Clinical Genomics, Cleveland Clinic, Cleveland, OH, USA
| |
Collapse
|
21
|
Leveridge M, Chung CW, Gross JW, Phelps CB, Green D. Integration of Lead Discovery Tactics and the Evolution of the Lead Discovery Toolbox. SLAS DISCOVERY 2018; 23:881-897. [PMID: 29874524 DOI: 10.1177/2472555218778503] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/26/2022]
Abstract
There has been much debate around the success rates of various screening strategies to identify starting points for drug discovery. Although high-throughput target-based and phenotypic screening has been the focus of this debate, techniques such as fragment screening, virtual screening, and DNA-encoded library screening are also increasingly reported as a source of new chemical equity. Here, we provide examples in which integration of more than one screening approach has improved the campaign outcome and discuss how strengths and weaknesses of various methods can be used to build a complementary toolbox of approaches, giving researchers the greatest probability of successfully identifying leads. Among others, we highlight case studies for receptor-interacting serine/threonine-protein kinase 1 and the bromo- and extra-terminal domain family of bromodomains. In each example, the unique insight or chemistries individual approaches provided are described, emphasizing the synergy of information obtained from the various tactics employed and the particular question each tactic was employed to answer. We conclude with a short prospective discussing how screening strategies are evolving, what this screening toolbox might look like in the future, how to maximize success through integration of multiple tactics, and scenarios that drive selection of one combination of tactics over another.
Collapse
Affiliation(s)
- Melanie Leveridge
- 1 GlaxoSmithKline Drug Design and Selection, Platform Technology and Science, Stevenage, Hertfordshire, UK
| | - Chun-Wa Chung
- 1 GlaxoSmithKline Drug Design and Selection, Platform Technology and Science, Stevenage, Hertfordshire, UK
| | - Jeffrey W Gross
- 2 GlaxoSmithKline Drug Design and Selection, Platform Technology and Science, Collegeville, PA, USA
| | - Christopher B Phelps
- 3 GlaxoSmithKline Drug Design and Selection, Platform Technology and Science, Cambridge, MA, USA
| | - Darren Green
- 1 GlaxoSmithKline Drug Design and Selection, Platform Technology and Science, Stevenage, Hertfordshire, UK
| |
Collapse
|
22
|
Abstract
Aim: Computational chemogenomics models the compound–protein interaction space, typically for drug discovery, where existing methods predominantly either incorporate increasing numbers of bioactivity samples or focus on specific subfamilies of proteins and ligands. As an alternative to modeling entire large datasets at once, active learning adaptively incorporates a minimum of informative examples for modeling, yielding compact but high quality models. Results/methodology: We assessed active learning for protein/target family-wide chemogenomic modeling by replicate experiment. Results demonstrate that small yet highly predictive models can be extracted from only 10–25% of large bioactivity datasets, irrespective of molecule descriptors used. Conclusion: Chemogenomic active learning identifies small subsets of ligand–target interactions in a large screening database that lead to knowledge discovery and highly predictive models.
Collapse
|
23
|
Gough A, Stern AM, Maier J, Lezon T, Shun TY, Chennubhotla C, Schurdak ME, Haney SA, Taylor DL. Biologically Relevant Heterogeneity: Metrics and Practical Insights. SLAS DISCOVERY 2017; 22:213-237. [PMID: 28231035 DOI: 10.1177/2472555216682725] [Citation(s) in RCA: 54] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
Abstract
Heterogeneity is a fundamental property of biological systems at all scales that must be addressed in a wide range of biomedical applications, including basic biomedical research, drug discovery, diagnostics, and the implementation of precision medicine. There are a number of published approaches to characterizing heterogeneity in cells in vitro and in tissue sections. However, there are no generally accepted approaches for the detection and quantitation of heterogeneity that can be applied in a relatively high-throughput workflow. This review and perspective emphasizes the experimental methods that capture multiplexed cell-level data, as well as the need for standard metrics of the spatial, temporal, and population components of heterogeneity. A recommendation is made for the adoption of a set of three heterogeneity indices that can be implemented in any high-throughput workflow to optimize the decision-making process. In addition, a pairwise mutual information method is suggested as an approach to characterizing the spatial features of heterogeneity, especially in tissue-based imaging. Furthermore, metrics for temporal heterogeneity are in the early stages of development. Example studies indicate that the analysis of functional phenotypic heterogeneity can be exploited to guide decisions in the interpretation of biomedical experiments, drug discovery, diagnostics, and the design of optimal therapeutic strategies for individual patients.
Collapse
Affiliation(s)
- Albert Gough
- 1 Department of Computational and Systems Biology, University of Pittsburgh, Pittsburgh, PA, USA.,2 University of Pittsburgh Drug Discovery Institute, Pittsburgh, PA, USA
| | - Andrew M Stern
- 1 Department of Computational and Systems Biology, University of Pittsburgh, Pittsburgh, PA, USA.,2 University of Pittsburgh Drug Discovery Institute, Pittsburgh, PA, USA
| | - John Maier
- 3 Department of Family Medicine, University of Pittsburgh, Pittsburgh, PA, USA
| | - Timothy Lezon
- 1 Department of Computational and Systems Biology, University of Pittsburgh, Pittsburgh, PA, USA.,2 University of Pittsburgh Drug Discovery Institute, Pittsburgh, PA, USA
| | - Tong-Ying Shun
- 2 University of Pittsburgh Drug Discovery Institute, Pittsburgh, PA, USA
| | - Chakra Chennubhotla
- 1 Department of Computational and Systems Biology, University of Pittsburgh, Pittsburgh, PA, USA.,2 University of Pittsburgh Drug Discovery Institute, Pittsburgh, PA, USA
| | - Mark E Schurdak
- 1 Department of Computational and Systems Biology, University of Pittsburgh, Pittsburgh, PA, USA.,2 University of Pittsburgh Drug Discovery Institute, Pittsburgh, PA, USA.,4 University of Pittsburgh Cancer Institute, Pittsburgh, PA, USA
| | - Steven A Haney
- 5 Eli Lilly and Company, Lilly Corporate Center, Indianapolis, IN, USA
| | - D Lansing Taylor
- 1 Department of Computational and Systems Biology, University of Pittsburgh, Pittsburgh, PA, USA.,2 University of Pittsburgh Drug Discovery Institute, Pittsburgh, PA, USA.,4 University of Pittsburgh Cancer Institute, Pittsburgh, PA, USA
| |
Collapse
|
24
|
Small Random Forest Models for Effective Chemogenomic Active Learning. JOURNAL OF COMPUTER AIDED CHEMISTRY 2017. [DOI: 10.2751/jcac.18.124] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
|
25
|
Bougen-Zhukov N, Loh SY, Lee HK, Loo LH. Large-scale image-based screening and profiling of cellular phenotypes. Cytometry A 2016; 91:115-125. [PMID: 27434125 DOI: 10.1002/cyto.a.22909] [Citation(s) in RCA: 46] [Impact Index Per Article: 5.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022]
Abstract
Cellular phenotypes are observable characteristics of cells resulting from the interactions of intrinsic and extrinsic chemical or biochemical factors. Image-based phenotypic screens under large numbers of basal or perturbed conditions can be used to study the influences of these factors on cellular phenotypes. Hundreds to thousands of phenotypic descriptors can also be quantified from the images of cells under each of these experimental conditions. Therefore, huge amounts of data can be generated, and the analysis of these data has become a major bottleneck in large-scale phenotypic screens. Here, we review current experimental and computational methods for large-scale image-based phenotypic screens. Our focus is on phenotypic profiling, a computational procedure for constructing quantitative and compact representations of cellular phenotypes based on the images collected in these screens. © 2016 International Society for Advancement of Cytometry.
Collapse
Affiliation(s)
- Nicola Bougen-Zhukov
- Bioinformatics Institute, Agency for Science, Technology and Research, Singapore, 138671, Singapore
| | - Sheng Yang Loh
- Bioinformatics Institute, Agency for Science, Technology and Research, Singapore, 138671, Singapore
| | - Hwee Kuan Lee
- Bioinformatics Institute, Agency for Science, Technology and Research, Singapore, 138671, Singapore
| | - Lit-Hsin Loo
- Bioinformatics Institute, Agency for Science, Technology and Research, Singapore, 138671, Singapore.,Department of Pharmacology, School of Medicine, National University of Singapore, Singapore, 117600, Singapore
| |
Collapse
|