1
|
Alsaedi S, Ogasawara M, Alarawi M, Gao X, Gojobori T. AI-powered precision medicine: utilizing genetic risk factor optimization to revolutionize healthcare. NAR Genom Bioinform 2025; 7:lqaf038. [PMID: 40330081 PMCID: PMC12051108 DOI: 10.1093/nargab/lqaf038] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2024] [Revised: 02/11/2025] [Accepted: 04/17/2025] [Indexed: 05/08/2025] Open
Abstract
The convergence of artificial intelligence (AI) and biomedical data is transforming precision medicine by enabling the use of genetic risk factors (GRFs) for customized healthcare services based on individual needs. Although GRFs play an essential role in disease susceptibility, progression, and therapeutic outcomes, a gap exists in exploring their contribution to AI-powered precision medicine. This paper addresses this need by investigating the significance and potential of utilizing GRFs with AI in the medical field. We examine their applications, particularly emphasizing their impact on disease prediction, treatment personalization, and overall healthcare improvement. This review explores the application of AI algorithms to optimize the use of GRFs, aiming to advance precision medicine in disease screening, patient stratification, drug discovery, and understanding disease mechanisms. Through a variety of case studies and examples, we demonstrate the potential of incorporating GRFs facilitated by AI into medical practice, resulting in more precise diagnoses, targeted therapies, and improved patient outcomes. This review underscores the potential of GRFs, empowered by AI, to enhance precision medicine by improving diagnostic accuracy, treatment precision, and individualized healthcare solutions.
Collapse
Affiliation(s)
- Sakhaa Alsaedi
- Computer Science, Division of Computer, Electrical and Mathematical Sciences and Engineering (CEMSE), King Abdullah University of Science and Technology (KAUST), 23955-6900 Thuwal, Kingdom of Saudi Arabia
- Center of Excellence on Smart Health, King Abdullah University of Science and Technology (KAUST), 23955-6900 Thuwal, Kingdom of Saudi Arabia
- Center of Excellence for Generative AI, King Abdullah University of Science and Technology (KAUST), 23955-6900 Thuwal, Kingdom of Saudi Arabia
- College of Computer Science and Engineering (CCSE), Taibah University, 42353 Madinah, Kingdom of Saudi Arabia
| | - Michihiro Ogasawara
- Department of Internal Medicine and Rheumatology, Juntendo University, 113-8431 Tokyo, Japan
| | - Mohammed Alarawi
- Center of Excellence on Smart Health, King Abdullah University of Science and Technology (KAUST), 23955-6900 Thuwal, Kingdom of Saudi Arabia
- Center of Excellence for Generative AI, King Abdullah University of Science and Technology (KAUST), 23955-6900 Thuwal, Kingdom of Saudi Arabia
- Biological and Environmental Sciences and Engineering, King Abdullah University of Science and Technology (KAUST), 23955-6900 Thuwal, Kingdom of Saudi Arabia
| | - Xin Gao
- Computer Science, Division of Computer, Electrical and Mathematical Sciences and Engineering (CEMSE), King Abdullah University of Science and Technology (KAUST), 23955-6900 Thuwal, Kingdom of Saudi Arabia
- Center of Excellence on Smart Health, King Abdullah University of Science and Technology (KAUST), 23955-6900 Thuwal, Kingdom of Saudi Arabia
- Center of Excellence for Generative AI, King Abdullah University of Science and Technology (KAUST), 23955-6900 Thuwal, Kingdom of Saudi Arabia
| | - Takashi Gojobori
- Center of Excellence on Smart Health, King Abdullah University of Science and Technology (KAUST), 23955-6900 Thuwal, Kingdom of Saudi Arabia
- Center of Excellence for Generative AI, King Abdullah University of Science and Technology (KAUST), 23955-6900 Thuwal, Kingdom of Saudi Arabia
- Biological and Environmental Sciences and Engineering, King Abdullah University of Science and Technology (KAUST), 23955-6900 Thuwal, Kingdom of Saudi Arabia
- Marine Open Innovation Institute (MaOI), 113-8431 Shizuoka, Japan
| |
Collapse
|
2
|
Guo F, Guan R, Li Y, Liu Q, Wang X, Yang C, Wang J. Foundation models in bioinformatics. Natl Sci Rev 2025; 12:nwaf028. [PMID: 40078374 PMCID: PMC11900445 DOI: 10.1093/nsr/nwaf028] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2024] [Revised: 12/17/2024] [Accepted: 01/08/2025] [Indexed: 03/14/2025] Open
Abstract
With the adoption of foundation models (FMs), artificial intelligence (AI) has become increasingly significant in bioinformatics and has successfully addressed many historical challenges, such as pre-training frameworks, model evaluation and interpretability. FMs demonstrate notable proficiency in managing large-scale, unlabeled datasets, because experimental procedures are costly and labor intensive. In various downstream tasks, FMs have consistently achieved noteworthy results, demonstrating high levels of accuracy in representing biological entities. A new era in computational biology has been ushered in by the application of FMs, focusing on both general and specific biological issues. In this review, we introduce recent advancements in bioinformatics FMs employed in a variety of downstream tasks, including genomics, transcriptomics, proteomics, drug discovery and single-cell analysis. Our aim is to assist scientists in selecting appropriate FMs in bioinformatics, according to four model types: language FMs, vision FMs, graph FMs and multimodal FMs. In addition to understanding molecular landscapes, AI technology can establish the theoretical and practical foundation for continued innovation in molecular biology.
Collapse
Affiliation(s)
- Fei Guo
- Hunan Provincial Key Lab on Bioinformatics, School of Computer Science and Engineering, Central South University, Changsha 410083, China
- Xiangjiang Laboratory, Changsha 410083, China
| | - Renchu Guan
- Key Laboratory for Symbol Computation and Knowledge Engineering of the Ministry of Education, College of Computer Science and Technology, Jilin University, Changchun 130012, China
| | - Yaohang Li
- Department of Computer Science, Old Dominion University, Norfolk 23529, USA
| | - Qi Liu
- School of Life Sciences and Technology, Tongji University, Shanghai 200092, China
| | - Xiaowo Wang
- Department of Automation, Tsinghua University, Beijing 100084, China
| | - Can Yang
- Department of Mathematics, State Key Laboratory of Molecular Neuroscience, and Big Data Bio-Intelligence Lab, The Hong Kong University of Science and Technology, Hong Kong, China
| | - Jianxin Wang
- Hunan Provincial Key Lab on Bioinformatics, School of Computer Science and Engineering, Central South University, Changsha 410083, China
- Xiangjiang Laboratory, Changsha 410083, China
| |
Collapse
|
3
|
Tyagi N, Vahab N, Tyagi S. Genome language modeling (GLM): a beginner's cheat sheet. Biol Methods Protoc 2025; 10:bpaf022. [PMID: 40370585 PMCID: PMC12077296 DOI: 10.1093/biomethods/bpaf022] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/07/2025] [Revised: 02/17/2025] [Accepted: 03/23/2025] [Indexed: 05/16/2025] Open
Abstract
Integrating genomics with diverse data modalities has the potential to revolutionize personalized medicine. However, this integration poses significant challenges due to the fundamental differences in data types and structures. The vast size of the genome necessitates transformation into a condensed representation containing key biomarkers and relevant features to ensure interoperability with other modalities. This commentary explores both conventional and state-of-the-art approaches to genome language modeling (GLM), with a focus on representing and extracting meaningful features from genomic sequences. We focus on the latest trends of applying language modeling techniques on genomics sequence data, treating it as a text modality. Effective feature extraction is essential in enabling machine learning models to effectively analyze large genomic datasets, particularly within multimodal frameworks. We first provide a step-by-step guide to various genomic sequence preprocessing and tokenization techniques. Then we explore feature extraction methods for the transformation of tokens using frequency, embedding, and neural network-based approaches. In the end, we discuss machine learning (ML) applications in genomics, focusing on classification, regression, language processing algorithms, and multimodal integration. Additionally, we explore the role of GLM in functional annotation, emphasizing how advanced ML models, such as Bidirectional encoder representations from transformers, enhance the interpretation of genomic data. To the best of our knowledge, we compile the first end-to-end analytic guide to convert complex genomic data into biologically interpretable information using GLM, thereby facilitating the development of novel data-driven hypotheses.
Collapse
Affiliation(s)
- Navya Tyagi
- AI and Data Science, Indian Institute of Technology, Madras, Chennai 600036, Tamil Nadu, India
- Amity Institute of Integrative Health Sciences, Amity University, Gurugram 122412, Haryana, India
| | - Naima Vahab
- School of Computing Technologies, Royal Melbourne Institute of Technology (RMIT) University, 3001 Melbourne, Australia
| | - Sonika Tyagi
- School of Computing Technologies, Royal Melbourne Institute of Technology (RMIT) University, 3001 Melbourne, Australia
| |
Collapse
|
4
|
Zablocki LI, Bugnon LA, Gerard M, Di Persia L, Stegmayer G, Milone DH. Comprehensive benchmarking of large language models for RNA secondary structure prediction. Brief Bioinform 2025; 26:bbaf137. [PMID: 40205851 PMCID: PMC11982019 DOI: 10.1093/bib/bbaf137] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2024] [Revised: 01/31/2025] [Accepted: 02/27/2025] [Indexed: 04/11/2025] Open
Abstract
In recent years, inspired by the success of large language models (LLMs) for DNA and proteins, several LLMs for RNA have also been developed. These models take massive RNA datasets as inputs and learn, in a self-supervised way, how to represent each RNA base with a semantically rich numerical vector. This is done under the hypothesis that obtaining high-quality RNA representations can enhance data-costly downstream tasks, such as the fundamental RNA secondary structure prediction problem. However, existing RNA-LLM have not been evaluated for this task in a unified experimental setup. Since they are pretrained models, assessment of their generalization capabilities on new structures is a crucial aspect. Nonetheless, this has been just partially addressed in literature. In this work we present a comprehensive experimental and comparative analysis of pretrained RNA-LLM that have been recently proposed. We evaluate the use of these representations for the secondary structure prediction task with a common deep learning architecture. The RNA-LLM were assessed with increasing generalization difficulty on benchmark datasets. Results showed that two LLMs clearly outperform the other models, and revealed significant challenges for generalization in low-homology scenarios. Moreover, in this study we provide curated benchmark datasets of increasing complexity and a unified experimental setup for this scientific endeavor. Source code and curated benchmark datasets with increasing complexity are available in the repository: https://github.com/sinc-lab/rna-llm-folding/.
Collapse
Affiliation(s)
- Luciano I Zablocki
- Research Institute for Signals, Systems and Computational Intelligence, sinc (i), FICH-UNL/CONICET, Ruta Nacional Nº 168, km 472.4, Santa Fe (3000), Argentina
| | - Leandro A Bugnon
- Research Institute for Signals, Systems and Computational Intelligence, sinc (i), FICH-UNL/CONICET, Ruta Nacional Nº 168, km 472.4, Santa Fe (3000), Argentina
| | - Matias Gerard
- Research Institute for Signals, Systems and Computational Intelligence, sinc (i), FICH-UNL/CONICET, Ruta Nacional Nº 168, km 472.4, Santa Fe (3000), Argentina
| | - Leandro Di Persia
- Research Institute for Signals, Systems and Computational Intelligence, sinc (i), FICH-UNL/CONICET, Ruta Nacional Nº 168, km 472.4, Santa Fe (3000), Argentina
| | - Georgina Stegmayer
- Research Institute for Signals, Systems and Computational Intelligence, sinc (i), FICH-UNL/CONICET, Ruta Nacional Nº 168, km 472.4, Santa Fe (3000), Argentina
| | - Diego H Milone
- Research Institute for Signals, Systems and Computational Intelligence, sinc (i), FICH-UNL/CONICET, Ruta Nacional Nº 168, km 472.4, Santa Fe (3000), Argentina
| |
Collapse
|
5
|
Creux C, Zehraoui F, Radvanyi F, Tahi F. MMnc: multi-modal interpretable representation for non-coding RNA classification and class annotation. BIOINFORMATICS (OXFORD, ENGLAND) 2025; 41:btaf051. [PMID: 39891346 PMCID: PMC11890286 DOI: 10.1093/bioinformatics/btaf051] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/10/2024] [Revised: 01/16/2025] [Accepted: 01/29/2025] [Indexed: 02/03/2025]
Abstract
MOTIVATION As the biological roles and disease implications of non-coding RNAs continue to emerge, the need to thoroughly characterize previously unexplored non-coding RNAs becomes increasingly urgent. These molecules hold potential as biomarkers and therapeutic targets. However, the vast and complex nature of non-coding RNAs data presents a challenge. We introduce MMnc, an interpretable deep-learning approach designed to classify non-coding RNAs into functional groups. MMnc leverages multiple data sources-such as the sequence, secondary structure, and expression-using attention-based multi-modal data integration. This ensures the learning of meaningful representations while accounting for missing sources in some samples. RESULTS Our findings demonstrate that MMnc achieves high classification accuracy across diverse non-coding RNA classes. The method's modular architecture allows for the consideration of multiple types of modalities, whereas other tools only consider one or two at most. MMnc is resilient to missing data, ensuring that all available information is effectively utilized. Importantly, the generated attention scores offer interpretable insights into the underlying patterns of the different non-coding RNA classes, potentially driving future non-coding RNA research and applications. AVAILABILITY AND IMPLEMENTATION Data and source code can be found at EvryRNA.ibisc.univ-evry.fr/EvryRNA/MMnc.
Collapse
Affiliation(s)
- Constance Creux
- Université Paris-Saclay, Univ Evry, IBISC, Evry-Courcouronnes 91020, France
- Molecular Oncology, PSL Research University, CNRS, UMR 144, Institut Curie, Paris 75248, France
| | - Farida Zehraoui
- Université Paris-Saclay, Univ Evry, IBISC, Evry-Courcouronnes 91020, France
| | - François Radvanyi
- Molecular Oncology, PSL Research University, CNRS, UMR 144, Institut Curie, Paris 75248, France
| | - Fariza Tahi
- Université Paris-Saclay, Univ Evry, IBISC, Evry-Courcouronnes 91020, France
| |
Collapse
|
6
|
Xu W, Li A, Zhao Y, Peng Y. Decoding the effects of mutation on protein interactions using machine learning. BIOPHYSICS REVIEWS 2025; 6:011307. [PMID: 40013003 PMCID: PMC11857871 DOI: 10.1063/5.0249920] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/21/2024] [Accepted: 01/14/2025] [Indexed: 02/28/2025]
Abstract
Accurately predicting mutation-caused binding free energy changes (ΔΔGs) on protein interactions is crucial for understanding how genetic variations affect interactions between proteins and other biomolecules, such as proteins, DNA/RNA, and ligands, which are vital for regulating numerous biological processes. Developing computational approaches with high accuracy and efficiency is critical for elucidating the mechanisms underlying various diseases, identifying potential biomarkers for early diagnosis, and developing targeted therapies. This review provides a comprehensive overview of recent advancements in predicting the impact of mutations on protein interactions across different interaction types, which are central to understanding biological processes and disease mechanisms, including cancer. We summarize recent progress in predictive approaches, including physicochemical-based, machine learning, and deep learning methods, evaluating the strengths and limitations of each. Additionally, we discuss the challenges related to the limitations of mutational data, including biases, data quality, and dataset size, and explore the difficulties in developing accurate prediction tools for mutation-induced effects on protein interactions. Finally, we discuss future directions for advancing these computational tools, highlighting the capabilities of advancing technologies, such as artificial intelligence to drive significant improvements in mutational effects prediction.
Collapse
Affiliation(s)
- Wang Xu
- Institute of Biophysics and Department of Physics, Central China Normal University, Wuhan 430079, China
| | - Anbang Li
- Institute of Biophysics and Department of Physics, Central China Normal University, Wuhan 430079, China
| | - Yunjie Zhao
- Institute of Biophysics and Department of Physics, Central China Normal University, Wuhan 430079, China
| | - Yunhui Peng
- Institute of Biophysics and Department of Physics, Central China Normal University, Wuhan 430079, China
| |
Collapse
|
7
|
Fu J, Li H, Kang Y, Zhu H, Huang T, Li Z. DRFormer: A Benchmark Model for RNA Sequence Downstream Tasks. Genes (Basel) 2025; 16:284. [PMID: 40149436 PMCID: PMC11942477 DOI: 10.3390/genes16030284] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/04/2025] [Revised: 02/24/2025] [Accepted: 02/25/2025] [Indexed: 03/29/2025] Open
Abstract
Background/Objectives: RNA research is critical for understanding gene regulation, disease mechanisms, and therapeutic development. Constructing effective RNA benchmark models for accurate downstream analysis has become a significant research challenge. The objective of this study is to propose a robust benchmark model, DRFormer, for RNA sequence downstream tasks. Methods: The DRFormer model utilizes RNA sequences to construct novel vision features based on secondary structure and sequence distance. These features are pre-trained using the SWIN model to develop a SWIN-RNA submodel. This submodel is then integrated with an RNA sequence model to construct a multimodal model for downstream analysis. Results: We conducted experiments on various RNA downstream tasks. In the sequence classification task, the MCC reached 94.4%, surpassing the state-of-the-art RNAErnie model by 1.2%. In the protein-RNA interaction prediction, DRFormer achieved an MCC of 0.492, outperforming advanced models like BERT-RBP and PrismNet. In RNA secondary structure prediction, the F1 score was 0.690, exceeding the widely used SPOT-RNA model by 1%. Additionally, generalization experiments on DNA tasks yielded satisfactory results. Conclusions: DRFormer is the first RNA sequence downstream analysis model that leverages structural features to construct a vision model and integrates sequence and vision models in a multimodal manner. This approach yields excellent prediction and analysis results, making it a valuable contribution to RNA research.
Collapse
Affiliation(s)
- Jianqi Fu
- School of Information Engineering, Huzhou University, Huzhou 313000, China (Y.K.)
| | - Haohao Li
- College of Science, Zhejiang Sci-Tech University, Hangzhou 310018, China; (H.L.); (T.H.)
| | - Yanlei Kang
- School of Information Engineering, Huzhou University, Huzhou 313000, China (Y.K.)
| | - Hancan Zhu
- School of Mathematics, Physics and Information, Shaoxing University, Shaoxing 312000, China;
| | - Tiren Huang
- College of Science, Zhejiang Sci-Tech University, Hangzhou 310018, China; (H.L.); (T.H.)
| | - Zhong Li
- School of Information Engineering, Huzhou University, Huzhou 313000, China (Y.K.)
| |
Collapse
|
8
|
Asim MN, Ibrahim MA, Asif T, Dengel A. RNA sequence analysis landscape: A comprehensive review of task types, databases, datasets, word embedding methods, and language models. Heliyon 2025; 11:e41488. [PMID: 39897847 PMCID: PMC11783440 DOI: 10.1016/j.heliyon.2024.e41488] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2024] [Revised: 12/23/2024] [Accepted: 12/24/2024] [Indexed: 02/04/2025] Open
Abstract
Deciphering information of RNA sequences reveals their diverse roles in living organisms, including gene regulation and protein synthesis. Aberrations in RNA sequence such as dysregulation and mutations can drive a diverse spectrum of diseases including cancers, genetic disorders, and neurodegenerative conditions. Furthermore, researchers are harnessing RNA's therapeutic potential for transforming traditional treatment paradigms into personalized therapies through the development of RNA-based drugs and gene therapies. To gain insights of biological functions and to detect diseases at early stages and develop potent therapeutics, researchers are performing diverse types RNA sequence analysis tasks. RNA sequence analysis through conventional wet-lab methods is expensive, time-consuming and error prone. To enable large-scale RNA sequence analysis, empowerment of wet-lab experimental methods with Artificial Intelligence (AI) applications necessitates scientists to have a comprehensive knowledge of both DNA and AI fields. While molecular biologists encounter challenges in understanding AI methods, computer scientists often lack basic foundations of RNA sequence analysis tasks. Considering the absence of a comprehensive literature that bridges this research gap and promotes the development of AI-driven RNA sequence analysis applications, the contributions of this manuscript are manifold: It equips AI researchers with biological foundations of 47 distinct RNA sequence analysis tasks. It sets a stage for development of benchmark datasets related to 47 distinct RNA sequence analysis tasks by facilitating cruxes of 64 different biological databases. It presents word embeddings and language models applications across 47 distinct RNA sequence analysis tasks. It streamlines the development of new predictors by providing a comprehensive survey of 58 word embeddings and 70 language models based predictive pipelines performance values as well as top performing traditional sequence encoding based predictors and their performances across 47 RNA sequence analysis tasks.
Collapse
Affiliation(s)
- Muhammad Nabeel Asim
- German Research Center for Artificial Intelligence GmbH, Kaiserslautern, 67663, Germany
| | - Muhammad Ali Ibrahim
- German Research Center for Artificial Intelligence GmbH, Kaiserslautern, 67663, Germany
- Department of Computer Science, Rhineland-Palatinate Technical University of Kaiserslautern-Landau, Kaiserslautern, 67663, Germany
| | - Tayyaba Asif
- Department of Computer Science, Rhineland-Palatinate Technical University of Kaiserslautern-Landau, Kaiserslautern, 67663, Germany
| | - Andreas Dengel
- German Research Center for Artificial Intelligence GmbH, Kaiserslautern, 67663, Germany
- Department of Computer Science, Rhineland-Palatinate Technical University of Kaiserslautern-Landau, Kaiserslautern, 67663, Germany
| |
Collapse
|
9
|
He Y, Huang F, Jiang X, Nie Y, Wang M, Wang J, Chen H. Foundation Model for Advancing Healthcare: Challenges, Opportunities and Future Directions. IEEE Rev Biomed Eng 2025; 18:172-191. [PMID: 39531565 DOI: 10.1109/rbme.2024.3496744] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2024]
Abstract
Foundation model, trained on a diverse range of data and adaptable to a myriad of tasks, is advancing healthcare. It fosters the development of healthcare artificial intelligence (AI) models tailored to the intricacies of the medical field, bridging the gap between limited AI models and the varied nature of healthcare practices. The advancement of a healthcare foundation model (HFM) brings forth tremendous potential to augment intelligent healthcare services across a broad spectrum of scenarios. However, despite the imminent widespread deployment of HFMs, there is currently a lack of clear understanding regarding their operation in the healthcare field, their existing challenges, and their future trajectory. To answer these critical inquiries, we present a comprehensive and in-depth examination that delves into the landscape of HFMs. It begins with a comprehensive overview of HFMs, encompassing their methods, data, and applications, to provide a quick understanding of the current progress. Subsequently, it delves into a thorough exploration of the challenges associated with data, algorithms, and computing infrastructures in constructing and widely applying foundation models in healthcare. Furthermore, this survey identifies promising directions for future development in this field. We believe that this survey will enhance the community's understanding of the current progress of HFMs and serve as a valuable source of guidance for future advancements in this domain.
Collapse
|
10
|
Jin L, Zhou Y, Zhang S, Chen SJ. mRNA vaccine sequence and structure design and optimization: Advances and challenges. J Biol Chem 2025; 301:108015. [PMID: 39608721 PMCID: PMC11728972 DOI: 10.1016/j.jbc.2024.108015] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2024] [Revised: 11/13/2024] [Accepted: 11/16/2024] [Indexed: 11/30/2024] Open
Abstract
Messenger RNA (mRNA) vaccines have emerged as a powerful tool against communicable diseases and cancers, as demonstrated by their huge success during the coronavirus disease 2019 (COVID-19) pandemic. Despite the outstanding achievements, mRNA vaccines still face challenges such as stringent storage requirements, insufficient antigen expression, and unexpected immune responses. Since the intrinsic properties of mRNA molecules significantly impact vaccine performance, optimizing mRNA design is crucial in preclinical development. In this review, we outline four key principles for optimal mRNA sequence design: enhancing ribosome loading and translation efficiency through untranslated region (UTR) optimization, improving translation efficiency via codon optimization, increasing structural stability by refining global RNA sequence and extending in-cell lifetime and expression fidelity by adjusting local RNA structures. We also explore recent advancements in computational models for designing and optimizing mRNA vaccine sequences following these principles. By integrating current mRNA knowledge, addressing challenges, and examining advanced computational methods, this review aims to promote the application of computational approaches in mRNA vaccine development and inspire novel solutions to existing obstacles.
Collapse
Affiliation(s)
- Lei Jin
- Department of Physics and Astronomy, University of Missouri, Columbia, Missouri, USA
| | - Yuanzhe Zhou
- Department of Physics and Astronomy, University of Missouri, Columbia, Missouri, USA
| | - Sicheng Zhang
- Department of Physics and Astronomy, University of Missouri, Columbia, Missouri, USA
| | - Shi-Jie Chen
- Department of Physics and Astronomy, University of Missouri, Columbia, Missouri, USA; Department of Biochemistry, MU Institute for Data Science and Informatics, University of Missouri, Columbia, Missouri, USA.
| |
Collapse
|
11
|
Bernard C, Postic G, Ghannay S, Tahi F. RNA-TorsionBERT: leveraging language models for RNA 3D torsion angles prediction. Bioinformatics 2024; 41:btaf004. [PMID: 39775709 PMCID: PMC11758789 DOI: 10.1093/bioinformatics/btaf004] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2024] [Revised: 12/11/2024] [Accepted: 01/07/2025] [Indexed: 01/11/2025] Open
Abstract
MOTIVATION Predicting the 3D structure of RNA is an ongoing challenge that has yet to be completely addressed despite continuous advancements. RNA 3D structures rely on distances between residues and base interactions but also backbone torsional angles. Knowing the torsional angles for each residue could help reconstruct its global folding, which is what we tackle in this work. This paper presents a novel approach for directly predicting RNA torsional angles from raw sequence data. Our method draws inspiration from the successful application of language models in various domains and adapts them to RNA. RESULTS We have developed a language-based model, RNA-TorsionBERT, incorporating better sequential interactions for predicting RNA torsional and pseudo-torsional angles from the sequence only. Through extensive benchmarking, we demonstrate that our method improves the prediction of torsional angles compared to state-of-the-art methods. In addition, by using our predictive model, we have inferred a torsion angle-dependent scoring function, called TB-MCQ, that replaces the true reference angles by our model prediction. We show that it accurately evaluates the quality of near-native predicted structures, in terms of RNA backbone torsion angle values. Our work demonstrates promising results, suggesting the potential utility of language models in advancing RNA 3D structure prediction. AVAILABILITY AND IMPLEMENTATION Source code is freely available on the EvryRNA platform: https://evryrna.ibisc.univ-evry.fr/evryrna/RNA-TorsionBERT.
Collapse
Affiliation(s)
- Clément Bernard
- Université Paris Saclay, Univ Evry, IBISC, Evry-Courcouronnes 91020, France
- LISN—CNRS/Université Paris-Saclay, Orsay 91400, France
| | - Guillaume Postic
- Université Paris Saclay, Univ Evry, IBISC, Evry-Courcouronnes 91020, France
| | - Sahar Ghannay
- LISN—CNRS/Université Paris-Saclay, Orsay 91400, France
| | - Fariza Tahi
- Université Paris Saclay, Univ Evry, IBISC, Evry-Courcouronnes 91020, France
| |
Collapse
|
12
|
Yu H, Yang H, Sun W, Yan Z, Yang X, Zhang H, Ding Y, Li K. An interpretable RNA foundation model for exploring functional RNA motifs in plants. NAT MACH INTELL 2024; 6:1616-1625. [PMID: 39703563 PMCID: PMC11652376 DOI: 10.1038/s42256-024-00946-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2024] [Accepted: 11/05/2024] [Indexed: 12/21/2024]
Abstract
The complex 'language' of plant RNA encodes a vast array of biological regulatory elements that orchestrate crucial aspects of plant growth, development and adaptation to environmental stresses. Recent advancements in foundation models (FMs) have demonstrated their unprecedented potential to decipher complex 'language' in biology. In this study, we introduced PlantRNA-FM, a high-performance and interpretable RNA FM specifically designed for plants. PlantRNA-FM was pretrained on an extensive dataset, integrating RNA sequences and RNA structure information from 1,124 distinct plant species. PlantRNA-FM exhibits superior performance in plant-specific downstream tasks. PlantRNA-FM achieves an F1 score of 0.974 for genic region annotation, whereas the current best-performing model achieves 0.639. Our PlantRNA-FM is empowered by our interpretable framework that facilitates the identification of biologically functional RNA sequence and structure motifs, including both RNA secondary and tertiary structure motifs across transcriptomes. Through experimental validations, we revealed translation-associated RNA motifs in plants. Our PlantRNA-FM also highlighted the importance of the position information of these functional RNA motifs in genic regions. Taken together, our PlantRNA-FM facilitates the exploration of functional RNA motifs across the complexity of transcriptomes, empowering plant scientists with capabilities for programming RNA codes in plants.
Collapse
Affiliation(s)
- Haopeng Yu
- Key Laboratory of Molecular Epigenetics of the Ministry of Education, Northeast Normal University, Changchun, China
- Department of Cell and Developmental Biology, John Innes Centre, Norwich Research Park, Norwich, UK
| | - Heng Yang
- Department of Computer Science, University of Exeter, Exeter, UK
| | - Wenqing Sun
- Key Laboratory of Molecular Epigenetics of the Ministry of Education, Northeast Normal University, Changchun, China
- Department of Cell and Developmental Biology, John Innes Centre, Norwich Research Park, Norwich, UK
| | - Zongyun Yan
- Department of Cell and Developmental Biology, John Innes Centre, Norwich Research Park, Norwich, UK
- State Key Laboratory of Plant Genomics and National Center for Plant Gene Research (Beijing), Institute of Genetics and Developmental Biology, Innovation Academy for Seed Design, Chinese Academy of Sciences, Beijing, China
| | - Xiaofei Yang
- National Key Laboratory of Plant Molecular Genetics, CAS Center for Excellence in Molecular Plant Sciences, Institute of Plant Physiology and Ecology, Chinese Academy of Sciences, Shanghai, China
- CAS-JIC Center of Excellence for Plant and Microbial Sciences, Institute of Plant Physiology and Ecology, Chinese Academy of Sciences, Shanghai, China
| | - Huakun Zhang
- Key Laboratory of Molecular Epigenetics of the Ministry of Education, Northeast Normal University, Changchun, China
| | - Yiliang Ding
- Department of Cell and Developmental Biology, John Innes Centre, Norwich Research Park, Norwich, UK
| | - Ke Li
- Department of Computer Science, University of Exeter, Exeter, UK
- Alan Turing Institute, London, UK
| |
Collapse
|
13
|
Sarumi OA, Heider D. Large language models and their applications in bioinformatics. Comput Struct Biotechnol J 2024; 23:3498-3505. [PMID: 39435343 PMCID: PMC11493188 DOI: 10.1016/j.csbj.2024.09.031] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2024] [Revised: 09/30/2024] [Accepted: 09/30/2024] [Indexed: 10/23/2024] Open
Abstract
Recent advancements in Natural Language Processing (NLP) have been significantly driven by the development of Large Language Models (LLMs), representing a substantial leap in language-based technology capabilities. These models, built on sophisticated deep learning architectures, typically transformers, are characterized by billions of parameters and extensive training data, enabling them to achieve high accuracy across various tasks. The transformer architecture of LLMs allows them to effectively handle context and sequential information, which is crucial for understanding and generating human language. Beyond traditional NLP applications, LLMs have shown significant promise in bioinformatics, transforming the field by addressing challenges associated with large and complex biological datasets. In genomics, proteomics, and personalized medicine, LLMs facilitate identifying patterns, predicting protein structures, or understanding genetic variations. This capability is crucial, e.g., for advancing drug discovery, where accurate prediction of molecular interactions is essential. This review discusses the current trends in LLMs research and their potential to revolutionize the field of bioinformatics and accelerate novel discoveries in the life sciences.
Collapse
Affiliation(s)
- Oluwafemi A. Sarumi
- University of Münster, Institute of Medical Informatics, Albert-Schweitzer-Campus, Münster, 48149, Germany
- Institute of Computer Science, Heinrich-Heine-University Duesseldorf, Graf-Adolf-Str. 63, Duesseldorf, 40215, Germany
| | - Dominik Heider
- University of Münster, Institute of Medical Informatics, Albert-Schweitzer-Campus, Münster, 48149, Germany
- Institute of Computer Science, Heinrich-Heine-University Duesseldorf, Graf-Adolf-Str. 63, Duesseldorf, 40215, Germany
| |
Collapse
|
14
|
Chen CC, Chan YM, Jeong H. REDalign: accurate RNA structural alignment using residual encoder-decoder network. BMC Bioinformatics 2024; 25:346. [PMID: 39501155 PMCID: PMC11539752 DOI: 10.1186/s12859-024-05956-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2024] [Accepted: 10/11/2024] [Indexed: 11/08/2024] Open
Abstract
BACKGROUND RNA secondary structural alignment serves as a foundational procedure in identifying conserved structural motifs among RNA sequences, crucially advancing our understanding of novel RNAs via comparative genomic analysis. While various computational strategies for RNA structural alignment exist, they often come with high computational complexity. Specifically, when addressing a set of RNAs with unknown structures, the task of simultaneously predicting their consensus secondary structure and determining the optimal sequence alignment requires an overwhelming computational effort of O ( L 6 ) for each RNA pair. Such an extremely high computational complexity makes these methods impractical for large-scale analysis despite their accurate alignment capabilities. RESULTS In this paper, we introduce REDalign, an innovative approach based on deep learning for RNA secondary structural alignment. By utilizing a residual encoder-decoder network, REDalign can efficiently capture consensus structures and optimize structural alignments. In this learning model, the encoder network leverages a hierarchical pyramid to assimilate high-level structural features. Concurrently, the decoder network, enhanced with residual skip connections, integrates multi-level encoded features to learn detailed feature hierarchies with fewer parameter sets. REDalign significantly reduces computational complexity compared to Sankoff-style algorithms and effectively handles non-nested structures, including pseudoknots, which are challenging for traditional alignment methods. Extensive evaluations demonstrate that REDalign provides superior accuracy and substantial computational efficiency. CONCLUSION REDalign presents a significant advancement in RNA secondary structural alignment, balancing high alignment accuracy with lower computational demands. Its ability to handle complex RNA structures, including pseudoknots, makes it an effective tool for large-scale RNA analysis, with potential implications for accelerating discoveries in RNA research and comparative genomics.
Collapse
Affiliation(s)
- Chun-Chi Chen
- Department of Electrical Engineering, National Chiayi University, No.300 Xuefu Rd, Chiayi City, 600355, Taiwan.
| | - Yi-Ming Chan
- MindtronicAI Co., 7 F., No. 218, Sec. 6, Roosevelt Road, Taipei, 11674, Taiwan
| | - Hyundoo Jeong
- Biomedical and Robotics Engineering, Incheon National University, 119 Academy-ro, Incheon, 22012, Yeonsu-gu, South Korea.
| |
Collapse
|
15
|
Nahali S, Safari L, Khanteymoori A, Huang J. StructmRNA a BERT based model with dual level and conditional masking for mRNA representation. Sci Rep 2024; 14:26043. [PMID: 39472486 PMCID: PMC11522565 DOI: 10.1038/s41598-024-77172-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2024] [Accepted: 10/21/2024] [Indexed: 11/02/2024] Open
Abstract
In this study, we introduce StructmRNA, a new BERT-based model that was designed for the detailed analysis of mRNA sequences and structures. The success of DNABERT in understanding the intricate language of non-coding DNA with bidirectional encoder representations is extended to mRNA with StructmRNA. This new model uses a special dual-level masking technique that covers both sequence and structure, along with conditional masking. This enables StructmRNA to adeptly generate meaningful embeddings for mRNA sequences, even in the absence of explicit structural data, by capitalizing on the intricate sequence-structure correlations learned during extensive pre-training on vast datasets. Compared to well-known models like those in the Stanford OpenVaccine project, StructmRNA performs better in important tasks such as predicting RNA degradation. Thus, StructmRNA can inform better RNA-based treatments by predicting the secondary structures and biological functions of unseen mRNA sequences. The proficiency of this model is further confirmed by rigorous evaluations, revealing its unprecedented ability to generalize across various organisms and conditions, thereby marking a significant advance in the predictive analysis of mRNA for therapeutic design. With this work, we aim to set a new standard for mRNA analysis, contributing to the broader field of genomics and therapeutic development.
Collapse
Affiliation(s)
- Sepideh Nahali
- Information Retrieval and Knowledge Management Research Lab, York University, Toronto, Ontario, Canada.
- Department of Computer Engineering, University of Zanjan, Zanjan, Iran.
| | - Leila Safari
- Department of Computer Engineering, University of Zanjan, Zanjan, Iran
| | | | - Jimmy Huang
- Information Retrieval and Knowledge Management Research Lab, York University, Toronto, Ontario, Canada
| |
Collapse
|
16
|
Zhao Y, Oono K, Takizawa H, Kotera M. GenerRNA: A generative pre-trained language model for de novo RNA design. PLoS One 2024; 19:e0310814. [PMID: 39352899 PMCID: PMC11444397 DOI: 10.1371/journal.pone.0310814] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2024] [Accepted: 09/08/2024] [Indexed: 10/04/2024] Open
Abstract
The design of RNA plays a crucial role in developing RNA vaccines, nucleic acid therapeutics, and innovative biotechnological tools. However, existing techniques frequently lack versatility across various tasks and are dependent on pre-defined secondary structure or other prior knowledge. To address these limitations, we introduce GenerRNA, a Transformer-based model inspired by the success of large language models (LLMs) in protein and molecule generation. GenerRNA is pre-trained on large-scale RNA sequences and capable of generating novel RNA sequences with stable secondary structures, while ensuring distinctiveness from existing sequences, thereby expanding our exploration of the RNA space. Moreover, GenerRNA can be fine-tuned on smaller, specialized datasets for specific subtasks, enabling the generation of RNAs with desired functionalities or properties without requiring any prior knowledge input. As a demonstration, we fine-tuned GenerRNA and successfully generated novel RNA sequences exhibiting high affinity for target proteins. Our work is the first application of a generative language model to RNA generation, presenting an innovative approach to RNA design.
Collapse
|
17
|
Todhunter ME, Jubair S, Verma R, Saqe R, Shen K, Duffy B. Artificial intelligence and machine learning applications for cultured meat. Front Artif Intell 2024; 7:1424012. [PMID: 39381621 PMCID: PMC11460582 DOI: 10.3389/frai.2024.1424012] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2024] [Accepted: 08/21/2024] [Indexed: 10/10/2024] Open
Abstract
Cultured meat has the potential to provide a complementary meat industry with reduced environmental, ethical, and health impacts. However, major technological challenges remain which require time-and resource-intensive research and development efforts. Machine learning has the potential to accelerate cultured meat technology by streamlining experiments, predicting optimal results, and reducing experimentation time and resources. However, the use of machine learning in cultured meat is in its infancy. This review covers the work available to date on the use of machine learning in cultured meat and explores future possibilities. We address four major areas of cultured meat research and development: establishing cell lines, cell culture media design, microscopy and image analysis, and bioprocessing and food processing optimization. In addition, we have included a survey of datasets relevant to CM research. This review aims to provide the foundation necessary for both cultured meat and machine learning scientists to identify research opportunities at the intersection between cultured meat and machine learning.
Collapse
Affiliation(s)
| | - Sheikh Jubair
- Alberta Machine Intelligence Institute, Edmonton, AB, Canada
| | - Ruchika Verma
- Alberta Machine Intelligence Institute, Edmonton, AB, Canada
| | - Rikard Saqe
- Department of Biology, University of Waterloo, Waterloo, ON, Canada
| | - Kevin Shen
- Department of Mathematics, University of Waterloo, Waterloo, ON, Canada
| | | |
Collapse
|
18
|
Li S, Moayedpour S, Li R, Bailey M, Riahi S, Kogler-Anele L, Miladi M, Miner J, Pertuy F, Zheng D, Wang J, Balsubramani A, Tran K, Zacharia M, Wu M, Gu X, Clinton R, Asquith C, Skaleski J, Boeglin L, Chivukula S, Dias A, Strugnell T, Montoya FU, Agarwal V, Bar-Joseph Z, Jager S. CodonBERT large language model for mRNA vaccines. Genome Res 2024; 34:1027-1035. [PMID: 38951026 PMCID: PMC11368176 DOI: 10.1101/gr.278870.123] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2023] [Accepted: 06/25/2024] [Indexed: 07/03/2024]
Abstract
mRNA-based vaccines and therapeutics are gaining popularity and usage across a wide range of conditions. One of the critical issues when designing such mRNAs is sequence optimization. Even small proteins or peptides can be encoded by an enormously large number of mRNAs. The actual mRNA sequence can have a large impact on several properties, including expression, stability, immunogenicity, and more. To enable the selection of an optimal sequence, we developed CodonBERT, a large language model (LLM) for mRNAs. Unlike prior models, CodonBERT uses codons as inputs, which enables it to learn better representations. CodonBERT was trained using more than 10 million mRNA sequences from a diverse set of organisms. The resulting model captures important biological concepts. CodonBERT can also be extended to perform prediction tasks for various mRNA properties. CodonBERT outperforms previous mRNA prediction methods, including on a new flu vaccine data set.
Collapse
Affiliation(s)
- Sizhen Li
- Digital R&D, Sanofi, Cambridge, Massachusetts 02141, USA
| | | | - Ruijiang Li
- Digital R&D, Sanofi, Cambridge, Massachusetts 02141, USA
| | - Michael Bailey
- Digital R&D, Sanofi, Cambridge, Massachusetts 02141, USA
| | - Saleh Riahi
- Digital R&D, Sanofi, Cambridge, Massachusetts 02141, USA
| | | | - Milad Miladi
- mRNA Center of Excellence, Sanofi, Waltham, Massachusetts 02451, USA
| | - Jacob Miner
- mRNA Center of Excellence, Sanofi, Waltham, Massachusetts 02451, USA
| | - Fabien Pertuy
- mRNA Center of Excellence, Sanofi, Waltham, Massachusetts 02451, USA
| | - Dinghai Zheng
- mRNA Center of Excellence, Sanofi, Waltham, Massachusetts 02451, USA
| | - Jun Wang
- mRNA Center of Excellence, Sanofi, Waltham, Massachusetts 02451, USA
| | | | - Khang Tran
- mRNA Center of Excellence, Sanofi, Waltham, Massachusetts 02451, USA
| | - Minnie Zacharia
- mRNA Center of Excellence, Sanofi, Waltham, Massachusetts 02451, USA
| | - Monica Wu
- mRNA Center of Excellence, Sanofi, Waltham, Massachusetts 02451, USA
| | - Xiaobo Gu
- mRNA Center of Excellence, Sanofi, Waltham, Massachusetts 02451, USA
| | - Ryan Clinton
- mRNA Center of Excellence, Sanofi, Waltham, Massachusetts 02451, USA
| | - Carla Asquith
- mRNA Center of Excellence, Sanofi, Waltham, Massachusetts 02451, USA
| | - Joseph Skaleski
- mRNA Center of Excellence, Sanofi, Waltham, Massachusetts 02451, USA
| | - Lianne Boeglin
- mRNA Center of Excellence, Sanofi, Waltham, Massachusetts 02451, USA
| | - Sudha Chivukula
- mRNA Center of Excellence, Sanofi, Waltham, Massachusetts 02451, USA
| | - Anusha Dias
- mRNA Center of Excellence, Sanofi, Waltham, Massachusetts 02451, USA
| | - Tod Strugnell
- mRNA Center of Excellence, Sanofi, Waltham, Massachusetts 02451, USA
| | | | - Vikram Agarwal
- mRNA Center of Excellence, Sanofi, Waltham, Massachusetts 02451, USA
| | - Ziv Bar-Joseph
- Digital R&D, Sanofi, Cambridge, Massachusetts 02141, USA;
| | - Sven Jager
- Digital R&D, Sanofi, Cambridge, Massachusetts 02141, USA
| |
Collapse
|
19
|
Wang J, Quan L, Jin Z, Wu H, Ma X, Wang X, Xie J, Pan D, Chen T, Wu T, Lyu Q. MultiModRLBP: A Deep Learning Approach for Multi-Modal RNA-Small Molecule Ligand Binding Sites Prediction. IEEE J Biomed Health Inform 2024; 28:4995-5006. [PMID: 38739505 DOI: 10.1109/jbhi.2024.3400521] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/16/2024]
Abstract
This study aims to tackle the intricate challenge of predicting RNA-small molecule binding sites to explore the potential value in the field of RNA drug targets. To address this challenge, we propose the MultiModRLBP method, which integrates multi-modal features using deep learning algorithms. These features include 3D structural properties at the nucleotide base level of the RNA molecule, relational graphs based on overall RNA structure, and rich RNA semantic information. In our investigation, we gathered 851 interactions between RNA and small molecule ligand from the RNAglib dataset and RLBind training set. Unlike conventional training sets, this collection broadened its scope by including RNA complexes that have the same RNA sequence but change their respective binding sites due to structural differences or the presence of different ligands. This enhancement enables the MultiModRLBP model to more accurately capture subtle changes at the structural level, ultimately improving its ability to discern nuances among similar RNA conformations. Furthermore, we evaluated MultiModRLBP on two classic test sets, Test18 and Test3, highlighting its performance disparities on small molecules based on metal and non-metal ions. Additionally, we conducted a structural sensitivity analysis on specific complex categories, considering RNA instances with varying degrees of structural changes and whether they share the same ligands. The research results indicate that MultiModRLBP outperforms the current state-of-the-art methods on multiple classic test sets, particularly excelling in predicting binding sites for non-metal ions and instances where the binding sites are widely distributed along the sequence. MultiModRLBP also can be used as a potential tool when the RNA structure is perturbed or the RNA experimental tertiary structure is not available. Most importantly, MultiModRLBP exhibits the capability to distinguish binding characteristics of RNA that are structurally diverse yet exhibit sequence similarity. These advancements hold promise in reducing the costs associated with the development of RNA-targeted drugs.
Collapse
|
20
|
Alsaafin A, Tizhoosh HR. Harmonizing immune cell sequences for computational analysis with large language models. Biol Methods Protoc 2024; 9:bpae055. [PMID: 39290987 PMCID: PMC11407694 DOI: 10.1093/biomethods/bpae055] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2024] [Revised: 07/07/2024] [Accepted: 07/29/2024] [Indexed: 09/19/2024] Open
Abstract
We present SEQuence Weighted Alignment for Sorting and Harmonization (Seqwash), an algorithm designed to process sequencing profiles utilizing large language models. Seqwash harmonizes immune cell sequences into a unified representation, empowering LLMs to embed meaningful patterns while eliminating irrelevant information. Evaluations using immune cell sequencing data showcase Seqwash's efficacy in standardizing profiles, leading to improved feature quality and enhanced performance in both supervised and unsupervised downstream tasks for sequencing data.
Collapse
Affiliation(s)
- Areej Alsaafin
- Department of Artificial Intelligence & Informatics, KIMIA Lab, Mayo Clinic, Rochester, MN, 55905, United States
| | - Hamid R Tizhoosh
- Department of Artificial Intelligence & Informatics, KIMIA Lab, Mayo Clinic, Rochester, MN, 55905, United States
| |
Collapse
|
21
|
Horvath A, Janapala Y, Woodward K, Mahmud S, Cleynen A, Gardiner E, Hannan R, Eyras E, Preiss T, Shirokikh N. Comprehensive translational profiling and STE AI uncover rapid control of protein biosynthesis during cell stress. Nucleic Acids Res 2024; 52:7925-7946. [PMID: 38721779 PMCID: PMC11260467 DOI: 10.1093/nar/gkae365] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/09/2024] [Revised: 03/21/2024] [Accepted: 04/25/2024] [Indexed: 07/23/2024] Open
Abstract
Translational control is important in all life, but it remains a challenge to accurately quantify. When ribosomes translate messenger (m)RNA into proteins, they attach to the mRNA in series, forming poly(ribo)somes, and can co-localize. Here, we computationally model new types of co-localized ribosomal complexes on mRNA and identify them using enhanced translation complex profile sequencing (eTCP-seq) based on rapid in vivo crosslinking. We detect long disome footprints outside regions of non-random elongation stalls and show these are linked to translation initiation and protein biosynthesis rates. We subject footprints of disomes and other translation complexes to artificial intelligence (AI) analysis and construct a new, accurate and self-normalized measure of translation, termed stochastic translation efficiency (STE). We then apply STE to investigate rapid changes to mRNA translation in yeast undergoing glucose depletion. Importantly, we show that, well beyond tagging elongation stalls, footprints of co-localized ribosomes provide rich insight into translational mechanisms, polysome dynamics and topology. STE AI ranks cellular mRNAs by absolute translation rates under given conditions, can assist in identifying its control elements and will facilitate the development of next-generation synthetic biology designs and mRNA-based therapeutics.
Collapse
Affiliation(s)
- Attila Horvath
- Division of Genome Sciences and Cancer, The John Curtin School of Medical Research, and The Shine-Dalgarno Centre for RNA Innovation, The Australian National University, Canberra, ACT 2601, Australia
| | - Yoshika Janapala
- Division of Genome Sciences and Cancer, The John Curtin School of Medical Research, and The Shine-Dalgarno Centre for RNA Innovation, The Australian National University, Canberra, ACT 2601, Australia
| | - Katrina Woodward
- Division of Genome Sciences and Cancer, The John Curtin School of Medical Research, and The Shine-Dalgarno Centre for RNA Innovation, The Australian National University, Canberra, ACT 2601, Australia
| | - Shafi Mahmud
- Division of Genome Sciences and Cancer, The John Curtin School of Medical Research, and The Shine-Dalgarno Centre for RNA Innovation, The Australian National University, Canberra, ACT 2601, Australia
| | - Alice Cleynen
- Division of Genome Sciences and Cancer, The John Curtin School of Medical Research, and The Shine-Dalgarno Centre for RNA Innovation, The Australian National University, Canberra, ACT 2601, Australia
- Institut Montpelliérain Alexander Grothendieck, Université de Montpellier, CNRS, Montpellier, France
| | - Elizabeth E Gardiner
- Division of Genome Sciences and Cancer, The John Curtin School of Medical Research, and The National Platelet Research and Referral Centre, The Australian National University, Canberra, ACT 2601, Australia
| | - Ross D Hannan
- Division of Genome Sciences and Cancer, The John Curtin School of Medical Research, and The Shine-Dalgarno Centre for RNA Innovation, The Australian National University, Canberra, ACT 2601, Australia
- Department of Biochemistry and Molecular Biology, University of Melbourne, Parkville 3010, Australia
- Peter MacCallum Cancer Centre, Melbourne 3000, Australia
- Department of Biochemistry and Molecular Biology, Monash University, Clayton 3800, Australia
- School of Biomedical Sciences, University of Queensland, St Lucia 4067, Australia
| | - Eduardo Eyras
- Division of Genome Sciences and Cancer, The John Curtin School of Medical Research, and The Shine-Dalgarno Centre for RNA Innovation, The Australian National University, Canberra, ACT 2601, Australia
- Division of Genome Sciences and Cancer, The John Curtin School of Medical Research, and The Centre for Computational Biomedical Sciences, The Australian National University, Canberra, ACT 2601, Australia
- EMBL Australia Partner Laboratory Network at the Australian National University, Canberra, ACT 2601, Australia
| | - Thomas Preiss
- Division of Genome Sciences and Cancer, The John Curtin School of Medical Research, and The Shine-Dalgarno Centre for RNA Innovation, The Australian National University, Canberra, ACT 2601, Australia
- Victor Chang Cardiac Research Institute, Darlinghurst, NSW 2010, Australia
| | - Nikolay E Shirokikh
- Division of Genome Sciences and Cancer, The John Curtin School of Medical Research, and The Shine-Dalgarno Centre for RNA Innovation, The Australian National University, Canberra, ACT 2601, Australia
| |
Collapse
|
22
|
Pham NT, Terrance AT, Jeon YJ, Rakkiyappan R, Manavalan B. ac4C-AFL: A high-precision identification of human mRNA N4-acetylcytidine sites based on adaptive feature representation learning. MOLECULAR THERAPY. NUCLEIC ACIDS 2024; 35:102192. [PMID: 38779332 PMCID: PMC11108997 DOI: 10.1016/j.omtn.2024.102192] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/07/2023] [Accepted: 04/18/2024] [Indexed: 05/25/2024]
Abstract
RNA N4-acetylcytidine (ac4C) is a highly conserved RNA modification that plays a crucial role in controlling mRNA stability, processing, and translation. Consequently, accurate identification of ac4C sites across the genome is critical for understanding gene expression regulation mechanisms. In this study, we have developed ac4C-AFL, a bioinformatics tool that precisely identifies ac4C sites from primary RNA sequences. In ac4C-AFL, we identified the optimal sequence length for model building and implemented an adaptive feature representation strategy that is capable of extracting the most representative features from RNA. To identify the most relevant features, we proposed a novel ensemble feature importance scoring strategy to rank features effectively. We then used this information to conduct the sequential forward search, which individually determine the optimal feature set from the 16 sequence-derived feature descriptors. Utilizing these optimal feature descriptors, we constructed 176 baseline models using 11 popular classifiers. The most efficient baseline models were identified using the two-step feature selection approach, whose predicted scores were integrated and trained with the appropriate classifier to develop the final prediction model. Our rigorous cross-validations and independent tests demonstrate that ac4C-AFL surpasses contemporary tools in predicting ac4C sites. Moreover, we have developed a publicly accessible web server at https://balalab-skku.org/ac4C-AFL/.
Collapse
Affiliation(s)
- Nhat Truong Pham
- Department of Integrative Biotechnology, College of Biotechnology and Bioengineering, Sungkyunkwan University, Suwon, Gyeonggi-do 16419, Republic of Korea
| | - Annie Terrina Terrance
- Department of Integrative Biotechnology, College of Biotechnology and Bioengineering, Sungkyunkwan University, Suwon, Gyeonggi-do 16419, Republic of Korea
| | - Young-Jun Jeon
- Department of Integrative Biotechnology, College of Biotechnology and Bioengineering, Sungkyunkwan University, Suwon, Gyeonggi-do 16419, Republic of Korea
| | - Rajan Rakkiyappan
- Department of Mathematics, Bharathiar University, Coimbatore, Tamil Nadu 641046, India
| | - Balachandran Manavalan
- Department of Integrative Biotechnology, College of Biotechnology and Bioengineering, Sungkyunkwan University, Suwon, Gyeonggi-do 16419, Republic of Korea
| |
Collapse
|
23
|
Chu Y, Yu D, Li Y, Huang K, Shen Y, Cong L, Zhang J, Wang M. A 5' UTR Language Model for Decoding Untranslated Regions of mRNA and Function Predictions. NAT MACH INTELL 2024; 6:449-460. [PMID: 38855263 PMCID: PMC11155392 DOI: 10.1038/s42256-024-00823-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/04/2023] [Accepted: 03/07/2024] [Indexed: 06/11/2024]
Abstract
The 5' UTR, a regulatory region at the beginning of an mRNA molecule, plays a crucial role in regulating the translation process and impacts the protein expression level. Language models have showcased their effectiveness in decoding the functions of protein and genome sequences. Here, we introduced a language model for 5' UTR, which we refer to as the UTR-LM. The UTR-LM is pre-trained on endogenous 5' UTRs from multiple species and is further augmented with supervised information including secondary structure and minimum free energy. We fine-tuned the UTR-LM in a variety of downstream tasks. The model outperformed the best known benchmark by up to 5% for predicting the Mean Ribosome Loading, and by up to 8% for predicting the Translation Efficiency and the mRNA Expression Level. The model also applies to identifying unannotated Internal Ribosome Entry Sites within the untranslated region and improves the AUPR from 0.37 to 0.52 compared to the best baseline. Further, we designed a library of 211 novel 5' UTRs with high predicted values of translation efficiency and evaluated them via a wet-lab assay. Experiment results confirmed that our top designs achieved a 32.5% increase in protein production level relative to well-established 5' UTR optimized for therapeutics.
Collapse
Affiliation(s)
- Yanyi Chu
- Center for Statistics and Machine Learning and Department of Electrical and Computer Engineering, Princeton University, Princeton, NJ 08544, USA
- Department of Pathology, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - Dan Yu
- RVAC Medicines, Waltham, MA 02451, USA
| | - Yupeng Li
- RVAC Medicines, Waltham, MA 02451, USA
| | - Kaixuan Huang
- Center for Statistics and Machine Learning and Department of Electrical and Computer Engineering, Princeton University, Princeton, NJ 08544, USA
| | - Yue Shen
- RVAC Medicines, Waltham, MA 02451, USA
| | - Le Cong
- Department of Pathology, Stanford University School of Medicine, Stanford, CA 94305, USA
| | | | - Mengdi Wang
- Center for Statistics and Machine Learning and Department of Electrical and Computer Engineering, Princeton University, Princeton, NJ 08544, USA
| |
Collapse
|
24
|
Hara K, Iwano N, Fukunaga T, Hamada M. DeepRaccess: high-speed RNA accessibility prediction using deep learning. FRONTIERS IN BIOINFORMATICS 2023; 3:1275787. [PMID: 37881622 PMCID: PMC10597636 DOI: 10.3389/fbinf.2023.1275787] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2023] [Accepted: 09/29/2023] [Indexed: 10/27/2023] Open
Abstract
RNA accessibility is a useful RNA secondary structural feature for predicting RNA-RNA interactions and translation efficiency in prokaryotes. However, conventional accessibility calculation tools, such as Raccess, are computationally expensive and require considerable computational time to perform transcriptome-scale analysis. In this study, we developed DeepRaccess, which predicts RNA accessibility based on deep learning methods. DeepRaccess was trained to take artificial RNA sequences as input and to predict the accessibility of these sequences as calculated by Raccess. Simulation and empirical dataset analyses showed that the accessibility predicted by DeepRaccess was highly correlated with the accessibility calculated by Raccess. In addition, we confirmed that DeepRaccess could predict protein abundance in E.coli with moderate accuracy from the sequences around the start codon. We also demonstrated that DeepRaccess achieved tens to hundreds of times software speed-up in a GPU environment. The source codes and the trained models of DeepRaccess are freely available at https://github.com/hmdlab/DeepRaccess.
Collapse
Affiliation(s)
- Kaisei Hara
- Department of Electrical Engineering and Bioscience, Graduate School of Advanced Science and Engineering, Waseda University, Tokyo, Japan
- Computational Bio Big-Data Open Innovation Laboratory, AIST-Waseda University, Tokyo, Japan
| | - Natsuki Iwano
- Department of Electrical Engineering and Bioscience, Graduate School of Advanced Science and Engineering, Waseda University, Tokyo, Japan
| | - Tsukasa Fukunaga
- Waseda Institute for Advanced Study, Waseda University, Tokyo, Japan
| | - Michiaki Hamada
- Department of Electrical Engineering and Bioscience, Graduate School of Advanced Science and Engineering, Waseda University, Tokyo, Japan
- Computational Bio Big-Data Open Innovation Laboratory, AIST-Waseda University, Tokyo, Japan
- Graduate School of Medicine, Nippon Medical School, Tokyo, Japan
| |
Collapse
|
25
|
Wang Z, Liang S, Liu S, Meng Z, Wang J, Liang S. Sequence pre-training-based graph neural network for predicting lncRNA-miRNA associations. Brief Bioinform 2023; 24:bbad317. [PMID: 37651605 DOI: 10.1093/bib/bbad317] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2023] [Revised: 06/28/2023] [Accepted: 08/15/2023] [Indexed: 09/02/2023] Open
Abstract
MicroRNAs (miRNAs) silence genes by binding to messenger RNAs, whereas long non-coding RNAs (lncRNAs) act as competitive endogenous RNAs (ceRNAs) that can relieve miRNA silencing effects and upregulate target gene expression. The ceRNA association between lncRNAs and miRNAs has been a research hotspot due to its medical importance, but it is challenging to verify experimentally. In this paper, we propose a novel deep learning scheme, i.e. sequence pre-training-based graph neural network (SPGNN), that combines pre-training and fine-tuning stages to predict lncRNA-miRNA associations from RNA sequences and the existing interactions represented as a graph. First, we utilize a sequence-to-vector technique to generate pre-trained embeddings based on the sequences of all RNAs during the pre-training stage. In the fine-tuning stage, we use Graph Neural Network to learn node representations from the heterogeneous graph constructed using lncRNA-miRNA association information. We evaluate our proposed scheme SPGNN on our newly collected animal lncRNA-miRNA association dataset and demonstrate that combining the $k$-mer technique and Doc2vec model for pre-training with the Simple Graph Convolution Network for fine-tuning is effective in predicting lncRNA-miRNA associations. Our approach outperforms state-of-the-art baselines across various evaluation metrics. We also conduct an ablation study and hyperparameter analysis to verify the effectiveness of each component and parameter of our scheme. The complete code and dataset are available on GitHub: https://github.com/zixwang/SPGNN.
Collapse
Affiliation(s)
- Zixiao Wang
- Mohamed bin Zayed University of Artificial Intelligence, Masdar City, UAE
| | - Shiyang Liang
- Department of Gastroenterology, Tangdu Hospital, Air Force Medical University, 569 Xinsi Road, Xi'an 710038, Shaanxi, China
| | - Siwei Liu
- Mohamed bin Zayed University of Artificial Intelligence, Masdar City, UAE
| | | | - Jingjie Wang
- Department of Gastroenterology, Tangdu Hospital, Air Force Medical University, 569 Xinsi Road, Xi'an 710038, Shaanxi, China
| | - Shangsong Liang
- Mohamed bin Zayed University of Artificial Intelligence, Masdar City, UAE
| |
Collapse
|
26
|
Sato K, Hamada M. Recent trends in RNA informatics: a review of machine learning and deep learning for RNA secondary structure prediction and RNA drug discovery. Brief Bioinform 2023; 24:bbad186. [PMID: 37232359 PMCID: PMC10359090 DOI: 10.1093/bib/bbad186] [Citation(s) in RCA: 28] [Impact Index Per Article: 14.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2023] [Revised: 04/24/2023] [Accepted: 04/25/2023] [Indexed: 05/27/2023] Open
Abstract
Computational analysis of RNA sequences constitutes a crucial step in the field of RNA biology. As in other domains of the life sciences, the incorporation of artificial intelligence and machine learning techniques into RNA sequence analysis has gained significant traction in recent years. Historically, thermodynamics-based methods were widely employed for the prediction of RNA secondary structures; however, machine learning-based approaches have demonstrated remarkable advancements in recent years, enabling more accurate predictions. Consequently, the precision of sequence analysis pertaining to RNA secondary structures, such as RNA-protein interactions, has also been enhanced, making a substantial contribution to the field of RNA biology. Additionally, artificial intelligence and machine learning are also introducing technical innovations in the analysis of RNA-small molecule interactions for RNA-targeted drug discovery and in the design of RNA aptamers, where RNA serves as its own ligand. This review will highlight recent trends in the prediction of RNA secondary structure, RNA aptamers and RNA drug discovery using machine learning, deep learning and related technologies, and will also discuss potential future avenues in the field of RNA informatics.
Collapse
Affiliation(s)
- Kengo Sato
- School of System Design and Technology, Tokyo Denki University, 5 Senju Asahi-cho, Adachi-ku, Tokyo 120-8551, Japan
| | - Michiaki Hamada
- Department of Electrical Engineering and Bioscience, Faculty of Science and Engineering, Waseda University, 55N-06-10, 3-4-1, Okubo, Shinjuku-ku, Tokyo 169-8555, Japan
- Computational Bio Big-Data Open Innovation Laboratory (CBBD-OIL) , National Institute of Advanced Industrial Science and Technology (AIST), 3-4-1, Okubo, Shinjuku-ku, Tokyo 169-8555, Japan
- Graduate School of Medicine, Nippon Medical School, 1-1-5, Sendagi, Bunkyo-ku, Tokyo 113-8602, Japan
| |
Collapse
|
27
|
Dunkel H, Wehrmann H, Jensen LR, Kuss AW, Simm S. MncR: Late Integration Machine Learning Model for Classification of ncRNA Classes Using Sequence and Structural Encoding. Int J Mol Sci 2023; 24:8884. [PMID: 37240230 PMCID: PMC10218863 DOI: 10.3390/ijms24108884] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2023] [Revised: 05/11/2023] [Accepted: 05/13/2023] [Indexed: 05/28/2023] Open
Abstract
Non-coding RNA (ncRNA) classes take over important housekeeping and regulatory functions and are quite heterogeneous in terms of length, sequence conservation and secondary structure. High-throughput sequencing reveals that the expressed novel ncRNAs and their classification are important to understand cell regulation and identify potential diagnostic and therapeutic biomarkers. To improve the classification of ncRNAs, we investigated different approaches of utilizing primary sequences and secondary structures as well as the late integration of both using machine learning models, including different neural network architectures. As input, we used the newest version of RNAcentral, focusing on six ncRNA classes, including lncRNA, rRNA, tRNA, miRNA, snRNA and snoRNA. The late integration of graph-encoded structural features and primary sequences in our MncR classifier achieved an overall accuracy of >97%, which could not be increased by more fine-grained subclassification. In comparison to the actual best-performing tool ncRDense, we had a minimal increase of 0.5% in all four overlapping ncRNA classes on a similar test set of sequences. In summary, MncR is not only more accurate than current ncRNA prediction tools but also allows the prediction of long ncRNA classes (lncRNAs, certain rRNAs) up to 12.000 nts and is trained on a more diverse ncRNA dataset retrieved from RNAcentral.
Collapse
Affiliation(s)
- Heiko Dunkel
- Institute of Bioinformatics, University Medicine Greifswald, Walther-Rathenau Str. 48, 17489 Greifswald, Germany
| | - Henning Wehrmann
- Department of Biosciences, Molecular Cell Biology of Plants, Goethe University, 60438 Frankfurt am Main, Germany
| | - Lars R. Jensen
- Human Molecular Genetics Group, Department of Functional Genomics, Interfaculty Institute of Genetics and Functional Genomics, University Medicine Greifswald, 17475 Greifswald, Germany
| | - Andreas W. Kuss
- Human Molecular Genetics Group, Department of Functional Genomics, Interfaculty Institute of Genetics and Functional Genomics, University Medicine Greifswald, 17475 Greifswald, Germany
| | - Stefan Simm
- Institute of Bioinformatics, University Medicine Greifswald, Walther-Rathenau Str. 48, 17489 Greifswald, Germany
| |
Collapse
|
28
|
Rios-Martinez C, Bhattacharya N, Amini AP, Crawford L, Yang KK. Deep self-supervised learning for biosynthetic gene cluster detection and product classification. PLoS Comput Biol 2023; 19:e1011162. [PMID: 37220151 PMCID: PMC10241353 DOI: 10.1371/journal.pcbi.1011162] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/24/2022] [Revised: 06/05/2023] [Accepted: 05/07/2023] [Indexed: 05/25/2023] Open
Abstract
Natural products are chemical compounds that form the basis of many therapeutics used in the pharmaceutical industry. In microbes, natural products are synthesized by groups of colocalized genes called biosynthetic gene clusters (BGCs). With advances in high-throughput sequencing, there has been an increase of complete microbial isolate genomes and metagenomes, from which a vast number of BGCs are undiscovered. Here, we introduce a self-supervised learning approach designed to identify and characterize BGCs from such data. To do this, we represent BGCs as chains of functional protein domains and train a masked language model on these domains. We assess the ability of our approach to detect BGCs and characterize BGC properties in bacterial genomes. We also demonstrate that our model can learn meaningful representations of BGCs and their constituent domains, detect BGCs in microbial genomes, and predict BGC product classes. These results highlight self-supervised neural networks as a promising framework for improving BGC prediction and classification.
Collapse
Affiliation(s)
- Carolina Rios-Martinez
- Microsoft Research New England, Cambridge, Massachusetts, United States of America
- Department of Bioengineering, Stanford University, Stanford, California, United States of America
| | - Nicholas Bhattacharya
- Microsoft Research New England, Cambridge, Massachusetts, United States of America
- Department of Mathematics, University of California, Berkeley, Berkeley, California, United States of America
| | - Ava P. Amini
- Microsoft Research New England, Cambridge, Massachusetts, United States of America
| | - Lorin Crawford
- Microsoft Research New England, Cambridge, Massachusetts, United States of America
| | - Kevin K. Yang
- Microsoft Research New England, Cambridge, Massachusetts, United States of America
| |
Collapse
|