1
|
Zhai J, Zhang Y, Zhang C, Yin X, Song M, Tang C, Ding P, Li Z, Ma C. deepTFBS: Improving within- and Cross-Species Prediction of Transcription Factor Binding Using Deep Multi-Task and Transfer Learning. ADVANCED SCIENCE (WEINHEIM, BADEN-WURTTEMBERG, GERMANY) 2025:e03135. [PMID: 40411397 DOI: 10.1002/advs.202503135] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/18/2025] [Revised: 04/24/2025] [Indexed: 05/26/2025]
Abstract
The precise prediction of transcription factor binding sites (TFBSs) is crucial in understanding gene regulation. In this study, deepTFBS, a comprehensive deep learning (DL) framework that builds a robust DNA language model of TF binding grammar for accurately predicting TFBSs within and across plant species is presented. Taking advantages of multi-task DL and transfer learning, deepTFBS is capable of leveraging the knowledge learned from large-scale TF binding profiles to enhance the prediction of TFBSs under small-sample training and cross-species prediction tasks. When tested using available information on 359 Arabidopsis TFs, deepTFBS outperformed previously described prediction strategies, including position weight matrix, deepSEA and DanQ, with a 244.49%, 49.15%, and 23.32% improvement of the area under the precision-recall curve (PRAUC), respectively. Further cross-species prediction of TFBS in wheat showed that deepTFBS yielded a significant PRAUC improvement of 30.6% over these three baseline models. deepTFBS can also utilize information from gene conservation and binding motifs, enabling efficient TFBS prediction in species where experimental data availability is limited. A case study, focusing on the WUSCHEL (WUS) transcription factor, illustrated the potential use of deepTFBS in cross-species applications, in our example between Arabidopsis and wheat. deepTFBS is publically available at https://github.com/cma2015/deepTFBS.
Collapse
Affiliation(s)
- Jingjing Zhai
- State Key Laboratory for Crop Stress Resistance and High-Efficiency Production, Center of Bioinformatics, College of Life Sciences, Northwest A&F University, Yangling, Shaanxi, 712100, China
- Key Laboratory of Biology and Genetics Improvement of Maize in Arid Area of Northwest Region, Ministry of Agriculture, Northwest A&F University, Yangling, Shaanxi, 712100, China
| | - Yuzhou Zhang
- State Key Laboratory for Crop Stress Resistance and High-Efficiency Production, Center of Bioinformatics, College of Life Sciences, Northwest A&F University, Yangling, Shaanxi, 712100, China
| | - Chujun Zhang
- State Key Laboratory for Crop Stress Resistance and High-Efficiency Production, Center of Bioinformatics, College of Life Sciences, Northwest A&F University, Yangling, Shaanxi, 712100, China
- Key Laboratory of Biology and Genetics Improvement of Maize in Arid Area of Northwest Region, Ministry of Agriculture, Northwest A&F University, Yangling, Shaanxi, 712100, China
| | - Xiaotong Yin
- College of Life Sciences, Northwest A&F University, Yangling, Shaanxi, 712100, China
| | - Minggui Song
- State Key Laboratory for Crop Stress Resistance and High-Efficiency Production, Center of Bioinformatics, College of Life Sciences, Northwest A&F University, Yangling, Shaanxi, 712100, China
- Key Laboratory of Biology and Genetics Improvement of Maize in Arid Area of Northwest Region, Ministry of Agriculture, Northwest A&F University, Yangling, Shaanxi, 712100, China
| | - Chenglong Tang
- College of Life Sciences, Northwest A&F University, Yangling, Shaanxi, 712100, China
| | - Pengjun Ding
- State Key Laboratory for Crop Stress Resistance and High-Efficiency Production, Center of Bioinformatics, College of Life Sciences, Northwest A&F University, Yangling, Shaanxi, 712100, China
- Key Laboratory of Biology and Genetics Improvement of Maize in Arid Area of Northwest Region, Ministry of Agriculture, Northwest A&F University, Yangling, Shaanxi, 712100, China
| | - Zenglin Li
- State Key Laboratory for Crop Stress Resistance and High-Efficiency Production, Center of Bioinformatics, College of Life Sciences, Northwest A&F University, Yangling, Shaanxi, 712100, China
- Key Laboratory of Biology and Genetics Improvement of Maize in Arid Area of Northwest Region, Ministry of Agriculture, Northwest A&F University, Yangling, Shaanxi, 712100, China
| | - Chuang Ma
- State Key Laboratory for Crop Stress Resistance and High-Efficiency Production, Center of Bioinformatics, College of Life Sciences, Northwest A&F University, Yangling, Shaanxi, 712100, China
- Key Laboratory of Biology and Genetics Improvement of Maize in Arid Area of Northwest Region, Ministry of Agriculture, Northwest A&F University, Yangling, Shaanxi, 712100, China
| |
Collapse
|
2
|
Appleton E, Tao J, Liu S, Glass C, Fonseca G, Church G. Machine-guided cell-fate engineering. Cell Rep 2025; 44:115726. [PMID: 40382774 DOI: 10.1016/j.celrep.2025.115726] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2024] [Revised: 03/06/2025] [Accepted: 04/30/2025] [Indexed: 05/20/2025] Open
Abstract
The creation of induced pluripotent stem cells (iPSCs) has enabled scientists to explore the function, mechanisms, and differentiation processes of many types of cells. One of the fastest and most efficient approaches is transcription factor (TF) over-expression. However, finding the right combination of TFs to over-express to differentiate iPSCs directly into other cell types is a difficult task. Here, we describe a machine-learning (ML) pipeline, called CellCartographer, that uses chromatin accessibility and transcriptomics data to design multiplex TF pooled-screening experiments for cell-type conversions that then may be iteratively refined. We validate this method by differentiating iPSCs into twelve cell types at low efficiency in preliminary screens and iteratively refine our TF combinations to achieve high-efficiency differentiation for six of these cell types in <6 days. Finally, we functionally characterize iPSC-derived cytotoxic T cells (iCytoTs), regulatory T cells (iTregs), type II astrocytes (iAstIIs), and hepatocytes (iHeps) to validate functionally accurate differentiation.
Collapse
Affiliation(s)
- Evan Appleton
- Wyss Institute for Biologically Inspired Engineering at Harvard University, Boston, MA 02115, USA; Department of Genetics, Harvard Medical School, Boston, MA 02115, USA.
| | - Jenhan Tao
- Cellular and Molecular Medicine, University of California at San Diego, La Jolla, CA 92093, USA.
| | - Songlei Liu
- Wyss Institute for Biologically Inspired Engineering at Harvard University, Boston, MA 02115, USA; Department of Genetics, Harvard Medical School, Boston, MA 02115, USA
| | - Christopher Glass
- Cellular and Molecular Medicine, University of California at San Diego, La Jolla, CA 92093, USA
| | - Gregory Fonseca
- Meakins-Christe Laboratories, Research Institute of McGill University Health Centre, Montréal, QC H4A-3J1, Canada; Quantitative Life Sciences, McGill University, Montréal, QC H4A-3J1, Canada; Department of Medicine, Division of Experimental Medicine, McGill University, Montréal, QC H4A-3J1, Canada
| | - George Church
- Wyss Institute for Biologically Inspired Engineering at Harvard University, Boston, MA 02115, USA; Department of Genetics, Harvard Medical School, Boston, MA 02115, USA.
| |
Collapse
|
3
|
Wan F, Torres MDT, Guan C, de la Fuente-Nunez C. Tutorial: guidelines for the use of machine learning methods to mine genomes and proteomes for antibiotic discovery. Nat Protoc 2025:10.1038/s41596-025-01144-w. [PMID: 40369233 DOI: 10.1038/s41596-025-01144-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2024] [Accepted: 01/08/2025] [Indexed: 05/16/2025]
Abstract
Genomes and proteomes constitute a rich reservoir of molecular diversity. However, they have remained underexplored because of a lack of appropriate tools. In recent years, computational approaches have been developed to mine this unexplored biological information, or dark matter, accelerating the discovery of new antibiotic molecules. Such efforts have yielded a wide range of new molecules. These include peptides released via predicted proteolytic cleavage of larger proteins, termed 'encrypted peptides', which have been found to be widespread in nature. Molecules encoded by and translated from small open reading frames within genomic sequences have also been uncovered, further expanding the landscape of bioactive compounds. Here, we discuss computational approaches, including machine learning and artificial intelligence (AI) tools, which have been used to date to identify antimicrobial compounds, with a special emphasis on peptides. We also propose potential avenues for future exploration in this rapidly evolving field. Moreover, we provide an overview of the experimental methods commonly used to validate these computational predictions. We anticipate that efforts combining cutting-edge AI and experimental approaches for biological sequence mining will reveal new insights into host immunity and continue to accelerate discoveries in the fields of antibiotics and infectious diseases.
Collapse
Affiliation(s)
- Fangping Wan
- Machine Biology Group, Departments of Psychiatry and Microbiology, Institute for Biomedical Informatics, Institute for Translational Medicine and Therapeutics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
- Departments of Bioengineering and Chemical and Biomolecular Engineering, School of Engineering and Applied Science, University of Pennsylvania, Philadelphia, PA, USA
- Department of Chemistry, School of Arts and Sciences, University of Pennsylvania, Philadelphia, PA, USA
- Penn Institute for Computational Science, University of Pennsylvania, Philadelphia, PA, USA
| | - Marcelo D T Torres
- Machine Biology Group, Departments of Psychiatry and Microbiology, Institute for Biomedical Informatics, Institute for Translational Medicine and Therapeutics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
- Departments of Bioengineering and Chemical and Biomolecular Engineering, School of Engineering and Applied Science, University of Pennsylvania, Philadelphia, PA, USA
- Department of Chemistry, School of Arts and Sciences, University of Pennsylvania, Philadelphia, PA, USA
- Penn Institute for Computational Science, University of Pennsylvania, Philadelphia, PA, USA
| | - Changge Guan
- Machine Biology Group, Departments of Psychiatry and Microbiology, Institute for Biomedical Informatics, Institute for Translational Medicine and Therapeutics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
- Departments of Bioengineering and Chemical and Biomolecular Engineering, School of Engineering and Applied Science, University of Pennsylvania, Philadelphia, PA, USA
- Department of Chemistry, School of Arts and Sciences, University of Pennsylvania, Philadelphia, PA, USA
- Penn Institute for Computational Science, University of Pennsylvania, Philadelphia, PA, USA
| | - Cesar de la Fuente-Nunez
- Machine Biology Group, Departments of Psychiatry and Microbiology, Institute for Biomedical Informatics, Institute for Translational Medicine and Therapeutics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA.
- Departments of Bioengineering and Chemical and Biomolecular Engineering, School of Engineering and Applied Science, University of Pennsylvania, Philadelphia, PA, USA.
- Department of Chemistry, School of Arts and Sciences, University of Pennsylvania, Philadelphia, PA, USA.
- Penn Institute for Computational Science, University of Pennsylvania, Philadelphia, PA, USA.
| |
Collapse
|
4
|
Barbadilla-Martínez L, Klaassen N, van Steensel B, de Ridder J. Predicting gene expression from DNA sequence using deep learning models. Nat Rev Genet 2025:10.1038/s41576-025-00841-2. [PMID: 40360798 DOI: 10.1038/s41576-025-00841-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 04/01/2025] [Indexed: 05/15/2025]
Abstract
Transcription of genes is regulated by DNA elements such as promoters and enhancers, the activity of which are in turn controlled by many transcription factors. Owing to the highly complex combinatorial logic involved, it has been difficult to construct computational models that predict gene activity from DNA sequence. Recent advances in deep learning techniques applied to data from epigenome mapping and high-throughput reporter assays have made substantial progress towards addressing this complexity. Such models can capture the regulatory grammar with remarkable accuracy and show great promise in predicting the effects of non-coding variants, uncovering detailed molecular mechanisms of gene regulation and designing synthetic regulatory elements for biotechnology. Here, we discuss the principles of these approaches, the types of training data sets that are available and the strengths and limitations of different approaches.
Collapse
Affiliation(s)
- Lucía Barbadilla-Martínez
- Oncode Institute, Utrecht, The Netherlands
- Center for Molecular Medicine, UMC Utrecht, Utrecht, The Netherlands
| | - Noud Klaassen
- Oncode Institute, Utrecht, The Netherlands
- Division of Molecular Genetics, Netherlands Cancer Institute, Amsterdam, The Netherlands
| | - Bas van Steensel
- Oncode Institute, Utrecht, The Netherlands.
- Division of Molecular Genetics, Netherlands Cancer Institute, Amsterdam, The Netherlands.
| | - Jeroen de Ridder
- Oncode Institute, Utrecht, The Netherlands.
- Center for Molecular Medicine, UMC Utrecht, Utrecht, The Netherlands.
| |
Collapse
|
5
|
Teschendorff AE, Horvath S. Epigenetic ageing clocks: statistical methods and emerging computational challenges. Nat Rev Genet 2025; 26:350-368. [PMID: 39806006 DOI: 10.1038/s41576-024-00807-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 11/20/2024] [Indexed: 01/16/2025]
Abstract
Over the past decade, epigenetic clocks have emerged as powerful machine learning tools, not only to estimate chronological and biological age but also to assess the efficacy of anti-ageing, cellular rejuvenation and disease-preventive interventions. However, many computational and statistical challenges remain that limit our understanding, interpretation and application of epigenetic clocks. Here, we review these computational challenges, focusing on interpretation, cell-type heterogeneity and emerging single-cell methods, aiming to provide guidelines for the rigorous construction of interpretable epigenetic clocks at cell-type and single-cell resolution.
Collapse
Affiliation(s)
- Andrew E Teschendorff
- CAS Key Laboratory of Computational Biology, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai, China.
| | | |
Collapse
|
6
|
Saparov A, Zech M. Big data and transformative bioinformatics in genomic diagnostics and beyond. Parkinsonism Relat Disord 2025; 134:107311. [PMID: 39924354 DOI: 10.1016/j.parkreldis.2025.107311] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 12/11/2024] [Revised: 01/23/2025] [Accepted: 01/25/2025] [Indexed: 02/11/2025]
Abstract
The current era of high-throughput analysis-driven research offers invaluable insights into disease etiologies, accurate diagnostics, pathogenesis, and personalized therapy. In the field of movement disorders, investigators are facing an increasing growth in the volume of produced patient-derived datasets, providing substantial opportunities for precision medicine approaches based on extensive information accessibility and advanced annotation practices. Integrating data from multiple sources, including phenomics, genomics, and multi-omics, is crucial for comprehensively understanding different types of movement disorders. Here, we explore formats and analytics of big data generated for patients with movement disorders, including strategies to meaningfully share the data for optimized patient benefit. We review computational methods that are essential to accelerate the process of evaluating the increasing amounts of specialized data collected. Based on concrete examples, we highlight how bioinformatic approaches facilitate the translation of multidimensional biological information into clinically relevant knowledge. Moreover, we outline the feasibility of computer-aided therapeutic target evaluation, and we discuss the importance of expanding the focus of big data research to understudied phenotypes such as dystonia.
Collapse
Affiliation(s)
- Alice Saparov
- Institute of Human Genetics, Technical University of Munich, School of Medicine and Health, Munich, Germany; Institute of Neurogenomics, Helmholtz Munich, Neuherberg, Germany; Institute for Advanced Study, Technical University of Munich, Garching, Germany
| | - Michael Zech
- Institute of Human Genetics, Technical University of Munich, School of Medicine and Health, Munich, Germany; Institute of Neurogenomics, Helmholtz Munich, Neuherberg, Germany; Institute for Advanced Study, Technical University of Munich, Garching, Germany.
| |
Collapse
|
7
|
Liu L, Qi W, Zhang N, Zhang J, Liu S, Wang H, Jiang L, Sun Y. Nutraceuticals for Gut-Brain Axis Health: A Novel Approach to Combat Malnutrition and Future Personalised Nutraceutical Interventions. Nutrients 2025; 17:1551. [PMID: 40362863 PMCID: PMC12073618 DOI: 10.3390/nu17091551] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/26/2025] [Revised: 04/22/2025] [Accepted: 04/26/2025] [Indexed: 05/15/2025] Open
Abstract
The gut-brain axis (GBA) is a bidirectional communication network between the gastrointestinal tract and the brain, modulated by gut microbiota and related biomarkers. Malnutrition disrupts GBA homeostasis, exacerbating GBA dysfunction through gut dysbiosis, impaired neuroactive metabolite production, and systemic inflammation. Nutraceuticals, including probiotics, prebiotics, synbiotics, postbiotics, and paraprobiotics, offer a promising approach to improving GBA homeostasis by modulating the gut microbiota composition and related neuroactive metabolites. This review aims to elucidate the interplay between gut microbiota-derived biomarkers and GBA dysfunction in malnutrition and evaluate the potential of nutraceuticals in combating malnutrition. Furthermore, it explores the future of personalised nutraceutical interventions tailored to individual genetic and microbiome profiles, providing a targeted approach to optimise health outcomes. The integration of nutraceuticals into GBA health management could transform malnutrition treatment and improve cognitive and metabolic health.
Collapse
Affiliation(s)
- Litai Liu
- Tourism & Cuisine College, Harbin University of Commerce, Harbin 150028, China; (L.L.); (W.Q.); (N.Z.); (S.L.)
- Department of Food and Nutritional Sciences, University of Reading, Reading RG6 6UR, UK
| | - Wen Qi
- Tourism & Cuisine College, Harbin University of Commerce, Harbin 150028, China; (L.L.); (W.Q.); (N.Z.); (S.L.)
| | - Na Zhang
- Tourism & Cuisine College, Harbin University of Commerce, Harbin 150028, China; (L.L.); (W.Q.); (N.Z.); (S.L.)
| | - Jinhao Zhang
- College of Food Science, Northeast Agricultural University, Harbin 150030, China; (J.Z.); (H.W.); (L.J.)
| | - Shen Liu
- Tourism & Cuisine College, Harbin University of Commerce, Harbin 150028, China; (L.L.); (W.Q.); (N.Z.); (S.L.)
| | - Huan Wang
- College of Food Science, Northeast Agricultural University, Harbin 150030, China; (J.Z.); (H.W.); (L.J.)
| | - Lianzhou Jiang
- College of Food Science, Northeast Agricultural University, Harbin 150030, China; (J.Z.); (H.W.); (L.J.)
| | - Ying Sun
- Tourism & Cuisine College, Harbin University of Commerce, Harbin 150028, China; (L.L.); (W.Q.); (N.Z.); (S.L.)
| |
Collapse
|
8
|
Li X, Xie C, Cheng L, Tong H, Bock R, Qian Q, Zhou W. The next Green Revolution: integrating crop architectype and physiotype. Trends Biotechnol 2025:S0167-7799(25)00129-5. [PMID: 40307093 DOI: 10.1016/j.tibtech.2025.04.002] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2025] [Revised: 03/28/2025] [Accepted: 04/01/2025] [Indexed: 05/02/2025]
Abstract
In the middle of the last century, the Green Revolution dramatically increased crop yields and transformed global agriculture. As current food production is increasingly challenged by the demands of the growing population, climate change, and environmental degradation, a new Green Revolution is urgently needed. This Review highlights recent progress in defining the morphological ideotypes of four major crops, and proposes essential physiological traits critical for crop improvement and environmental adaptation. We introduce two concepts: the 'architectype' representing optimized morphological features, and the 'physiotype' encompassing improved physiological traits. By integrating these concepts through advanced genomic technologies and precision management practices, the next Green Revolution could potentially enhance crop yields and resource use efficiency by over 20-30%, thereby ensuring sustainable food production.
Collapse
Affiliation(s)
- Xia Li
- State Key Laboratory of Crop Gene Resources and Breeding, Institute of Crop Sciences, Chinese Academy of Agricultural Sciences, Beijing 100081, China
| | - Chen Xie
- State Key Laboratory of Crop Gene Resources and Breeding, Institute of Crop Sciences, Chinese Academy of Agricultural Sciences, Beijing 100081, China
| | - Lin Cheng
- State Key Laboratory of Crop Gene Resources and Breeding, Institute of Crop Sciences, Chinese Academy of Agricultural Sciences, Beijing 100081, China
| | - Hongning Tong
- State Key Laboratory of Crop Gene Resources and Breeding, Institute of Crop Sciences, Chinese Academy of Agricultural Sciences, Beijing 100081, China
| | - Ralph Bock
- Max Planck Institute of Molecular Plant Physiology, Am Mühlenberg 1, 14476 Potsdam-Golm, Germany
| | - Qian Qian
- State Key Laboratory of Crop Gene Resources and Breeding, Institute of Crop Sciences, Chinese Academy of Agricultural Sciences, Beijing 100081, China
| | - Wenbin Zhou
- State Key Laboratory of Crop Gene Resources and Breeding, Institute of Crop Sciences, Chinese Academy of Agricultural Sciences, Beijing 100081, China.
| |
Collapse
|
9
|
Binder N, Khavaran A, Sankowski R. Primer on machine learning applications in brain immunology. FRONTIERS IN BIOINFORMATICS 2025; 5:1554010. [PMID: 40313869 PMCID: PMC12043695 DOI: 10.3389/fbinf.2025.1554010] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/07/2025] [Accepted: 03/24/2025] [Indexed: 05/03/2025] Open
Abstract
Single-cell and spatial technologies have transformed our understanding of brain immunology, providing unprecedented insights into immune cell heterogeneity and spatial organisation within the central nervous system. These methods have uncovered complex cellular interactions, rare cell populations, and the dynamic immune landscape in neurological disorders. This review highlights recent advances in single-cell "omics" data analysis and discusses their applicability for brain immunology. Traditional statistical techniques, adapted for single-cell omics, have been crucial in categorizing cell types and identifying gene signatures, overcoming challenges posed by increasingly complex datasets. We explore how machine learning, particularly deep learning methods like autoencoders and graph neural networks, is addressing these challenges by enhancing dimensionality reduction, data integration, and feature extraction. Newly developed foundation models present exciting opportunities for uncovering gene expression programs and predicting genetic perturbations. Focusing on brain development, we demonstrate how single-cell analyses have resolved immune cell heterogeneity, identified temporal maturation trajectories, and uncovered potential therapeutic links to various pathologies, including brain malignancies and neurodegeneration. The integration of single-cell and spatial omics has elucidated the intricate cellular interplay within the developing brain. This mini-review is intended for wet lab biologists at all career stages, offering a concise overview of the evolving landscape of single-cell omics in the age of widely available artificial intelligence.
Collapse
Affiliation(s)
| | | | - Roman Sankowski
- Institute of Neuropathology, Faculty of Medicine, University of Freiburg, Freiburg, Germany
| |
Collapse
|
10
|
Ferreira REP, Dórea JRR. Leveraging computer vision, large language models, and multimodal machine learning for optimal decision-making in dairy farming. J Dairy Sci 2025:S0022-0302(25)00211-5. [PMID: 40221039 DOI: 10.3168/jds.2024-25650] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2024] [Accepted: 03/06/2025] [Indexed: 04/14/2025]
Abstract
This article explores various applications of artificial intelligence technologies in dairy farming, including the use of computer vision systems (CVS) for animal identification, body condition score (BCS) and body shape analysis, and potential uses of LLMs in the dairy industry. Among recent advancements in precision livestock farming (PLF) tools, CVS have gained popularity as powerful solutions for individual animal monitoring. These systems can capture phenotypes from multiple animals simultaneously using a single device in an automated and non-intrusive manner. To match animals with their corresponding predicted phenotypes, these systems require individual animal identification, which can be achieved through external identification systems or computer vision-based animal identification algorithms. Additionally, modern natural language processing techniques, such as large language models (LLMs), offer opportunities for advanced data integration, including unstructured textual data. Furthermore, we discuss the challenges associated with integrating data from different sources and modalities - such as images, text, and tabular data - into multimodal machine learning systems for phenotype prediction, which also represents a key area of artificial intelligence application. Digital technologies such as CVS and LLMs have the potential to transform dairy farming. CVS can provide individual and objective assessments of animal health, while LLMs can integrate diverse data sources for phenotype prediction. While there is much potential ahead, these technologies offer significant opportunities for advancing animal health monitoring, farm management, and individual phenotyping.
Collapse
Affiliation(s)
- Rafael E P Ferreira
- Department of Animal and Dairy Sciences, University of Wisconsin, Madison, WI 53706, USA
| | - João R R Dórea
- Department of Animal and Dairy Sciences, University of Wisconsin, Madison, WI 53706, USA; Department of Biological Systems Engineering, University of Wisconsin, Madison, WI 53706, USA.
| |
Collapse
|
11
|
Fan X, Chang T, Chen C, Hafner M, Wang Z. Analysis of RNA translation with a deep learning architecture provides new insight into translation control. Nucleic Acids Res 2025; 53:gkaf277. [PMID: 40219965 PMCID: PMC11992669 DOI: 10.1093/nar/gkaf277] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/24/2024] [Revised: 02/20/2025] [Accepted: 04/01/2025] [Indexed: 04/14/2025] Open
Abstract
Accurate annotation of coding regions in RNAs is essential for understanding gene translation. We developed a deep neural network to directly predict and analyze translation initiation and termination sites from RNA sequences. Trained with human transcripts, our model learned hidden rules of translation control and achieved a near perfect prediction of canonical translation sites across entire human transcriptome. Surprisingly, this model revealed a new role of codon usage in regulating translation termination, which was experimentally validated. We also identified thousands of new open reading frames in mRNAs or lncRNAs, some of which were confirmed experimentally. The model trained with human mRNAs achieved high prediction accuracy of canonical translation sites in all eukaryotes and good prediction in polycistronic transcripts from prokaryotes or RNA viruses, suggesting a high degree of conservation in translation control. Collectively, we present TranslationAI (https://www.biosino.org/TranslationAI/), a general and efficient deep learning model for RNA translation that generates new insights into the complexity of translation regulation.
Collapse
Affiliation(s)
- Xiaojuan Fan
- Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai, 200031, China
- RNA Molecular Biology Laboratory, National Institute of Arthritis and Musculoskeletal and Skin Disease, Bethesda, MD 20814, United States
| | - Tiangen Chang
- Laboratory of Cancer Data Science, National Cancer Institute, Bethesda, MD 20814, United States
| | - Chuyun Chen
- Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai, 200031, China
| | - Markus Hafner
- RNA Molecular Biology Laboratory, National Institute of Arthritis and Musculoskeletal and Skin Disease, Bethesda, MD 20814, United States
| | - Zefeng Wang
- Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai, 200031, China
- School of Life Science, Southern University of Science and Technology, Shenzhen, Guangdong, 518055, China
| |
Collapse
|
12
|
Nayak N, Mehrotra S, Karamchandani AN, Santelia D, Mehrotra R. Recent advances in designing synthetic plant regulatory modules. FRONTIERS IN PLANT SCIENCE 2025; 16:1567659. [PMID: 40241826 PMCID: PMC11999978 DOI: 10.3389/fpls.2025.1567659] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 01/27/2025] [Accepted: 03/17/2025] [Indexed: 04/18/2025]
Abstract
Introducing novel functions in plants through synthetic multigene circuits requires strict transcriptional regulation. Currently, the use of natural regulatory modules in synthetic circuits is hindered by our limited knowledge of complex plant regulatory mechanisms, the paucity of characterized promoters, and the possibility of crosstalk with endogenous circuits. Synthetic regulatory modules can overcome these limitations. This article introduces an integrative de novo approach for designing plant synthetic promoters by utilizing the available online tools and databases. The recent achievements in designing and validating synthetic plant promoters, enhancers, transcription factors, and the challenges of establishing synthetic circuits in plants are also discussed.
Collapse
Affiliation(s)
- Namitha Nayak
- Department of Biological Sciences, Birla Institute of Technology and Sciences Pilani, Goa, India
| | - Sandhya Mehrotra
- Department of Biological Sciences, Birla Institute of Technology and Sciences Pilani, Goa, India
| | | | - Diana Santelia
- Institute of Integrative Biology, ETH Zürich Universitätstrasse, Zürich, Switzerland
| | - Rajesh Mehrotra
- Department of Biological Sciences, Birla Institute of Technology and Sciences Pilani, Goa, India
| |
Collapse
|
13
|
Tyagi N, Vahab N, Tyagi S. Genome language modeling (GLM): a beginner's cheat sheet. Biol Methods Protoc 2025; 10:bpaf022. [PMID: 40370585 PMCID: PMC12077296 DOI: 10.1093/biomethods/bpaf022] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/07/2025] [Revised: 02/17/2025] [Accepted: 03/23/2025] [Indexed: 05/16/2025] Open
Abstract
Integrating genomics with diverse data modalities has the potential to revolutionize personalized medicine. However, this integration poses significant challenges due to the fundamental differences in data types and structures. The vast size of the genome necessitates transformation into a condensed representation containing key biomarkers and relevant features to ensure interoperability with other modalities. This commentary explores both conventional and state-of-the-art approaches to genome language modeling (GLM), with a focus on representing and extracting meaningful features from genomic sequences. We focus on the latest trends of applying language modeling techniques on genomics sequence data, treating it as a text modality. Effective feature extraction is essential in enabling machine learning models to effectively analyze large genomic datasets, particularly within multimodal frameworks. We first provide a step-by-step guide to various genomic sequence preprocessing and tokenization techniques. Then we explore feature extraction methods for the transformation of tokens using frequency, embedding, and neural network-based approaches. In the end, we discuss machine learning (ML) applications in genomics, focusing on classification, regression, language processing algorithms, and multimodal integration. Additionally, we explore the role of GLM in functional annotation, emphasizing how advanced ML models, such as Bidirectional encoder representations from transformers, enhance the interpretation of genomic data. To the best of our knowledge, we compile the first end-to-end analytic guide to convert complex genomic data into biologically interpretable information using GLM, thereby facilitating the development of novel data-driven hypotheses.
Collapse
Affiliation(s)
- Navya Tyagi
- AI and Data Science, Indian Institute of Technology, Madras, Chennai 600036, Tamil Nadu, India
- Amity Institute of Integrative Health Sciences, Amity University, Gurugram 122412, Haryana, India
| | - Naima Vahab
- School of Computing Technologies, Royal Melbourne Institute of Technology (RMIT) University, 3001 Melbourne, Australia
| | - Sonika Tyagi
- School of Computing Technologies, Royal Melbourne Institute of Technology (RMIT) University, 3001 Melbourne, Australia
| |
Collapse
|
14
|
Yang Q, Li M, Xiao Z, Feng Y, Lei L, Li S. A New Perspective on Precision Medicine: The Power of Digital Organoids. Biomater Res 2025; 29:0171. [PMID: 40129676 PMCID: PMC11931648 DOI: 10.34133/bmr.0171] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2024] [Revised: 02/21/2025] [Accepted: 03/04/2025] [Indexed: 03/26/2025] Open
Abstract
Precision medicine is a personalized medical model based on the individual's genome, phenotype, and lifestyle that provides tailored treatment plans for patients. In this context, tumor organoids, a 3-dimensional preclinical model based on patient-derived tumor cell self-organization, combined with digital analysis methods, such as high-throughput sequencing and image processing technology, can be used to analyze the genome, transcriptome, and cellular heterogeneity of tumors, so as to accurately track and assess the growth process, genetic characteristics, and drug responsiveness of tumor organoids, thereby facilitating the implementation of precision medicine. This interdisciplinary approach is expected to promote the innovation of cancer diagnosis and enhance personalized treatment. In this review, the characteristics and culture methods of tumor organoids are summarized, and the application of multi-omics, such as bioinformatics and artificial intelligence, and the digital methods of organoids in precision medicine research are discussed. Finally, this review explores the main causes and potential solutions for the bottleneck in the clinical translation of digital tumor organoids, proposes the prospects of multidisciplinary cooperation and clinical transformation to narrow the gap between laboratory and clinical settings, and provides references for research and development in this field.
Collapse
Affiliation(s)
- Qian Yang
- Department of Otorhinolaryngology Head and Neck Surgery, The Second Xiangya Hospital,
Central South University, Changsha 410011, Hunan, China
| | - Mengmeng Li
- Department of Otorhinolaryngology Head and Neck Surgery, The Second Xiangya Hospital,
Central South University, Changsha 410011, Hunan, China
| | - Zian Xiao
- Department of Otorhinolaryngology Head and Neck Surgery, The Second Xiangya Hospital,
Central South University, Changsha 410011, Hunan, China
| | - Yekai Feng
- Department of Otorhinolaryngology Head and Neck Surgery, The Second Xiangya Hospital,
Central South University, Changsha 410011, Hunan, China
| | - Lanjie Lei
- Key Laboratory of Artificial Organs and Computational Medicine in Zhejiang Province, Institute of Translational Medicine,
Zhejiang Shuren University, Hangzhou 310015, Zhejiang, China
| | - Shisheng Li
- Department of Otorhinolaryngology Head and Neck Surgery, The Second Xiangya Hospital,
Central South University, Changsha 410011, Hunan, China
| |
Collapse
|
15
|
Hanna DR, Creswell ML, Terry RS, Vergamini LB, Sardiu M, Du HE, McMahon AK, Molina WR, Whiles BB. Bing chat for kidney stone management questions based on the AUA guidelines: a comparison of chatbot conversation style modes. World J Urol 2025; 43:151. [PMID: 40047903 DOI: 10.1007/s00345-025-05533-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/04/2024] [Accepted: 02/25/2025] [Indexed: 05/13/2025] Open
Abstract
PURPOSE Artificial intelligence (AI) technology will inevitably permeate healthcare. Bing Chat is an AI chatbot with different conservation styles. We evaluated each of these response mode answers regarding management of nephrolithiasis. METHODS A total of 20 questions were created based on the AUA Surgical Management of Stones guidelines. Bing Chat's responses were evaluated across Precise, Balanced, and Creative conversation style chat modes by three physicians using the Brief DISCERN tool. Consensus scoring was employed to assess appropriateness, guideline adherence, empathy, recommendation for physician consultation, and inability to answer the inquiry. Responses were also assessed for their directness and the presence of superfluous information. Chat modes were compared using descriptive statistics as well as ANOVA, Chi-Squared tests, and Fisher exact tests. RESULTS The median Brief DISCERN Score in Precise, Balanced, and Creative modes were: 22, 21, and 21, respectively. There was no significant difference in Brief DISCERN scores between the three chat modes (p = 0.68). Guideline adherence by chatbot conversation style was similar (p = 0.37), as was response appropriateness (p = 0.62), directly answering the question asked (p = 0.26) and providing a recommendation to consult with a healthcare provider (p = 0.07). Creative and balanced modes outperformed precise mode when evaluating response empathy. Creative mode was more likely to include superfluous information and less likely to answer the question. CONCLUSION In its current iteration, Bing Chat provides low quality urologic healthcare information for nephrolithiasis queries, regardless of the conversation style utilized.
Collapse
Affiliation(s)
- Daniel R Hanna
- Department of Urology, University of Kansas Health System, 4000 Cambridge Street, Mailstop 3016, Kansas City, KS, USA
| | - Michael L Creswell
- Department of Urology, University of Kansas Health System, 4000 Cambridge Street, Mailstop 3016, Kansas City, KS, USA
| | - Russell S Terry
- Department of Urology, University of Florida College of Medicine, Gainesville, FL, USA
| | - Lucas B Vergamini
- Department of Urology, University of Kansas Health System, 4000 Cambridge Street, Mailstop 3016, Kansas City, KS, USA
| | - Mihaela Sardiu
- Department of Biostatistics & Data Science, University of Kansas, Kansas City, KS, USA
| | - Holly E Du
- Department of Biostatistics & Data Science, University of Kansas, Kansas City, KS, USA
| | - Amber K McMahon
- Department of Urology, University of Kansas Health System, 4000 Cambridge Street, Mailstop 3016, Kansas City, KS, USA
| | - Wilson R Molina
- Department of Urology, University of Kansas Health System, 4000 Cambridge Street, Mailstop 3016, Kansas City, KS, USA
| | - Bristol B Whiles
- Department of Urology, University of Kansas Health System, 4000 Cambridge Street, Mailstop 3016, Kansas City, KS, USA.
| |
Collapse
|
16
|
Sefer E. DRGAT: Predicting Drug Responses Via Diffusion-Based Graph Attention Network. J Comput Biol 2025; 32:330-350. [PMID: 39639802 DOI: 10.1089/cmb.2024.0807] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/07/2024] Open
Abstract
Accurately predicting drug response depending on a patient's genomic profile is critical for advancing personalized medicine. Deep learning approaches rise and especially the rise of graph neural networks leveraging large-scale omics datasets have been a key driver of research in this area. However, these biological datasets, which are typically high dimensional but have small sample sizes, present challenges such as overfitting and poor generalization in predictive models. As a complicating matter, gene expression (GE) data must capture complex inter-gene relationships, exacerbating these issues. In this article, we tackle these challenges by introducing a drug response prediction method, called drug response graph attention network (DRGAT), which combines a denoising diffusion implicit model for data augmentation with a recently introduced graph attention network (GAT) with high-order neighbor propagation (HO-GATs) prediction module. Our proposed approach achieved almost 5% improvement in the area under receiver operating characteristic curve compared with state-of-the-art models for the many studied drugs, indicating our method's reasonable generalization capabilities. Moreover, our experiments confirm the potential of diffusion-based generative models, a core component of our method, to mitigate the inherent limitations of omics datasets by effectively augmenting GE data.
Collapse
Affiliation(s)
- Emre Sefer
- Artificial Intelligence and Data Engineering Department, Ozyegin University, Istanbul, Turkey
| |
Collapse
|
17
|
Capitanchik C, Wilkins OG, Wagner N, Gagneur J, Ule J. From computational models of the splicing code to regulatory mechanisms and therapeutic implications. Nat Rev Genet 2025; 26:171-190. [PMID: 39358547 DOI: 10.1038/s41576-024-00774-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 08/27/2024] [Indexed: 10/04/2024]
Abstract
Since the discovery of RNA splicing and its role in gene expression, researchers have sought a set of rules, an algorithm or a computational model that could predict the splice isoforms, and their frequencies, produced from any transcribed gene in a specific cellular context. Over the past 30 years, these models have evolved from simple position weight matrices to deep-learning models capable of integrating sequence data across vast genomic distances. Most recently, new model architectures are moving the field closer to context-specific alternative splicing predictions, and advances in sequencing technologies are expanding the type of data that can be used to inform and interpret such models. Together, these developments are driving improved understanding of splicing regulatory mechanisms and emerging applications of the splicing code to the rational design of RNA- and splicing-based therapeutics.
Collapse
Affiliation(s)
- Charlotte Capitanchik
- The Francis Crick Institute, London, UK
- UK Dementia Research Institute at King's College London, London, UK
- Department of Basic and Clinical Neuroscience, Institute of Psychiatry Psychology & Neuroscience, King's College London, London, UK
| | - Oscar G Wilkins
- The Francis Crick Institute, London, UK
- UCL Queen Square Motor Neuron Disease Centre, Department of Neuromuscular Diseases, UCL Queen Square Institute of Neurology, UCL, London, UK
| | - Nils Wagner
- School of Computation, Information and Technology, Technical University of Munich, Garching, Germany
- Helmholtz Association - Munich School for Data Science (MUDS), Munich, Germany
| | - Julien Gagneur
- School of Computation, Information and Technology, Technical University of Munich, Garching, Germany.
- Institute of Human Genetics, School of Medicine, Technical University of Munich, Munich, Germany.
- Computational Health Center, Helmholtz Center Munich, Neuherberg, Germany.
| | - Jernej Ule
- The Francis Crick Institute, London, UK.
- UK Dementia Research Institute at King's College London, London, UK.
- Department of Basic and Clinical Neuroscience, Institute of Psychiatry Psychology & Neuroscience, King's College London, London, UK.
- National Institute of Chemistry, Ljubljana, Slovenia.
| |
Collapse
|
18
|
Schwehn PM, Falter-Braun P. Inferring protein from transcript abundances using convolutional neural networks. BioData Min 2025; 18:18. [PMID: 40016737 PMCID: PMC11866710 DOI: 10.1186/s13040-025-00434-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2024] [Accepted: 02/14/2025] [Indexed: 03/01/2025] Open
Abstract
BACKGROUND Although transcript abundance is often used as a proxy for protein abundance, it is an unreliable predictor. As proteins execute biological functions and their expression levels influence phenotypic outcomes, we developed a convolutional neural network (CNN) to predict protein abundances from mRNA abundances, protein sequence, and mRNA sequence in Homo sapiens (H. sapiens) and the reference plant Arabidopsis thaliana (A. thaliana). RESULTS After hyperparameter optimization and initial data exploration, we implemented distinct training modules for value-based and sequence-based data. By analyzing the learned weights, we revealed common and organism-specific sequence features that influence protein-to-mRNA ratios (PTRs), including known and putative sequence motifs. Adding condition-specific protein interaction information identified genes correlated with many PTRs but did not improve predictions, likely due to insufficient data. The integrated model predicted protein abundance on unseen genes with a coefficient of determination (r2) of 0.30 in H. sapiens and 0.32 in A. thaliana. CONCLUSIONS For H. sapiens, our model improves prediction performance by nearly 50% compared to previous sequence-based approaches, and for A. thaliana it represents the first model of its kind. The model's learned motifs recapitulate known regulatory elements, supporting its utility in systems-level and hypothesis-driven research approaches related to protein regulation.
Collapse
Affiliation(s)
- Patrick Maximilian Schwehn
- Institute of Network Biology (INET), Molecular Targets and Therapies Center (MTTC), Helmholtz Munich, Neuherberg, Germany
| | - Pascal Falter-Braun
- Institute of Network Biology (INET), Molecular Targets and Therapies Center (MTTC), Helmholtz Munich, Neuherberg, Germany.
- Microbe-Host Interactions, Ludwig-Maximilians-Universität München, Planegg-Martinsried, Germany.
| |
Collapse
|
19
|
Nilsson A, Meimetis N, Lauffenburger DA. Towards an interpretable deep learning model of cancer. NPJ Precis Oncol 2025; 9:46. [PMID: 39948231 PMCID: PMC11825879 DOI: 10.1038/s41698-025-00822-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2024] [Accepted: 01/27/2025] [Indexed: 02/16/2025] Open
Abstract
Cancer is a manifestation of dysfunctional cell states. It emerges from an interplay of intrinsic and extrinsic factors that disrupt cellular dynamics, including genetic and epigenetic alterations, as well as the tumor microenvironment. This complexity can make it challenging to infer molecular causes for treating the disease. This may be addressed by system-wide computer models of cells, as they allow rapid generation and testing of hypotheses that would be too slow or impossible to perform in the laboratory and clinic. However, so far, such models have been impeded by both experimental and computational limitations. In this perspective, we argue that they can now be achieved using deep learning algorithms to integrate omics data and prior knowledge of molecular networks. Such models would have many applications in precision oncology, e.g., for identifying drug targets and biomarkers, predicting resistance mechanisms and toxicity effects of drugs, or simulating cell-cell interactions in the microenvironment.
Collapse
Affiliation(s)
- Avlant Nilsson
- Department of Biological Engineering, Massachusetts Institute of Technology, Cambridge, MA, USA
- Department of Biology and Biological Engineering, Chalmers University of Technology, Göteborg, Sweden
- Department of Cell and Molecular Biology, SciLifeLab, Karolinska Institutet, Stockholm, Sweden
| | - Nikolaos Meimetis
- Department of Biological Engineering, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Douglas A Lauffenburger
- Department of Biological Engineering, Massachusetts Institute of Technology, Cambridge, MA, USA.
| |
Collapse
|
20
|
Li ZP, Du Z, Huang DS, Teschendorff AE. Interpretable deep learning of single-cell and epigenetic data reveals novel molecular insights in aging. Sci Rep 2025; 15:5048. [PMID: 39934290 PMCID: PMC11814351 DOI: 10.1038/s41598-025-89646-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2024] [Accepted: 02/06/2025] [Indexed: 02/13/2025] Open
Abstract
Deep learning (DL) and explainable artificial intelligence (XAI) have emerged as powerful machine-learning tools to identify complex predictive data patterns in a spatial or temporal domain. Here, we consider the application of DL and XAI to large omic datasets, in order to study biological aging at the molecular level. We develop an advanced multi-view graph-level representation learning (MGRL) framework that integrates prior biological network information, to build molecular aging clocks at cell-type resolution, which we subsequently interpret using XAI. We apply this framework to one of the largest single-cell transcriptomic datasets encompassing over a million immune cells from 981 donors, revealing a ribosomal gene subnetwork, whose expression correlates with age independently of cell-type. Application of the same DL-XAI framework to DNA methylation data of sorted monocytes reveals an epigenetically deregulated inflammatory response pathway whose activity increases with age. We show that the ribosomal module and inflammatory pathways would not have been discovered had we used more standard machine-learning methods. In summary, the computational deep learning framework presented here illustrates how deep learning when combined with explainable AI tools, can reveal novel biological insights into the complex process of aging.
Collapse
Affiliation(s)
- Zhi-Peng Li
- Ningbo Institute of Digital Twin, Eastern Institute of Technology, Ningbo, 315201, Zhejiang, China
- School of life sciences, University of Science and Technology of China, Hefei, 230026, Anhui, China
| | - Zhaozhen Du
- CAS Key Laboratory of Computational Biology, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, 320 Yue Yang Road, Shanghai, 200031, China
| | - De-Shuang Huang
- Ningbo Institute of Digital Twin, Eastern Institute of Technology, Ningbo, 315201, Zhejiang, China.
| | - Andrew E Teschendorff
- CAS Key Laboratory of Computational Biology, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, 320 Yue Yang Road, Shanghai, 200031, China.
| |
Collapse
|
21
|
Shahid A, Zahra A, Aslam S, Shamim A, Ali WR, Aslam B, Khan SH, Arshad MI. Appraisal of CRISPR Technology as an Innovative Screening to Therapeutic Toolkit for Genetic Disorders. Mol Biotechnol 2025:10.1007/s12033-025-01374-z. [PMID: 39894889 DOI: 10.1007/s12033-025-01374-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2024] [Accepted: 01/02/2025] [Indexed: 02/04/2025]
Abstract
The high frequency of genetic diseases compels the development of refined diagnostic and therapeutic systems. CRISPR is a precise genome editing tool that offers detection of genetic mutation with high sensitivity, specificity and flexibility for point-of-care testing in low resource environment. Advancements in CRISPR ushered new hope for the detection of genetic diseases. This review aims to explore the recent advances in CRISPR for the detection and treatment of genetic disorders. It delves into the advances like next-generation CRISPR diagnostics like nano-biosensors, digitalized CRISPR, and omics-integrated CRISPR technologies to enhance the detection limits and to facilitate the "lab-on-chip" technologies. Additionally, therapeutic potential of CRISPR technologies is reviewed to evaluate the implementation potential of CRISPR technologies for the treatment of hematological diseases, (sickle cell anemia and β-thalassemia), HIV, cancer, cardiovascular diseases, and neurological disorders, etc. Emerging CRISPR therapeutic approaches such as base/epigenetic editing and stem cells for the development of foreseen CRIPSR drugs are explored for the development of point-of-care testing. A combination of predictive models of artificial intelligence and machine learning with growing knowledge of genetic disorders has also been discussed to understand their role in acceleration of genetic detection. Ethical consideration are briefly discussed towards to end of review. This review provides the comprehensive insights into advances in the CRISPR diagnostics/therapeutics which are believed to pave the way for reliable, effective, and low-cost genetic testing.
Collapse
Affiliation(s)
- Ayesha Shahid
- National Center for Genome Editing, Center for Advanced Studies/D-8 Research Center, University of Agriculture, Faisalabad, 38000, Pakistan
| | - Ambreen Zahra
- National Center for Genome Editing, Center for Advanced Studies/D-8 Research Center, University of Agriculture, Faisalabad, 38000, Pakistan
- Center for Agricultural Biochemistry and Biotechnology, University of Agriculture, Faisalabad, 38000, Pakistan
| | - Sabin Aslam
- National Center for Genome Editing, Center for Advanced Studies/D-8 Research Center, University of Agriculture, Faisalabad, 38000, Pakistan
| | - Amen Shamim
- National Center for Genome Editing, Center for Advanced Studies/D-8 Research Center, University of Agriculture, Faisalabad, 38000, Pakistan
- Department of Computer Science, University of Agriculture, Faisalabad, 38000, Pakistan
| | | | - Bilal Aslam
- Institute of Microbiology, Government College University Faisalabad, Faisalabad, 38000, Pakistan
| | - Sultan Habibullah Khan
- National Center for Genome Editing, Center for Advanced Studies/D-8 Research Center, University of Agriculture, Faisalabad, 38000, Pakistan
- Center for Agricultural Biochemistry and Biotechnology, University of Agriculture, Faisalabad, 38000, Pakistan
| | - Muhammad Imran Arshad
- National Center for Genome Editing, Center for Advanced Studies/D-8 Research Center, University of Agriculture, Faisalabad, 38000, Pakistan.
- Institute of Microbiology, University of Agriculture Faisalabad, Pakistan Academy of Sciences (PAS), Faisalabad, 38000, Pakistan.
- Jiangsu University, Jiangsu, 212013, People's Republic of China.
| |
Collapse
|
22
|
Paganelli F, Poli A, Truocchio S, Martelli AM, Palumbo C, Lattanzi G, Chiarini F. At the nucleus of cancer: how the nuclear envelope controls tumor progression. MedComm (Beijing) 2025; 6:e70073. [PMID: 39866838 PMCID: PMC11758262 DOI: 10.1002/mco2.70073] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2024] [Revised: 12/09/2024] [Accepted: 12/12/2024] [Indexed: 01/28/2025] Open
Abstract
Historically considered downstream effects of tumorigenesis-arising from changes in DNA content or chromatin organization-nuclear alterations have long been seen as mere prognostic markers within a genome-centric model of cancer. However, recent findings have placed the nuclear envelope (NE) at the forefront of tumor progression, highlighting its active role in mediating cellular responses to mechanical forces. Despite significant progress, the precise interplay between NE components and cancer progression remains under debate. In this review, we provide a comprehensive and up-to-date overview of how changes in NE composition affect nuclear mechanics and facilitate malignant transformation, grounded in the latest molecular and functional studies. We also review recent research that uses advanced technologies, including artificial intelligence, to predict malignancy risk and treatment outcomes by analyzing nuclear morphology. Finally, we discuss how progress in understanding nuclear mechanics has paved the way for mechanotherapy-a promising cancer treatment approach that exploits the mechanical differences between cancerous and healthy cells. Shifting the perspective on NE alterations from mere diagnostic markers to potential therapeutic targets, this review calls for further investigation into the evolving role of the NE in cancer, highlighting the potential for innovative strategies to transform conventional cancer therapies.
Collapse
Affiliation(s)
- Francesca Paganelli
- Department of Biomedical and Neuromotor SciencesAlma Mater StudiorumUniversity of BolognaBolognaItaly
| | - Alessandro Poli
- IFOM ETS ‐ The AIRC Institute of Molecular OncologyMilanItaly
| | - Serena Truocchio
- Department of Biomedical and Neuromotor SciencesAlma Mater StudiorumUniversity of BolognaBolognaItaly
| | - Alberto M. Martelli
- Department of Biomedical and Neuromotor SciencesAlma Mater StudiorumUniversity of BolognaBolognaItaly
| | - Carla Palumbo
- Department of BiomedicalMetabolic and Neural SciencesUniversity of Modena and Reggio EmiliaModenaItaly
| | - Giovanna Lattanzi
- CNR Institute of Molecular Genetics “Luigi Luca Cavalli‐Sforza”Unit of BolognaBolognaItaly
- IRCCS Istituto Ortopedico RizzoliBolognaItaly
| | - Francesca Chiarini
- Department of BiomedicalMetabolic and Neural SciencesUniversity of Modena and Reggio EmiliaModenaItaly
| |
Collapse
|
23
|
Dalla-Torre H, Gonzalez L, Mendoza-Revilla J, Lopez Carranza N, Grzywaczewski AH, Oteri F, Dallago C, Trop E, de Almeida BP, Sirelkhatim H, Richard G, Skwark M, Beguir K, Lopez M, Pierrot T. Nucleotide Transformer: building and evaluating robust foundation models for human genomics. Nat Methods 2025; 22:287-297. [PMID: 39609566 PMCID: PMC11810778 DOI: 10.1038/s41592-024-02523-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2023] [Accepted: 10/19/2024] [Indexed: 11/30/2024]
Abstract
The prediction of molecular phenotypes from DNA sequences remains a longstanding challenge in genomics, often driven by limited annotated data and the inability to transfer learnings between tasks. Here, we present an extensive study of foundation models pre-trained on DNA sequences, named Nucleotide Transformer, ranging from 50 million up to 2.5 billion parameters and integrating information from 3,202 human genomes and 850 genomes from diverse species. These transformer models yield context-specific representations of nucleotide sequences, which allow for accurate predictions even in low-data settings. We show that the developed models can be fine-tuned at low cost to solve a variety of genomics applications. Despite no supervision, the models learned to focus attention on key genomic elements and can be used to improve the prioritization of genetic variants. The training and application of foundational models in genomics provides a widely applicable approach for accurate molecular phenotype prediction from DNA sequence.
Collapse
Affiliation(s)
| | | | | | | | | | | | - Christian Dallago
- Nvidia, Santa Clara, CA, USA
- Technical University of Munich, Munich, Germany
| | | | | | | | | | | | | | | | | |
Collapse
|
24
|
Hu Y, Horlbeck MA, Zhang R, Ma S, Shrestha R, Kartha VK, Duarte FM, Hock C, Savage RE, Labade A, Kletzien H, Meliki A, Castillo A, Durand NC, Mattei E, Anderson LJ, Tay T, Earl AS, Shoresh N, Epstein CB, Wagers AJ, Buenrostro JD. Multiscale footprints reveal the organization of cis-regulatory elements. Nature 2025; 638:779-786. [PMID: 39843737 PMCID: PMC11839466 DOI: 10.1038/s41586-024-08443-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2022] [Accepted: 11/22/2024] [Indexed: 01/24/2025]
Abstract
Cis-regulatory elements (CREs) control gene expression and are dynamic in their structure and function, reflecting changes in the composition of diverse effector proteins over time1. However, methods for measuring the organization of effector proteins at CREs across the genome are limited, hampering efforts to connect CRE structure to their function in cell fate and disease. Here we developed PRINT, a computational method that identifies footprints of DNA-protein interactions from bulk and single-cell chromatin accessibility data across multiple scales of protein size. Using these multiscale footprints, we created the seq2PRINT framework, which uses deep learning to allow precise inference of transcription factor and nucleosome binding and interprets regulatory logic at CREs. Applying seq2PRINT to single-cell chromatin accessibility data from human bone marrow, we observe sequential establishment and widening of CREs centred on pioneer factors across haematopoiesis. We further discover age-associated alterations in the structure of CREs in murine haematopoietic stem cells, including widespread reduction of nucleosome footprints and gain of de novo identified Ets composite motifs. Collectively, we establish a method for obtaining rich insights into DNA-binding protein dynamics from chromatin accessibility data, and reveal the architecture of regulatory elements across differentiation and ageing.
Collapse
Affiliation(s)
- Yan Hu
- Gene Regulation Observatory, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Department of Stem Cell and Regenerative Biology, Harvard University, Cambridge, MA, USA
| | - Max A Horlbeck
- Gene Regulation Observatory, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Department of Stem Cell and Regenerative Biology, Harvard University, Cambridge, MA, USA
- Division of Genetics and Genomics, Boston Children's Hospital, Boston, MA, USA
| | - Ruochi Zhang
- Gene Regulation Observatory, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Department of Stem Cell and Regenerative Biology, Harvard University, Cambridge, MA, USA
- Eric and Wendy Schmidt Center, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Sai Ma
- Gene Regulation Observatory, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Department of Stem Cell and Regenerative Biology, Harvard University, Cambridge, MA, USA
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Rojesh Shrestha
- Gene Regulation Observatory, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Department of Stem Cell and Regenerative Biology, Harvard University, Cambridge, MA, USA
| | - Vinay K Kartha
- Gene Regulation Observatory, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Department of Stem Cell and Regenerative Biology, Harvard University, Cambridge, MA, USA
| | - Fabiana M Duarte
- Gene Regulation Observatory, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Department of Stem Cell and Regenerative Biology, Harvard University, Cambridge, MA, USA
| | - Conrad Hock
- Gene Regulation Observatory, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Department of Stem Cell and Regenerative Biology, Harvard University, Cambridge, MA, USA
| | - Rachel E Savage
- Gene Regulation Observatory, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Department of Stem Cell and Regenerative Biology, Harvard University, Cambridge, MA, USA
| | - Ajay Labade
- Gene Regulation Observatory, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Department of Stem Cell and Regenerative Biology, Harvard University, Cambridge, MA, USA
| | - Heidi Kletzien
- Department of Stem Cell and Regenerative Biology, Harvard University, Cambridge, MA, USA
- Paul F. Glenn Center for the Biology of Aging, Harvard Medical School, Boston, MA, USA
| | - Alia Meliki
- Gene Regulation Observatory, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Department of Stem Cell and Regenerative Biology, Harvard University, Cambridge, MA, USA
| | - Andrew Castillo
- Gene Regulation Observatory, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Department of Stem Cell and Regenerative Biology, Harvard University, Cambridge, MA, USA
| | - Neva C Durand
- Gene Regulation Observatory, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Eugenio Mattei
- Gene Regulation Observatory, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Lauren J Anderson
- Gene Regulation Observatory, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Tristan Tay
- Gene Regulation Observatory, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Department of Stem Cell and Regenerative Biology, Harvard University, Cambridge, MA, USA
| | - Andrew S Earl
- Gene Regulation Observatory, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Department of Stem Cell and Regenerative Biology, Harvard University, Cambridge, MA, USA
| | - Noam Shoresh
- Gene Regulation Observatory, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Charles B Epstein
- Gene Regulation Observatory, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Amy J Wagers
- Department of Stem Cell and Regenerative Biology, Harvard University, Cambridge, MA, USA
- Paul F. Glenn Center for the Biology of Aging, Harvard Medical School, Boston, MA, USA
| | - Jason D Buenrostro
- Gene Regulation Observatory, Broad Institute of MIT and Harvard, Cambridge, MA, USA.
- Department of Stem Cell and Regenerative Biology, Harvard University, Cambridge, MA, USA.
| |
Collapse
|
25
|
Generalized AI models for genomics applications. Nat Methods 2025; 22:231-232. [PMID: 39609569 DOI: 10.1038/s41592-024-02524-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2024]
|
26
|
Li L, Chen Y, Xie H, Zheng P, Mu G, Li Q, Huang H, Shen Z. Machine Learning Model for Predicting Risk Factors of Prolonged Length of Hospital Stay in Patients with Aortic Dissection: a Retrospective Clinical Study. J Cardiovasc Transl Res 2025; 18:185-197. [PMID: 39388090 PMCID: PMC11885363 DOI: 10.1007/s12265-024-10565-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 02/01/2024] [Accepted: 09/04/2024] [Indexed: 10/12/2024]
Abstract
The length of hospital stay (LOS) is crucial for assessing medical service quality. This study aimed to develop machine learning models for predicting risk factors of prolonged LOS in patients with aortic dissection (AD). The data of 516 AD patients were obtained from the hospital's medical system, with 111 patients in the prolonged LOS (> 30 days) group based on three quarters of the LOS in the entire cohort. Given the screened variables and prediction models, the XGBoost model demonstrated superior predictive performance in identifying prolonged LOS, due to the highest area under the receiver operating characteristic curve, sensitivity, and F1-score in both subsets. The SHapley Additive exPlanation analysis indicated that high density lipoprotein cholesterol, alanine transaminase, systolic blood pressure, percentage of lymphocyte, and operation time were the top five risk factors associated with prolonged LOS. These findings have a guiding value for the clinical management of patients with AD.
Collapse
Affiliation(s)
- Luo Li
- Department of Cardiovascular Surgery of the First Affiliated Hospital & Institute for Cardiovascular Science, Soochow University, Suzhou Medical College, Soochow University, 899 Pinghai Road, Jiangsu, 215123, Suzhou, China
| | - Yihuan Chen
- Department of Cardiovascular Surgery of the First Affiliated Hospital & Institute for Cardiovascular Science, Soochow University, Suzhou Medical College, Soochow University, 899 Pinghai Road, Jiangsu, 215123, Suzhou, China
| | - Hui Xie
- Department of Cardiovascular Surgery of the First Affiliated Hospital & Institute for Cardiovascular Science, Soochow University, Suzhou Medical College, Soochow University, 899 Pinghai Road, Jiangsu, 215123, Suzhou, China
| | - Peng Zheng
- Department of Cardiology, School of Medicine, Zhongda Hospital, Southeast University, 87 Dingjiaqiao, Jiangsu, 210009, Nanjing, China
| | - Gaohang Mu
- Department of Cardiovascular Surgery of the First Affiliated Hospital & Institute for Cardiovascular Science, Soochow University, Suzhou Medical College, Soochow University, 899 Pinghai Road, Jiangsu, 215123, Suzhou, China
| | - Qian Li
- Department of Cardiovascular Surgery of the First Affiliated Hospital & Institute for Cardiovascular Science, Soochow University, Suzhou Medical College, Soochow University, 899 Pinghai Road, Jiangsu, 215123, Suzhou, China
| | - Haoyue Huang
- Department of Cardiovascular Surgery of the First Affiliated Hospital & Institute for Cardiovascular Science, Soochow University, Suzhou Medical College, Soochow University, 899 Pinghai Road, Jiangsu, 215123, Suzhou, China.
| | - Zhenya Shen
- Department of Cardiovascular Surgery of the First Affiliated Hospital & Institute for Cardiovascular Science, Soochow University, Suzhou Medical College, Soochow University, 899 Pinghai Road, Jiangsu, 215123, Suzhou, China.
| |
Collapse
|
27
|
Zhou H, Clark E, Guan D, Lagarrigue S, Fang L, Cheng H, Tuggle CK, Kapoor M, Wang Y, Giuffra E, Egidy G. Comparative Genomics and Epigenomics of Transcriptional Regulation. Annu Rev Anim Biosci 2025; 13:73-98. [PMID: 39565835 DOI: 10.1146/annurev-animal-111523-102217] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2024]
Abstract
Transcriptional regulation in response to diverse physiological cues involves complicated biological processes. Recent initiatives that leverage whole genome sequencing and annotation of regulatory elements significantly contribute to our understanding of transcriptional gene regulation. Advances in the data sets available for comparative genomics and epigenomics can identify evolutionarily constrained regulatory variants and shed light on noncoding elements that influence transcription in different tissues and developmental stages across species. Most epigenomic data, however, are generated from healthy subjects at specific developmental stages. To bridge the genotype-phenotype gap, future research should focus on generating multidimensional epigenomic data under diverse physiological conditions. Farm animal species offer advantages in terms of feasibility, cost, and experimental design for such integrative analyses in comparison to humans. Deep learning modeling and cutting-edge technologies in sequencing and functional screening and validation also provide great promise for better understanding transcriptional regulation in this dynamic field.
Collapse
Affiliation(s)
- Huaijun Zhou
- Department of Animal Science, University of California, Davis, California, USA; , , ,
| | - Emily Clark
- The Roslin Institute, University of Edinburgh, Edinburgh, Midlothian, United Kingdom;
| | - Dailu Guan
- Department of Animal Science, University of California, Davis, California, USA; , , ,
| | | | - Lingzhao Fang
- Center for Quantitative Genetics and Genomics, Aarhus University, Aarhus, Denmark;
| | - Hao Cheng
- Department of Animal Science, University of California, Davis, California, USA; , , ,
| | | | - Muskan Kapoor
- Department of Animal Science, Iowa State University, Ames, Iowa, USA; ,
| | - Ying Wang
- Department of Animal Science, University of California, Davis, California, USA; , , ,
| | | | - Giorgia Egidy
- GABI, AgroParisTech, INRAE, Jouy-en-Josas, France; ,
| |
Collapse
|
28
|
Crossa J, Martini JWR, Vitale P, Pérez-Rodríguez P, Costa-Neto G, Fritsche-Neto R, Runcie D, Cuevas J, Toledo F, Li H, De Vita P, Gerard G, Dreisigacker S, Crespo-Herrera L, Saint Pierre C, Bentley A, Lillemo M, Ortiz R, Montesinos-López OA, Montesinos-López A. Expanding genomic prediction in plant breeding: harnessing big data, machine learning, and advanced software. TRENDS IN PLANT SCIENCE 2025:S1360-1385(24)00345-5. [PMID: 39890501 DOI: 10.1016/j.tplants.2024.12.009] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/13/2024] [Revised: 12/05/2024] [Accepted: 12/12/2024] [Indexed: 02/03/2025]
Abstract
With growing evidence that genomic selection (GS) improves genetic gains in plant breeding, it is timely to review the key factors that improve its efficiency. In this feature review, we focus on the statistical machine learning (ML) methods and software that are democratizing GS methodology. We outline the principles of genomic-enabled prediction and discuss how statistical ML tools enhance GS efficiency with big data. Additionally, we examine various statistical ML tools developed in recent years for predicting traits across continuous, binary, categorical, and count phenotypes. We highlight the unique advantages of deep learning (DL) models used in genomic prediction (GP). Finally, we review software developed to democratize the use of GP models and recent data management tools that support the adoption of GS methodology.
Collapse
Affiliation(s)
- José Crossa
- International Maize and Wheat Improvement Center (CIMMYT), Carretera México - Veracruz Km. 45, El Batán, CP 56237, Texcoco, Edo. de México, Mexico; Colegio de Postgraduados, Montecillos, Edo. de México CP 56230, Mexico
| | | | - Paolo Vitale
- International Maize and Wheat Improvement Center (CIMMYT), Carretera México - Veracruz Km. 45, El Batán, CP 56237, Texcoco, Edo. de México, Mexico
| | | | | | | | - Daniel Runcie
- Department of Plant Sciences at the University of California, Davis, CA, USA
| | - Jaime Cuevas
- Universidad de Quintana Roo, Chetumal, Quintana Roo, 77019, Mexico
| | - Fernando Toledo
- International Maize and Wheat Improvement Center (CIMMYT), Carretera México - Veracruz Km. 45, El Batán, CP 56237, Texcoco, Edo. de México, Mexico
| | - H Li
- International Maize and Wheat Improvement Center (CIMMYT), Carretera México - Veracruz Km. 45, El Batán, CP 56237, Texcoco, Edo. de México, Mexico
| | - Pasquale De Vita
- Research Center for Cereal and Industrial Crops (CREA-CI), CREA - Council for Agricultural Research and Economics, Foggia, Italy
| | - Guillermo Gerard
- International Maize and Wheat Improvement Center (CIMMYT), Carretera México - Veracruz Km. 45, El Batán, CP 56237, Texcoco, Edo. de México, Mexico
| | - Susanne Dreisigacker
- International Maize and Wheat Improvement Center (CIMMYT), Carretera México - Veracruz Km. 45, El Batán, CP 56237, Texcoco, Edo. de México, Mexico
| | - Leonardo Crespo-Herrera
- International Maize and Wheat Improvement Center (CIMMYT), Carretera México - Veracruz Km. 45, El Batán, CP 56237, Texcoco, Edo. de México, Mexico
| | - Carolina Saint Pierre
- International Maize and Wheat Improvement Center (CIMMYT), Carretera México - Veracruz Km. 45, El Batán, CP 56237, Texcoco, Edo. de México, Mexico
| | - Alison Bentley
- Australian National University, Research School of Biology, Canberra, Australia
| | - Morten Lillemo
- Norwegian University of Life Science (NMBU), Department of Plant Science, Ås, Norway
| | - Rodomiro Ortiz
- Department of Plant Breeding, Swedish University of Agricultural Sciences (SLU), P.O. Box 190 Sundsvagen 10, SE 23422 Lomma, Sweden
| | | | - Abelardo Montesinos-López
- Departamento de Matemáticas, Centro Universitario de Ciencias Exactas e Ingenierías (CUCEI), Universidad de Guadalajara, 44430, Guadalajara, Jalisco, Mexico.
| |
Collapse
|
29
|
Chernigovskaya M, Pavlović M, Kanduri C, Gielis S, Robert P, Scheffer L, Slabodkin A, Haff IH, Meysman P, Yaari G, Sandve GK, Greiff V. Simulation of adaptive immune receptors and repertoires with complex immune information to guide the development and benchmarking of AIRR machine learning. Nucleic Acids Res 2025; 53:gkaf025. [PMID: 39873270 PMCID: PMC11773363 DOI: 10.1093/nar/gkaf025] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2023] [Accepted: 01/25/2025] [Indexed: 01/30/2025] Open
Abstract
Machine learning (ML) has shown great potential in the adaptive immune receptor repertoire (AIRR) field. However, there is a lack of large-scale ground-truth experimental AIRR data suitable for AIRR-ML-based disease diagnostics and therapeutics discovery. Simulated ground-truth AIRR data are required to complement the development and benchmarking of robust and interpretable AIRR-ML methods where experimental data is currently inaccessible or insufficient. The challenge for simulated data to be useful is incorporating key features observed in experimental repertoires. These features, such as antigen or disease-associated immune information, cause AIRR-ML problems to be challenging. Here, we introduce LIgO, a software suite, which simulates AIRR data for the development and benchmarking of AIRR-ML methods. LIgO incorporates different types of immune information both on the receptor and the repertoire level and preserves native-like generation probability distribution. Additionally, LIgO assists users in determining the computational feasibility of their simulations. We show two examples where LIgO supports the development and validation of AIRR-ML methods: (i) how individuals carrying out-of-distribution immune information impacts receptor-level prediction performance and (ii) how immune information co-occurring in the same AIRs impacts the performance of conventional receptor-level encoding and repertoire-level classification approaches. LIgO guides the advancement and assessment of interpretable AIRR-ML methods.
Collapse
Affiliation(s)
- Maria Chernigovskaya
- Department of Immunology, University of Oslo and Oslo University Hospital, Oslo, 0372, Norway
| | - Milena Pavlović
- Department of Informatics, University of Oslo, Oslo, 0373, Norway
- UiO:RealArt Convergence Environment, University of Oslo, Oslo, 0373, Norway
| | - Chakravarthi Kanduri
- Department of Informatics, University of Oslo, Oslo, 0373, Norway
- UiO:RealArt Convergence Environment, University of Oslo, Oslo, 0373, Norway
| | - Sofie Gielis
- Department of Mathematics and Computer Science, University of Antwerp, Antwerp, 2020, Belgium
| | - Philippe A Robert
- Department of Immunology, University of Oslo and Oslo University Hospital, Oslo, 0372, Norway
- Department of Biomedicine, University of Basel, Basel, 4031, Switzerland
| | - Lonneke Scheffer
- Department of Informatics, University of Oslo, Oslo, 0373, Norway
| | - Andrei Slabodkin
- Department of Immunology, University of Oslo and Oslo University Hospital, Oslo, 0372, Norway
| | | | - Pieter Meysman
- Department of Mathematics and Computer Science, University of Antwerp, Antwerp, 2020, Belgium
| | - Gur Yaari
- Faculty of Engineering, Bar-Ilan University, Ramat Gan, 5290002, Israel
| | - Geir Kjetil Sandve
- Department of Informatics, University of Oslo, Oslo, 0373, Norway
- UiO:RealArt Convergence Environment, University of Oslo, Oslo, 0373, Norway
| | - Victor Greiff
- Department of Immunology, University of Oslo and Oslo University Hospital, Oslo, 0372, Norway
| |
Collapse
|
30
|
Zhou J, Weinberger DR, Han S. Deep learning predicts DNA methylation regulatory variants in specific brain cell types and enhances fine mapping for brain disorders. SCIENCE ADVANCES 2025; 11:eadn1870. [PMID: 39742481 PMCID: PMC11691643 DOI: 10.1126/sciadv.adn1870] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/27/2023] [Accepted: 11/18/2024] [Indexed: 01/03/2025]
Abstract
DNA methylation (DNAm) is essential for brain development and function and potentially mediates the effects of genetic risk variants underlying brain disorders. We present INTERACT, a transformer-based deep learning model to predict regulatory variants affecting DNAm levels in specific brain cell types, leveraging existing single-nucleus DNAm data from the human brain. We show that INTERACT accurately predicts cell type-specific DNAm profiles, achieving an average area under the receiver operating characteristic curve of 0.99 across cell types. Furthermore, INTERACT predicts cell type-specific DNAm regulatory variants, which reflect cellular context and enrich the heritability of brain-related traits in relevant cell types. We demonstrate that incorporating predicted variant effects and DNAm levels of CpG sites enhances the fine mapping for three brain disorders-schizophrenia, depression, and Alzheimer's disease-and facilitates mapping causal genes to particular cell types. Our study highlights the power of deep learning in identifying cell type-specific regulatory variants, which will enhance our understanding of the genetics of complex traits.
Collapse
Affiliation(s)
- Jiyun Zhou
- Lieber Institute for Brain Development, Johns Hopkins Medical Campus, Baltimore, MD 21287, USA
| | - Daniel R. Weinberger
- Lieber Institute for Brain Development, Johns Hopkins Medical Campus, Baltimore, MD 21287, USA
- Department of Psychiatry and Behavioral Sciences, Johns Hopkins University School of Medicine, Baltimore, MD 21287, USA
- Department of Genetic Medicine, Johns Hopkins University School of Medicine, Baltimore, MD 21205, USA
- Department of Neuroscience, Johns Hopkins University School of Medicine, Baltimore, MD 21205, USA
- Department of Neurology, Johns Hopkins University School of Medicine, Baltimore, MD 21205, USA
| | - Shizhong Han
- Lieber Institute for Brain Development, Johns Hopkins Medical Campus, Baltimore, MD 21287, USA
- Department of Psychiatry and Behavioral Sciences, Johns Hopkins University School of Medicine, Baltimore, MD 21287, USA
- Department of Genetic Medicine, Johns Hopkins University School of Medicine, Baltimore, MD 21205, USA
| |
Collapse
|
31
|
Hecker N, Kempynck N, Mauduit D, Abaffyová D, Vandepoel R, Dieltiens S, Borm L, Sarropoulos I, González-Blas CB, De Man J, Davie K, Leysen E, Vandensteen J, Moors R, Hulselmans G, Lim L, De Wit J, Christiaens V, Poovathingal S, Aerts S. Enhancer-driven cell type comparison reveals similarities between the mammalian and bird pallium. Science 2025; 387:eadp3957. [PMID: 39946451 DOI: 10.1126/science.adp3957] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2024] [Accepted: 11/26/2024] [Indexed: 04/23/2025]
Abstract
Combinations of transcription factors govern the identity of cell types, which is reflected by genomic enhancer codes. We used deep learning to characterize these enhancer codes and devised three metrics to compare cell types in the telencephalon across amniotes. To this end, we generated single-cell multiome and spatially resolved transcriptomics data of the chicken telencephalon. Enhancer codes of orthologous nonneuronal and γ-aminobutyric acid-mediated (GABAergic) cell types show a high degree of similarity across amniotes, whereas excitatory neurons of the mammalian neocortex and avian pallium exhibit varying degrees of similarity. Enhancer codes of avian mesopallial neurons are most similar to those of mammalian deep-layer neurons. With this study, we present generally applicable deep learning approaches to characterize and compare cell types on the basis of genomic regulatory sequences.
Collapse
Affiliation(s)
- Nikolai Hecker
- Laboratory of Computational Biology, VIB Center for AI & Computational Biology, Leuven, Belgium
- VIB-KU Leuven Center for Brain & Disease Research, Leuven, Belgium
- Department of Human Genetics, KU Leuven, Leuven, Belgium
| | - Niklas Kempynck
- Laboratory of Computational Biology, VIB Center for AI & Computational Biology, Leuven, Belgium
- VIB-KU Leuven Center for Brain & Disease Research, Leuven, Belgium
- Department of Human Genetics, KU Leuven, Leuven, Belgium
| | - David Mauduit
- Laboratory of Computational Biology, VIB Center for AI & Computational Biology, Leuven, Belgium
- VIB-KU Leuven Center for Brain & Disease Research, Leuven, Belgium
- Department of Human Genetics, KU Leuven, Leuven, Belgium
| | - Darina Abaffyová
- Laboratory of Computational Biology, VIB Center for AI & Computational Biology, Leuven, Belgium
- VIB-KU Leuven Center for Brain & Disease Research, Leuven, Belgium
- Department of Human Genetics, KU Leuven, Leuven, Belgium
| | - Roel Vandepoel
- Laboratory of Computational Biology, VIB Center for AI & Computational Biology, Leuven, Belgium
- VIB-KU Leuven Center for Brain & Disease Research, Leuven, Belgium
- Department of Human Genetics, KU Leuven, Leuven, Belgium
| | - Sam Dieltiens
- Laboratory of Computational Biology, VIB Center for AI & Computational Biology, Leuven, Belgium
- VIB-KU Leuven Center for Brain & Disease Research, Leuven, Belgium
- Department of Human Genetics, KU Leuven, Leuven, Belgium
| | - Lars Borm
- Laboratory of Computational Biology, VIB Center for AI & Computational Biology, Leuven, Belgium
- VIB-KU Leuven Center for Brain & Disease Research, Leuven, Belgium
- Department of Human Genetics, KU Leuven, Leuven, Belgium
| | - Ioannis Sarropoulos
- Center for Molecular Biology of Heidelberg University, Heidelberg University, Heidelberg, Germany
| | - Carmen Bravo González-Blas
- VIB-KU Leuven Center for Brain & Disease Research, Leuven, Belgium
- Department of Human Genetics, KU Leuven, Leuven, Belgium
| | - Julie De Man
- Laboratory of Computational Biology, VIB Center for AI & Computational Biology, Leuven, Belgium
- VIB-KU Leuven Center for Brain & Disease Research, Leuven, Belgium
- Department of Human Genetics, KU Leuven, Leuven, Belgium
| | - Kristofer Davie
- VIB-KU Leuven Center for Brain & Disease Research, Leuven, Belgium
- Department of Neurosciences, KU Leuven, Leuven, Belgium
| | - Elke Leysen
- VIB-KU Leuven Center for Brain & Disease Research, Leuven, Belgium
- Department of Neurosciences, KU Leuven, Leuven, Belgium
| | - Jeroen Vandensteen
- VIB-KU Leuven Center for Brain & Disease Research, Leuven, Belgium
- Department of Neurosciences, KU Leuven, Leuven, Belgium
| | - Rani Moors
- VIB-KU Leuven Center for Brain & Disease Research, Leuven, Belgium
- Department of Neurosciences, KU Leuven, Leuven, Belgium
| | - Gert Hulselmans
- Laboratory of Computational Biology, VIB Center for AI & Computational Biology, Leuven, Belgium
- VIB-KU Leuven Center for Brain & Disease Research, Leuven, Belgium
- Department of Human Genetics, KU Leuven, Leuven, Belgium
| | - Lynette Lim
- VIB-KU Leuven Center for Brain & Disease Research, Leuven, Belgium
- Department of Neurosciences, KU Leuven, Leuven, Belgium
| | - Joris De Wit
- VIB-KU Leuven Center for Brain & Disease Research, Leuven, Belgium
- Department of Neurosciences, KU Leuven, Leuven, Belgium
| | - Valerie Christiaens
- Laboratory of Computational Biology, VIB Center for AI & Computational Biology, Leuven, Belgium
- VIB-KU Leuven Center for Brain & Disease Research, Leuven, Belgium
- Department of Human Genetics, KU Leuven, Leuven, Belgium
| | | | - Stein Aerts
- Laboratory of Computational Biology, VIB Center for AI & Computational Biology, Leuven, Belgium
- VIB-KU Leuven Center for Brain & Disease Research, Leuven, Belgium
- Department of Human Genetics, KU Leuven, Leuven, Belgium
| |
Collapse
|
32
|
Daoud A, Ben-Hur A. The role of chromatin state in intron retention: A case study in leveraging large scale deep learning models. PLoS Comput Biol 2025; 21:e1012755. [PMID: 39792954 PMCID: PMC11756788 DOI: 10.1371/journal.pcbi.1012755] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/24/2024] [Revised: 01/23/2025] [Accepted: 12/30/2024] [Indexed: 01/12/2025] Open
Abstract
Complex deep learning models trained on very large datasets have become key enabling tools for current research in natural language processing and computer vision. By providing pre-trained models that can be fine-tuned for specific applications, they enable researchers to create accurate models with minimal effort and computational resources. Large scale genomics deep learning models come in two flavors: the first are large language models of DNA sequences trained in a self-supervised fashion, similar to the corresponding natural language models; the second are supervised learning models that leverage large scale genomics datasets from ENCODE and other sources. We argue that these models are the equivalent of foundation models in natural language processing in their utility, as they encode within them chromatin state in its different aspects, providing useful representations that allow quick deployment of accurate models of gene regulation. We demonstrate this premise by leveraging the recently created Sei model to develop simple, interpretable models of intron retention, and demonstrate their advantage over models based on the DNA language model DNABERT-2. Our work also demonstrates the impact of chromatin state on the regulation of intron retention. Using representations learned by Sei, our model is able to discover the involvement of transcription factors and chromatin marks in regulating intron retention, providing better accuracy than a recently published custom model developed for this purpose.
Collapse
Affiliation(s)
- Ahmed Daoud
- Department of Computer Science, Colorado State University, Fort Collins, Colorado, United States of America
| | - Asa Ben-Hur
- Department of Computer Science, Colorado State University, Fort Collins, Colorado, United States of America
| |
Collapse
|
33
|
Bréhélin L. Advancing Regulatory Genomics With Machine Learning. Bioinform Biol Insights 2024; 18:11779322241249562. [PMID: 39735654 PMCID: PMC11672376 DOI: 10.1177/11779322241249562] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2023] [Accepted: 04/09/2024] [Indexed: 12/31/2024] Open
Abstract
In recent years, several machine learning (ML) approaches have been proposed to predict gene expression signal and chromatin features from the DNA sequence alone. These models are often used to deduce and, to some extent, assess putative new biological insights about gene regulation, and they have led to very interesting advances in regulatory genomics. This article reviews a selection of these methods, ranging from linear models to random forests, kernel methods, and more advanced deep learning models. Specifically, we detail the different techniques and strategies that can be used to extract new gene-regulation hypotheses from these models. Furthermore, because these putative insights need to be validated with wet-lab experiments, we emphasize that it is important to have a measure of confidence associated with the extracted hypotheses. We review the procedures that have been proposed to measure this confidence for the different types of ML models, and we discuss the fact that they do not provide the same kind of information.
Collapse
|
34
|
Tian Y, Zhang J, Li Z, Wu K, Cao M, Lin J, Pradhan P, Lai S, Meng J, Fu B, Chen M, Lin H. Trade-offs among human, animal, and environmental health hinder the uniform progress of global One Health. iScience 2024; 27:111357. [PMID: 39650728 PMCID: PMC11625309 DOI: 10.1016/j.isci.2024.111357] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2024] [Revised: 10/15/2024] [Accepted: 11/06/2024] [Indexed: 12/11/2024] Open
Abstract
The One Health (OH) approach, integrating aspects of human, animal, and environmental health, still lacks robustly quantified insights into its complex relationships. To fill this knowledge gap, we devised a comprehensive assessment scheme for OH to assess its progress, synergies, trade-offs, and priority targets. From 2000 to 2020, we find evidence for global progress toward OH, albeit uneven, with its average score rising from 61.6 to 65.5, driven primarily by better human health although environmental health lags. Despite synergies prevalent within and between the three health dimensions, over half of the world's countries, mainly low-income ones, still incur substantial trade-offs impeding OH's advancement, especially between animal and environmental health. Our in-depth analysis of synergy and trade-off networks reveals that maternal, newborn, and child health are critical synergistic targets, whereas biodiversity and land resources dominate trade-offs. We provide key information for the synergetic and uniform development of global OH and policymaking.
Collapse
Affiliation(s)
- Ya Tian
- School of Geography and Environment, Jiangxi Normal University, Nanchang 330022, China
- Key Laboratory of Virtual Geographic Environment, Ministry of Education, Nanjing Normal University, Nanjing 210023, China
| | - Junze Zhang
- Key Laboratory of Regional and Urban Ecological Security, Research Center for Eco-Environmental Sciences, Chinese Academy of Sciences, Beijing 100085, China
| | - Zonghan Li
- Key Laboratory of Virtual Geographic Environment, Ministry of Education, Nanjing Normal University, Nanjing 210023, China
| | - Kai Wu
- Key Laboratory of Virtual Geographic Environment, Ministry of Education, Nanjing Normal University, Nanjing 210023, China
| | - Min Cao
- Key Laboratory of Virtual Geographic Environment, Ministry of Education, Nanjing Normal University, Nanjing 210023, China
| | - Jian Lin
- Key Laboratory of Virtual Geographic Environment, Ministry of Education, Nanjing Normal University, Nanjing 210023, China
| | - Prajal Pradhan
- Integrated Research on Energy, Environment and Society (IREES), Energy and Sustainability Research Institute Groningen (ESRIG), University of Groningen, 9747 Groningen AG, the Netherlands
- Potsdam Institute for Climate Impact Research (PIK), Member of the Leibniz Association, P.O. Box 60 12 03, 14412 Potsdam, Germany
| | - Shengjie Lai
- WorldPop, School of Geography and Environmental Science, University of Southampton, Southampton SO17 1BJ, UK
| | - Jia Meng
- Department of Orthopedics, Nanjing Jinling Hospital, Affiliated Hospital of Medical School, Nanjing University, Nanjing 210002, China
| | - Bojie Fu
- Key Laboratory of Regional and Urban Ecological Security, Research Center for Eco-Environmental Sciences, Chinese Academy of Sciences, Beijing 100085, China
| | - Min Chen
- Key Laboratory of Virtual Geographic Environment, Ministry of Education, Nanjing Normal University, Nanjing 210023, China
| | - Hui Lin
- School of Geography and Environment, Jiangxi Normal University, Nanchang 330022, China
| |
Collapse
|
35
|
Jo T, Bice P, Nho K, Saykin AJ, the Alzheimer’s Disease Sequencing Project. LD-informed deep learning for Alzheimer's gene loci detection using WGS data. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2024:2024.09.19.24313993. [PMID: 39371140 PMCID: PMC11451815 DOI: 10.1101/2024.09.19.24313993] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 10/08/2024]
Abstract
INTRODUCTION The exponential growth of genomic datasets necessitates advanced analytical tools to effectively identify genetic loci from large-scale high throughput sequencing data. This study presents Deep-Block, a multi-stage deep learning framework that incorporates biological knowledge into its AI architecture to identify genetic regions as significantly associated with Alzheimer's disease (AD). The framework employs a three-stage approach: (1) genome segmentation based on linkage disequilibrium (LD) patterns, (2) selection of relevant LD blocks using sparse attention mechanisms, and (3) application of TabNet and Random Forest algorithms to quantify single nucleotide polymorphism (SNP) feature importance, thereby identifying genetic factors contributing to AD risk. METHODS The Deep-Block was applied to a large-scale whole genome sequencing (WGS) dataset from the Alzheimer's Disease Sequencing Project (ADSP), comprising 7,416 non-Hispanic white participants (3,150 cognitively normal older adults (CN), 4,266 AD). RESULTS 30,218 LD blocks were identified and then ranked based on their relevance with Alzheimer's disease. Subsequently, the Deep-Block identified novel SNPs within the top 1,500 LD blocks and confirmed previously known variants, including APOE rs429358 and rs769449. Expression Quantitative Trait Loci (eQTL) analysis across 13 brain regions provided functional evidence for the identified variants. The results were cross-validated against established AD-associated loci from the European Alzheimer's and Dementia Biobank (EADB) and the GWAS catalog. DISCUSSION The Deep-Block framework effectively processes large-scale high throughput sequencing data while preserving SNP interactions during dimensionality reduction, minimizing bias and information loss. The framework's findings are supported by tissue-specific eQTL evidence across brain regions, indicating the functional relevance of the identified variants. Additionally, the Deep-Block approach has identified both known and novel genetic variants, enhancing our understanding of the genetic architecture and demonstrating its potential for application in large-scale sequencing studies.
Collapse
Affiliation(s)
- Taeho Jo
- Indiana Alzheimer Disease Research Center and Center for Neuroimaging, Department of Radiology and Imaging Sciences, Indiana University School of Medicine, Indianapolis, IN 46202, USA
| | - Paula Bice
- Indiana Alzheimer Disease Research Center and Center for Neuroimaging, Department of Radiology and Imaging Sciences, Indiana University School of Medicine, Indianapolis, IN 46202, USA
| | - Kwangsik Nho
- Indiana Alzheimer Disease Research Center and Center for Neuroimaging, Department of Radiology and Imaging Sciences, Indiana University School of Medicine, Indianapolis, IN 46202, USA
- Center for Computational Biology and Bioinformatics, Indiana University School of Medicine, Indianapolis, IN 46202, USA
| | - Andrew J. Saykin
- Indiana Alzheimer Disease Research Center and Center for Neuroimaging, Department of Radiology and Imaging Sciences, Indiana University School of Medicine, Indianapolis, IN 46202, USA
- Department of Medical and Molecular Genetics, Indiana University School of Medicine, Indianapolis, IN 46202, USA
| | | |
Collapse
|
36
|
Cong Z, Mukoma NJ, Yin Q, Zhu B, She L, Hsiang T, Zhang L, Jiang L, Liu X. NPtagM: A Tailoring Enzyme Genome Mining Toolkit and Its Application in Terpenoid P450s from Phytopathogenic Fungi. JOURNAL OF AGRICULTURAL AND FOOD CHEMISTRY 2024; 72:27225-27234. [PMID: 39621301 DOI: 10.1021/acs.jafc.4c07653] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/12/2024]
Abstract
Terpenoids derived from phytopathogenic fungi are major participants in interactions among microorganisms, plants, and animals. The modifications catalyzed by cytochrome P450s significantly influence the structural and bioactivity diversity of the terpenoids. To conduct genome mining of P450s in pathogenic fungi, in this study, we developed a new software called Natural Products Tailoring Enzymes Genome Mining (NPtagM). By optimizing the workflow and gene prediction software, NPtagM demonstrated a 3-fold increase in the number of predicted P450s and an 8-fold reduction in runtime compared to antiSMASH. We then used it to extract 1189 dereplicated terpenoid P450s from our in-house fungal genomes. Using a sequence similarity network analysis, we identified a family that potentially produced eremophilane-type sesquiterpenoids. The heterologous expression in Aspergillus oryzae resulted in the production of two new and four known eremophilanes. Our results highlight the potential of NPtagM in genome mining for tailoring enzymes from phytopathogenic fungi.
Collapse
Affiliation(s)
- Zhanren Cong
- State Key Laboratory of Bioreactor Engineering, East China University of Science of Technology, Shanghai 200237, China
| | - Njeru Joe Mukoma
- State Key Laboratory of Bioreactor Engineering, East China University of Science of Technology, Shanghai 200237, China
| | - Qiang Yin
- State Key Laboratory of Bioreactor Engineering, East China University of Science of Technology, Shanghai 200237, China
| | - Bin Zhu
- State Key Laboratory of Bioreactor Engineering, East China University of Science of Technology, Shanghai 200237, China
- Engineering Research Centre of Pharmaceutical Process Chemistry, Ministry of Education, and Laboratory of Pharmaceutical Crystal Engineering & Technology, School of Pharmacy, East China University of Science and Technology, Shanghai 200237, China
| | - Lingwei She
- Key Laboratory of Molecular Medicine and Biotherapy, School of Life Science, Beijing Institute of Technology, 100081 Beijing, China
| | - Tom Hsiang
- School of Environmental Sciences, University of Guelph, 50 Stone Road East, Guelph, Ontario N1G2W1, Canada
| | - Lixin Zhang
- State Key Laboratory of Bioreactor Engineering, East China University of Science of Technology, Shanghai 200237, China
| | - Lan Jiang
- Department of Cardiothoracic Surgery, Children's Hospital of Nanjing Medical University, Nanjing 210093, China
| | - Xueting Liu
- State Key Laboratory of Bioreactor Engineering, East China University of Science of Technology, Shanghai 200237, China
| |
Collapse
|
37
|
Nabi MT, Ali S, Mahmood Z, Khan MA, Alsenan S. A self-supervised deep-driven model for automatic weather classification from remote sensing images. INTERNATIONAL JOURNAL OF REMOTE SENSING 2024:1-26. [DOI: 10.1080/01431161.2024.2431184] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/29/2024] [Accepted: 11/06/2024] [Indexed: 01/12/2025]
Affiliation(s)
- Mattia Tun Nabi
- Department of Robotics & Artificial Intelligence, School of Mechanical & Manufacturing Engineering (SMME), National University of Sciences and Technology (NUST), Islamabad, Pakistan
- NUST-COVENTRY Human-Robot Interaction (NC-HRI) Laboratory, School of Mechanical and Manufacturing Engineering, National University of Sciences and Technology (NUST), Islamabad, Pakistan
| | - Sara Ali
- Department of Robotics & Artificial Intelligence, School of Mechanical & Manufacturing Engineering (SMME), National University of Sciences and Technology (NUST), Islamabad, Pakistan
- NUST-COVENTRY Human-Robot Interaction (NC-HRI) Laboratory, School of Mechanical and Manufacturing Engineering, National University of Sciences and Technology (NUST), Islamabad, Pakistan
- Intelligent Field Robotics Laboratory (IFRL), National Center for Artificial Intelligence (NCAI), National University of Sciences and Technology (NUST), Islamabad, Pakistan
| | - Zahid Mahmood
- Department of Electrical and Computer Engineering, COMSATS University Islamabad, Abbottabad, Pakistan
| | - Muhammad Attique Khan
- Department of AI, College of Computer Engineering and Science, Prince Mohammad Bin Fahd University, Dhahran, Saudi Arabia
| | - Shrooq Alsenan
- Information Systems Department, College of Computer and Information Sciences, Princess Nourah Bint Abdulrahman University, Riyadh, Saudi Arabia
| |
Collapse
|
38
|
Li W, Zhang Z, Xie B, He Y, He K, Qiu H, Lu Z, Jiang C, Pan X, He Y, Hu W, Liu W, Que T, Hu Y. HiOmics: A cloud-based one-stop platform for the comprehensive analysis of large-scale omics data. Comput Struct Biotechnol J 2024; 23:659-668. [PMID: 38292471 PMCID: PMC10824657 DOI: 10.1016/j.csbj.2024.01.002] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2023] [Revised: 01/01/2024] [Accepted: 01/02/2024] [Indexed: 02/01/2024] Open
Abstract
Analyzing the vast amount of omics data generated comprehensively by high-throughput sequencing technology is of utmost importance for scientists. In this context, we propose HiOmics, a cloud-based platform equipped with nearly 300 plugins designed for the comprehensive analysis and visualization of omics data. HiOmics utilizes the Element Plus framework to craft a user-friendly interface and harnesses Docker container technology to ensure the reliability and reproducibility of data analysis results. Furthermore, HiOmics employs the Workflow Description Language and Cromwell engine to construct workflows, ensuring the portability of data analysis and simplifying the examination of intricate data. Additionally, HiOmics has developed DataCheck, a tool based on Golang, which verifies and converts data formats. Finally, by leveraging the object storage technology and batch computing capabilities of public cloud platforms, HiOmics enables the storage and processing of large-scale data while maintaining resource independence among users.
Collapse
Affiliation(s)
- Wen Li
- Life Sciences Institute, Guangxi Medical University, Nanning, Guangxi, China
- Department of Biochemistry and Molecular Biology, School of Basic Medicine, Guangxi Medical University, Nanning, Guangxi, China
- Key Laboratory of Biological Molecular Medicine Research (Guangxi Medical University), Education Department of Guangxi Zhuang Autonomous Region, Nanning, Guangxi, China
| | - Zhining Zhang
- Guangxi Henbio Biotechnology Co., Ltd., Nanning, Guangxi, China
| | - Bo Xie
- Life Sciences Institute, Guangxi Medical University, Nanning, Guangxi, China
| | - Yunlin He
- Guangxi Henbio Biotechnology Co., Ltd., Nanning, Guangxi, China
| | - Kangming He
- Guangxi Henbio Biotechnology Co., Ltd., Nanning, Guangxi, China
| | - Hong Qiu
- Life Sciences Institute, Guangxi Medical University, Nanning, Guangxi, China
- Guangxi Henbio Biotechnology Co., Ltd., Nanning, Guangxi, China
| | - Zhiwei Lu
- Guangxi Henbio Biotechnology Co., Ltd., Nanning, Guangxi, China
| | - Chunlan Jiang
- Guangxi Henbio Biotechnology Co., Ltd., Nanning, Guangxi, China
| | - Xuanyu Pan
- School of Basic Medicine, Guangxi Medical University, Nanning, Guangxi, China
| | - Yuxiao He
- Life Sciences Institute, Guangxi Medical University, Nanning, Guangxi, China
| | - Wenyu Hu
- Guangxi Henbio Biotechnology Co., Ltd., Nanning, Guangxi, China
| | - Wenjian Liu
- Faculty of Data Science, City University of Macau, Macau, China
| | - Tengcheng Que
- Faculty of Data Science, City University of Macau, Macau, China
- Youjiang Medical University for Nationalities, Baise, Guangxi, China
- Guangxi Zhuang Autonomous Terrestrial Wildlife Rescue Research and Epidemic Diseases Monitoring Center, Nanning, Guangxi, China
| | - Yanling Hu
- Life Sciences Institute, Guangxi Medical University, Nanning, Guangxi, China
- Department of Biochemistry and Molecular Biology, School of Basic Medicine, Guangxi Medical University, Nanning, Guangxi, China
- Key Laboratory of Biological Molecular Medicine Research (Guangxi Medical University), Education Department of Guangxi Zhuang Autonomous Region, Nanning, Guangxi, China
- Guangxi Henbio Biotechnology Co., Ltd., Nanning, Guangxi, China
- Faculty of Data Science, City University of Macau, Macau, China
| |
Collapse
|
39
|
Li Z, Zhang Y, Peng B, Qin S, Zhang Q, Chen Y, Chen C, Bao Y, Zhu Y, Hong Y, Liu B, Liu Q, Xu L, Chen X, Ma X, Wang H, Xie L, Yao Y, Deng B, Li J, De B, Chen Y, Wang J, Li T, Liu R, Tang Z, Cao J, Zuo E, Mei C, Zhu F, Shao C, Wang G, Sun T, Wang N, Liu G, Ni JQ, Liu Y. A novel interpretable deep learning-based computational framework designed synthetic enhancers with broad cross-species activity. Nucleic Acids Res 2024; 52:13447-13468. [PMID: 39420601 PMCID: PMC11602155 DOI: 10.1093/nar/gkae912] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2024] [Revised: 09/25/2024] [Accepted: 10/03/2024] [Indexed: 10/19/2024] Open
Abstract
Enhancers play a critical role in dynamically regulating spatial-temporal gene expression and establishing cell identity, underscoring the significance of designing them with specific properties for applications in biosynthetic engineering and gene therapy. Despite numerous high-throughput methods facilitating genome-wide enhancer identification, deciphering the sequence determinants of their activity remains challenging. Here, we present the DREAM (DNA cis-Regulatory Elements with controllable Activity design platforM) framework, a novel deep learning-based approach for synthetic enhancer design. Proficient in uncovering subtle and intricate patterns within extensive enhancer screening data, DREAM achieves cutting-edge sequence-based enhancer activity prediction and highlights critical sequence features implicating strong enhancer activity. Leveraging DREAM, we have engineered enhancers that surpass the potency of the strongest enhancer within the Drosophila genome by approximately 3.6-fold. Remarkably, these synthetic enhancers exhibited conserved functionality across species that have diverged more than billion years, indicating that DREAM was able to learn highly conserved enhancer regulatory grammar. Additionally, we designed silencers and cell line-specific enhancers using DREAM, demonstrating its versatility. Overall, our study not only introduces an interpretable approach for enhancer design but also lays out a general framework applicable to the design of other types of cis-regulatory elements.
Collapse
Affiliation(s)
- Zhaohong Li
- Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Key Laboratory of Livestock and Poultry Multi-Omics of MARA, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Buxin Road NO. 97, Dapeng District, Shenzhen 518124, China
- Innovation Group of Pig Genome Design and Breeding, Research Centre for Animal Genome, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Buxin Road NO. 97, Dapeng District, Shenzhen 518124, China
| | - Yuanyuan Zhang
- Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Key Laboratory of Livestock and Poultry Multi-Omics of MARA, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Buxin Road NO. 97, Dapeng District, Shenzhen 518124, China
- Innovation Group of Pig Genome Design and Breeding, Research Centre for Animal Genome, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Buxin Road NO. 97, Dapeng District, Shenzhen 518124, China
| | - Bo Peng
- Gene Regulatory Lab, School of Basic Medical Sciences, Tsinghua University, NO. 30 Shuangqing road, Haidian district, Beijing 100084, China
- State Key Laboratory of Molecular Oncology, Tsinghua University, NO. 30 Shuangqing road, Haidian district, Beijing 100084, China
| | - Shenghua Qin
- Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Key Laboratory of Livestock and Poultry Multi-Omics of MARA, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Buxin Road NO. 97, Dapeng District, Shenzhen 518124, China
- Innovation Group of Pig Genome Design and Breeding, Research Centre for Animal Genome, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Buxin Road NO. 97, Dapeng District, Shenzhen 518124, China
| | - Qian Zhang
- State Key Laboratory of Mycology, Institute of Microbiology, Chinese Academy of Sciences, NO.1 Beichen West Road, Chaoyang District, Beijing 100101, China
| | - Yun Chen
- Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Key Laboratory of Livestock and Poultry Multi-Omics of MARA, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Buxin Road NO. 97, Dapeng District, Shenzhen 518124, China
- Innovation Group of Pig Genome Design and Breeding, Research Centre for Animal Genome, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Buxin Road NO. 97, Dapeng District, Shenzhen 518124, China
| | - Choulin Chen
- Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Key Laboratory of Livestock and Poultry Multi-Omics of MARA, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Buxin Road NO. 97, Dapeng District, Shenzhen 518124, China
- Innovation Group of Pig Genome Design and Breeding, Research Centre for Animal Genome, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Buxin Road NO. 97, Dapeng District, Shenzhen 518124, China
| | - Yongzhou Bao
- Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Key Laboratory of Livestock and Poultry Multi-Omics of MARA, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Buxin Road NO. 97, Dapeng District, Shenzhen 518124, China
- Innovation Group of Pig Genome Design and Breeding, Research Centre for Animal Genome, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Buxin Road NO. 97, Dapeng District, Shenzhen 518124, China
| | - Yuqi Zhu
- Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, NO. 7 Pengfei Road, Dapeng District, Shenzhen 518124, China
| | - Yi Hong
- Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, NO. 7 Pengfei Road, Dapeng District, Shenzhen 518124, China
| | - Binghua Liu
- State Key Laboratory of Maricultural Biobreeding and Sustainable Goods, Yellow Sea Fisheries Research Institute, Chinese Academy of Fishery Sciences, NO.106 Nanjing Road, Shinan District, Qingdao, Shandong 266071, China
| | - Qian Liu
- State Key Laboratory of Maricultural Biobreeding and Sustainable Goods, Yellow Sea Fisheries Research Institute, Chinese Academy of Fishery Sciences, NO.106 Nanjing Road, Shinan District, Qingdao, Shandong 266071, China
| | - Lingna Xu
- Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Key Laboratory of Livestock and Poultry Multi-Omics of MARA, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Buxin Road NO. 97, Dapeng District, Shenzhen 518124, China
- Innovation Group of Pig Genome Design and Breeding, Research Centre for Animal Genome, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Buxin Road NO. 97, Dapeng District, Shenzhen 518124, China
| | - Xi Chen
- Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Key Laboratory of Synthetic Biology, Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Buxin Road NO. 97, Dapeng District, Shenzhen 518124, China
| | - Xinhao Ma
- College of Grassland Agriculture, National Beef Cattle Improvement Center, College of Animal Science and Technology, Northwest A&F University, NO. 3 Taicheng Road, Yangling District, Yangling, Shaanxi 712100, China
| | - Hongyan Wang
- State Key Laboratory of Maricultural Biobreeding and Sustainable Goods, Yellow Sea Fisheries Research Institute, Chinese Academy of Fishery Sciences, NO.106 Nanjing Road, Shinan District, Qingdao, Shandong 266071, China
| | - Long Xie
- Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Key Laboratory of Livestock and Poultry Multi-Omics of MARA, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Buxin Road NO. 97, Dapeng District, Shenzhen 518124, China
| | - Yilong Yao
- Green Healthy Aquaculture Research Center, Kunpeng Institute of Modern Agriculture at Foshan, Chinese Academy of Agricultural Sciences, Building 26 Lihe Technology Park, Auxiliary Road of Xinxi Avenue South, Nanhai District, Foshan 528226, China
| | - Biao Deng
- Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Key Laboratory of Livestock and Poultry Multi-Omics of MARA, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Buxin Road NO. 97, Dapeng District, Shenzhen 518124, China
- Innovation Group of Pig Genome Design and Breeding, Research Centre for Animal Genome, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Buxin Road NO. 97, Dapeng District, Shenzhen 518124, China
| | - Jiaying Li
- Department of Ophthalmology, Beijing Institute of Ophthalmology, Beijing Tongren Eye Center, Beijing Tongren Hospital, Capital Medical University, Dongjiaomin lane No1, Dongcheng District, Beijing 100101, China
| | - Baojun De
- College of Life Sciences, Inner Mongolia Autonomous Region Key Laboratory of Biomanufacturing, Inner Mongolia Agricultural University, NO. 306 Zhaowuda Road, Saihan District, Hohhot 010018, China
| | - Yuting Chen
- College of Life Sciences, Inner Mongolia Autonomous Region Key Laboratory of Biomanufacturing, Inner Mongolia Agricultural University, NO. 306 Zhaowuda Road, Saihan District, Hohhot 010018, China
| | - Jing Wang
- Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Key Laboratory of Synthetic Biology, Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Buxin Road NO. 97, Dapeng District, Shenzhen 518124, China
| | - Tian Li
- College of JUNCAO Science and Ecology, Haixia Institute of Science and Technology, National Engineering Research Center of JUNCAO, Fujian Agriculture and Forestry University (FAFU), NO.15 Shangxiadian Road, Cangshan District, Fuzhou 0350002, China
| | - Ranran Liu
- Institute of Animal Science, Chinese Academy of Agricultural Sciences, Yuanmingyuan West Road NO. 2, Haidian District, Beijing 100193, China
| | - Zhonglin Tang
- Green Healthy Aquaculture Research Center, Kunpeng Institute of Modern Agriculture at Foshan, Chinese Academy of Agricultural Sciences, Building 26 Lihe Technology Park, Auxiliary Road of Xinxi Avenue South, Nanhai District, Foshan 528226, China
| | - Junwei Cao
- College of Life Sciences, Inner Mongolia Autonomous Region Key Laboratory of Biomanufacturing, Inner Mongolia Agricultural University, NO. 306 Zhaowuda Road, Saihan District, Hohhot 010018, China
| | - Erwei Zuo
- Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Key Laboratory of Livestock and Poultry Multi-Omics of MARA, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Buxin Road NO. 97, Dapeng District, Shenzhen 518124, China
| | - Chugang Mei
- College of Grassland Agriculture, National Beef Cattle Improvement Center, College of Animal Science and Technology, Northwest A&F University, NO. 3 Taicheng Road, Yangling District, Yangling, Shaanxi 712100, China
| | - Fangjie Zhu
- College of JUNCAO Science and Ecology, Haixia Institute of Science and Technology, National Engineering Research Center of JUNCAO, Fujian Agriculture and Forestry University (FAFU), NO.15 Shangxiadian Road, Cangshan District, Fuzhou 0350002, China
| | - Changwei Shao
- State Key Laboratory of Maricultural Biobreeding and Sustainable Goods, Yellow Sea Fisheries Research Institute, Chinese Academy of Fishery Sciences, NO.106 Nanjing Road, Shinan District, Qingdao, Shandong 266071, China
| | - Guirong Wang
- Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Key Laboratory of Synthetic Biology, Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Buxin Road NO. 97, Dapeng District, Shenzhen 518124, China
| | - Tongjun Sun
- Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, NO. 7 Pengfei Road, Dapeng District, Shenzhen 518124, China
| | - Ningli Wang
- Department of Ophthalmology, Beijing Institute of Ophthalmology, Beijing Tongren Eye Center, Beijing Tongren Hospital, Capital Medical University, Dongjiaomin lane No1, Dongcheng District, Beijing 100101, China
| | - Gang Liu
- State Key Laboratory of Mycology, Institute of Microbiology, Chinese Academy of Sciences, NO.1 Beichen West Road, Chaoyang District, Beijing 100101, China
| | - Jian-Quan Ni
- Gene Regulatory Lab, School of Basic Medical Sciences, Tsinghua University, NO. 30 Shuangqing road, Haidian district, Beijing 100084, China
- State Key Laboratory of Molecular Oncology, Tsinghua University, NO. 30 Shuangqing road, Haidian district, Beijing 100084, China
- SXMU-Tsinghua Collaborative Innovation Center for Frontier Medicine, Shanxi Medical University, NO. 56 Xinjian South Road, Yingze District, Taiyuan 030001, China
| | - Yuwen Liu
- Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Key Laboratory of Livestock and Poultry Multi-Omics of MARA, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Buxin Road NO. 97, Dapeng District, Shenzhen 518124, China
- Innovation Group of Pig Genome Design and Breeding, Research Centre for Animal Genome, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Buxin Road NO. 97, Dapeng District, Shenzhen 518124, China
- Green Healthy Aquaculture Research Center, Kunpeng Institute of Modern Agriculture at Foshan, Chinese Academy of Agricultural Sciences, Building 26 Lihe Technology Park, Auxiliary Road of Xinxi Avenue South, Nanhai District, Foshan 528226, China
| |
Collapse
|
40
|
Wang Y, Kong S, Zhou C, Wang Y, Zhang Y, Fang Y, Li G. A review of deep learning models for the prediction of chromatin interactions with DNA and epigenomic profiles. Brief Bioinform 2024; 26:bbae651. [PMID: 39708837 DOI: 10.1093/bib/bbae651] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2024] [Revised: 10/29/2024] [Accepted: 12/03/2024] [Indexed: 12/23/2024] Open
Abstract
Advances in three-dimensional (3D) genomics have revealed the spatial characteristics of chromatin interactions in gene expression regulation, which is crucial for understanding molecular mechanisms in biological processes. High-throughput technologies like ChIA-PET, Hi-C, and their derivatives methods have greatly enhanced our knowledge of 3D chromatin architecture. However, the chromatin interaction mechanisms remain largely unexplored. Deep learning, with its powerful feature extraction and pattern recognition capabilities, offers a promising approach for integrating multi-omics data, to build accurate predictive models of chromatin interaction matrices. This review systematically summarizes recent advances in chromatin interaction matrix prediction models. By integrating DNA sequences and epigenetic signals, we investigate the latest developments in these methods. This article details various models, focusing on how one-dimensional (1D) information transforms into the 3D structure chromatin interactions, and how the integration of different deep learning modules specifically affects model accuracy. Additionally, we discuss the critical role of DNA sequence information and epigenetic markers in shaping 3D genome interaction patterns. Finally, this review addresses the challenges in predicting chromatin interaction matrices, in order to improve the precise mapping of chromatin interaction matrices and DNA sequence, and supporting the transformation and theoretical development of 3D genomics across biological systems.
Collapse
Affiliation(s)
- Yunlong Wang
- Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Key Laboratory of Livestock and Poultry Multi-omics of MARA, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, No. 97 Buxin Road, Dapeng New District, Shenzhen 518120, China
| | - Siyuan Kong
- Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Key Laboratory of Livestock and Poultry Multi-omics of MARA, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, No. 97 Buxin Road, Dapeng New District, Shenzhen 518120, China
| | - Cong Zhou
- Agricultural Bioinformatics Key Laboratory of Hubei Province, Huazhong Agricultural University, No. 1 Shizishan Street, Hongshan District, Wuhan 430070, China
- Hubei Engineering Technology Research Center of Agricultural Big Data, 3D Genomics Research Center, No. 1 Shizishan Street, Hongshan District, Wuhan 430070, China
- College of Informatics, Huazhong Agricultural University, No. 1 Shizishan Street, Hongshan District, Wuhan 430070, China
| | - Yanfang Wang
- State Key Laboratory of Animal Biotech Breeding, Institute of Animal Science, Chinese Academy of Agricultural Sciences (CAAS), No. 2 West Yuanmingyuan Rd, Haidian District, Beijing 100193, China
| | - Yubo Zhang
- Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Key Laboratory of Livestock and Poultry Multi-omics of MARA, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, No. 97 Buxin Road, Dapeng New District, Shenzhen 518120, China
- Sequencing Facility, Frederick National Laboratory for Cancer Research, 8560 Progress Drive, Frederick, MD 21701, United States
| | - Yaping Fang
- Agricultural Bioinformatics Key Laboratory of Hubei Province, Huazhong Agricultural University, No. 1 Shizishan Street, Hongshan District, Wuhan 430070, China
- Hubei Engineering Technology Research Center of Agricultural Big Data, 3D Genomics Research Center, No. 1 Shizishan Street, Hongshan District, Wuhan 430070, China
- College of Informatics, Huazhong Agricultural University, No. 1 Shizishan Street, Hongshan District, Wuhan 430070, China
| | - Guoliang Li
- Agricultural Bioinformatics Key Laboratory of Hubei Province, Huazhong Agricultural University, No. 1 Shizishan Street, Hongshan District, Wuhan 430070, China
- Hubei Engineering Technology Research Center of Agricultural Big Data, 3D Genomics Research Center, No. 1 Shizishan Street, Hongshan District, Wuhan 430070, China
- College of Informatics, Huazhong Agricultural University, No. 1 Shizishan Street, Hongshan District, Wuhan 430070, China
| |
Collapse
|
41
|
Islam UI, Campelo dos Santos AL, Kanjilal R, Assis R. Learning genotype-phenotype associations from gaps in multi-species sequence alignments. Brief Bioinform 2024; 26:bbaf022. [PMID: 39976386 PMCID: PMC11840556 DOI: 10.1093/bib/bbaf022] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2024] [Revised: 12/16/2024] [Accepted: 01/08/2025] [Indexed: 02/21/2025] Open
Abstract
Understanding the genetic basis of phenotypic variation is fundamental to biology. Here we introduce GAP, a novel machine learning framework for predicting binary phenotypes from gaps in multi-species sequence alignments. GAP employs a neural network to predict the presence or absence of phenotypes solely from alignment gaps, contrasting with existing tools that require additional and often inaccessible input data. GAP can be applied to three distinct problems: predicting phenotypes in species from known associated genomic regions, pinpointing positions within such regions that are important for predicting phenotypes, and extracting sets of candidate regions associated with phenotypes. We showcase the utility of GAP by exploiting the well-known association between the L-gulonolactone oxidase (Gulo) gene and vitamin C synthesis, demonstrating its perfect prediction accuracy in 34 vertebrates. This exceptional performance also applies more generally, with GAP achieving high accuracy and power on a large simulated dataset. Moreover, predictions of vitamin C synthesis in species with unknown status mirror their phylogenetic relationships, and positions with high predictive importance are consistent with those identified by previous studies. Last, a genome-wide application of GAP identifies many additional genes that may be associated with vitamin C synthesis, and analysis of these candidates uncovers functional enrichment for immunity, a widely recognized role of vitamin C. Hence, GAP represents a simple yet useful tool for predicting genotype-phenotype associations and addressing diverse evolutionary questions from data available in a broad range of study systems.
Collapse
Affiliation(s)
- Uwaise Ibna Islam
- Department of Electrical Engineering and Computer Science, Florida Atlantic University, Boca Raton, FL 33431, United States
| | - Andre Luiz Campelo dos Santos
- Department of Electrical Engineering and Computer Science, Florida Atlantic University, Boca Raton, FL 33431, United States
| | - Ria Kanjilal
- Department of Electrical Engineering and Computer Science, Florida Atlantic University, Boca Raton, FL 33431, United States
| | - Raquel Assis
- Department of Electrical Engineering and Computer Science, Florida Atlantic University, Boca Raton, FL 33431, United States
- Institute for Human Health and Disease Intervention, Florida Atlantic University, Boca Raton, FL 33431, United States
| |
Collapse
|
42
|
Liu Y, Zhong L, Yan B, Chen Z, Yu Y, Yu D, Qin J, Wang J. A self-attention-driven deep learning framework for inference of transcriptional gene regulatory networks. Brief Bioinform 2024; 26:bbae639. [PMID: 39679439 DOI: 10.1093/bib/bbae639] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/11/2024] [Revised: 10/15/2024] [Accepted: 11/25/2024] [Indexed: 12/17/2024] Open
Abstract
The interactions between transcription factors (TFs) and the target genes could provide a basis for constructing gene regulatory networks (GRNs) for mechanistic understanding of various biological complex processes. From gene expression data, particularly single-cell transcriptomic data containing rich cell-to-cell variations, it is highly desirable to infer TF-gene interactions (TGIs) using deep learning technologies. Numerous models or software including deep learning-based algorithms have been designed to identify transcriptional regulatory relationships between TFs and the downstream genes. However, these methods do not significantly improve predictions of TGIs due to some limitations regarding constructing underlying interactive structures linking regulatory components. In this study, we introduce a deep learning framework, DeepTGI, that encodes gene expression profiles from single-cell and/or bulk transcriptomic data and predicts TGIs with high accuracy. Our approach could fuse the features extracted from Auto-encoder with self-attention mechanism and other networks and could transform multihead attention modules to define representative features. By comparing it with other models or methods, DeepTGI exhibits its superiority to identify more potential TGIs and to reconstruct the GRNs and, therefore, could provide broader perspectives for discovery of more biological meaningful TGIs and for understanding transcriptional gene regulatory mechanisms.
Collapse
Affiliation(s)
- Yong Liu
- College of Electronic Information, Guangxi Minzu University, 188 East University Road, Nanning, Guangxi, 530006, China
| | - Le Zhong
- College of Electronic Information, Guangxi Minzu University, 188 East University Road, Nanning, Guangxi, 530006, China
| | - Bin Yan
- Division of Applied Oral Sciences & Community Dental Care, Faculty of Dentistry, The University of Hong Kong, 34 Hospital Road, Hong Kong SAR, China
| | - Zhuobin Chen
- School of Pharmaceutical Sciences (Shenzhen), Shenzhen Campus of Sun Yat-sen University, 66 Gongchang Road, Shenzhen, Guangdong, 518107, China
| | - Yanjia Yu
- College of Electronic Information, Guangxi Minzu University, 188 East University Road, Nanning, Guangxi, 530006, China
| | - Dan Yu
- Division of Applied Oral Sciences & Community Dental Care, Faculty of Dentistry, The University of Hong Kong, 34 Hospital Road, Hong Kong SAR, China
| | - Jing Qin
- School of Pharmaceutical Sciences (Shenzhen), Shenzhen Campus of Sun Yat-sen University, 66 Gongchang Road, Shenzhen, Guangdong, 518107, China
| | - Junwen Wang
- Division of Applied Oral Sciences & Community Dental Care, Faculty of Dentistry, The University of Hong Kong, 34 Hospital Road, Hong Kong SAR, China
- Department of Quantitative Health Sciences, Center for Individualized Medicine, and Mayo Clinic Comprehensive Cancer Center, Mayo Clinic, 13400 E Shea Blvd, Scottsdale, AZ, 85259, United States
| |
Collapse
|
43
|
Wong KH, Rodriguez NA, Traylor-Knowles N. Exploring the Unknown: How Can We Improve Single-cell RNAseq Cell Type Annotations in Non-model Organisms? Integr Comp Biol 2024; 64:1291-1299. [PMID: 39013613 DOI: 10.1093/icb/icae112] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2024] [Revised: 07/05/2024] [Accepted: 07/08/2024] [Indexed: 07/18/2024] Open
Abstract
Single-cell RNA sequencing (scRNAseq) is a powerful tool to describe cell types in multicellular organisms across the animal kingdom. In standard scRNAseq analysis pipelines, clusters of cells with similar transcriptional signatures are given cell type labels based on marker genes that infer specialized known characteristics. Since these analyses are designed for model organisms, such as humans and mice, problems arise when attempting to label cell types of distantly related, non-model species that have unique or divergent cell types. Consequently, this leads to limited discovery of novel species-specific cell types and potential mis-annotation of cell types in non-model species while using scRNAseq. To address this problem, we discuss recently published approaches that help annotate scRNAseq clusters for any non-model organism. We first suggest that annotating with an evolutionary context of cell lineages will aid in the discovery of novel cell types and provide a marker-free approach to compare cell types across distantly related species. Secondly, machine learning has greatly improved bioinformatic analyses, so we highlight some open-source programs that use reference-free approaches to annotate cell clusters. Lastly, we propose the use of unannotated genes as potential cell markers for non-model organisms, as many do not have fully annotated genomes and these data are often disregarded. Improving single-cell annotations will aid the discovery of novel cell types and enhance our understanding of non-model organisms at a cellular level. By unifying approaches to annotate cell types in non-model organisms, we can increase the confidence of cell annotation label transfer and the flexibility to discover novel cell types.
Collapse
Affiliation(s)
- Kevin H Wong
- Department of Marine Biology and Ecology, Rosenstiel School of Marine, Atmospheric, and Earth Science, University of Miami, Miami, Florida, USA, 33149
| | - Natalia Andrade Rodriguez
- Department of Marine Biology and Ecology, Rosenstiel School of Marine, Atmospheric, and Earth Science, University of Miami, Miami, Florida, USA, 33149
| | - Nikki Traylor-Knowles
- Department of Marine Biology and Ecology, Rosenstiel School of Marine, Atmospheric, and Earth Science, University of Miami, Miami, Florida, USA, 33149
| |
Collapse
|
44
|
Chen C, Quan J, Chen X, Yang T, Yu C, Ye S, Yang Y, Wu X, Jiang D, Weng Y. Explore key genes of Crohn's disease based on glycerophospholipid metabolism: A comprehensive analysis Utilizing Mendelian Randomization, Multi-Omics integration, Machine Learning, and SHAP methodology. Int Immunopharmacol 2024; 141:112905. [PMID: 39173401 DOI: 10.1016/j.intimp.2024.112905] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2024] [Revised: 07/25/2024] [Accepted: 08/05/2024] [Indexed: 08/24/2024]
Abstract
BACKGROUND AND AIMS Crohn's disease (CD) is a chronic, complex inflammatory condition with increasing incidence and prevalence worldwide. However, the causes of CD remain incompletely understood. We identified CD-related metabolites, inflammatory factors, and key genes by Mendelian randomization (MR), multi-omics integration, machine learning (ML), and SHAP. METHODS We first performed a mediation MR analysis on 1400 serum metabolites, 91 inflammatory factors, and CD. We found that certain phospholipids are causally related to CD. In the scRNA-seq data, monocytes were categorized into high and low metabolism groups based on their glycerophospholipid metabolism scores. The differentially expressed genes of these two groups of cells were extracted, and transcription factor prediction, cell communication analysis, and GSEA analysis were performed. After further screening of differentially expressed genes (FDR<0.05, log2FC>1), least absolute shrinkage and selection operator (LASSO) regression was performed to obtain hub genes. Models for hub genes were built using the Catboost, XGboost, and NGboost methods. Further, we used the SHAP method to interpret the models and obtain the gene with the highest contribution to each model. Finally, qRT-PCR was used to verify the expression of these genes in the peripheral blood mononuclear cells (PBMC) of CD patients and healthy subjects. RESULT MR results showed 1-palmitoyl-2-stearoyl-gpc (16:0/18:0) levels, 1-stearoyl-2-arachidonoyl-GPI (18:0/20:4) levels, 1-arachidonoyl-gpc (20:4n6) levels, 1-palmitoyl-2-arachidonoyl-gpc (16:0/20:4n6) levels, and 1-arachidonoyl-GPE (20:4n6) levels were significantly associated with CD risk reduction (FDR<0.05), with CXCL9 acting as a mediation between these phospholipids and CD. The analysis identified 19 hub genes, with Catboost, XGboost, and NGboost achieving AUC of 0.91, 0.88, and 0.85, respectively. The SHAP methodology obtained the three genes with the highest model contribution: G0S2, S100A8, and PLAUR. The qRT-PCR results showed that the expression levels of S100A8 (p = 0.0003), G0S2 (p < 0.0001), and PLAUR (p = 0.0141) in the PBMC of CD patients were higher than healthy subjects. CONCLUSION MR findings suggest that certain phospholipids may lower CD risk. G0S2, S100A8, and PLAUR may be potential pathogenic genes in CD. These phospholipids and genes could serve as novel diagnostic and therapeutic targets for CD.
Collapse
Affiliation(s)
- Changan Chen
- Department of Gastroenterology, Affiliated Hospital of Guangdong Medical University, Zhanjiang, Guangdong, PR China
| | - Juanhua Quan
- Department of Gastroenterology, Affiliated Hospital of Guangdong Medical University, Zhanjiang, Guangdong, PR China
| | - Xintian Chen
- Department of Gastroenterology, Affiliated Hospital of Guangdong Medical University, Zhanjiang, Guangdong, PR China
| | - Tingmei Yang
- Department of Gastroenterology, Affiliated Hospital of Guangdong Medical University, Zhanjiang, Guangdong, PR China
| | - Caiyuan Yu
- Department of Gastroenterology, Affiliated Hospital of Guangdong Medical University, Zhanjiang, Guangdong, PR China
| | - Shicai Ye
- Department of Gastroenterology, Affiliated Hospital of Guangdong Medical University, Zhanjiang, Guangdong, PR China
| | - Yuping Yang
- Department of Gastroenterology, Affiliated Hospital of Guangdong Medical University, Zhanjiang, Guangdong, PR China
| | - Xiu Wu
- Department of Gastroenterology, Affiliated Hospital of Guangdong Medical University, Zhanjiang, Guangdong, PR China
| | - Danxian Jiang
- Department of Medical Oncology, Affiliated Hospital of Guangdong Medical University, Zhanjiang, Guangdong, PR China.
| | - Yijie Weng
- Department of Gastroenterology, Affiliated Hospital of Guangdong Medical University, Zhanjiang, Guangdong, PR China.
| |
Collapse
|
45
|
Korbel F, Eroshok E, Ohler U. Interpreting deep neural networks for the prediction of translation rates. BMC Genomics 2024; 25:1061. [PMID: 39522049 PMCID: PMC11549864 DOI: 10.1186/s12864-024-10925-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/24/2024] [Accepted: 10/21/2024] [Indexed: 11/16/2024] Open
Abstract
BACKGROUND The 5' untranslated region of mRNA strongly impacts the rate of translation initiation. A recent convolutional neural network (CNN) model accurately quantifies the relationship between massively parallel synthetic 5' untranslated regions (5'UTRs) and translation levels. However, the underlying biological features, which drive model predictions, remain elusive. Uncovering sequence determinants predictive of translation output may allow us to develop a more detailed understanding of translation regulation at the 5'UTR. RESULTS Applying model interpretation, we extract representations of regulatory logic from CNNs trained on synthetic and human 5'UTR reporter data. We reveal a complex interplay of regulatory sequence elements, such as initiation context and upstream open reading frames (uORFs) to influence model predictions. We show that models trained on synthetic data alone do not sufficiently explain translation regulation via the 5'UTR due to differences in the frequency of regulatory motifs compared to natural 5'UTRs. CONCLUSIONS Our study demonstrates the significance of model interpretation in understanding model behavior, properties of experimental data and ultimately mRNA translation. By combining synthetic and human 5'UTR reporter data, we develop a model (OptMRL) which better captures the characteristics of human translation regulation. This approach provides a general strategy for building more successful sequence-based models of gene regulation, as it combines global sampling of random sequences with the subspace of naturally occurring sequences. Ultimately, this will enhance our understanding of 5'UTR sequences in disease and our ability to engineer translation output.
Collapse
Affiliation(s)
- Frederick Korbel
- Max-Delbrück-Center for Molecular Medicine in the Helmholtz Association (MDC), Berlin Institute for Medical Systems Biology (BIMSB), Hannoversche Straße 28, Berlin, 10115, Germany
| | - Ekaterina Eroshok
- Max-Delbrück-Center for Molecular Medicine in the Helmholtz Association (MDC), Berlin Institute for Medical Systems Biology (BIMSB), Hannoversche Straße 28, Berlin, 10115, Germany
- Department of Biology, Humboldt-Universität zu Berlin, Unter den Linden 6, Berlin, 10099, Germany
| | - Uwe Ohler
- Max-Delbrück-Center for Molecular Medicine in the Helmholtz Association (MDC), Berlin Institute for Medical Systems Biology (BIMSB), Hannoversche Straße 28, Berlin, 10115, Germany.
- Department of Biology, Humboldt-Universität zu Berlin, Unter den Linden 6, Berlin, 10099, Germany.
- Department of Computer Science, Humboldt-Universität zu Berlin, Unter den Linden 6, Berlin, 10099, Germany.
| |
Collapse
|
46
|
Kihlman R, Launonen I, Sillanpää MJ, Waldmann P. Sub-sampling graph neural networks for genomic prediction of quantitative phenotypes. G3 (BETHESDA, MD.) 2024; 14:jkae216. [PMID: 39250757 PMCID: PMC11540326 DOI: 10.1093/g3journal/jkae216] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/23/2024] [Accepted: 09/03/2024] [Indexed: 09/11/2024]
Abstract
In genomics, use of deep learning (DL) is rapidly growing and DL has successfully demonstrated its ability to uncover complex relationships in large biological and biomedical data sets. With the development of high-throughput sequencing techniques, genomic markers can now be allocated to large sections of a genome. By analyzing allele sharing between individuals, one may calculate realized genomic relationships from single-nucleotide polymorphisms (SNPs) data rather than relying on known pedigree relationships under polygenic model. The traditional approaches in genome-wide prediction (GWP) of quantitative phenotypes utilize genomic relationships in fixed global covariance modeling, possibly with some nonlinear kernel mapping (for example Gaussian processes). On the other hand, the DL approaches proposed so far for GWP fail to take into account the non-Euclidean graph structure of relationships between individuals over several generations. In this paper, we propose one global convolutional neural network (GCN) and one local sub-sampling architecture (GCN-RS) that are specifically designed to perform regression analysis based on genomic relationship information. A GCN is tailored to non-Euclidean spaces and consists of several layers of graph convolutions. The GCN-RS architecture is designed to further improve the GCN's performance by sub-sampling the graph to reduce the dimensionality of the input data. Through these graph convolutional layers, the GCN maps input genomic markers to their quantitative phenotype values. The graphs are constructed using an iterative nearest neighbor approach. Comparisons show that the GCN-RS outperforms the popular Genomic Best Linear Unbiased Predictor method on one simulated and three real datasets from wheat, mice and pig with a predictive improvement of 4.4% to 49.4% in terms of test mean squared error. This indicates that GCN-RS is a promising tool for genomic predictions in plants and animals. Furthermore, GCN-RS is computationally efficient, making it a viable option for large-scale applications.
Collapse
Affiliation(s)
- Ragini Kihlman
- Research Unit of Mathematical Sciences, University of Oulu, FI-90014 University of Oulu, Finland
| | - Ilkka Launonen
- Research Unit of Mathematical Sciences, University of Oulu, FI-90014 University of Oulu, Finland
| | - Mikko J Sillanpää
- Research Unit of Mathematical Sciences, University of Oulu, FI-90014 University of Oulu, Finland
| | - Patrik Waldmann
- Research Unit of Mathematical Sciences, University of Oulu, FI-90014 University of Oulu, Finland
| |
Collapse
|
47
|
Chen CC, Chan YM, Jeong H. REDalign: accurate RNA structural alignment using residual encoder-decoder network. BMC Bioinformatics 2024; 25:346. [PMID: 39501155 PMCID: PMC11539752 DOI: 10.1186/s12859-024-05956-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2024] [Accepted: 10/11/2024] [Indexed: 11/08/2024] Open
Abstract
BACKGROUND RNA secondary structural alignment serves as a foundational procedure in identifying conserved structural motifs among RNA sequences, crucially advancing our understanding of novel RNAs via comparative genomic analysis. While various computational strategies for RNA structural alignment exist, they often come with high computational complexity. Specifically, when addressing a set of RNAs with unknown structures, the task of simultaneously predicting their consensus secondary structure and determining the optimal sequence alignment requires an overwhelming computational effort of O ( L 6 ) for each RNA pair. Such an extremely high computational complexity makes these methods impractical for large-scale analysis despite their accurate alignment capabilities. RESULTS In this paper, we introduce REDalign, an innovative approach based on deep learning for RNA secondary structural alignment. By utilizing a residual encoder-decoder network, REDalign can efficiently capture consensus structures and optimize structural alignments. In this learning model, the encoder network leverages a hierarchical pyramid to assimilate high-level structural features. Concurrently, the decoder network, enhanced with residual skip connections, integrates multi-level encoded features to learn detailed feature hierarchies with fewer parameter sets. REDalign significantly reduces computational complexity compared to Sankoff-style algorithms and effectively handles non-nested structures, including pseudoknots, which are challenging for traditional alignment methods. Extensive evaluations demonstrate that REDalign provides superior accuracy and substantial computational efficiency. CONCLUSION REDalign presents a significant advancement in RNA secondary structural alignment, balancing high alignment accuracy with lower computational demands. Its ability to handle complex RNA structures, including pseudoknots, makes it an effective tool for large-scale RNA analysis, with potential implications for accelerating discoveries in RNA research and comparative genomics.
Collapse
Affiliation(s)
- Chun-Chi Chen
- Department of Electrical Engineering, National Chiayi University, No.300 Xuefu Rd, Chiayi City, 600355, Taiwan.
| | - Yi-Ming Chan
- MindtronicAI Co., 7 F., No. 218, Sec. 6, Roosevelt Road, Taipei, 11674, Taiwan
| | - Hyundoo Jeong
- Biomedical and Robotics Engineering, Incheon National University, 119 Academy-ro, Incheon, 22012, Yeonsu-gu, South Korea.
| |
Collapse
|
48
|
Wang S, Zhu Y, Zhou Z, Luo Y, Huang Y, Liu Y, Xu T. Integrated Ultrasound-Enrichment and Machine Learning in Colorimetric Lateral Flow Assay for Accurate and Sensitive Clinical Alzheimer's Biomarker Diagnosis. ADVANCED SCIENCE (WEINHEIM, BADEN-WURTTEMBERG, GERMANY) 2024; 11:e2406196. [PMID: 39297315 PMCID: PMC11558096 DOI: 10.1002/advs.202406196] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/07/2024] [Revised: 09/05/2024] [Indexed: 11/14/2024]
Abstract
The colloidal gold nanoparticle (AuNP)-based colorimetric lateral flow assay (LFA) is one of the most promising analytical tools for point-of-care disease diagnosis. However, the low sensitivity and insufficient accuracy still limit its clinical application. In this work, a machine learning (ML)-optimized colorimetric LFA with ultrasound enrichment is developed to achieve the sensitive and accurate detection of tau proteins for early screening of Alzheimer's disease (AD). The LFA device is integrated with a portable ultrasonic actuator to rapidly enrich microparticles using ultrasound, which is essential for sample pre-enrichment to improve the sensitivity, followed by ML algorithms to classify and predict the enhanced colorimetric signals. The results of the undiluted serum sample testing show that the protocol enables efficient classification and accurate quantification of the AD biomarker tau protein concentration with an average classification accuracy of 98.11% and an average prediction accuracy of 99.99%, achieving a limit of detection (LOD) as sensitive as 10.30 pg mL-1. Further point-of-care testing (POCT) of human plasma samples demonstrates the potential use of LFA in clinical trials. Such a reliable lateral flow immunosensor with high precision and superb sensing performance is expected to put LFA in perspective as an AD clinical diagnostic platform.
Collapse
Affiliation(s)
- Shuqing Wang
- School of Biomedical EngineeringCollege of Chemistry and Environmental EngineeringThe Institute for Advanced Study (IAS)Shenzhen UniversityShenzhenGuangdong518060P. R. China
| | - Yan Zhu
- School of Biomedical EngineeringCollege of Chemistry and Environmental EngineeringThe Institute for Advanced Study (IAS)Shenzhen UniversityShenzhenGuangdong518060P. R. China
| | - Zhongzeng Zhou
- School of Biomedical EngineeringCollege of Chemistry and Environmental EngineeringThe Institute for Advanced Study (IAS)Shenzhen UniversityShenzhenGuangdong518060P. R. China
| | - Yong Luo
- Beijing Key Laboratory for Bioengineering and Sensing TechnologyUniversity of Science and Technology BeijingBeijing100083P. R. China
| | - Yan Huang
- Beijing Key Laboratory for Bioengineering and Sensing TechnologyUniversity of Science and Technology BeijingBeijing100083P. R. China
| | - Yibiao Liu
- Longgang District Central Hospital of ShenzhenShenzhenGuangdong518116P. R. China
| | - Tailin Xu
- School of Biomedical EngineeringCollege of Chemistry and Environmental EngineeringThe Institute for Advanced Study (IAS)Shenzhen UniversityShenzhenGuangdong518060P. R. China
| |
Collapse
|
49
|
Moslemi C, Sækmose S, Larsen R, Brodersen T, Bay JT, Didriksen M, Nielsen KR, Bruun MT, Dowsett J, Dinh KM, Mikkelsen C, Hyvärinen K, Ritari J, Partanen J, Ullum H, Erikstrup C, Ostrowski SR, Olsson ML, Pedersen OB. A deep learning approach to prediction of blood group antigens from genomic data. Transfusion 2024; 64:2179-2195. [PMID: 39268576 DOI: 10.1111/trf.18013] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2023] [Revised: 07/17/2024] [Accepted: 08/27/2024] [Indexed: 09/17/2024]
Abstract
BACKGROUND Deep learning methods are revolutionizing natural science. In this study, we aim to apply such techniques to develop blood type prediction models based on cheap to analyze and easily scalable screening array genotyping platforms. METHODS Combining existing blood types from blood banks and imputed screening array genotypes for ~111,000 Danish and 1168 Finnish blood donors, we used deep learning techniques to train and validate blood type prediction models for 36 antigens in 15 blood group systems. To account for missing genotypes a denoising autoencoder initial step was utilized, followed by a convolutional neural network blood type classifier. RESULTS Two thirds of the trained blood type prediction models demonstrated an F1-accuracy above 99%. Models for antigens with low or high frequencies like, for example, Cw, low training cohorts like, for example, Cob, or very complicated genetic underpinning like, for example, RhD, proved to be more challenging for high accuracy (>99%) DL modeling. However, in the Danish cohort only 4 out of 36 models (Cob, Cw, D-weak, Kpa) failed to achieve a prediction F1-accuracy above 97%. This high predictive performance was replicated in the Finnish cohort. DISCUSSION High accuracy in a variety of blood groups proves viability of deep learning-based blood type prediction using array chip genotypes, even in blood groups with nontrivial genetic underpinnings. These techniques are suitable for aiding in identifying blood donors with rare blood types by greatly narrowing down the potential pool of candidate donors before clinical grade confirmation.
Collapse
Affiliation(s)
- Camous Moslemi
- Department of Clinical Immunology, Zealand University Hospital, Køge, Denmark
- Institute of Science and Environment, Roskilde University, Roskilde, Denmark
| | - Susanne Sækmose
- Department of Clinical Immunology, Zealand University Hospital, Køge, Denmark
| | - Rune Larsen
- Department of Clinical Immunology, Zealand University Hospital, Køge, Denmark
| | - Thorsten Brodersen
- Department of Clinical Immunology, Zealand University Hospital, Køge, Denmark
| | - Jakob T Bay
- Department of Clinical Immunology, Zealand University Hospital, Køge, Denmark
| | - Maria Didriksen
- Department of Clinical Immunology, Copenhagen University Hospital, Rigshopitalet, Copenhagen, Denmark
| | - Kaspar R Nielsen
- Department of Clinical Immunology, Aalborg University Hospital, Aalborg, Denmark
| | - Mie T Bruun
- Department of Clinical Immunology, Odense University Hospital, Odense, Denmark
| | - Joseph Dowsett
- Department of Clinical Immunology, Copenhagen University Hospital, Rigshopitalet, Copenhagen, Denmark
| | - Khoa M Dinh
- Department of Clinical Immunology, Aarhus University Hospital, Aarhus, Denmark
| | - Christina Mikkelsen
- Department of Clinical Immunology, Copenhagen University Hospital, Rigshopitalet, Copenhagen, Denmark
| | | | - Jarmo Ritari
- Finnish Red Cross Blood Service, Helsinki, Finland
| | | | | | - Christian Erikstrup
- Department of Clinical Immunology, Aarhus University Hospital, Aarhus, Denmark
- Department of Clinical Medicine, Aarhus University, Aarhus, Denmark
| | - Sisse R Ostrowski
- Department of Clinical Immunology, Copenhagen University Hospital, Rigshopitalet, Copenhagen, Denmark
- Department of Clinical Medicine, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark
| | - Martin L Olsson
- Department of Laboratory Medicine, Lund University, Lund, Sweden
- Department of Clinical Immunology and Transfusion, Office for Medical Services, Region Skåne, Sweden
| | - Ole B Pedersen
- Department of Clinical Immunology, Zealand University Hospital, Køge, Denmark
- Department of Clinical Medicine, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark
| |
Collapse
|
50
|
Benvenuti JL, Casa PL, Pessi de Abreu F, Martinez GS, de Avila E Silva S. From straight to curved: A historical perspective of DNA shape. PROGRESS IN BIOPHYSICS AND MOLECULAR BIOLOGY 2024; 193:46-54. [PMID: 39260792 DOI: 10.1016/j.pbiomolbio.2024.09.002] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/13/2024] [Revised: 07/30/2024] [Accepted: 09/04/2024] [Indexed: 09/13/2024]
Abstract
DNA is the macromolecule responsible for storing the genetic information of a cell and it has intrinsic properties such as deformability, stability and curvature. DNA Curvature plays an important role in gene transcription and, consequently, in the subsequent production of proteins, a fundamental process of cells. With recent advances in bioinformatics and theoretical biology, it became possible to analyze and understand the involvement of DNA Curvature as a discriminatory characteristic of gene-promoting regions. These regions act as sites where RNAp (ribonucleic acid-polymerase) binds to initiate transcription. This review aims to describe the formation of Curvature, as well as highlight its importance in predicting promoters. Furthermore, this article provides the potential of DNA Curvature as a distinguishing feature for promoter prediction tools, as well as outlining the calculation procedures that have been described by other researchers. This work may support further studies directed towards the enhancement of promoter prediction software.
Collapse
Affiliation(s)
- Jean Lucas Benvenuti
- Universidade de Caxias do Sul. Petrópolis, Caxias do Sul, Rio Grande do Sul, Brazil.
| | - Pedro Lenz Casa
- Universidade de Caxias do Sul. Petrópolis, Caxias do Sul, Rio Grande do Sul, Brazil
| | - Fernanda Pessi de Abreu
- Universidade de Caxias do Sul. Petrópolis, Caxias do Sul, Rio Grande do Sul, Brazil; Instituto de Biociências, Programa de Pós-Graduação em Genética e Biologia Molecular, Universidade Federal do Rio Grande do Sul, Porto Alegre, Rio Grande do Sul, Brazil
| | | | | |
Collapse
|