1
|
Shokor F, Croiseau P, Gangloff H, Saintilan R, Tribout T, Mary-Huard T, Cuyabano BCD. Deep learning and genomic best linear unbiased prediction integration: An approach to identify potential nonlinear genetic relationships between traits. J Dairy Sci 2025; 108:6174-6189. [PMID: 40252763 DOI: 10.3168/jds.2024-26057] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/21/2024] [Accepted: 03/24/2025] [Indexed: 04/21/2025]
Abstract
Genomic prediction (GP) aims to predict the breeding values of multiple complex traits, usually assumed to be multivariate normally distributed by the largely used statistical methods, thus imposing linear genetic relationships between traits. Although these methods are valuable for GP, they do not account for potential nonlinear genetic relationships between traits in scenarios. For individual traits, this oversight may minimally affect prediction accuracy, but it can limit genetic progress when selection involves multiple traits. Deep learning (DL) offers a promising alternative for capturing nonlinear genetic relationships due to its ability to identify complex patterns without prior assumptions about the data structure. We proposed a novel hybrid model that that combines both DL and GBLUP (DLGBLUP), which uses the output of the traditional GBLUP, and enhances its predicted genetic values (PGV) by accounting for nonlinear genetic relationships between traits using DL. We simulated data with linear and nonlinear genetic relationships between traits in order to verify whether DLGBLUP was able to identify nonlinearity when present and avoid inducing it when absent. We found that DLGBLUP consistently provided more accurate PGV for traits simulated with strong nonlinear genetic relationships, accurately identifying these relationships. Over 7 generations of selection, a greater genetic progress was achieved with PGV that accounted for nonlinear relationships (DLGBLUP), compared with GBLUP. When applied to a real dataset from the French Holstein dairy cattle population, DLGBLUP detected nonlinear genetic relationships between pairs of traits, such as conception rate and protein content, and SCC and fat yield, although, no significant increase in prediction accuracy was observed. The integration of DL into GP enabled the modeling of nonlinear genetic relationships between traits, a possibility not previously discussed, given the linear nature of GBLUP. The detection of nonlinear genetic relationships between traits in the French Holstein population when using DLGBLUP indicates the presence of such relationships in real breeding data, suggesting that it may be relevant to further explore nonlinear relationships. This possibility of nonlinear genetic relationships between traits offers a different perspective into multitrait evaluations, with potential to further improve selection strategies in commercial livestock breeding programs. This is particularly relevant when integrating new traits into multitrait evaluations or incorporating new subpopulations, which may introduce different forms of nonlinearity. Finally, it is shown that DL can be used as a complement to the statistical methods deployed in routine genetic evaluations, rather than as an alternative, by enhancing their performance.
Collapse
Affiliation(s)
- F Shokor
- Eliance, 75012 Paris, France; Université Paris-Saclay, INRAE, AgroParisTech, GABI, 78350 Jouy-en-Josas, France.
| | - P Croiseau
- Université Paris-Saclay, INRAE, AgroParisTech, GABI, 78350 Jouy-en-Josas, France
| | - H Gangloff
- Université Paris-Saclay, INRAE, AgroParisTech, UMR MIA Paris-Saclay, 91120 Palaiseau, France
| | - R Saintilan
- Eliance, 75012 Paris, France; Université Paris-Saclay, INRAE, AgroParisTech, GABI, 78350 Jouy-en-Josas, France
| | - T Tribout
- Université Paris-Saclay, INRAE, AgroParisTech, GABI, 78350 Jouy-en-Josas, France
| | - T Mary-Huard
- Université Paris-Saclay, INRAE, AgroParisTech, UMR MIA Paris-Saclay, 91120 Palaiseau, France; Université Paris-Saclay, INRAE, CNRS, AgroParisTech, GQE-Le Moulon, 91190 Gif-sur-Yvette, France
| | - B C D Cuyabano
- Université Paris-Saclay, INRAE, AgroParisTech, GABI, 78350 Jouy-en-Josas, France
| |
Collapse
|
2
|
Hu G, Zhou T, Zhou P, Yau SST. Novel natural vector with asymmetric covariance for classifying biological sequences. Gene 2025; 962:149532. [PMID: 40367998 DOI: 10.1016/j.gene.2025.149532] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2025] [Revised: 04/07/2025] [Accepted: 04/23/2025] [Indexed: 05/16/2025]
Abstract
The genome sequences of organisms form a large and complex landscape, presenting a significant challenge in bioinformatics: how to utilize mathematical tools to describe and analyze this space effectively. The ability to compare relationships between different organisms depends on creating a rational mapping rule that can uniformly encode genome sequences of varying lengths as vectors in a measurable space. This mapping would enable researchers to apply modern mathematical and machine learning techniques to otherwise challenging genomic comparisons. The natural vector method has been proposed as a concise and effective approach to accomplish this. However, its various iterations have certain limitations. In response, we carefully analyze the strengths and weaknesses of these natural vector methods and propose an improved version-an asymmetric covariance natural vector method (ACNV). This new method incorporates k-mer information alongside covariance computations with asymmetric properties between base positions. We tested ACNV on microbial genome sequence datasets, including bacterial, fungal, and viral sequences, evaluating its performance in terms of classification accuracy and convex hull separation. The results demonstrate that ACNV effectively captures sequence characteristics, showcasing its robust sequence representation capabilities and highlighting its elegant geometric properties.
Collapse
Affiliation(s)
- Guoqing Hu
- Beijing Institute of Mathematical Sciences and Applications (BIMSA), 101408, Beijing, China.
| | - Tao Zhou
- Department of Mathematical Sciences, Tsinghua University, 100084, Beijing, China
| | - Piyu Zhou
- Beijing Institute of Mathematical Sciences and Applications (BIMSA), 101408, Beijing, China; State Key Laboratory of Mathematical Science, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, 100190, Beijing, China; University of Chinese Academy of Sciences, 100049, Beijing, China
| | - Stephen Shing-Toung Yau
- Beijing Institute of Mathematical Sciences and Applications (BIMSA), 101408, Beijing, China; Department of Mathematical Sciences, Tsinghua University, 100084, Beijing, China.
| |
Collapse
|
3
|
Zavorskas J, Edwards H, Marten MR, Harris S, Srivastava R. Generalizable Metamaterials Design Techniques Inspire Efficient Mycelial Materials Inverse Design. ACS Biomater Sci Eng 2025; 11:1897-1920. [PMID: 39898596 DOI: 10.1021/acsbiomaterials.4c01986] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/04/2025]
Abstract
Fungal mycelial materials can mimic numerous nonrenewable materials; they are even capable of outperforming certain materials at their own applications. Fungi's versatility makes mock leather, bricks, wood, foam, meats, and many other products possible. That said, there is currently a critical need to develop efficient mycelial materials design techniques. In mycelial materials, and the wider field of biomaterials, design is primarily limited to costly forward techniques. New mycelial materials could be developed faster and cheaper with robust inverse design techniques, which are not currently used within the field. However, computational inverse design techniques will not be tractable unless clear and concrete design parameters are defined for fungi, derived from genotype and bulk phenotype characteristics. Through mycelial materials case studies and a comprehensive review of metamaterials design techniques, we identify three critical needs that must be addressed to implement computational inverse design in mycelial materials. These critical needs are the following: 1) heuristic search/optimization algorithms, 2) efficient mathematical modeling, and 3) dimensionality reduction techniques. Metamaterials researchers already use many of these computational techniques that can be adapted for mycelial materials inverse design. Then, we suggest mycelium-specific parameters as well as how to measure and use them. Ultimately, based on a review of metamaterials research and the current state of mycelial materials design, we synthesize a generalizable inverse design paradigm that can be applied to mycelial materials or related design fields.
Collapse
Affiliation(s)
- Joseph Zavorskas
- Department of Chemical and Biomolecular Engineering, University of Connecticut, 191 Auditorium Rd, U-3222, Storrs, Connecticut 06269, United States
| | - Harley Edwards
- Department of Chemical, Biochemical, and Environmental Engineering, University of Maryland, Baltimore County, 1000 Hilltop Circle, Baltimore, Maryland 21250, United States
| | - Mark R Marten
- Department of Chemical, Biochemical, and Environmental Engineering, University of Maryland, Baltimore County, 1000 Hilltop Circle, Baltimore, Maryland 21250, United States
| | - Steven Harris
- Department of Plant Pathology, Entomology, and Microbiology, Iowa State University, 2213 Pammel Dr, Ames, Iowa 50011, United States
| | - Ranjan Srivastava
- Department of Chemical and Biomolecular Engineering, University of Connecticut, 191 Auditorium Rd, U-3222, Storrs, Connecticut 06269, United States
| |
Collapse
|
4
|
Lee I, Wallace ZS, Wang Y, Park S, Nam H, Majithia AR, Ideker T. A genotype-phenotype transformer to assess and explain polygenic risk. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2025:2024.10.23.619940. [PMID: 40291728 PMCID: PMC12026415 DOI: 10.1101/2024.10.23.619940] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/30/2025]
Abstract
Genome-wide association studies have linked millions of genetic variants to biomedical phenotypes, but their utility has been limited by lack of mechanistic understanding and widespread epistatic interactions. Recently, Transformer models have emerged as a powerful machine learning architecture with potential to address these and other challenges. Accordingly, here we introduce the Genotype-to-Phenotype Transformer (G2PT), a framework for modeling hierarchical information flow among variants, genes, multigenic systems, and phenotypes. As proof-of-concept, we use G2PT to model the genetics of TG/HDL (triglycerides to high-density lipoprotein cholesterol), an indicator of metabolic health. G2PT predicts this trait via attention to 1,395 variants underlying at least 20 systems, including immune response and cholesterol transport, with accuracy exceeding state-of-the-art. It implicates 40 epistatic interactions, including epistasis between APOA4 and CETP in phospholipid transfer, a target pathway for cholesterol modification. This work positions hierarchical graph transformers as a next-generation approach to polygenic risk.
Collapse
|
5
|
Zheng W, Ma W, Chen Z, Wang C, Sun T, Dong W, Zhang W, Zhang S, Tang Z, Li K, Zhao Y, Liu Y. DPImpute: A Genotype Imputation Framework for Ultra-Low Coverage Whole-Genome Sequencing and its Application in Genomic Selection. ADVANCED SCIENCE (WEINHEIM, BADEN-WURTTEMBERG, GERMANY) 2025; 12:e2412482. [PMID: 40013759 PMCID: PMC12021046 DOI: 10.1002/advs.202412482] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/07/2024] [Revised: 02/05/2025] [Indexed: 02/28/2025]
Abstract
Whole-genome sequencing is pivotal for elucidating the complex relationships between genotype and phenotype. However, its widespread application is hindered by the high sequencing depth and large sample sizes required, especially for genomic selection (GS) reliant on precise phenotype prediction from high-density genotype data. To address this, DPImpute (Dual-Phase Impute) is developed, an two-step imputation pipeline enabling accurate whole-genome SNP genotyping under ultra-low coverage whole-genome sequencing (ulcWGS) depths, small testing sample sizes, and limited reference populations. DPImpute achieved 98.06% SNP imputation accuracy with minimal testing samples (≤10), reference samples (≤100), and an ultra-low sequencing depth of 0.3X, surpassing the accuracy of existing imputation methods. Moreover, this high accuracy is maintained across multi-ancestry human populations. Remarkably, DPImpute demonstrated accurate SNP imputation from low-coverage sequencing data from single blood cells and single blastocyst cells, highlighting its potential in embryo GS. To enhance the accessibility of DPImpute, a user-friendly web server (https://agdb.ecenr.com/DPImpute/home) is developed and a Docker container for seamless implementation. In summary, DPImpute can significantly expedite breeding programs through precise and cost-effective genotyping and serve as a valuable tool for diverse population genotyping, encompassing both human and animal studies.
Collapse
Affiliation(s)
- Weigang Zheng
- Key Laboratory of Agricultural Animal GeneticsBreeding and Reproduction of Ministry of Education & Key Lab of Swine Genetics and Breeding of Ministry of Agriculture and Rural AffairsCollege of Animal Science and TechnologyHuazhong Agricultural UniversityWuhan430070China
- Shenzhen BranchGuangdong Laboratory for Lingnan Modern AgricultureKey Laboratory of Livestock and Poultry Multi‐Omics of MARAAgricultural Genomics Institute at ShenzhenChinese Academy of Agricultural SciencesShenzhen518124China
- Innovation Group of Pig Genome Design and BreedingResearch Centre for Animal GenomeAgricultural Genomics Institute at ShenzhenChinese Academy of Agricultural SciencesShenzhen518124China
| | - Wenlong Ma
- Shenzhen BranchGuangdong Laboratory for Lingnan Modern AgricultureKey Laboratory of Livestock and Poultry Multi‐Omics of MARAAgricultural Genomics Institute at ShenzhenChinese Academy of Agricultural SciencesShenzhen518124China
- Innovation Group of Pig Genome Design and BreedingResearch Centre for Animal GenomeAgricultural Genomics Institute at ShenzhenChinese Academy of Agricultural SciencesShenzhen518124China
| | - Zhilong Chen
- Shenzhen BranchGuangdong Laboratory for Lingnan Modern AgricultureKey Laboratory of Livestock and Poultry Multi‐Omics of MARAAgricultural Genomics Institute at ShenzhenChinese Academy of Agricultural SciencesShenzhen518124China
- Innovation Group of Pig Genome Design and BreedingResearch Centre for Animal GenomeAgricultural Genomics Institute at ShenzhenChinese Academy of Agricultural SciencesShenzhen518124China
| | - Chao Wang
- Key Laboratory of Agricultural Animal GeneticsBreeding and Reproduction of Ministry of Education & Key Lab of Swine Genetics and Breeding of Ministry of Agriculture and Rural AffairsCollege of Animal Science and TechnologyHuazhong Agricultural UniversityWuhan430070China
- Shenzhen BranchGuangdong Laboratory for Lingnan Modern AgricultureKey Laboratory of Livestock and Poultry Multi‐Omics of MARAAgricultural Genomics Institute at ShenzhenChinese Academy of Agricultural SciencesShenzhen518124China
- Innovation Group of Pig Genome Design and BreedingResearch Centre for Animal GenomeAgricultural Genomics Institute at ShenzhenChinese Academy of Agricultural SciencesShenzhen518124China
| | - Tao Sun
- Shenzhen BranchGuangdong Laboratory for Lingnan Modern AgricultureKey Laboratory of Livestock and Poultry Multi‐Omics of MARAAgricultural Genomics Institute at ShenzhenChinese Academy of Agricultural SciencesShenzhen518124China
- Innovation Group of Pig Genome Design and BreedingResearch Centre for Animal GenomeAgricultural Genomics Institute at ShenzhenChinese Academy of Agricultural SciencesShenzhen518124China
| | - Wenjun Dong
- Key Laboratory of Agricultural Animal GeneticsBreeding and Reproduction of Ministry of Education & Key Lab of Swine Genetics and Breeding of Ministry of Agriculture and Rural AffairsCollege of Animal Science and TechnologyHuazhong Agricultural UniversityWuhan430070China
- Shenzhen BranchGuangdong Laboratory for Lingnan Modern AgricultureKey Laboratory of Livestock and Poultry Multi‐Omics of MARAAgricultural Genomics Institute at ShenzhenChinese Academy of Agricultural SciencesShenzhen518124China
- Innovation Group of Pig Genome Design and BreedingResearch Centre for Animal GenomeAgricultural Genomics Institute at ShenzhenChinese Academy of Agricultural SciencesShenzhen518124China
| | - Wenjing Zhang
- State Key Laboratory of Swine and Poultry Breeding IndustryNational Engineering Research Center for Breeding Swine IndustryGuangdong Provincial Key Lab of Agro‐Animal Genomics and Molecular BreedingCollege of Animal ScienceSouth China Agricultural UniversityGuangzhou510642China
| | - Song Zhang
- Shenzhen BranchGuangdong Laboratory for Lingnan Modern AgricultureKey Laboratory of Livestock and Poultry Multi‐Omics of MARAAgricultural Genomics Institute at ShenzhenChinese Academy of Agricultural SciencesShenzhen518124China
- Innovation Group of Pig Genome Design and BreedingResearch Centre for Animal GenomeAgricultural Genomics Institute at ShenzhenChinese Academy of Agricultural SciencesShenzhen518124China
| | - Zhonglin Tang
- Shenzhen BranchGuangdong Laboratory for Lingnan Modern AgricultureKey Laboratory of Livestock and Poultry Multi‐Omics of MARAAgricultural Genomics Institute at ShenzhenChinese Academy of Agricultural SciencesShenzhen518124China
- Innovation Group of Pig Genome Design and BreedingResearch Centre for Animal GenomeAgricultural Genomics Institute at ShenzhenChinese Academy of Agricultural SciencesShenzhen518124China
- Kunpeng Institute of Modern Agriculture at FoshanChinese Academy of Agricultural SciencesFoshan528226China
| | - Kui Li
- Shenzhen BranchGuangdong Laboratory for Lingnan Modern AgricultureKey Laboratory of Livestock and Poultry Multi‐Omics of MARAAgricultural Genomics Institute at ShenzhenChinese Academy of Agricultural SciencesShenzhen518124China
- Innovation Group of Pig Genome Design and BreedingResearch Centre for Animal GenomeAgricultural Genomics Institute at ShenzhenChinese Academy of Agricultural SciencesShenzhen518124China
| | - Yunxiang Zhao
- Guangxi Key Laboratory of Animal Breeding, Disease Control and Prevention, College of Animal Science and TechnologyGuangxi UniversityNanning530004China
| | - Yuwen Liu
- Key Laboratory of Agricultural Animal GeneticsBreeding and Reproduction of Ministry of Education & Key Lab of Swine Genetics and Breeding of Ministry of Agriculture and Rural AffairsCollege of Animal Science and TechnologyHuazhong Agricultural UniversityWuhan430070China
- Shenzhen BranchGuangdong Laboratory for Lingnan Modern AgricultureKey Laboratory of Livestock and Poultry Multi‐Omics of MARAAgricultural Genomics Institute at ShenzhenChinese Academy of Agricultural SciencesShenzhen518124China
- Innovation Group of Pig Genome Design and BreedingResearch Centre for Animal GenomeAgricultural Genomics Institute at ShenzhenChinese Academy of Agricultural SciencesShenzhen518124China
- Kunpeng Institute of Modern Agriculture at FoshanChinese Academy of Agricultural SciencesFoshan528226China
| |
Collapse
|
6
|
Thapa K, Kinali M, Pei S, Luna A, Babur Ö. Strategies to include prior knowledge in omics analysis with deep neural networks. PATTERNS (NEW YORK, N.Y.) 2025; 6:101203. [PMID: 40182174 PMCID: PMC11963003 DOI: 10.1016/j.patter.2025.101203] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 04/05/2025]
Abstract
High-throughput molecular profiling technologies have revolutionized molecular biology research in the past decades. One important use of molecular data is to make predictions of phenotypes and other features of the organisms using machine learning algorithms. Deep learning models have become increasingly popular for this task due to their ability to learn complex non-linear patterns. Applying deep learning to molecular profiles, however, is challenging due to the very high dimensionality of the data and relatively small sample sizes, causing models to overfit. A solution is to incorporate biological prior knowledge to guide the learning algorithm for processing the functionally related input together. This helps regularize the models and improve their generalizability and interpretability. Here, we describe three major strategies proposed to use prior knowledge in deep learning models to make predictions based on molecular profiles. We review the related deep learning architectures, including the major ideas in relatively new graph neural networks.
Collapse
Affiliation(s)
- Kisan Thapa
- Computer Science Department, University of Massachusetts Boston, 100 Morrissey Boulevard, Boston, MA 02125, USA
| | - Meric Kinali
- Computer Science Department, University of Massachusetts Boston, 100 Morrissey Boulevard, Boston, MA 02125, USA
| | - Shichao Pei
- Computer Science Department, University of Massachusetts Boston, 100 Morrissey Boulevard, Boston, MA 02125, USA
| | - Augustin Luna
- Developmental Therapeutics Branch, Center for Cancer Research, National Cancer Institute, NIH, 9000 Rockville Pike, Bathesda, MD 20892, USA
- Computational Biology Branch, National Library of Medicine, NIH, 9000 Rockville Pike, Bathesda, MD 20892, USA
| | - Özgün Babur
- Computer Science Department, University of Massachusetts Boston, 100 Morrissey Boulevard, Boston, MA 02125, USA
| |
Collapse
|
7
|
Xi X, Li J, Jia J, Meng Q, Li C, Wang X, Wei L, Zhang X. A mechanism-informed deep neural network enables prioritization of regulators that drive cell state transitions. Nat Commun 2025; 16:1284. [PMID: 39900922 PMCID: PMC11790924 DOI: 10.1038/s41467-025-56475-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2024] [Accepted: 01/15/2025] [Indexed: 02/05/2025] Open
Abstract
Cells are regulated at multiple levels, from regulations of individual genes to interactions across multiple genes. Some recent neural network models can connect molecular changes to cellular phenotypes, but their design lacks modeling of regulatory mechanisms, limiting the decoding of regulations behind key cellular events, such as cell state transitions. Here, we present regX, a deep neural network incorporating both gene-level regulation and gene-gene interaction mechanisms, which enables prioritizing potential driver regulators of cell state transitions and providing mechanistic interpretations. Applied to single-cell multi-omics data on type 2 diabetes and hair follicle development, regX reliably prioritizes key transcription factors and candidate cis-regulatory elements that drive cell state transitions. Some regulators reveal potential new therapeutic targets, drug repurposing possibilities, and putative causal single nucleotide polymorphisms. This method to analyze single-cell multi-omics data demonstrates how the interpretable design of neural networks can better decode biological systems.
Collapse
Affiliation(s)
- Xi Xi
- MOE Key Laboratory of Bioinformatics and Bioinformatics Division, BNRIST / Department of Automation, Tsinghua University, Beijing, China
| | - Jiaqi Li
- MOE Key Laboratory of Bioinformatics and Bioinformatics Division, BNRIST / Department of Automation, Tsinghua University, Beijing, China
| | - Jinmeng Jia
- MOE Key Laboratory of Bioinformatics and Bioinformatics Division, BNRIST / Department of Automation, Tsinghua University, Beijing, China
| | - Qiuchen Meng
- MOE Key Laboratory of Bioinformatics and Bioinformatics Division, BNRIST / Department of Automation, Tsinghua University, Beijing, China
| | - Chen Li
- MOE Key Laboratory of Bioinformatics and Bioinformatics Division, BNRIST / Department of Automation, Tsinghua University, Beijing, China
| | - Xiaowo Wang
- MOE Key Laboratory of Bioinformatics and Bioinformatics Division, BNRIST / Department of Automation, Tsinghua University, Beijing, China
| | - Lei Wei
- MOE Key Laboratory of Bioinformatics and Bioinformatics Division, BNRIST / Department of Automation, Tsinghua University, Beijing, China
| | - Xuegong Zhang
- MOE Key Laboratory of Bioinformatics and Bioinformatics Division, BNRIST / Department of Automation, Tsinghua University, Beijing, China.
- School of Life Sciences, Tsinghua University, Beijing, China.
| |
Collapse
|
8
|
Gromiha MM, Pandey M, Kulandaisamy A, Sharma D, Ridha F. Progress on the development of prediction tools for detecting disease causing mutations in proteins. Comput Biol Med 2025; 185:109510. [PMID: 39637461 DOI: 10.1016/j.compbiomed.2024.109510] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2024] [Revised: 11/27/2024] [Accepted: 11/29/2024] [Indexed: 12/07/2024]
Abstract
Proteins are involved in a variety of functions in living organisms. The mutation of amino acid residues in a protein alters its structure, stability, binding, and function, with some mutations leading to diseases. Understanding the influence of mutations on protein structure and function help to gain deep insights on the molecular mechanism of diseases and devising therapeutic strategies. Hence, several generic and disease-specific methods have been proposed to reveal pathogenic effects on mutations. In this review, we focus on the development of prediction methods for identifying disease causing mutations in proteins. We briefly outline the existing databases for disease-causing mutations, followed by a discussion on sequence- and structure-based features used for prediction. Further, we discuss computational tools based on machine learning, deep learning and large language models for detecting disease-causing mutations. Specifically, we emphasize the advances in predicting hotspots and mutations for targets involved in cancer, neurodegenerative and infectious diseases as well as in membrane proteins. The computational resources including databases and algorithms understanding/predicting the effect of mutations will be listed. Moreover, limitations of existing methods and possible improvements will be discussed.
Collapse
Affiliation(s)
- M Michael Gromiha
- Department of Biotechnology, Bhupat and Jyoti Mehta School of Biosciences, Indian Institute of Technology Madras, Chennai 600036, India.
| | - Medha Pandey
- Department of Biotechnology, Bhupat and Jyoti Mehta School of Biosciences, Indian Institute of Technology Madras, Chennai 600036, India
| | - A Kulandaisamy
- Department of Biotechnology, Bhupat and Jyoti Mehta School of Biosciences, Indian Institute of Technology Madras, Chennai 600036, India
| | - Divya Sharma
- Department of Biotechnology, Bhupat and Jyoti Mehta School of Biosciences, Indian Institute of Technology Madras, Chennai 600036, India
| | - Fathima Ridha
- Department of Biotechnology, Bhupat and Jyoti Mehta School of Biosciences, Indian Institute of Technology Madras, Chennai 600036, India
| |
Collapse
|
9
|
Zeng R, Li Z, Li J, Zhang Q. DNA promoter task-oriented dictionary mining and prediction model based on natural language technology. Sci Rep 2025; 15:153. [PMID: 39747934 PMCID: PMC11697570 DOI: 10.1038/s41598-024-84105-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2024] [Accepted: 12/19/2024] [Indexed: 01/04/2025] Open
Abstract
Promoters are essential DNA sequences that initiate transcription and regulate gene expression. Precisely identifying promoter sites is crucial for deciphering gene expression patterns and the roles of gene regulatory networks. Recent advancements in bioinformatics have leveraged deep learning and natural language processing (NLP) to enhance promoter prediction accuracy. Techniques such as convolutional neural networks (CNNs), long short-term memory (LSTM) networks, and BERT models have been particularly impactful. However, current approaches often rely on arbitrary DNA sequence segmentation during BERT pre-training, which may not yield optimal results. To overcome this limitation, this article introduces a novel DNA sequence segmentation method. This approach develops a more refined dictionary for DNA sequences, utilizes it for BERT pre-training, and employs an Inception neural network as the foundational model. This BERT-Inception architecture captures information across multiple granularities. Experimental results show that the model improves the performance of several downstream tasks and introduces deep learning interpretability, providing new perspectives for interpreting and understanding DNA sequence information. The detailed source code is available at https://github.com/katouMegumiH/Promoter_BERT .
Collapse
Affiliation(s)
- Ruolei Zeng
- Department of Computer Science and Engineering, University of Minnesota, Minneapolis, MN, 55455, USA
| | - Zihan Li
- National Engineering Research Centre for Agri-Product Quality Traceability, Beijing Technology and Business University, No.11 Fucheng Road, Beijing, 100048, China.
| | - Jialu Li
- National Engineering Research Centre for Agri-Product Quality Traceability, Beijing Technology and Business University, No.11 Fucheng Road, Beijing, 100048, China
| | - Qingchuan Zhang
- National Engineering Research Centre for Agri-Product Quality Traceability, Beijing Technology and Business University, No.11 Fucheng Road, Beijing, 100048, China.
| |
Collapse
|
10
|
Arab A, Kashani B, Cordova-Delgado M, Scott EN, Alemi K, Trueman J, Groeneweg G, Chang WC, Loucks CM, Ross CJD, Carleton BC, Ester M. Machine learning model identifies genetic predictors of cisplatin-induced ototoxicity in CERS6 and TLR4. Comput Biol Med 2024; 183:109324. [PMID: 39488053 DOI: 10.1016/j.compbiomed.2024.109324] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2024] [Revised: 10/20/2024] [Accepted: 10/22/2024] [Indexed: 11/04/2024]
Abstract
BACKGROUND Cisplatin-induced ototoxicity remains a significant concern in pediatric cancer treatment due to its permanent impact on quality of life. Previously, genetic association analyses have been performed to detect genetic variants associated with this adverse reaction. METHODS In this study, a combination of interpretable neural networks and Generative Adversarial Networks (GANs) was employed to identify genetic markers associated with cisplatin-induced ototoxicity. The applied method, BRI-Net, incorporates biological domain knowledge to define the network structure and employs adversarial training to learn an unbiased representation of the data, which is robust to known confounders. Leveraging genomic data from a cohort of 362 cisplatin-treated pediatric cancer patients recruited by the CPNDS (Canadian Pharmacogenomics Network for Drug Safety), this model revealed two statistically significant single nucleotide polymorphisms to be associated with cisplatin-induced ototoxicity. RESULTS Two markers within the CERS6 (rs13022792, p-value: 3 × 10-4) and TLR4 (rs10759932, p-value: 7 × 10-4) genes were associated with this cisplatin-induced adverse reaction. CERS6, a ceramide synthase, contributes to elevated ceramide levels, a known initiator of apoptotic signals in mouse models of inner ear hair cells. TLR4, a pattern-recognition protein, initiates inflammation in response to cisplatin, and reduced TLR4 expression has been shown in murine hair cells to confer protection from ototoxicity. CONCLUSION Overall, these findings provide a foundation for understanding the genetic landscape of cisplatin-induced ototoxicity, with implications for improving patient care and treatment outcomes.
Collapse
Affiliation(s)
- Ali Arab
- School of Computing Science, Simon Fraser University, Burnaby, BC, Canada
| | - Bahareh Kashani
- Department of Experimental Medicine, Faculty of Medicine, University of British Columbia, Vancouver, BC, Canada; BC Children's Hospital Research Institute, Vancouver, BC, Canada
| | | | - Erika N Scott
- BC Children's Hospital Research Institute, Vancouver, BC, Canada; Division of Translational Therapeutics, Department of Pediatrics, Faculty of Medicine, University of British Columbia, Vancouver, BC, Canada
| | - Kaveh Alemi
- School of Computing Science, Simon Fraser University, Burnaby, BC, Canada
| | - Jessica Trueman
- BC Children's Hospital Research Institute, Vancouver, BC, Canada; Division of Translational Therapeutics, Department of Pediatrics, Faculty of Medicine, University of British Columbia, Vancouver, BC, Canada
| | - Gabriella Groeneweg
- Division of Translational Therapeutics, Department of Pediatrics, Faculty of Medicine, University of British Columbia, Vancouver, BC, Canada; Pharmaceutical Outcomes Programme, BC Children's Hospital, Vancouver, BC, Canada
| | - Wan-Chun Chang
- BC Children's Hospital Research Institute, Vancouver, BC, Canada; Division of Translational Therapeutics, Department of Pediatrics, Faculty of Medicine, University of British Columbia, Vancouver, BC, Canada
| | - Catrina M Loucks
- BC Children's Hospital Research Institute, Vancouver, BC, Canada; Division of Translational Therapeutics, Department of Pediatrics, Faculty of Medicine, University of British Columbia, Vancouver, BC, Canada; Department of Anesthesiology, Pharmacology & Therapeutics, Faculty of Medicine, University of British Columbia, Vancouver, BC, Canada
| | - Colin J D Ross
- BC Children's Hospital Research Institute, Vancouver, BC, Canada; Faculty of Pharmaceutical Sciences, University of British Columbia, Vancouver, BC, Canada
| | - Bruce C Carleton
- BC Children's Hospital Research Institute, Vancouver, BC, Canada; Division of Translational Therapeutics, Department of Pediatrics, Faculty of Medicine, University of British Columbia, Vancouver, BC, Canada; Pharmaceutical Outcomes Programme, BC Children's Hospital, Vancouver, BC, Canada.
| | - Martin Ester
- School of Computing Science, Simon Fraser University, Burnaby, BC, Canada
| |
Collapse
|
11
|
Ghose U, Sproviero W, Winchester L, Amin N, Zhu T, Newby D, Ulm BS, Papathanasiou A, Shi L, Liu Q, Fernandes M, Adams C, Albukhari A, Almansouri M, Choudhry H, van Duijn C, Nevado-Holgado A. Genome-wide association neural networks identify genes linked to family history of Alzheimer's disease. Brief Bioinform 2024; 26:bbae704. [PMID: 39775791 PMCID: PMC11707606 DOI: 10.1093/bib/bbae704] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2024] [Revised: 11/29/2024] [Accepted: 12/23/2024] [Indexed: 01/11/2025] Open
Abstract
Augmenting traditional genome-wide association studies (GWAS) with advanced machine learning algorithms can allow the detection of novel signals in available cohorts. We introduce "genome-wide association neural networks (GWANN)" a novel approach that uses neural networks (NNs) to perform a gene-level association study with family history of Alzheimer's disease (AD). In UK Biobank, we defined cases (n = 42 110) as those with AD or family history of AD and sampled an equal number of controls. The data was split into an 80:20 ratio of training and testing samples, and GWANN was trained on the former followed by identifying associated genes using its performance on the latter. Our method identified 18 genes to be associated with family history of AD. APOE, BIN1, SORL1, ADAM10, APH1B, and SPI1 have been identified by previous AD GWAS. Among the 12 new genes, PCDH9, NRG3, ROR1, LINGO2, SMYD3, and LRRC7 have been associated with neurofibrillary tangles or phosphorylated tau in previous studies. Furthermore, there is evidence for differential transcriptomic or proteomic expression between AD and healthy brains for 10 of the 12 new genes. A series of post hoc analyses resulted in a significantly enriched protein-protein interaction network (P-value < 1 × 10-16), and enrichment of relevant disease and biological pathways such as focal adhesion (P-value = 1 × 10-4), extracellular matrix organization (P-value = 1 × 10-4), Hippo signaling (P-value = 7 × 10-4), Alzheimer's disease (P-value = 3 × 10-4), and impaired cognition (P-value = 4 × 10-3). Applying NNs for GWAS illustrates their potential to complement existing algorithms and methods and enable the discovery of new associations without the need to expand existing cohorts.
Collapse
Affiliation(s)
- Upamanyu Ghose
- Department of Psychiatry, University of Oxford, Oxford, United Kingdom
- King Abdulaziz University and the University of Oxford Centre for Artificial Intelligence in Precision Medicine (KO-CAIPM), Jeddah, Saudi Arabia
| | - William Sproviero
- Department of Psychiatry, University of Oxford, Oxford, United Kingdom
| | - Laura Winchester
- Department of Psychiatry, University of Oxford, Oxford, United Kingdom
| | - Najaf Amin
- Nuffield Department of Population Health, University of Oxford, Oxford, United Kingdom
| | - Taiyu Zhu
- Department of Psychiatry, University of Oxford, Oxford, United Kingdom
| | - Danielle Newby
- Department of Psychiatry, University of Oxford, Oxford, United Kingdom
- Centre for Statistics in Medicine, Nuffield Department of Orthopedics, Rheumatology and Musculoskeletal Sciences, University of Oxford, Oxford, United Kingdom
| | - Brittany S Ulm
- King Abdulaziz University and the University of Oxford Centre for Artificial Intelligence in Precision Medicine (KO-CAIPM), Jeddah, Saudi Arabia
- Nuffield Department of Population Health, University of Oxford, Oxford, United Kingdom
| | | | - Liu Shi
- Department of Psychiatry, University of Oxford, Oxford, United Kingdom
- Department of Translational Medicine, Nxera Pharma UK Limited, Cambridge, United Kingdom
| | - Qiang Liu
- Department of Psychiatry, University of Oxford, Oxford, United Kingdom
- School of Engineering Mathematics and Technology University of Bristol, Ada Lovelace Building, Bristol, United Kingdom
| | - Marco Fernandes
- Department of Psychiatry, University of Oxford, Oxford, United Kingdom
- School of Medicine, University of St Andrews, St Andrews, United Kingdom
| | - Cassandra Adams
- King Abdulaziz University and the University of Oxford Centre for Artificial Intelligence in Precision Medicine (KO-CAIPM), Jeddah, Saudi Arabia
- Centre for Medicines Discovery, Nuffield Department of Medicine, University of Oxford, Oxford, United Kingdom
| | - Ashwag Albukhari
- King Abdulaziz University and the University of Oxford Centre for Artificial Intelligence in Precision Medicine (KO-CAIPM), Jeddah, Saudi Arabia
- Biochemistry Department, Faculty of Science, King Abdulaziz University, Jeddah, Saudi Arabia
| | - Majid Almansouri
- King Abdulaziz University and the University of Oxford Centre for Artificial Intelligence in Precision Medicine (KO-CAIPM), Jeddah, Saudi Arabia
- Clinical Biochemistry Department, Faculty of Medicine, King Abdulaziz University, Jeddah, Saudi Arabia
| | - Hani Choudhry
- King Abdulaziz University and the University of Oxford Centre for Artificial Intelligence in Precision Medicine (KO-CAIPM), Jeddah, Saudi Arabia
- Biochemistry Department, Faculty of Science, King Abdulaziz University, Jeddah, Saudi Arabia
| | - Cornelia van Duijn
- King Abdulaziz University and the University of Oxford Centre for Artificial Intelligence in Precision Medicine (KO-CAIPM), Jeddah, Saudi Arabia
- Nuffield Department of Population Health, University of Oxford, Oxford, United Kingdom
| | - Alejo Nevado-Holgado
- Department of Psychiatry, University of Oxford, Oxford, United Kingdom
- King Abdulaziz University and the University of Oxford Centre for Artificial Intelligence in Precision Medicine (KO-CAIPM), Jeddah, Saudi Arabia
| |
Collapse
|
12
|
Islam UI, Campelo dos Santos AL, Kanjilal R, Assis R. Learning genotype-phenotype associations from gaps in multi-species sequence alignments. Brief Bioinform 2024; 26:bbaf022. [PMID: 39976386 PMCID: PMC11840556 DOI: 10.1093/bib/bbaf022] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2024] [Revised: 12/16/2024] [Accepted: 01/08/2025] [Indexed: 02/21/2025] Open
Abstract
Understanding the genetic basis of phenotypic variation is fundamental to biology. Here we introduce GAP, a novel machine learning framework for predicting binary phenotypes from gaps in multi-species sequence alignments. GAP employs a neural network to predict the presence or absence of phenotypes solely from alignment gaps, contrasting with existing tools that require additional and often inaccessible input data. GAP can be applied to three distinct problems: predicting phenotypes in species from known associated genomic regions, pinpointing positions within such regions that are important for predicting phenotypes, and extracting sets of candidate regions associated with phenotypes. We showcase the utility of GAP by exploiting the well-known association between the L-gulonolactone oxidase (Gulo) gene and vitamin C synthesis, demonstrating its perfect prediction accuracy in 34 vertebrates. This exceptional performance also applies more generally, with GAP achieving high accuracy and power on a large simulated dataset. Moreover, predictions of vitamin C synthesis in species with unknown status mirror their phylogenetic relationships, and positions with high predictive importance are consistent with those identified by previous studies. Last, a genome-wide application of GAP identifies many additional genes that may be associated with vitamin C synthesis, and analysis of these candidates uncovers functional enrichment for immunity, a widely recognized role of vitamin C. Hence, GAP represents a simple yet useful tool for predicting genotype-phenotype associations and addressing diverse evolutionary questions from data available in a broad range of study systems.
Collapse
Affiliation(s)
- Uwaise Ibna Islam
- Department of Electrical Engineering and Computer Science, Florida Atlantic University, Boca Raton, FL 33431, United States
| | - Andre Luiz Campelo dos Santos
- Department of Electrical Engineering and Computer Science, Florida Atlantic University, Boca Raton, FL 33431, United States
| | - Ria Kanjilal
- Department of Electrical Engineering and Computer Science, Florida Atlantic University, Boca Raton, FL 33431, United States
| | - Raquel Assis
- Department of Electrical Engineering and Computer Science, Florida Atlantic University, Boca Raton, FL 33431, United States
- Institute for Human Health and Disease Intervention, Florida Atlantic University, Boca Raton, FL 33431, United States
| |
Collapse
|
13
|
He R, Fu J, Ren J, Pan W. Trait imputation enhances nonlinear genetic prediction for some traits. Genetics 2024; 228:iyae148. [PMID: 39255064 PMCID: PMC12098936 DOI: 10.1093/genetics/iyae148] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/10/2024] [Revised: 08/29/2024] [Accepted: 09/03/2024] [Indexed: 09/12/2024] Open
Abstract
The expansive collection of genetic and phenotypic data within biobanks offers an unprecedented opportunity for biomedical research. However, the frequent occurrence of missing phenotypes presents a significant barrier to fully leveraging this potential. In our target application, on one hand, we have only a small and complete dataset with both genotypes and phenotypes to build a genetic prediction model, commonly called a polygenic (risk) score (PGS or PRS); on the other hand, we have a large dataset of genotypes (e.g. from a biobank) without the phenotype of interest. Our goal is to leverage the large dataset of genotypes (but without the phenotype) and a separate genome-wide association studies summary dataset of the phenotype to impute the phenotypes, which are then used as an individual-level dataset, along with the small complete dataset, to build a nonlinear model as PGS. More specifically, we trained some nonlinear models to 7 imputed and observed phenotypes from the UK Biobank data. We then trained an ensemble model to integrate these models for each trait, resulting in higher R2 values in prediction than using only the small complete (observed) dataset. Additionally, for 2 of the 7 traits, we observed that the nonlinear model trained with the imputed traits had higher R2 than using the imputed traits directly as the PGS, while for the remaining 5 traits, no improvement was found. These findings demonstrate the potential of leveraging existing genetic data and accounting for nonlinear genetic relationships to improve prediction accuracy for some traits.
Collapse
Affiliation(s)
- Ruoyu He
- Division of Biostatistics and Health Data Science, School of Public Health, University of Minnesota, Minneapolis, MN 55414, USA
- School of Statistics, University of Minnesota, Minneapolis, MN 55414, USA
| | - Jinwen Fu
- Division of Biostatistics and Health Data Science, School of Public Health, University of Minnesota, Minneapolis, MN 55414, USA
- School of Statistics, University of Minnesota, Minneapolis, MN 55414, USA
| | - Jingchen Ren
- Division of Biostatistics and Health Data Science, School of Public Health, University of Minnesota, Minneapolis, MN 55414, USA
- School of Statistics, University of Minnesota, Minneapolis, MN 55414, USA
| | - Wei Pan
- Division of Biostatistics and Health Data Science, School of Public Health, University of Minnesota, Minneapolis, MN 55414, USA
| |
Collapse
|
14
|
Jo Y, Webster MJ, Kim S, Lee D. Interpretation of SNP combination effects on schizophrenia etiology based on stepwise deep learning with multi-precision data. Brief Funct Genomics 2024; 23:663-671. [PMID: 37738675 PMCID: PMC11428150 DOI: 10.1093/bfgp/elad041] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2023] [Revised: 07/17/2023] [Accepted: 08/22/2023] [Indexed: 09/24/2023] Open
Abstract
Schizophrenia genome-wide association studies (GWAS) have reported many genomic risk loci, but it is unclear how they affect schizophrenia susceptibility through interactions of multiple SNPs. We propose a stepwise deep learning technique with multi-precision data (SLEM) to explore the SNP combination effects on schizophrenia through intermediate molecular and cellular functions. The SLEM technique utilizes two levels of precision data for learning. It constructs initial backbone networks with more precise but small amount of multilevel assay data. Then, it learns strengths of intermediate interactions with the less precise but massive amount of GWAS data. The learned networks facilitate identifying effective SNP interactions from the intractably large space of all possible SNP combinations. We have shown that the extracted SNP combinations show higher accuracy than any single SNPs and preserve the accuracy in an independent dataset. The learned networks also provide interpretations of molecular and cellular interactions of SNP combinations toward schizophrenia etiology.
Collapse
Affiliation(s)
- Yousang Jo
- Department of Bio and Brain Engineering, KAIST, Daejeon, South Korea
| | - Maree J Webster
- Brain Research Laboratory, Stanley Medical Research Institute, Rockville, MD, USA
| | - Sanghyeon Kim
- Brain Research Laboratory, Stanley Medical Research Institute, Rockville, MD, USA
| | - Doheon Lee
- Department of Bio and Brain Engineering, KAIST, Daejeon, South Korea
| |
Collapse
|
15
|
Xie Z, Weng L, He J, Feng X, Xu X, Ma Y, Bai P, Kong Q. PNNGS, a multi-convolutional parallel neural network for genomic selection. FRONTIERS IN PLANT SCIENCE 2024; 15:1410596. [PMID: 39290743 PMCID: PMC11405342 DOI: 10.3389/fpls.2024.1410596] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 04/01/2024] [Accepted: 08/19/2024] [Indexed: 09/19/2024]
Abstract
Genomic selection (GS) can accomplish breeding faster than phenotypic selection. Improving prediction accuracy is the key to promoting GS. To improve the GS prediction accuracy and stability, we introduce parallel convolution to deep learning for GS and call it a parallel neural network for genomic selection (PNNGS). In PNNGS, information passes through convolutions of different kernel sizes in parallel. The convolutions in each branch are connected with residuals. Four different Lp loss functions train PNNGS. Through experiments, the optimal number of parallel paths for rice, sunflower, wheat, and maize is found to be 4, 6, 4, and 3, respectively. Phenotype prediction is performed on 24 cases through ridge-regression best linear unbiased prediction (RRBLUP), random forests (RF), support vector regression (SVR), deep neural network genomic prediction (DNNGP), and PNNGS. Serial DNNGP and parallel PNNGS outperform the other three algorithms. On average, PNNGS prediction accuracy is 0.031 larger than DNNGP prediction accuracy, indicating that parallelism can improve the GS model. Plants are divided into clusters through principal component analysis (PCA) and K-means clustering algorithms. The sample sizes of different clusters vary greatly, indicating that this is unbalanced data. Through stratified sampling, the prediction stability and accuracy of PNNGS are improved. When the training samples are reduced in small clusters, the prediction accuracy of PNNGS decreases significantly. Increasing the sample size of small clusters is critical to improving the prediction accuracy of GS.
Collapse
Affiliation(s)
- Zhengchao Xie
- Research Center for Life Sciences Computing, Zhejiang Laboratory, Hangzhou, China
| | - Lin Weng
- Research Center for Life Sciences Computing, Zhejiang Laboratory, Hangzhou, China
| | - Jingjing He
- Research Center for Life Sciences Computing, Zhejiang Laboratory, Hangzhou, China
| | - Xianzhong Feng
- Key Laboratory of Soybean Molecular Design Breeding, Northeast Institute of Geography and Agroecology, Chinese Academy of Sciences, Changchun, China
| | - Xiaogang Xu
- School of Computer Science and Technology, Zhejiang Gongshang University, Hangzhou, China
| | - Yinxing Ma
- Research Center for Life Sciences Computing, Zhejiang Laboratory, Hangzhou, China
| | - Panpan Bai
- Research Center for Life Sciences Computing, Zhejiang Laboratory, Hangzhou, China
| | - Qihui Kong
- Research Center for Life Sciences Computing, Zhejiang Laboratory, Hangzhou, China
| |
Collapse
|
16
|
van Hilten A, van Rooij J, Ikram MA, Niessen WJ, van Meurs JBJ, Roshchupkin GV. Phenotype prediction using biologically interpretable neural networks on multi-cohort multi-omics data. NPJ Syst Biol Appl 2024; 10:81. [PMID: 39095438 PMCID: PMC11297229 DOI: 10.1038/s41540-024-00405-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/14/2023] [Accepted: 07/12/2024] [Indexed: 08/04/2024] Open
Abstract
Integrating multi-omics data into predictive models has the potential to enhance accuracy, which is essential for precision medicine. In this study, we developed interpretable predictive models for multi-omics data by employing neural networks informed by prior biological knowledge, referred to as visible networks. These neural networks offer insights into the decision-making process and can unveil novel perspectives on the underlying biological mechanisms associated with traits and complex diseases. We tested the performance, interpretability and generalizability for inferring smoking status, subject age and LDL levels using genome-wide RNA expression and CpG methylation data from the blood of the BIOS consortium (four population cohorts, Ntotal = 2940). In a cohort-wise cross-validation setting, the consistency of the diagnostic performance and interpretation was assessed. Performance was consistently high for predicting smoking status with an overall mean AUC of 0.95 (95% CI: 0.90-1.00) and interpretation revealed the involvement of well-replicated genes such as AHRR, GPR15 and LRRN3. LDL-level predictions were only generalized in a single cohort with an R2 of 0.07 (95% CI: 0.05-0.08). Age was inferred with a mean error of 5.16 (95% CI: 3.97-6.35) years with the genes COL11A2, AFAP1, OTUD7A, PTPRN2, ADARB2 and CD34 consistently predictive. For both regression tasks, we found that using multi-omics networks improved performance, stability and generalizability compared to interpretable single omic networks. We believe that visible neural networks have great potential for multi-omics analysis; they combine multi-omic data elegantly, are interpretable, and generalize well to data from different cohorts.
Collapse
Affiliation(s)
- Arno van Hilten
- Department of Radiology and Nuclear Medicine, Erasmus MC, Rotterdam, The Netherlands.
| | - Jeroen van Rooij
- Department of Internal Medicine, Erasmus MC, Rotterdam, The Netherlands
| | - M Arfan Ikram
- Department of Imaging Physics, Delft University of Technology, Delft, The Netherlands
| | - Wiro J Niessen
- Department of Radiology and Nuclear Medicine, Erasmus MC, Rotterdam, The Netherlands
- Department of Imaging Physics, Delft University of Technology, Delft, The Netherlands
| | - Joyce B J van Meurs
- Department of Internal Medicine, Erasmus MC, Rotterdam, The Netherlands
- Department of Orthopaedics and Sports Medicine, Erasmus MC, Rotterdam, The Netherlands
| | - Gennady V Roshchupkin
- Department of Radiology and Nuclear Medicine, Erasmus MC, Rotterdam, The Netherlands
| |
Collapse
|
17
|
van Hilten A, Katz S, Saccenti E, Niessen WJ, Roshchupkin GV. Designing interpretable deep learning applications for functional genomics: a quantitative analysis. Brief Bioinform 2024; 25:bbae449. [PMID: 39293804 PMCID: PMC11410376 DOI: 10.1093/bib/bbae449] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2024] [Revised: 08/07/2024] [Accepted: 08/28/2024] [Indexed: 09/20/2024] Open
Abstract
Deep learning applications have had a profound impact on many scientific fields, including functional genomics. Deep learning models can learn complex interactions between and within omics data; however, interpreting and explaining these models can be challenging. Interpretability is essential not only to help progress our understanding of the biological mechanisms underlying traits and diseases but also for establishing trust in these model's efficacy for healthcare applications. Recognizing this importance, recent years have seen the development of numerous diverse interpretability strategies, making it increasingly difficult to navigate the field. In this review, we present a quantitative analysis of the challenges arising when designing interpretable deep learning solutions in functional genomics. We explore design choices related to the characteristics of genomics data, the neural network architectures applied, and strategies for interpretation. By quantifying the current state of the field with a predefined set of criteria, we find the most frequent solutions, highlight exceptional examples, and identify unexplored opportunities for developing interpretable deep learning models in genomics.
Collapse
Affiliation(s)
- Arno van Hilten
- Department of Radiology and Nuclear Medicine, Erasmus MC, 3015 GD Rotterdam, The Netherlands
| | - Sonja Katz
- Department of Radiology and Nuclear Medicine, Erasmus MC, 3015 GD Rotterdam, The Netherlands
- Laboratory of Systems and Synthetic Biology, Wageningen University & Research, 6700 HB Wageningen WE, The Netherlands
| | - Edoardo Saccenti
- Laboratory of Systems and Synthetic Biology, Wageningen University & Research, 6700 HB Wageningen WE, The Netherlands
| | - Wiro J Niessen
- Department of Imaging Physics, Delft University of Technology, 2628 CD Delft, The Netherlands
| | - Gennady V Roshchupkin
- Department of Radiology and Nuclear Medicine, Erasmus MC, 3015 GD Rotterdam, The Netherlands
- Department of Epidemiology, Erasmus MC, 3015 GD Rotterdam, The Netherlands
| |
Collapse
|
18
|
Ding X, Zhang L, Fan M, Li L. TME-NET: an interpretable deep neural network for predicting pan-cancer immune checkpoint inhibitor responses. Brief Bioinform 2024; 25:bbae410. [PMID: 39167797 PMCID: PMC11337220 DOI: 10.1093/bib/bbae410] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2024] [Revised: 07/17/2024] [Accepted: 08/02/2024] [Indexed: 08/23/2024] Open
Abstract
Immunotherapy with immune checkpoint inhibitors (ICIs) is increasingly used to treat various tumor types. Determining patient responses to ICIs presents a significant clinical challenge. Although components of the tumor microenvironment (TME) are used to predict patient outcomes, comprehensive assessments of the TME are frequently overlooked. Using a top-down approach, the TME was divided into five layers-outcome, immune role, cell, cellular component, and gene. Using this structure, a neural network called TME-NET was developed to predict responses to ICIs. Model parameter weights and cell ablation studies were used to investigate the influence of TME components. The model was developed and evaluated using a pan-cancer cohort of 948 patients across four cancer types, with Area Under the Curve (AUC) and accuracy as performance metrics. Results show that TME-NET surpasses established models such as support vector machine and k-nearest neighbors in AUC and accuracy. Visualization of model parameter weights showed that at the cellular layer, Th1 cells enhance immune responses, whereas myeloid-derived suppressor cells and M2 macrophages show strong immunosuppressive effects. Cell ablation studies further confirmed the impact of these cells. At the gene layer, the transcription factors STAT4 in Th1 cells and IRF4 in M2 macrophages significantly affect TME dynamics. Additionally, the cytokine-encoding genes IFNG from Th1 cells and ARG1 from M2 macrophages are crucial for modulating immune responses within the TME. Survival data from immunotherapy cohorts confirmed the prognostic ability of these markers, with p-values <0.01. In summary, TME-NET performs well in predicting immunotherapy responses and offers interpretable insights into the immunotherapy process. It can be customized at https://immbal.shinyapps.io/TME-NET.
Collapse
Affiliation(s)
- Xiaobao Ding
- Institute of Biomedical Engineering and Instrumentation, Hangzhou Dianzi University, Hangzhou 310018, Zhejiang, China
- Institute of Big Data and Artificial Intelligence in Medicine, School of Electronics and Information Engineering, Taizhou University, Taizhou 318000, Zhejiang, China
- School of Computer Science and Technology, Hangzhou Dianzi University, Hangzhou, 310018, China
| | - Lin Zhang
- Institute of Biomedical Engineering and Instrumentation, Hangzhou Dianzi University, Hangzhou 310018, Zhejiang, China
| | - Ming Fan
- Institute of Biomedical Engineering and Instrumentation, Hangzhou Dianzi University, Hangzhou 310018, Zhejiang, China
| | - Lihua Li
- Institute of Biomedical Engineering and Instrumentation, Hangzhou Dianzi University, Hangzhou 310018, Zhejiang, China
- School of Computer Science and Technology, Hangzhou Dianzi University, Hangzhou, 310018, China
| |
Collapse
|
19
|
Wang X, Li F, Zhang Y, Imoto S, Shen HH, Li S, Guo Y, Yang J, Song J. Deep learning approaches for non-coding genetic variant effect prediction: current progress and future prospects. Brief Bioinform 2024; 25:bbae446. [PMID: 39276327 PMCID: PMC11401448 DOI: 10.1093/bib/bbae446] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2024] [Revised: 08/08/2024] [Accepted: 08/27/2024] [Indexed: 09/16/2024] Open
Abstract
Recent advancements in high-throughput sequencing technologies have significantly enhanced our ability to unravel the intricacies of gene regulatory processes. A critical challenge in this endeavor is the identification of variant effects, a key factor in comprehending the mechanisms underlying gene regulation. Non-coding variants, constituting over 90% of all variants, have garnered increasing attention in recent years. The exploration of gene variant impacts and regulatory mechanisms has spurred the development of various deep learning approaches, providing new insights into the global regulatory landscape through the analysis of extensive genetic data. Here, we provide a comprehensive overview of the development of the non-coding variants models based on bulk and single-cell sequencing data and their model-based interpretation and downstream tasks. This review delineates the popular sequencing technologies for epigenetic profiling and deep learning approaches for discerning the effects of non-coding variants. Additionally, we summarize the limitations of current approaches in variant effect prediction research and outline opportunities for improvement. We anticipate that our study will offer a practical and useful guide for the bioinformatic community to further advance the unraveling of genetic variant effects.
Collapse
Affiliation(s)
- Xiaoyu Wang
- Monash Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Melbourne, VIC 3800, Australia
- Monash Data Futures Institute, Monash University, Melbourne, VIC 3800, Australia
| | - Fuyi Li
- South Australian immunoGENomics Cancer Institute (SAiGENCI), Faculty of Health and Medical Sciences, The University of Adelaide, Adelaide, SA 5005, Australia
| | - Yiwen Zhang
- School of Public Health and Preventive Medicine, Monash University, Melbourne, VIC 3004, Australia
| | - Seiya Imoto
- Genome Center, Institute of Medical Science, The University of Tokyo, Minato-ku, Tokyo 108-8639, Japan
- Collaborative Research Institute for Innovative Microbiology, The University of Tokyo, Bunkyo-ku, Tokyo 113-8657, Japan
| | - Hsin-Hui Shen
- Department of Materials Science and Engineering, Faculty of Engineering, Monash University, Clayton, VIC 3800, Australia
| | - Shanshan Li
- School of Public Health and Preventive Medicine, Monash University, Melbourne, VIC 3004, Australia
| | - Yuming Guo
- School of Public Health and Preventive Medicine, Monash University, Melbourne, VIC 3004, Australia
| | - Jian Yang
- School of Life Sciences, Westlake University, Hangzhou, Zhejiang 310030, China
- Westlake Laboratory of Life Sciences and Biomedicine, Hangzhou, Zhejiang 310024, China
| | - Jiangning Song
- Monash Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Melbourne, VIC 3800, Australia
- Monash Data Futures Institute, Monash University, Melbourne, VIC 3800, Australia
| |
Collapse
|
20
|
Martins D, Abbasi M, Egas C, Arrais JP. Detecting outliers in case-control cohorts for improving deep learning networks on Schizophrenia prediction. J Integr Bioinform 2024; 21:jib-2023-0042. [PMID: 39004922 PMCID: PMC11377398 DOI: 10.1515/jib-2023-0042] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2023] [Accepted: 06/06/2024] [Indexed: 07/16/2024] Open
Abstract
This study delves into the intricate genetic and clinical aspects of Schizophrenia, a complex mental disorder with uncertain etiology. Deep Learning (DL) holds promise for analyzing large genomic datasets to uncover new risk factors. However, based on reports of non-negligible misdiagnosis rates for SCZ, case-control cohorts may contain outlying genetic profiles, hindering compelling performances of classification models. The research employed a case-control dataset sourced from the Swedish populace. A gene-annotation-based DL architecture was developed and employed in two stages. First, the model was trained on the entire dataset to highlight differences between cases and controls. Then, samples likely to be misclassified were excluded, and the model was retrained on the refined dataset for performance evaluation. The results indicate that SCZ prevalence and misdiagnosis rates can affect case-control cohorts, potentially compromising future studies reliant on such datasets. However, by detecting and filtering outliers, the study demonstrates the feasibility of adapting DL methodologies to large-scale biological problems, producing results more aligned with existing heritability estimates for SCZ. This approach not only advances the comprehension of the genetic background of SCZ but also opens doors for adapting DL techniques in complex research for precision medicine in mental health.
Collapse
Affiliation(s)
- Daniel Martins
- Centre for Informatics and Systems, Department of Informatics Engineering, University of Coimbra, Coimbra, Portugal
- Centre for Innovative Biomedicine and Biotechnology, University of Coimbra, Coimbra, Portugal
| | - Maryam Abbasi
- Polytechnic Institute of Coimbra, Applied Research Institute, Coimbra, Portugal
- Research Centre for Natural Resources Environment and Society, Polytechnic Institute of Coimbra, Coimbra, Portugal
| | - Conceição Egas
- Centre for Innovative Biomedicine and Biotechnology, University of Coimbra, Coimbra, Portugal
- Biocant - Transfer Technology Association, Cantanhede, Portugal
| | - Joel P Arrais
- Centre for Informatics and Systems, Department of Informatics Engineering, University of Coimbra, Coimbra, Portugal
| |
Collapse
|
21
|
Huang X, Rymbekova A, Dolgova O, Lao O, Kuhlwilm M. Harnessing deep learning for population genetic inference. Nat Rev Genet 2024; 25:61-78. [PMID: 37666948 DOI: 10.1038/s41576-023-00636-3] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 07/11/2023] [Indexed: 09/06/2023]
Abstract
In population genetics, the emergence of large-scale genomic data for various species and populations has provided new opportunities to understand the evolutionary forces that drive genetic diversity using statistical inference. However, the era of population genomics presents new challenges in analysing the massive amounts of genomes and variants. Deep learning has demonstrated state-of-the-art performance for numerous applications involving large-scale data. Recently, deep learning approaches have gained popularity in population genetics; facilitated by the advent of massive genomic data sets, powerful computational hardware and complex deep learning architectures, they have been used to identify population structure, infer demographic history and investigate natural selection. Here, we introduce common deep learning architectures and provide comprehensive guidelines for implementing deep learning models for population genetic inference. We also discuss current challenges and future directions for applying deep learning in population genetics, focusing on efficiency, robustness and interpretability.
Collapse
Affiliation(s)
- Xin Huang
- Department of Evolutionary Anthropology, University of Vienna, Vienna, Austria.
- Human Evolution and Archaeological Sciences (HEAS), University of Vienna, Vienna, Austria.
| | - Aigerim Rymbekova
- Department of Evolutionary Anthropology, University of Vienna, Vienna, Austria
- Human Evolution and Archaeological Sciences (HEAS), University of Vienna, Vienna, Austria
| | - Olga Dolgova
- Integrative Genomics Laboratory, CIC bioGUNE - Centro de Investigación Cooperativa en Biociencias, Derio, Biscaya, Spain
| | - Oscar Lao
- Institute of Evolutionary Biology, CSIC-Universitat Pompeu Fabra, Barcelona, Spain.
| | - Martin Kuhlwilm
- Department of Evolutionary Anthropology, University of Vienna, Vienna, Austria.
- Human Evolution and Archaeological Sciences (HEAS), University of Vienna, Vienna, Austria.
| |
Collapse
|
22
|
Sigala RE, Lagou V, Shmeliov A, Atito S, Kouchaki S, Awais M, Prokopenko I, Mahdi A, Demirkan A. Machine Learning to Advance Human Genome-Wide Association Studies. Genes (Basel) 2023; 15:34. [PMID: 38254924 PMCID: PMC10815885 DOI: 10.3390/genes15010034] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2023] [Revised: 12/19/2023] [Accepted: 12/22/2023] [Indexed: 01/24/2024] Open
Abstract
Machine learning, including deep learning, reinforcement learning, and generative artificial intelligence are revolutionising every area of our lives when data are made available. With the help of these methods, we can decipher information from larger datasets while addressing the complex nature of biological systems in a more efficient way. Although machine learning methods have been introduced to human genetic epidemiological research as early as 2004, those were never used to their full capacity. In this review, we outline some of the main applications of machine learning to assigning human genetic loci to health outcomes. We summarise widely used methods and discuss their advantages and challenges. We also identify several tools, such as Combi, GenNet, and GMSTool, specifically designed to integrate these methods for hypothesis-free analysis of genetic variation data. We elaborate on the additional value and limitations of these tools from a geneticist's perspective. Finally, we discuss the fast-moving field of foundation models and large multi-modal omics biobank initiatives.
Collapse
Affiliation(s)
- Rafaella E. Sigala
- Section of Statistical Multi-Omics, Department of Clinical and Experimental Medicine, Guildford GU2 7XH, Surrey, UK; (R.E.S.); (V.L.); (A.S.); (I.P.)
| | - Vasiliki Lagou
- Section of Statistical Multi-Omics, Department of Clinical and Experimental Medicine, Guildford GU2 7XH, Surrey, UK; (R.E.S.); (V.L.); (A.S.); (I.P.)
| | - Aleksey Shmeliov
- Section of Statistical Multi-Omics, Department of Clinical and Experimental Medicine, Guildford GU2 7XH, Surrey, UK; (R.E.S.); (V.L.); (A.S.); (I.P.)
| | - Sara Atito
- Surrey Institute for People-Centred Artificial Intelligence, University of Surrey, Guildford GU2 7XH, Surrey, UK; (S.A.); (S.K.); (M.A.)
- Centre for Vision, Speech and Signal Processing, University of Surrey, Guildford GU2 7XH, Surrey, UK
| | - Samaneh Kouchaki
- Surrey Institute for People-Centred Artificial Intelligence, University of Surrey, Guildford GU2 7XH, Surrey, UK; (S.A.); (S.K.); (M.A.)
- Centre for Vision, Speech and Signal Processing, University of Surrey, Guildford GU2 7XH, Surrey, UK
| | - Muhammad Awais
- Surrey Institute for People-Centred Artificial Intelligence, University of Surrey, Guildford GU2 7XH, Surrey, UK; (S.A.); (S.K.); (M.A.)
- Centre for Vision, Speech and Signal Processing, University of Surrey, Guildford GU2 7XH, Surrey, UK
| | - Inga Prokopenko
- Section of Statistical Multi-Omics, Department of Clinical and Experimental Medicine, Guildford GU2 7XH, Surrey, UK; (R.E.S.); (V.L.); (A.S.); (I.P.)
- Surrey Institute for People-Centred Artificial Intelligence, University of Surrey, Guildford GU2 7XH, Surrey, UK; (S.A.); (S.K.); (M.A.)
| | - Adam Mahdi
- Oxford Internet Institute, University of Oxford, Oxford OX1 3JS, Oxfordshire, UK;
| | - Ayse Demirkan
- Section of Statistical Multi-Omics, Department of Clinical and Experimental Medicine, Guildford GU2 7XH, Surrey, UK; (R.E.S.); (V.L.); (A.S.); (I.P.)
- Surrey Institute for People-Centred Artificial Intelligence, University of Surrey, Guildford GU2 7XH, Surrey, UK; (S.A.); (S.K.); (M.A.)
| |
Collapse
|
23
|
Chen SF, Loguercio S, Chen KY, Lee SE, Park JB, Liu S, Sadaei HJ, Torkamani A. Artificial Intelligence for Risk Assessment on Primary Prevention of Coronary Artery Disease. CURRENT CARDIOVASCULAR RISK REPORTS 2023; 17:215-231. [DOI: 10.1007/s12170-023-00731-4] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 10/09/2023] [Indexed: 01/04/2025]
Abstract
Abstract
Purpose of Review
Coronary artery disease (CAD) is a common and etiologically complex disease worldwide. Current guidelines for primary prevention, or the prevention of a first acute event, include relatively simple risk assessment and leave substantial room for improvement both for risk ascertainment and selection of prevention strategies. Here, we review how advances in big data and predictive modeling foreshadow a promising future of improved risk assessment and precision medicine for CAD.
Recent Findings
Artificial intelligence (AI) has improved the utility of high dimensional data, providing an opportunity to better understand the interplay between numerous CAD risk factors. Beyond applications of AI in cardiac imaging, the vanguard application of AI in healthcare, recent translational research is also revealing a promising path for AI in multi-modal risk prediction using standard biomarkers, genetic and other omics technologies, a variety of biosensors, and unstructured data from electronic health records (EHRs). However, gaps remain in clinical validation of AI models, most notably in the actionability of complex risk prediction for more precise therapeutic interventions.
Summary
The recent availability of nation-scale biobank datasets has provided a tremendous opportunity to richly characterize longitudinal health trajectories using health data collected at home, at laboratories, and through clinic visits. The ever-growing availability of deep genotype-phenotype data is poised to drive a transition from simple risk prediction algorithms to complex, “data-hungry,” AI models in clinical decision-making. While AI models provide the means to incorporate essentially all risk factors into comprehensive risk prediction frameworks, there remains a need to wrap these predictions in interpretable frameworks that map to our understanding of underlying biological mechanisms and associated personalized intervention. This review explores recent advances in the role of machine learning and AI in CAD primary prevention and highlights current strengths as well as limitations mediating potential future applications.
Collapse
|
24
|
Wang Q, Zhang J, Liu Z, Duan Y, Li C. Integrative approaches based on genomic techniques in the functional studies on enhancers. Brief Bioinform 2023; 25:bbad442. [PMID: 38048082 PMCID: PMC10694556 DOI: 10.1093/bib/bbad442] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2023] [Revised: 10/22/2023] [Accepted: 11/08/2023] [Indexed: 12/05/2023] Open
Abstract
With the development of sequencing technology and the dramatic drop in sequencing cost, the functions of noncoding genes are being characterized in a wide variety of fields (e.g. biomedicine). Enhancers are noncoding DNA elements with vital transcription regulation functions. Tens of thousands of enhancers have been identified in the human genome; however, the location, function, target genes and regulatory mechanisms of most enhancers have not been elucidated thus far. As high-throughput sequencing techniques have leapt forwards, omics approaches have been extensively employed in enhancer research. Multidimensional genomic data integration enables the full exploration of the data and provides novel perspectives for screening, identification and characterization of the function and regulatory mechanisms of unknown enhancers. However, multidimensional genomic data are still difficult to integrate genome wide due to complex varieties, massive amounts, high rarity, etc. To facilitate the appropriate methods for studying enhancers with high efficacy, we delineate the principles, data processing modes and progress of various omics approaches to study enhancers and summarize the applications of traditional machine learning and deep learning in multi-omics integration in the enhancer field. In addition, the challenges encountered during the integration of multiple omics data are addressed. Overall, this review provides a comprehensive foundation for enhancer analysis.
Collapse
Affiliation(s)
- Qilin Wang
- School of Engineering Medicine, Beihang University, Beijing 100191, China
- School of Biological Science and Medical Engineering, Beihang University, Beijing 100191, China
| | - Junyou Zhang
- School of Engineering Medicine, Beihang University, Beijing 100191, China
- School of Biological Science and Medical Engineering, Beihang University, Beijing 100191, China
| | - Zhaoshuo Liu
- School of Engineering Medicine, Beihang University, Beijing 100191, China
- School of Biological Science and Medical Engineering, Beihang University, Beijing 100191, China
| | - Yingying Duan
- School of Engineering Medicine, Beihang University, Beijing 100191, China
- School of Biological Science and Medical Engineering, Beihang University, Beijing 100191, China
| | - Chunyan Li
- School of Engineering Medicine, Beihang University, Beijing 100191, China
- School of Biological Science and Medical Engineering, Beihang University, Beijing 100191, China
- Key Laboratory of Big Data-Based Precision Medicine (Ministry of Industry and Information Technology), Beihang University, Beijing 100191, China
- Beijing Advanced Innovation Center for Big Data-Based Precision Medicine, Beihang University, Beijing 100191, China
| |
Collapse
|
25
|
Li P, Wei J, Zhu Y. CellGO: a novel deep learning-based framework and webserver for cell-type-specific gene function interpretation. Brief Bioinform 2023; 25:bbad417. [PMID: 37995133 PMCID: PMC10790717 DOI: 10.1093/bib/bbad417] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/07/2023] [Revised: 10/09/2023] [Accepted: 10/29/2023] [Indexed: 11/25/2023] Open
Abstract
Interpreting the function of genes and gene sets identified from omics experiments remains a challenge, as current pathway analysis tools often fail to consider the critical biological context, such as tissue or cell-type specificity. To address this limitation, we introduced CellGO. CellGO tackles this challenge by leveraging the visible neural network (VNN) and single-cell gene expressions to mimic cell-type-specific signaling propagation along the Gene Ontology tree within a cell. This design enables a novel scoring system to calculate the cell-type-specific gene-pathway paired active scores, based on which, CellGO is able to identify cell-type-specific active pathways associated with single genes. In addition, by aggregating the activities of single genes, CellGO extends its capability to identify cell-type-specific active pathways for a given gene set. To enhance biological interpretation, CellGO offers additional features, including the identification of significantly active cell types and driver genes and community analysis of pathways. To validate its performance, CellGO was assessed using a gene set comprising mixed cell-type markers, confirming its ability to discern active pathways across distinct cell types. Subsequent benchmarking analyses demonstrated CellGO's superiority in effectively identifying cell types and their corresponding cell-type-specific pathways affected by gene knockouts, using either single genes or sets of genes differentially expressed between knockout and control samples. Moreover, CellGO demonstrated its ability to infer cell-type-specific pathogenesis for disease risk genes. Accessible as a Python package, CellGO also provides a user-friendly web interface, making it a versatile and accessible tool for researchers in the field.
Collapse
Affiliation(s)
- Peilong Li
- State Key Laboratory of Medical Neurobiology, MOE Frontiers Center for Brain Science, Institutes of Brain Science and Department of Neurosurgery, Huashan Hospital, Fudan University, Shanghai 200032, China
| | - Junfeng Wei
- State Key Laboratory of Medical Neurobiology, MOE Frontiers Center for Brain Science, Institutes of Brain Science and Department of Neurosurgery, Huashan Hospital, Fudan University, Shanghai 200032, China
| | - Ying Zhu
- State Key Laboratory of Medical Neurobiology, MOE Frontiers Center for Brain Science, Institutes of Brain Science and Department of Neurosurgery, Huashan Hospital, Fudan University, Shanghai 200032, China
| |
Collapse
|
26
|
Kaynar G, Cakmakci D, Bund C, Todeschi J, Namer IJ, Cicek AE. PiDeeL: metabolic pathway-informed deep learning model for survival analysis and pathological classification of gliomas. Bioinformatics 2023; 39:btad684. [PMID: 37952175 PMCID: PMC10663986 DOI: 10.1093/bioinformatics/btad684] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2023] [Revised: 10/19/2023] [Accepted: 11/10/2023] [Indexed: 11/14/2023] Open
Abstract
MOTIVATION Online assessment of tumor characteristics during surgery is important and has the potential to establish an intra-operative surgeon feedback mechanism. With the availability of such feedback, surgeons could decide to be more liberal or conservative regarding the resection of the tumor. While there are methods to perform metabolomics-based tumor pathology prediction, their model complexity predictive performance is limited by the small dataset sizes. Furthermore, the information conveyed by the feedback provided on the tumor tissue could be improved both in terms of content and accuracy. RESULTS In this study, we propose a metabolic pathway-informed deep learning model (PiDeeL) to perform survival analysis and pathology assessment based on metabolite concentrations. We show that incorporating pathway information into the model architecture substantially reduces parameter complexity and achieves better survival analysis and pathological classification performance. With these design decisions, we show that PiDeeL improves tumor pathology prediction performance of the state-of-the-art in terms of the Area Under the ROC Curve by 3.38% and the Area Under the Precision-Recall Curve by 4.06%. Similarly, with respect to the time-dependent concordance index (c-index), PiDeeL achieves better survival analysis performance (improvement of 4.3%) when compared to the state-of-the-art. Moreover, we show that importance analyses performed on input metabolite features as well as pathway-specific neurons of PiDeeL provide insights into tumor metabolism. We foresee that the use of this model in the surgery room will help surgeons adjust the surgery plan on the fly and will result in better prognosis estimates tailored to surgical procedures. AVAILABILITY AND IMPLEMENTATION The code is released at https://github.com/ciceklab/PiDeeL. The data used in this study are released at https://zenodo.org/record/7228791.
Collapse
Affiliation(s)
- Gun Kaynar
- Computer Engineering Department, Bilkent University, 06800 Ankara, Turkey
| | - Doruk Cakmakci
- School of Computer Science, McGill University, Montreal, QC, H3A 0E9, Canada
| | - Caroline Bund
- MNMS Platform, University Hospitals of Strasbourg, Strasbourg 67098, France
- ICube, University of Strasbourg, CNRS UMR, 7357, Strasbourg 67000, France
- Department of Nuclear Medicine and Molecular Imaging, ICANS, Strasbourg 67000, France
| | - Julien Todeschi
- Department of Neurosurgery, University Hospitals of Strasbourg, Strasbourg, 67091, France
| | - Izzie Jacques Namer
- MNMS Platform, University Hospitals of Strasbourg, Strasbourg 67098, France
- ICube, University of Strasbourg, CNRS UMR, 7357, Strasbourg 67000, France
- Department of Nuclear Medicine and Molecular Imaging, ICANS, Strasbourg 67000, France
| | - A Ercument Cicek
- Computer Engineering Department, Bilkent University, 06800 Ankara, Turkey
- Computational Biology Department, Carnegie Mellon University, Pittsburgh, PA 15213, United States
| |
Collapse
|
27
|
Esser-Skala W, Fortelny N. Reliable interpretability of biology-inspired deep neural networks. NPJ Syst Biol Appl 2023; 9:50. [PMID: 37816807 PMCID: PMC10564878 DOI: 10.1038/s41540-023-00310-8] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2023] [Accepted: 09/15/2023] [Indexed: 10/12/2023] Open
Abstract
Deep neural networks display impressive performance but suffer from limited interpretability. Biology-inspired deep learning, where the architecture of the computational graph is based on biological knowledge, enables unique interpretability where real-world concepts are encoded in hidden nodes, which can be ranked by importance and thereby interpreted. In such models trained on single-cell transcriptomes, we previously demonstrated that node-level interpretations lack robustness upon repeated training and are influenced by biases in biological knowledge. Similar studies are missing for related models. Here, we test and extend our methodology for reliable interpretability in P-NET, a biology-inspired model trained on patient mutation data. We observe variability of interpretations and susceptibility to knowledge biases, and identify the network properties that drive interpretation biases. We further present an approach to control the robustness and biases of interpretations, which leads to more specific interpretations. In summary, our study reveals the broad importance of methods to ensure robust and bias-aware interpretability in biology-inspired deep learning.
Collapse
Affiliation(s)
- Wolfgang Esser-Skala
- Computational Systems Biology Group, Department of Biosciences and Medical Biology, University of Salzburg, Hellbrunner Straße 34, 5020, Salzburg, Austria
| | - Nikolaus Fortelny
- Computational Systems Biology Group, Department of Biosciences and Medical Biology, University of Salzburg, Hellbrunner Straße 34, 5020, Salzburg, Austria.
| |
Collapse
|
28
|
Verplaetse N, Passemiers A, Arany A, Moreau Y, Raimondi D. Large sample size and nonlinear sparse models outline epistatic effects in inflammatory bowel disease. Genome Biol 2023; 24:224. [PMID: 37798735 PMCID: PMC10552306 DOI: 10.1186/s13059-023-03064-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2023] [Accepted: 09/20/2023] [Indexed: 10/07/2023] Open
Abstract
BACKGROUND Despite clear evidence of nonlinear interactions in the molecular architecture of polygenic diseases, linear models have so far appeared optimal in genotype-to-phenotype modeling. A key bottleneck for such modeling is that genetic data intrinsically suffers from underdetermination ([Formula: see text]). Millions of variants are present in each individual while the collection of large, homogeneous cohorts is hindered by phenotype incidence, sequencing cost, and batch effects. RESULTS We demonstrate that when we provide enough training data and control the complexity of nonlinear models, a neural network outperforms additive approaches in whole exome sequencing-based inflammatory bowel disease case-control prediction. To do so, we propose a biologically meaningful sparsified neural network architecture, providing empirical evidence for positive and negative epistatic effects present in the inflammatory bowel disease pathogenesis. CONCLUSIONS In this paper, we show that underdetermination is likely a major driver for the apparent optimality of additive modeling in clinical genetics today.
Collapse
Affiliation(s)
- Nora Verplaetse
- Department of of Electrical Engineering, Katholieke Universiteit Leuven, Leuven, Belgium.
| | - Antoine Passemiers
- Department of of Electrical Engineering, Katholieke Universiteit Leuven, Leuven, Belgium
| | - Adam Arany
- Department of of Electrical Engineering, Katholieke Universiteit Leuven, Leuven, Belgium
| | - Yves Moreau
- Department of of Electrical Engineering, Katholieke Universiteit Leuven, Leuven, Belgium
| | - Daniele Raimondi
- Department of of Electrical Engineering, Katholieke Universiteit Leuven, Leuven, Belgium.
| |
Collapse
|
29
|
Balnis J, Lauria EJM, Yucel R, Singer HA, Alisch RS, Jaitovich A. Peripheral Blood Omics and Other Multiplex-based Systems in Pulmonary and Critical Care Medicine. Am J Respir Cell Mol Biol 2023; 69:383-390. [PMID: 37379507 PMCID: PMC10557924 DOI: 10.1165/rcmb.2023-0153ps] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2023] [Accepted: 06/28/2023] [Indexed: 06/30/2023] Open
Abstract
Over the last years, the use of peripheral blood-derived big datasets in combination with machine learning technology has accelerated the understanding, prediction, and management of pulmonary and critical care conditions. The goal of this article is to provide readers with an introduction to the methods and applications of blood omics and other multiplex-based technologies in the pulmonary and critical care medicine setting to better appreciate the current literature in the field. To accomplish that, we provide essential concepts needed to rationalize this approach and introduce readers to the types of molecules that can be obtained from the circulating blood to generate big datasets; elaborate on the differences between bulk, sorted, and single-cell approaches; and the basic analytical pipelines required for clinical interpretation. Examples of peripheral blood-derived big datasets used in recent literature are presented, and limitations of that technology are highlighted to qualify both the current and future value of these methodologies.
Collapse
Affiliation(s)
- Joseph Balnis
- Division of Pulmonary and Critical Care Medicine and
- Department of Molecular and Cellular Physiology, Albany Medical College, Albany, New York
| | - Eitel J. M. Lauria
- School of Computer Science and Mathematics, Marist College, Poughkeepsie, New York
| | - Recai Yucel
- Department of Epidemiology and Biostatistics, Temple University, Philadelphia, Pennsylvania; and
| | - Harold A. Singer
- Department of Molecular and Cellular Physiology, Albany Medical College, Albany, New York
| | - Reid S. Alisch
- Department of Neurological Surgery, University of Wisconsin School of Medicine and Public Health, Madison, Wisconsin
| | - Ariel Jaitovich
- Division of Pulmonary and Critical Care Medicine and
- Department of Molecular and Cellular Physiology, Albany Medical College, Albany, New York
| |
Collapse
|
30
|
Cheng KP, Shen WX, Jiang YY, Chen Y, Chen YZ, Tan Y. Deep learning of 2D-Restructured gene expression representations for improved low-sample therapeutic response prediction. Comput Biol Med 2023; 164:107245. [PMID: 37480677 DOI: 10.1016/j.compbiomed.2023.107245] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2023] [Revised: 06/27/2023] [Accepted: 07/07/2023] [Indexed: 07/24/2023]
Abstract
Clinical outcome prediction is important for stratified therapeutics. Machine learning (ML) and deep learning (DL) methods facilitate therapeutic response prediction from transcriptomic profiles of cells and clinical samples. Clinical transcriptomic DL is challenged by the low-sample sizes (34-286 subjects), high-dimensionality (up to 21,653 genes) and unordered nature of clinical transcriptomic data. The established methods rely on ML algorithms at accuracy levels of 0.6-0.8 AUC/ACC values. Low-sample DL algorithms are needed for enhanced prediction capability. Here, an unsupervised manifold-guided algorithm was employed for restructuring transcriptomic data into ordered image-like 2D-representations, followed by efficient DL of these 2D-representations with deep ConvNets. Our DL models significantly outperformed the state-of-the-art (SOTA) ML models on 82% of 17 low-sample benchmark datasets (53% with >0.05 AUC/ACC improvement). They are more robust than the SOTA models in cross-cohort prediction tasks, and in identifying robust biomarkers and response-dependent variational patterns consistent with experimental indications.
Collapse
Affiliation(s)
- Kai Ping Cheng
- The State Key Laboratory of Chemical Oncogenomics, Key Laboratory of Chemical Biology, Tsinghua Shenzhen International Graduate School, Tsinghua University, Shenzhen, 518055, PR China; Institute of Biomedical Health Technology and Engineering, Shenzhen Bay Laboratory, Shenzhen, 518132, PR China
| | - Wan Xiang Shen
- Bioinformatics and Drug Design Group, Department of Pharmacy, Center for Computational Science and Engineering, National University of Singapore, 117543, Singapore
| | - Yu Yang Jiang
- School of Pharmaceutical Sciences, Tsinghua University, Beijing, 100084, PR China
| | - Yan Chen
- The State Key Laboratory of Chemical Oncogenomics, Key Laboratory of Chemical Biology, Tsinghua Shenzhen International Graduate School, Tsinghua University, Shenzhen, 518055, PR China
| | - Yu Zong Chen
- The State Key Laboratory of Chemical Oncogenomics, Key Laboratory of Chemical Biology, Tsinghua Shenzhen International Graduate School, Tsinghua University, Shenzhen, 518055, PR China; Institute of Biomedical Health Technology and Engineering, Shenzhen Bay Laboratory, Shenzhen, 518132, PR China.
| | - Ying Tan
- The State Key Laboratory of Chemical Oncogenomics, Key Laboratory of Chemical Biology, Tsinghua Shenzhen International Graduate School, Tsinghua University, Shenzhen, 518055, PR China; The Institute of Drug Discovery Technology, Ningbo University, Ningbo, 315211, PR China; Shenzhen Kivita Innovative Drug Discovery Institute, Shenzhen, 518110, PR China.
| |
Collapse
|
31
|
Ghosh Roy G, Geard N, Verspoor K, He S. MPVNN: Mutated Pathway Visible Neural Network architecture for interpretable prediction of cancer-specific survival risk. Bioinformatics 2022; 38:5026-5032. [PMID: 36124954 DOI: 10.1093/bioinformatics/btac636] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2022] [Revised: 08/04/2022] [Accepted: 09/16/2022] [Indexed: 12/24/2022] Open
Abstract
MOTIVATION Survival risk prediction using gene expression data is important in making treatment decisions in cancer. Standard neural network (NN) survival analysis models are black boxes with a lack of interpretability. More interpretable visible neural network architectures are designed using biological pathway knowledge. But they do not model how pathway structures can change for particular cancer types. RESULTS We propose a novel Mutated Pathway Visible Neural Network (MPVNN) architecture, designed using prior signaling pathway knowledge and random replacement of known pathway edges using gene mutation data simulating signal flow disruption. As a case study, we use the PI3K-Akt pathway and demonstrate overall improved cancer-specific survival risk prediction of MPVNN over other similar-sized NN and standard survival analysis methods. We show that trained MPVNN architecture interpretation, which points to smaller sets of genes connected by signal flow within the PI3K-Akt pathway that is important in risk prediction for particular cancer types, is reliable. AVAILABILITY AND IMPLEMENTATION The data and code are available at https://github.com/gourabghoshroy/MPVNN. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Gourab Ghosh Roy
- School of Computer Science, University of Birmingham, Birmingham B15 2TT, UK.,School of Computing and Information Systems, University of Melbourne, Melbourne 3052, Australia
| | - Nicholas Geard
- School of Computing and Information Systems, University of Melbourne, Melbourne 3052, Australia
| | - Karin Verspoor
- School of Computing and Information Systems, University of Melbourne, Melbourne 3052, Australia.,School of Computing Technologies, RMIT University, Melbourne 3000, Australia
| | - Shan He
- School of Computer Science, University of Birmingham, Birmingham B15 2TT, UK
| |
Collapse
|
32
|
Baratta AM, Brandner AJ, Plasil SL, Rice RC, Farris SP. Advancements in Genomic and Behavioral Neuroscience Analysis for the Study of Normal and Pathological Brain Function. Front Mol Neurosci 2022; 15:905328. [PMID: 35813067 PMCID: PMC9259865 DOI: 10.3389/fnmol.2022.905328] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/26/2022] [Accepted: 06/06/2022] [Indexed: 11/16/2022] Open
Abstract
Psychiatric and neurological disorders are influenced by an undetermined number of genes and molecular pathways that may differ among afflicted individuals. Functionally testing and characterizing biological systems is essential to discovering the interrelationship among candidate genes and understanding the neurobiology of behavior. Recent advancements in genetic, genomic, and behavioral approaches are revolutionizing modern neuroscience. Although these tools are often used separately for independent experiments, combining these areas of research will provide a viable avenue for multidimensional studies on the brain. Herein we will briefly review some of the available tools that have been developed for characterizing novel cellular and animal models of human disease. A major challenge will be openly sharing resources and datasets to effectively integrate seemingly disparate types of information and how these systems impact human disorders. However, as these emerging technologies continue to be developed and adopted by the scientific community, they will bring about unprecedented opportunities in our understanding of molecular neuroscience and behavior.
Collapse
Affiliation(s)
- Annalisa M. Baratta
- Center for Neuroscience, School of Medicine, University of Pittsburgh, Pittsburgh, PA, United States
| | - Adam J. Brandner
- Center for Neuroscience, School of Medicine, University of Pittsburgh, Pittsburgh, PA, United States
| | - Sonja L. Plasil
- Department of Pharmacology & Chemical Biology, School of Medicine, University of Pittsburgh, Pittsburgh, PA, United States
| | - Rachel C. Rice
- Center for Neuroscience, School of Medicine, University of Pittsburgh, Pittsburgh, PA, United States
| | - Sean P. Farris
- Center for Neuroscience, School of Medicine, University of Pittsburgh, Pittsburgh, PA, United States
- Department of Anesthesiology and Perioperative Medicine, School of Medicine, University of Pittsburgh, Pittsburgh, PA, United States
- Department of Biomedical Informatics, School of Medicine, University of Pittsburgh, Pittsburgh, PA, United States
| |
Collapse
|
33
|
Divate M, Tyagi A, Richard DJ, Prasad PA, Gowda H, Nagaraj SH. Deep Learning-Based Pan-Cancer Classification Model Reveals Tissue-of-Origin Specific Gene Expression Signatures. Cancers (Basel) 2022; 14:cancers14051185. [PMID: 35267493 PMCID: PMC8909043 DOI: 10.3390/cancers14051185] [Citation(s) in RCA: 18] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2021] [Revised: 02/03/2022] [Accepted: 02/17/2022] [Indexed: 12/24/2022] Open
Abstract
Cancer tissue-of-origin specific biomarkers are needed for effective diagnosis, monitoring, and treatment of cancers. In this study, we analyzed transcriptomics data from 37 cancer types provided by The Cancer Genome Atlas (TCGA) to identify cancer tissue-of-origin specific gene expression signatures. We developed a deep neural network model to classify cancers based on gene expression data. The model achieved a predictive accuracy of >97% across cancer types indicating the presence of distinct cancer tissue-of-origin specific gene expression signatures. We interpreted the model using Shapley additive explanations to identify specific gene signatures that significantly contributed to cancer-type classification. We evaluated the model and the validity of gene signatures using an independent test data set from the International Cancer Genome Consortium. In conclusion, we present a robust neural network model for accurate classification of cancers based on gene expression data and also provide a list of gene signatures that are valuable for developing biomarker panels for determining cancer tissue-of-origin. These gene signatures serve as valuable biomarkers for determining tissue-of-origin for cancers of unknown primary.
Collapse
Affiliation(s)
- Mayur Divate
- Centre for Genomics and Personalised Health, Queensland University of Technology, Brisbane, QLD 4059, Australia; (M.D.); (D.J.R.)
| | - Aayush Tyagi
- Indian Institute of Technology, IIT Delhi Main Rd., IIT Campus, Hauz Khas, New Delhi 110016, India; (A.T.); (P.A.P.)
| | - Derek J. Richard
- Centre for Genomics and Personalised Health, Queensland University of Technology, Brisbane, QLD 4059, Australia; (M.D.); (D.J.R.)
- Translational Research Institute, 37 Kent Street, Brisbane, QLD 4102, Australia
| | - Prathosh A. Prasad
- Indian Institute of Technology, IIT Delhi Main Rd., IIT Campus, Hauz Khas, New Delhi 110016, India; (A.T.); (P.A.P.)
- Department of Electrical Communication Engineering, Indian Institute of Science, Devasandra Layout, Bengaluru 560012, India
| | - Harsha Gowda
- QIMR Berghofer Medical Research Institute, 300 Herston Rd., Brisbane, QLD 4006, Australia
- Faculty of Health, Queensland University of Technology, Brisbane, QLD 4059, Australia
- Faculty of Medicine, The University of Queensland Mayne Medical School, 20 Weightman Street, Brisbane, QLD 4006, Australia
- Correspondence: (H.G.); (S.H.N.); Tel.: +61-733-620-452 (H.G.); +61-731-386-085 (S.H.N.)
| | - Shivashankar H. Nagaraj
- Centre for Genomics and Personalised Health, Queensland University of Technology, Brisbane, QLD 4059, Australia; (M.D.); (D.J.R.)
- Translational Research Institute, 37 Kent Street, Brisbane, QLD 4102, Australia
- Correspondence: (H.G.); (S.H.N.); Tel.: +61-733-620-452 (H.G.); +61-731-386-085 (S.H.N.)
| |
Collapse
|