1
|
Hsiao YC, Dutta A. Network Modeling and Control of Dynamic Disease Pathways, Review and Perspectives. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2024; 21:1211-1230. [PMID: 38498762 DOI: 10.1109/tcbb.2024.3378155] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/20/2024]
Abstract
Dynamic disease pathways are a combination of complex dynamical processes among bio-molecules in a cell that leads to diseases. Network modeling of disease pathways considers disease-related bio-molecules (e.g. DNA, RNA, transcription factors, enzymes, proteins, and metabolites) and their interaction (e.g. DNA methylation, histone modification, alternative splicing, and protein modification) to study disease progression and predict therapeutic responses. These bio-molecules and their interactions are the basic elements in the study of the misregulation in the disease-related gene expression that lead to abnormal cellular responses. Gene regulatory networks, cell signaling networks, and metabolic networks are the three major types of intracellular networks for the study of the cellular responses elicited from extracellular signals. The disease-related cellular responses can be prevented or regulated by designing control strategies to manipulate these extracellular or other intracellular signals. The paper reviews the regulatory mechanisms, the dynamic models, and the control strategies for each intracellular network. The applications, limitations and the prospective for modeling and control are also discussed.
Collapse
|
2
|
Li S, Yan B, Wu B, Su J, Lu J, Lam TW, Boheler KR, Poon ENY, Luo R. Integrated modeling framework reveals co-regulation of transcription factors, miRNAs and lncRNAs on cardiac developmental dynamics. Stem Cell Res Ther 2023; 14:247. [PMID: 37705079 PMCID: PMC10500942 DOI: 10.1186/s13287-023-03442-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2022] [Accepted: 08/07/2023] [Indexed: 09/15/2023] Open
Abstract
AIMS Dissecting complex interactions among transcription factors (TFs), microRNAs (miRNAs) and long noncoding RNAs (lncRNAs) are central for understanding heart development and function. Although computational approaches and platforms have been described to infer relationships among regulatory factors and genes, current approaches do not adequately account for how highly diverse, interacting regulators that include noncoding RNAs (ncRNAs) control cardiac gene expression dynamics over time. METHODS To overcome this limitation, we devised an integrated framework, cardiac gene regulatory modeling (CGRM) that integrates LogicTRN and regulatory component analysis bioinformatics modeling platforms to infer complex regulatory mechanisms. We then used CGRM to identify and compare the TF-ncRNA gene regulatory networks that govern early- and late-stage cardiomyocytes (CMs) generated by in vitro differentiation of human pluripotent stem cells (hPSC) and ventricular and atrial CMs isolated during in vivo human cardiac development. RESULTS Comparisons of in vitro versus in vivo derived CMs revealed conserved regulatory networks among TFs and ncRNAs in early cells that significantly diverged in late staged cells. We report that cardiac genes ("heart targets") expressed in early-stage hPSC-CMs are primarily regulated by MESP1, miR-1, miR-23, lncRNAs NEAT1 and MALAT1, while GATA6, HAND2, miR-200c, NEAT1 and MALAT1 are critical for late hPSC-CMs. The inferred TF-miRNA-lncRNA networks regulating heart development and contraction were similar among early-stage CMs, among individual hPSC-CM datasets and between in vitro and in vivo samples. However, genes related to apoptosis, cell cycle and proliferation, and transmembrane transport showed a high degree of divergence between in vitro and in vivo derived late-stage CMs. Overall, late-, but not early-stage CMs diverged greatly in the expression of "heart target" transcripts and their regulatory mechanisms. CONCLUSIONS In conclusion, we find that hPSC-CMs are regulated in a cell autonomous manner during early development that diverges significantly as a function of time when compared to in vivo derived CMs. These findings demonstrate the feasibility of using CGRM to reveal dynamic and complex transcriptional and posttranscriptional regulatory interactions that underlie cell directed versus environment-dependent CM development. These results with in vitro versus in vivo derived CMs thus establish this approach for detailed analyses of heart disease and for the analysis of cell regulatory systems in other biomedical fields.
Collapse
Affiliation(s)
- Shumin Li
- Department of Computer Science, The University of Hong Kong, Pokfulam, Hong Kong, China
| | - Bin Yan
- Department of Computer Science, The University of Hong Kong, Pokfulam, Hong Kong, China
- State Key Laboratory of Pharmaceutical Biotechnology, The University of Hong Kong, Pokfulam, Hong Kong, China
| | - Binbin Wu
- School of Biomedical Sciences, The Chinese University of Hong Kong, Shatin, Hong Kong, China
- Centre for Cardiovascular Genomics and Medicine, Lui Che Woo Institute of Innovative Medicine, The Chinese University of Hong Kong, Shatin, Hong Kong, China
| | - Junhao Su
- Department of Computer Science, The University of Hong Kong, Pokfulam, Hong Kong, China
| | - Jianliang Lu
- Department of Computer Science, The University of Hong Kong, Pokfulam, Hong Kong, China
| | - Tak-Wah Lam
- Department of Computer Science, The University of Hong Kong, Pokfulam, Hong Kong, China
| | - Kenneth R Boheler
- The Division of Cardiology, Department of Medicine and The Whiting School of Engineering, Department of Biomedical Engineering, The Johns Hopkins University, Baltimore, MD, 21205, USA.
| | - Ellen Ngar-Yun Poon
- School of Biomedical Sciences, The Chinese University of Hong Kong, Shatin, Hong Kong, China.
- Centre for Cardiovascular Genomics and Medicine, Lui Che Woo Institute of Innovative Medicine, The Chinese University of Hong Kong, Shatin, Hong Kong, China.
- Hong Kong Hub of Paediatric Excellence (HK HOPE), The Chinese University of Hong Kong, Kowloon Bay, Hong Kong, China.
| | - Ruibang Luo
- Department of Computer Science, The University of Hong Kong, Pokfulam, Hong Kong, China.
| |
Collapse
|
3
|
Pandey AK, Loscalzo J. Network medicine: an approach to complex kidney disease phenotypes. Nat Rev Nephrol 2023:10.1038/s41581-023-00705-0. [PMID: 37041415 DOI: 10.1038/s41581-023-00705-0] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 03/13/2023] [Indexed: 04/13/2023]
Abstract
Scientific reductionism has been the basis of disease classification and understanding for more than a century. However, the reductionist approach of characterizing diseases from a limited set of clinical observations and laboratory evaluations has proven insufficient in the face of an exponential growth in data generated from transcriptomics, proteomics, metabolomics and deep phenotyping. A new systematic method is necessary to organize these datasets and build new definitions of what constitutes a disease that incorporates both biological and environmental factors to more precisely describe the ever-growing complexity of phenotypes and their underlying molecular determinants. Network medicine provides such a conceptual framework to bridge these vast quantities of data while providing an individualized understanding of disease. The modern application of network medicine principles is yielding new insights into the pathobiology of chronic kidney diseases and renovascular disorders by expanding the understanding of pathogenic mediators, novel biomarkers and new options for renal therapeutics. These efforts affirm network medicine as a robust paradigm for elucidating new advances in the diagnosis and treatment of kidney disorders.
Collapse
Affiliation(s)
- Arvind K Pandey
- Division of Cardiovascular Medicine, Department of Medicine, Brigham and Women's Hospital, and Harvard Medical School, Boston, MA, USA
| | - Joseph Loscalzo
- Division of Cardiovascular Medicine, Department of Medicine, Brigham and Women's Hospital, and Harvard Medical School, Boston, MA, USA.
| |
Collapse
|
4
|
Guedj M, Swindle J, Hamon A, Hubert S, Desvaux E, Laplume J, Xuereb L, Lefebvre C, Haudry Y, Gabarroca C, Aussy A, Laigle L, Dupin-Roger I, Moingeon P. Industrializing AI-powered drug discovery: lessons learned from the Patrimony computing platform. Expert Opin Drug Discov 2022; 17:815-824. [PMID: 35786124 DOI: 10.1080/17460441.2022.2095368] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/04/2022]
Abstract
INTRODUCTION As a mid-size international pharmaceutical company, we initiated four years ago the launch of a dedicated high-throughput computing platform supporting drug discovery. The platform named "Patrimony" was built-up on the initial predicate to capitalize on our proprietary data while leveraging public data sources in order to foster a Computational Precision Medicine approach with the power of Artificial Intelligence. AREAS COVERED Specifically, Patrimony is designed to identify novel therapeutic target candidates. With several successful use cases in Immuno-inflammatory diseases, and current ongoing extension to applications to Oncology and Neurology, we document how this industrial computational platform has had a transformational impact on our R&D, making it more competitive, as well time and cost effective through a model-based educated selection of therapeutic targets and drug candidates. EXPERT OPINION We report our achievements, but also our challenges in implementing data access and governance processes, building-up hardware and user interfaces, and acculturing scientists to use predictive models to inform decisions.
Collapse
Affiliation(s)
- Mickaël Guedj
- Servier, Research & Development, Suresnes Cedex, France
| | - Jack Swindle
- Lincoln, Research & Development, Boulogne-Billancourt Cedex, France
| | - Antoine Hamon
- Lincoln, Research & Development, Boulogne-Billancourt Cedex, France
| | - Sandra Hubert
- Servier, Research & Development, Suresnes Cedex, France
| | - Emiko Desvaux
- Servier, Research & Development, Suresnes Cedex, France
| | | | - Laura Xuereb
- Servier, Research & Development, Suresnes Cedex, France
| | | | | | | | - Audrey Aussy
- Servier, Research & Development, Suresnes Cedex, France
| | | | | | | |
Collapse
|
5
|
Liu S, You Y, Tong Z, Zhang L. Developing an Embedding, Koopman and Autoencoder Technologies-Based Multi-Omics Time Series Predictive Model (EKATP) for Systems Biology research. Front Genet 2021; 12:761629. [PMID: 34764986 PMCID: PMC8576451 DOI: 10.3389/fgene.2021.761629] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/20/2021] [Accepted: 09/27/2021] [Indexed: 11/13/2022] Open
Abstract
It is very important for systems biologists to predict the state of the multi-omics time series for disease occurrence and health detection. However, it is difficult to make the prediction due to the high-dimensional, nonlinear and noisy characteristics of the multi-omics time series data. For this reason, this study innovatively proposes an Embedding, Koopman and Autoencoder technologies-based multi-omics time series predictive model (EKATP) to predict the future state of a high-dimensional nonlinear multi-omics time series. We evaluate this EKATP by using a genomics time series with chaotic behavior, a proteomics time series with oscillating behavior and a metabolomics time series with flow behavior. The computational experiments demonstrate that our proposed EKATP can substantially improve the accuracy, robustness and generalizability to predict the future state of a time series for multi-omics data.
Collapse
Affiliation(s)
- Suran Liu
- College of Computer Science, Sichuan University, Chengdu, China
| | - Yujie You
- College of Computer Science, Sichuan University, Chengdu, China
| | - Zhaoqi Tong
- College of Software Engineering, Sichuan University, Chengdu, China
| | - Le Zhang
- College of Computer Science, Sichuan University, Chengdu, China
| |
Collapse
|
6
|
Sun X, Zhang J, Nie Q. Inferring latent temporal progression and regulatory networks from cross-sectional transcriptomic data of cancer samples. PLoS Comput Biol 2021; 17:e1008379. [PMID: 33667222 PMCID: PMC7968745 DOI: 10.1371/journal.pcbi.1008379] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/03/2020] [Revised: 03/17/2021] [Accepted: 02/15/2021] [Indexed: 12/19/2022] Open
Abstract
Unraveling molecular regulatory networks underlying disease progression is critically important for understanding disease mechanisms and identifying drug targets. The existing methods for inferring gene regulatory networks (GRNs) rely mainly on time-course gene expression data. However, most available omics data from cross-sectional studies of cancer patients often lack sufficient temporal information, leading to a key challenge for GRN inference. Through quantifying the latent progression using random walks-based manifold distance, we propose a latent-temporal progression-based Bayesian method, PROB, for inferring GRNs from the cross-sectional transcriptomic data of tumor samples. The robustness of PROB to the measurement variabilities in the data is mathematically proved and numerically verified. Performance evaluation on real data indicates that PROB outperforms other methods in both pseudotime inference and GRN inference. Applications to bladder cancer and breast cancer demonstrate that our method is effective to identify key regulators of cancer progression or drug targets. The identified ACSS1 is experimentally validated to promote epithelial-to-mesenchymal transition of bladder cancer cells, and the predicted FOXM1-targets interactions are verified and are predictive of relapse in breast cancer. Our study suggests new effective ways to clinical transcriptomic data modeling for characterizing cancer progression and facilitates the translation of regulatory network-based approaches into precision medicine.
Collapse
Affiliation(s)
- Xiaoqiang Sun
- Key Laboratory of Tropical Disease Control, Chinese Ministry of Education; Zhongshan School of Medicine, Sun Yat-sen University, Guangzhou, China
- School of Mathematics, Sun Yat-sen University, Guangzhou, China
| | - Ji Zhang
- State Key Laboratory of Oncology in South China, Collaborative Innovation Center for Cancer Medicine, Sun Yat-sen University Cancer Center, Guangzhou, Guangdong, China
| | - Qing Nie
- Department of Mathematics and Department of Developmental & Cell Biology, NSF-Simons Center for Multiscale Cell Fate Research, University of California Irvine, Irvine, California, United States of America
| |
Collapse
|
7
|
Oh VKS, Li RW. Temporal Dynamic Methods for Bulk RNA-Seq Time Series Data. Genes (Basel) 2021; 12:352. [PMID: 33673721 PMCID: PMC7997275 DOI: 10.3390/genes12030352] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2021] [Revised: 02/19/2021] [Accepted: 02/22/2021] [Indexed: 02/06/2023] Open
Abstract
Dynamic studies in time course experimental designs and clinical approaches have been widely used by the biomedical community. These applications are particularly relevant in stimuli-response models under environmental conditions, characterization of gradient biological processes in developmental biology, identification of therapeutic effects in clinical trials, disease progressive models, cell-cycle, and circadian periodicity. Despite their feasibility and popularity, sophisticated dynamic methods that are well validated in large-scale comparative studies, in terms of statistical and computational rigor, are less benchmarked, comparing to their static counterparts. To date, a number of novel methods in bulk RNA-Seq data have been developed for the various time-dependent stimuli, circadian rhythms, cell-lineage in differentiation, and disease progression. Here, we comprehensively review a key set of representative dynamic strategies and discuss current issues associated with the detection of dynamically changing genes. We also provide recommendations for future directions for studying non-periodical, periodical time course data, and meta-dynamic datasets.
Collapse
Affiliation(s)
- Vera-Khlara S. Oh
- Animal Genomics and Improvement Laboratory, United States Department of Agriculture, Agricultural Research Service, Beltsville, MD 20705, USA;
- Department of Computer Science and Statistics, College of Natural Sciences, Jeju National University, Jeju City 63243, Korea
| | - Robert W. Li
- Animal Genomics and Improvement Laboratory, United States Department of Agriculture, Agricultural Research Service, Beltsville, MD 20705, USA;
| |
Collapse
|
8
|
Zaborowski AB, Walther D. Determinants of correlated expression of transcription factors and their target genes. Nucleic Acids Res 2020; 48:11347-11369. [PMID: 33104784 PMCID: PMC7672440 DOI: 10.1093/nar/gkaa927] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/07/2020] [Revised: 10/01/2020] [Accepted: 10/06/2020] [Indexed: 11/14/2022] Open
Abstract
While transcription factors (TFs) are known to regulate the expression of their target genes (TGs), only a weak correlation of expression between TFs and their TGs has generally been observed. As lack of correlation could be caused by additional layers of regulation, the overall correlation distribution may hide the presence of a subset of regulatory TF-TG pairs with tight expression coupling. Using reported regulatory pairs in the plant Arabidopsis thaliana along with comprehensive gene expression information and testing a wide array of molecular features, we aimed to discern the molecular determinants of high expression correlation of TFs and their TGs. TF-family assignment, stress-response process involvement, short genomic distances of the TF-binding sites to the transcription start site of their TGs, few required protein-protein-interaction connections to establish physical interactions between the TF and polymerase-II, unambiguous TF-binding motifs, increased numbers of miRNA target-sites in TF-mRNAs, and a young evolutionary age of TGs were found particularly indicative of high TF-TG correlation. The modulating roles of post-transcriptional, post-translational processes, and epigenetic factors have been characterized as well. Our study reveals that regulatory pairs with high expression coupling are associated with specific molecular determinants.
Collapse
Affiliation(s)
- Adam B Zaborowski
- Max Planck Institute of Molecular Plant Physiology, Am Mühlenberg 1, 14476 Potsdam-Golm, Germany
| | - Dirk Walther
- Max Planck Institute of Molecular Plant Physiology, Am Mühlenberg 1, 14476 Potsdam-Golm, Germany
| |
Collapse
|
9
|
Anguita-Ruiz A, Segura-Delgado A, Alcalá R, Aguilera CM, Alcalá-Fdez J. eXplainable Artificial Intelligence (XAI) for the identification of biologically relevant gene expression patterns in longitudinal human studies, insights from obesity research. PLoS Comput Biol 2020; 16:e1007792. [PMID: 32275707 PMCID: PMC7176286 DOI: 10.1371/journal.pcbi.1007792] [Citation(s) in RCA: 21] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2019] [Revised: 04/22/2020] [Accepted: 03/17/2020] [Indexed: 12/18/2022] Open
Abstract
Until date, several machine learning approaches have been proposed for the dynamic modeling of temporal omics data. Although they have yielded impressive results in terms of model accuracy and predictive ability, most of these applications are based on "Black-box" algorithms and more interpretable models have been claimed by the research community. The recent eXplainable Artificial Intelligence (XAI) revolution offers a solution for this issue, were rule-based approaches are highly suitable for explanatory purposes. The further integration of the data mining process along with functional-annotation and pathway analyses is an additional way towards more explanatory and biologically soundness models. In this paper, we present a novel rule-based XAI strategy (including pre-processing, knowledge-extraction and functional validation) for finding biologically relevant sequential patterns from longitudinal human gene expression data (GED). To illustrate the performance of our pipeline, we work on in vivo temporal GED collected within the course of a long-term dietary intervention in 57 subjects with obesity (GSE77962). As validation populations, we employ three independent datasets following the same experimental design. As a result, we validate primarily extracted gene patterns and prove the goodness of our strategy for the mining of biologically relevant gene-gene temporal relations. Our whole pipeline has been gathered under open-source software and could be easily extended to other human temporal GED applications.
Collapse
Affiliation(s)
- Augusto Anguita-Ruiz
- Department of Biochemistry and Molecular Biology II, Institute of Nutrition and Food Technology "José Mataix", Center of Biomedical Research, University of Granada, Granada, Spain
- Instituto de Investigación Biosanitaria ibs.GRANADA, Granada, Spain
- CIBEROBN (Physiopathology of Obesity and Nutrition), Instituto de Salud Carlos III (ISCIII), Madrid, Spain
| | - Alberto Segura-Delgado
- Department of Computer Science and Artificial Intelligence, University of Granada, Granada, Spain
| | - Rafael Alcalá
- Department of Computer Science and Artificial Intelligence, University of Granada, Granada, Spain
| | - Concepción M. Aguilera
- Department of Biochemistry and Molecular Biology II, Institute of Nutrition and Food Technology "José Mataix", Center of Biomedical Research, University of Granada, Granada, Spain
- Instituto de Investigación Biosanitaria ibs.GRANADA, Granada, Spain
- CIBEROBN (Physiopathology of Obesity and Nutrition), Instituto de Salud Carlos III (ISCIII), Madrid, Spain
| | - Jesús Alcalá-Fdez
- Department of Computer Science and Artificial Intelligence, University of Granada, Granada, Spain
| |
Collapse
|
10
|
Chandereng T, Gitter A. Lag penalized weighted correlation for time series clustering. BMC Bioinformatics 2020; 21:21. [PMID: 31948388 PMCID: PMC6966853 DOI: 10.1186/s12859-019-3324-1] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2019] [Accepted: 12/16/2019] [Indexed: 11/17/2022] Open
Abstract
BACKGROUND The similarity or distance measure used for clustering can generate intuitive and interpretable clusters when it is tailored to the unique characteristics of the data. In time series datasets generated with high-throughput biological assays, measurements such as gene expression levels or protein phosphorylation intensities are collected sequentially over time, and the similarity score should capture this special temporal structure. RESULTS We propose a clustering similarity measure called Lag Penalized Weighted Correlation (LPWC) to group pairs of time series that exhibit closely-related behaviors over time, even if the timing is not perfectly synchronized. LPWC aligns time series profiles to identify common temporal patterns. It down-weights aligned profiles based on the length of the temporal lags that are introduced. We demonstrate the advantages of LPWC versus existing time series and general clustering algorithms. In a simulated dataset based on the biologically-motivated impulse model, LPWC is the only method to recover the true clusters for almost all simulated genes. LPWC also identifies clusters with distinct temporal patterns in our yeast osmotic stress response and axolotl limb regeneration case studies. CONCLUSIONS LPWC achieves both of its time series clustering goals. It groups time series with correlated changes over time, even if those patterns occur earlier or later in some of the time series. In addition, it refrains from introducing large shifts in time when searching for temporal patterns by applying a lag penalty. The LPWC R package is available at https://github.com/gitter-lab/LPWC and CRAN under a MIT license.
Collapse
Affiliation(s)
- Thevaa Chandereng
- Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, WI USA
- Morgridge Institute of Research, Madison, WI USA
- Department of Statistics, University of Wisconsin-Madison, Madison, WI USA
| | - Anthony Gitter
- Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, WI USA
- Morgridge Institute of Research, Madison, WI USA
| |
Collapse
|
11
|
García-Gutiérrez MS, Navarrete F, Sala F, Gasparyan A, Austrich-Olivares A, Manzanares J. Biomarkers in Psychiatry: Concept, Definition, Types and Relevance to the Clinical Reality. Front Psychiatry 2020; 11:432. [PMID: 32499729 PMCID: PMC7243207 DOI: 10.3389/fpsyt.2020.00432] [Citation(s) in RCA: 153] [Impact Index Per Article: 30.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 01/15/2020] [Accepted: 04/28/2020] [Indexed: 12/12/2022] Open
Abstract
During the last years, an extraordinary effort has been made to identify biomarkers as potential tools for improving prevention, diagnosis, drug response and drug development in psychiatric disorders. Contrary to other diseases, mental illnesses are classified by diagnostic categories with a broad variety list of symptoms. Consequently, patients diagnosed from the same psychiatric illness present a great heterogeneity in their clinical presentation. This fact together with the incomplete knowledge of the neurochemical alterations underlying mental disorders, contribute to the limited efficacy of current pharmacological options. In this respect, the identification of biomarkers in psychiatry is becoming essential to facilitate diagnosis through the developing of markers that allow to stratify groups within the syndrome, which in turn may lead to more focused treatment options. In order to shed light on this issue, this review summarizes the concept and types of biomarkers including an operational definition for therapeutic development. Besides, the advances in this field were summarized and sorted into five categories, which include genetics, transcriptomics, proteomics, metabolomics, and epigenetics. While promising results were achieved, there is a lack of biomarker investigations especially related to treatment response to psychiatric conditions. This review includes a final conclusion remarking the future challenges required to reach the goal of developing valid, reliable and broadly-usable biomarkers for psychiatric disorders and their treatment. The identification of factors predicting treatment response will reduce trial-and-error switches of medications facilitating the discovery of new effective treatments, being a crucial step towards the establishment of greater personalized medicine.
Collapse
Affiliation(s)
- Maria Salud García-Gutiérrez
- Instituto de Neurociencias, Universidad Miguel Hernández-CSIC, Alicante, Spain.,Red Temática de Investigación Cooperativa en Salud (RETICS), Red de Trastornos Adictivos, Instituto de Salud Carlos III, MICINN and FEDER, Madrid, Spain
| | - Francisco Navarrete
- Instituto de Neurociencias, Universidad Miguel Hernández-CSIC, Alicante, Spain.,Red Temática de Investigación Cooperativa en Salud (RETICS), Red de Trastornos Adictivos, Instituto de Salud Carlos III, MICINN and FEDER, Madrid, Spain
| | - Francisco Sala
- Instituto de Neurociencias, Universidad Miguel Hernández-CSIC, Alicante, Spain
| | - Ani Gasparyan
- Instituto de Neurociencias, Universidad Miguel Hernández-CSIC, Alicante, Spain.,Red Temática de Investigación Cooperativa en Salud (RETICS), Red de Trastornos Adictivos, Instituto de Salud Carlos III, MICINN and FEDER, Madrid, Spain
| | | | - Jorge Manzanares
- Instituto de Neurociencias, Universidad Miguel Hernández-CSIC, Alicante, Spain.,Red Temática de Investigación Cooperativa en Salud (RETICS), Red de Trastornos Adictivos, Instituto de Salud Carlos III, MICINN and FEDER, Madrid, Spain
| |
Collapse
|
12
|
Liang Y, Kelemen A, Kelemen A. Reproducibility of biomarker identifications from mass spectrometry proteomic data in cancer studies. Stat Appl Genet Mol Biol 2019; 18:sagmb-2018-0039. [PMID: 31077580 DOI: 10.1515/sagmb-2018-0039] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
Reproducibility of disease signatures and clinical biomarkers in multi-omics disease analysis has been a key challenge due to a multitude of factors. The heterogeneity of the limited sample, various biological factors such as environmental confounders, and the inherent experimental and technical noises, compounded with the inadequacy of statistical tools, can lead to the misinterpretation of results, and subsequently very different biology. In this paper, we investigate the biomarker reproducibility issues, potentially caused by differences of statistical methods with varied distribution assumptions or marker selection criteria using Mass Spectrometry proteomic ovarian tumor data. We examine the relationship between effect sizes, p values, Cauchy p values, False Discovery Rate p values, and the rank fractions of identified proteins out of thousands in the limited heterogeneous sample. We compared the markers identified from statistical single features selection approaches with machine learning wrapper methods. The results reveal marked differences when selecting the protein markers from varied methods with potential selection biases and false discoveries, which may be due to the small effects, different distribution assumptions, and p value type criteria versus prediction accuracies. The alternative solutions and other related issues are discussed in supporting the reproducibility of findings for clinical actionable outcomes.
Collapse
Affiliation(s)
- Yulan Liang
- Department of Family and Community Health, University of Maryland, Baltimore, MD 21201-1579, USA
| | - Adam Kelemen
- Department of Computer Science, University of Maryland, College Park, MD 20742, USA
| | - Arpad Kelemen
- Department of Organizational Systems and Adult Health, University of Maryland, Baltimore, MD 21201-1579, USA
| |
Collapse
|
13
|
Sherman TD, Kagohara LT, Cao R, Cheng R, Satriano M, Considine M, Krigsfeld G, Ranaweera R, Tang Y, Jablonski SA, Stein-O'Brien G, Gaykalova DA, Weiner LM, Chung CH, Fertig EJ. CancerInSilico: An R/Bioconductor package for combining mathematical and statistical modeling to simulate time course bulk and single cell gene expression data in cancer. PLoS Comput Biol 2019; 14:e1006935. [PMID: 31002670 PMCID: PMC6504085 DOI: 10.1371/journal.pcbi.1006935] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/04/2018] [Revised: 05/07/2019] [Accepted: 03/11/2019] [Indexed: 11/18/2022] Open
Abstract
Bioinformatics techniques to analyze time course bulk and single cell omics data
are advancing. The absence of a known ground truth of the dynamics of molecular
changes challenges benchmarking their performance on real data. Realistic
simulated time-course datasets are essential to assess the performance of time
course bioinformatics algorithms. We develop an R/Bioconductor package,
CancerInSilico, to simulate bulk and single cell
transcriptional data from a known ground truth obtained from mathematical models
of cellular systems. This package contains a general R infrastructure for
running cell-based models and simulating gene expression data based on the model
states. We show how to use this package to simulate a gene expression data set
and consequently benchmark analysis methods on this data set with a known ground
truth. The package is freely available via Bioconductor: http://bioconductor.org/packages/CancerInSilico/
Collapse
Affiliation(s)
- Thomas D. Sherman
- Department of Oncology, Division of Biostatistics and Bioinformatics,
Sidney Kimmel Comprehensive Cancer Center, Johns Hopkins University, Baltimore,
MD United States of America
- * E-mail:
(TDS); (EJF)
| | - Luciane T. Kagohara
- Department of Oncology, Division of Biostatistics and Bioinformatics,
Sidney Kimmel Comprehensive Cancer Center, Johns Hopkins University, Baltimore,
MD United States of America
| | - Raymon Cao
- Department of Oncology, Division of Biostatistics and Bioinformatics,
Sidney Kimmel Comprehensive Cancer Center, Johns Hopkins University, Baltimore,
MD United States of America
| | - Raymond Cheng
- Science, Math and Computer Science Magnet Program, Poolesville High
School, Poolesville, MD United States of America
| | - Matthew Satriano
- Department of Mathematics, University of Waterloo, Waterloo, Ontario,
Canada
| | - Michael Considine
- Department of Oncology, Division of Biostatistics and Bioinformatics,
Sidney Kimmel Comprehensive Cancer Center, Johns Hopkins University, Baltimore,
MD United States of America
| | - Gabriel Krigsfeld
- Department of Oncology, Division of Biostatistics and Bioinformatics,
Sidney Kimmel Comprehensive Cancer Center, Johns Hopkins University, Baltimore,
MD United States of America
| | | | - Yong Tang
- Lombardi Comprehensive Cancer Center, Georgetown University, Washington,
DC United States of America
| | - Sandra A. Jablonski
- Lombardi Comprehensive Cancer Center, Georgetown University, Washington,
DC United States of America
| | - Genevieve Stein-O'Brien
- Department of Oncology, Division of Biostatistics and Bioinformatics,
Sidney Kimmel Comprehensive Cancer Center, Johns Hopkins University, Baltimore,
MD United States of America
- Institute of Genetic Medicine, Johns Hopkins University, Baltimore, MD
United States of America
| | - Daria A. Gaykalova
- Department of Otolaryngology-Head and Neck Surgery, Johns Hopkins
University School of Medicine, Baltimore, MD United States of
America
| | - Louis M. Weiner
- Lombardi Comprehensive Cancer Center, Georgetown University, Washington,
DC United States of America
| | | | - Elana J. Fertig
- Department of Oncology, Division of Biostatistics and Bioinformatics,
Sidney Kimmel Comprehensive Cancer Center, Johns Hopkins University, Baltimore,
MD United States of America
- Department of Applied Mathematics and Statistics, Johns Hopkins
University, Baltimore, MD United States of America
- Department of Biomedical Engineering, Johns Hopkins University,
Baltimore, MD United States of America
- * E-mail:
(TDS); (EJF)
| |
Collapse
|
14
|
Stein-O'Brien GL, Arora R, Culhane AC, Favorov AV, Garmire LX, Greene CS, Goff LA, Li Y, Ngom A, Ochs MF, Xu Y, Fertig EJ. Enter the Matrix: Factorization Uncovers Knowledge from Omics. Trends Genet 2018; 34:790-805. [PMID: 30143323 PMCID: PMC6309559 DOI: 10.1016/j.tig.2018.07.003] [Citation(s) in RCA: 132] [Impact Index Per Article: 18.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2018] [Revised: 06/01/2018] [Accepted: 07/16/2018] [Indexed: 12/20/2022]
Abstract
Omics data contain signals from the molecular, physical, and kinetic inter- and intracellular interactions that control biological systems. Matrix factorization (MF) techniques can reveal low-dimensional structure from high-dimensional data that reflect these interactions. These techniques can uncover new biological knowledge from diverse high-throughput omics data in applications ranging from pathway discovery to timecourse analysis. We review exemplary applications of MF for systems-level analyses. We discuss appropriate applications of these methods, their limitations, and focus on the analysis of results to facilitate optimal biological interpretation. The inference of biologically relevant features with MF enables discovery from high-throughput data beyond the limits of current biological knowledge - answering questions from high-dimensional data that we have not yet thought to ask.
Collapse
Affiliation(s)
- Genevieve L Stein-O'Brien
- Department of Oncology, Division of Biostatistics and Bioinformatics, Sidney Kimmel Comprehensive Cancer Center, Johns Hopkins School of Medicine, Baltimore, MD, USA; Department of Neuroscience, Johns Hopkins School of Medicine, Baltimore, MD, USA; McKusick-Nathans Institute of Genetic Medicine, Johns Hopkins School of Medicine, Baltimore, MD, USA
| | - Raman Arora
- Department of Computer Science, Institute for Data Intensive Engineering and Science, Johns Hopkins University, Baltimore, MD, USA
| | - Aedin C Culhane
- Department of Biostatistics and Computational Biology, Dana-Farber Cancer Institute, Boston, MA, USA; Department of Biostatistics, Harvard TH Chan School of Public Health, Boston, MA, USA
| | - Alexander V Favorov
- Department of Oncology, Division of Biostatistics and Bioinformatics, Sidney Kimmel Comprehensive Cancer Center, Johns Hopkins School of Medicine, Baltimore, MD, USA; Vavilov Institute of General Genetics, Moscow, Russia
| | | | - Casey S Greene
- Department of Systems Pharmacology and Translational Therapeutics, Perelman School of Medicine, University of Pennsylvania, PA, USA; Childhood Cancer Data Lab, Alex's Lemonade Stand Foundation, PA, USA
| | - Loyal A Goff
- Department of Neuroscience, Johns Hopkins School of Medicine, Baltimore, MD, USA; McKusick-Nathans Institute of Genetic Medicine, Johns Hopkins School of Medicine, Baltimore, MD, USA
| | - Yifeng Li
- Digital Technologies Research Centre, National Research Council of Canada, Ottawa, ON, Canada
| | - Aloune Ngom
- School of Computer Science, University of Windsor, Windsor, ON, Canada
| | - Michael F Ochs
- Department of Mathematics and Statistics, The College of New Jersey, Ewing, NJ, USA
| | - Yanxun Xu
- Department of Applied Mathematics and Statistics, Whiting School of Engineering, Johns Hopkins University, Baltimore, MD, USA
| | - Elana J Fertig
- Department of Oncology, Division of Biostatistics and Bioinformatics, Sidney Kimmel Comprehensive Cancer Center, Johns Hopkins School of Medicine, Baltimore, MD, USA.
| |
Collapse
|
15
|
Time to build on good design: Resolving the temporal dynamics of gene regulatory networks. Proc Natl Acad Sci U S A 2018; 115:6325-6327. [PMID: 29871952 DOI: 10.1073/pnas.1807707115] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
|