1
|
drexml: A command line tool and Python package for drug repurposing. Comput Struct Biotechnol J 2024; 23:1129-1143. [PMID: 38510973 PMCID: PMC10950807 DOI: 10.1016/j.csbj.2024.02.027] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2023] [Revised: 02/27/2024] [Accepted: 02/27/2024] [Indexed: 03/22/2024] Open
Abstract
We introduce drexml, a command line tool and Python package for rational data-driven drug repurposing. The package employs machine learning and mechanistic signal transduction modeling to identify drug targets capable of regulating a particular disease. In addition, it employs explainability tools to contextualize potential drug targets within the functional landscape of the disease. The methodology is validated in Fanconi Anemia and Familial Melanoma, two distinct rare diseases where there is a pressing need for solutions. In the Fanconi Anemia case, the model successfully predicts previously validated repurposed drugs, while in the Familial Melanoma case, it identifies a promising set of drugs for further investigation.
Collapse
|
2
|
Evidence of the association between increased use of direct oral anticoagulants and a reduction in the rate of atrial fibrillation-related stroke and major bleeding at the population level (2012-2019). Med Clin (Barc) 2024; 162:220-227. [PMID: 37989706 DOI: 10.1016/j.medcli.2023.10.008] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/07/2023] [Revised: 10/10/2023] [Accepted: 10/11/2023] [Indexed: 11/23/2023]
Abstract
BACKGROUND The introduction of direct-acting oral anticoagulants (DOACs) has shown to decrease atrial fibrillation (AF)-related stroke and bleeding rates in clinical studies, but there is no certain evidence about their effects at the population level. Our aim was to assess changes in AF-related stroke and major bleeding rates between 2012 and 2019 in Andalusia (Spain), and the association between DOACs use and events rates at the population level. METHODS All patients with an AF diagnosis from 2012 to 2019 were identified using the Andalusian Health Population Base, that provides clinical information on all Andalusian people. Annual ischemic and hemorrhagic stroke, major bleeding rates, and used antithrombotic treatments were determined. Marginal hazard ratios (HR) were calculated for each treatment. RESULTS A total of 95,085 patients with an AF diagnosis were identified. Mean age was 76.1±10.2 years (49.7% women). An increase in the use of DOACs was observed throughout the study period in both males and females (p<0.001). The annual rate of ischemic stroke decreased by one third, while that of hemorrhagic stroke and major bleeding decreased 2-3-fold from 2012 to 2019. Marginal HR was lower than 0.50 for DOACs compared to VKA for all ischemic or hemorrhagic events. CONCLUSIONS In this contemporary population-based study using clinical and administrative databases in Andalusia, a significant reduction in the incidence of AF-related ischemic and hemorrhagic stroke and major bleeding was observed between 2012 and 2019. The increased use of DOACs seems to be associated with this reduction.
Collapse
|
3
|
The mechanistic functional landscape of retinitis pigmentosa: a machine learning-driven approach to therapeutic target discovery. J Transl Med 2024; 22:139. [PMID: 38321543 PMCID: PMC10848380 DOI: 10.1186/s12967-024-04911-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2023] [Accepted: 01/20/2024] [Indexed: 02/08/2024] Open
Abstract
BACKGROUND Retinitis pigmentosa is the prevailing genetic cause of blindness in developed nations with no effective treatments. In the pursuit of unraveling the intricate dynamics underlying this complex disease, mechanistic models emerge as a tool of proven efficiency rooted in systems biology, to elucidate the interplay between RP genes and their mechanisms. The integration of mechanistic models and drug-target interactions under the umbrella of machine learning methodologies provides a multifaceted approach that can boost the discovery of novel therapeutic targets, facilitating further drug repurposing in RP. METHODS By mapping Retinitis Pigmentosa-related genes (obtained from Orphanet, OMIM and HPO databases) onto KEGG signaling pathways, a collection of signaling functional circuits encompassing Retinitis Pigmentosa molecular mechanisms was defined. Next, a mechanistic model of the so-defined disease map, where the effects of interventions can be simulated, was built. Then, an explainable multi-output random forest regressor was trained using normal tissue transcriptomic data to learn causal connections between targets of approved drugs from DrugBank and the functional circuits of the mechanistic disease map. Selected target genes involvement were validated on rd10 mice, a murine model of Retinitis Pigmentosa. RESULTS A mechanistic functional map of Retinitis Pigmentosa was constructed resulting in 226 functional circuits belonging to 40 KEGG signaling pathways. The method predicted 109 targets of approved drugs in use with a potential effect over circuits corresponding to nine hallmarks identified. Five of those targets were selected and experimentally validated in rd10 mice: Gabre, Gabra1 (GABARα1 protein), Slc12a5 (KCC2 protein), Grin1 (NR1 protein) and Glr2a. As a result, we provide a resource to evaluate the potential impact of drug target genes in Retinitis Pigmentosa. CONCLUSIONS The possibility of building actionable disease models in combination with machine learning algorithms to learn causal drug-disease interactions opens new avenues for boosting drug discovery. Such mechanistically-based hypotheses can guide and accelerate the experimental validations prioritizing drug target candidates. In this work, a mechanistic model describing the functional disease map of Retinitis Pigmentosa was developed, identifying five promising therapeutic candidates targeted by approved drug. Further experimental validation will demonstrate the efficiency of this approach for a systematic application to other rare diseases.
Collapse
|
4
|
Real-world evidence with a retrospective cohort of 15,968 COVID-19 hospitalized patients suggests 21 new effective treatments. Virol J 2023; 20:226. [PMID: 37803348 PMCID: PMC10559601 DOI: 10.1186/s12985-023-02195-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2022] [Accepted: 09/27/2023] [Indexed: 10/08/2023] Open
Abstract
PURPOSE Despite the extensive vaccination campaigns in many countries, COVID-19 is still a major worldwide health problem because of its associated morbidity and mortality. Therefore, finding efficient treatments as fast as possible is a pressing need. Drug repurposing constitutes a convenient alternative when the need for new drugs in an unexpected medical scenario is urgent, as is the case with COVID-19. METHODS Using data from a central registry of electronic health records (the Andalusian Population Health Database), the effect of prior consumption of drugs for other indications previous to the hospitalization with respect to patient outcomes, including survival and lymphocyte progression, was studied on a retrospective cohort of 15,968 individuals, comprising all COVID-19 patients hospitalized in Andalusia between January and November 2020. RESULTS Covariate-adjusted hazard ratios and analysis of lymphocyte progression curves support a significant association between consumption of 21 different drugs and better patient survival. Contrarily, one drug, furosemide, displayed a significant increase in patient mortality. CONCLUSIONS In this study we have taken advantage of the availability of a regional clinical database to study the effect of drugs, which patients were taking for other indications, on their survival. The large size of the database allowed us to control covariates effectively.
Collapse
|
5
|
Functional Profiling of Soft Tissue Sarcoma Using Mechanistic Models. Int J Mol Sci 2023; 24:14732. [PMID: 37834179 PMCID: PMC10572617 DOI: 10.3390/ijms241914732] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2023] [Revised: 09/08/2023] [Accepted: 09/26/2023] [Indexed: 10/15/2023] Open
Abstract
Soft tissue sarcoma is an umbrella term for a group of rare cancers that are difficult to treat. In addition to surgery, neoadjuvant chemotherapy has shown the potential to downstage tumors and prevent micrometastases. However, finding effective therapeutic targets remains a research challenge. Here, a previously developed computational approach called mechanistic models of signaling pathways has been employed to unravel the impact of observed changes at the gene expression level on the ultimate functional behavior of cells. In the context of such a mechanistic model, RNA-Seq counts sourced from the Recount3 resource, from The Cancer Genome Atlas (TCGA) Sarcoma project, and non-diseased sarcomagenic tissues from the Genotype-Tissue Expression (GTEx) project were utilized to investigate signal transduction activity through signaling pathways. This approach provides a precise view of the relationship between sarcoma patient survival and the signaling landscape in tumors and their environment. Despite the distinct regulatory alterations observed in each sarcoma subtype, this study identified 13 signaling circuits, or elementary sub-pathways triggering specific cell functions, present across all subtypes, belonging to eight signaling pathways, which served as predictors for patient survival. Additionally, nine signaling circuits from five signaling pathways that highlighted the modifications tumor samples underwent in comparison to normal tissues were found. These results describe the protective role of the immune system, suggesting an anti-tumorigenic effect in the tumor microenvironment, in the process of tumor cell detachment and migration, or the dysregulation of ion homeostasis. Also, the analysis of signaling circuit intermediary proteins suggests multiple strategies for therapy.
Collapse
|
6
|
Crosstalk between Metabolite Production and Signaling Activity in Breast Cancer. Int J Mol Sci 2023; 24:ijms24087450. [PMID: 37108611 PMCID: PMC10138666 DOI: 10.3390/ijms24087450] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2023] [Revised: 04/11/2023] [Accepted: 04/13/2023] [Indexed: 04/29/2023] Open
Abstract
The reprogramming of metabolism is a recognized cancer hallmark. It is well known that different signaling pathways regulate and orchestrate this reprogramming that contributes to cancer initiation and development. However, recent evidence is accumulating, suggesting that several metabolites could play a relevant role in regulating signaling pathways. To assess the potential role of metabolites in the regulation of signaling pathways, both metabolic and signaling pathway activities of Breast invasive Carcinoma (BRCA) have been modeled using mechanistic models. Gaussian Processes, powerful machine learning methods, were used in combination with SHapley Additive exPlanations (SHAP), a recent methodology that conveys causality, to obtain potential causal relationships between the production of metabolites and the regulation of signaling pathways. A total of 317 metabolites were found to have a strong impact on signaling circuits. The results presented here point to the existence of a complex crosstalk between signaling and metabolic pathways more complex than previously was thought.
Collapse
|
7
|
SigPrimedNet: A Signaling-Informed Neural Network for scRNA-seq Annotation of Known and Unknown Cell Types. BIOLOGY 2023; 12:biology12040579. [PMID: 37106779 PMCID: PMC10135788 DOI: 10.3390/biology12040579] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/27/2022] [Revised: 03/04/2023] [Accepted: 04/08/2023] [Indexed: 04/29/2023]
Abstract
Single-cell RNA sequencing is increasing our understanding of the behavior of complex tissues or organs, by providing unprecedented details on the complex cell type landscape at the level of individual cells. Cell type definition and functional annotation are key steps to understanding the molecular processes behind the underlying cellular communication machinery. However, the exponential growth of scRNA-seq data has made the task of manually annotating cells unfeasible, due not only to an unparalleled resolution of the technology but to an ever-increasing heterogeneity of the data. Many supervised and unsupervised methods have been proposed to automatically annotate cells. Supervised approaches for cell-type annotation outperform unsupervised methods except when new (unknown) cell types are present. Here, we introduce SigPrimedNet an artificial neural network approach that leverages (i) efficient training by means of a sparsity-inducing signaling circuits-informed layer, (ii) feature representation learning through supervised training, and (iii) unknown cell-type identification by fitting an anomaly detection method on the learned representation. We show that SigPrimedNet can efficiently annotate known cell types while keeping a low false-positive rate for unseen cells across a set of publicly available datasets. In addition, the learned representation acts as a proxy for signaling circuit activity measurements, which provide useful estimations of the cell functionalities.
Collapse
|
8
|
An SPM-Enriched Marine Oil Supplement Shifted Microglia Polarization toward M2, Ameliorating Retinal Degeneration in rd10 Mice. Antioxidants (Basel) 2022; 12:antiox12010098. [PMID: 36670960 PMCID: PMC9855087 DOI: 10.3390/antiox12010098] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2022] [Revised: 12/03/2022] [Accepted: 12/13/2022] [Indexed: 01/04/2023] Open
Abstract
Retinitis pigmentosa (RP) is the most common inherited retinal dystrophy causing progressive vision loss. It is accompanied by chronic and sustained inflammation, including M1 microglia activation. This study evaluated the effect of an essential fatty acid (EFA) supplement containing specialized pro-resolving mediators (SPMs), on retinal degeneration and microglia activation in rd10 mice, a model of RP, as well as on LPS-stimulated BV2 cells. The EFA supplement was orally administered to mice from postnatal day (P)9 to P18. At P18, the electrical activity of the retina was examined by electroretinography (ERG) and innate behavior in response to light were measured. Retinal degeneration was studied via histology including the TUNEL assay and microglia immunolabeling. Microglia polarization (M1/M2) was assessed by flow cytometry, qPCR, ELISA and histology. Redox status was analyzed by measuring antioxidant enzymes and markers of oxidative damage. Interestingly, the EFA supplement ameliorated retinal dysfunction and degeneration by improving ERG recording and sensitivity to light, and reducing photoreceptor cell loss. The EFA supplement reduced inflammation and microglia activation attenuating M1 markers as well as inducing a shift to the M2 phenotype in rd10 mouse retinas and LPS-stimulated BV2 cells. It also reduced oxidative stress markers of lipid peroxidation and carbonylation. These findings could open up new therapeutic opportunities based on resolving inflammation with oral supplementation with SPMs such as the EFA supplement.
Collapse
|
9
|
Discovering potential interactions between rare diseases and COVID-19 by combining mechanistic models of viral infection with statistical modeling. Hum Mol Genet 2022; 31:2078-2089. [PMID: 35022696 PMCID: PMC9239744 DOI: 10.1093/hmg/ddac007] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2021] [Revised: 12/30/2021] [Accepted: 01/10/2022] [Indexed: 11/28/2022] Open
Abstract
Recent studies have demonstrated a relevant role of the host genetics in the coronavirus disease 2019 (COVID-19) prognosis. Most of the 7000 rare diseases described to date have a genetic component, typically highly penetrant. However, this vast spectrum of genetic variability remains yet unexplored with respect to possible interactions with COVID-19. Here, a mathematical mechanistic model of the COVID-19 molecular disease mechanism has been used to detect potential interactions between rare disease genes and the COVID-19 infection process and downstream consequences. Out of the 2518 disease genes analyzed, causative of 3854 rare diseases, a total of 254 genes have a direct effect on the COVID-19 molecular disease mechanism and 207 have an indirect effect revealed by a significant strong correlation. This remarkable potential of interaction occurs for >300 rare diseases. Mechanistic modeling of COVID-19 disease map has allowed a holistic systematic analysis of the potential interactions between the loss of function in known rare disease genes and the pathological consequences of COVID-19 infection. The results identify links between disease genes and COVID-19 hallmarks and demonstrate the usefulness of the proposed approach for future preventive measures in some rare diseases.
Collapse
|
10
|
Integrating pathway knowledge with deep neural networks to reduce the dimensionality in single-cell RNA-seq data. BioData Min 2022; 15:1. [PMID: 34980200 PMCID: PMC8722116 DOI: 10.1186/s13040-021-00285-4] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2021] [Accepted: 12/04/2021] [Indexed: 11/13/2022] Open
Abstract
Background Single-cell RNA sequencing (scRNA-seq) data provide valuable insights into cellular heterogeneity which is significantly improving the current knowledge on biology and human disease. One of the main applications of scRNA-seq data analysis is the identification of new cell types and cell states. Deep neural networks (DNNs) are among the best methods to address this problem. However, this performance comes with the trade-off for a lack of interpretability in the results. In this work we propose an intelligible pathway-driven neural network to correctly solve cell-type related problems at single-cell resolution while providing a biologically meaningful representation of the data. Results In this study, we explored the deep neural networks constrained by several types of prior biological information, e.g. signaling pathway information, as a way to reduce the dimensionality of the scRNA-seq data. We have tested the proposed biologically-based architectures on thousands of cells of human and mouse origin across a collection of public datasets in order to check the performance of the model. Specifically, we tested the architecture across different validation scenarios that try to mimic how unknown cell types are clustered by the DNN and how it correctly annotates cell types by querying a database in a retrieval problem. Moreover, our approach demonstrated to be comparable to other less interpretable DNN approaches constrained by using protein-protein interactions gene regulation data. Finally, we show how the latent structure learned by the network could be used to visualize and to interpret the composition of human single cell datasets. Conclusions Here we demonstrate how the integration of pathways, which convey fundamental information on functional relationships between genes, with DNNs, that provide an excellent classification framework, results in an excellent alternative to learn a biologically meaningful representation of scRNA-seq data. In addition, the introduction of prior biological knowledge in the DNN reduces the size of the network architecture. Comparative results demonstrate a superior performance of this approach with respect to other similar approaches. As an additional advantage, the use of pathways within the DNN structure enables easy interpretability of the results by connecting features to cell functionalities by means of the pathway nodes, as demonstrated with an example with human melanoma tumor cells. Supplementary Information The online version contains supplementary material available at 10.1186/s13040-021-00285-4.
Collapse
|
11
|
Real world evidence of calcifediol or vitamin D prescription and mortality rate of COVID-19 in a retrospective cohort of hospitalized Andalusian patients. Sci Rep 2021; 11:23380. [PMID: 34862422 PMCID: PMC8642445 DOI: 10.1038/s41598-021-02701-5] [Citation(s) in RCA: 18] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/03/2021] [Accepted: 11/19/2021] [Indexed: 12/12/2022] Open
Abstract
COVID-19 is a major worldwide health problem because of acute respiratory distress syndrome, and mortality. Several lines of evidence have suggested a relationship between the vitamin D endocrine system and severity of COVID-19. We present a survival study on a retrospective cohort of 15,968 patients, comprising all COVID-19 patients hospitalized in Andalusia between January and November 2020. Based on a central registry of electronic health records (the Andalusian Population Health Database, BPS), prescription of vitamin D or its metabolites within 15–30 days before hospitalization were recorded. The effect of prescription of vitamin D (metabolites) for other indication previous to the hospitalization was studied with respect to patient survival. Kaplan–Meier survival curves and hazard ratios support an association between prescription of these metabolites and patient survival. Such association was stronger for calcifediol (Hazard Ratio, HR = 0.67, with 95% confidence interval, CI, of [0.50–0.91]) than for cholecalciferol (HR = 0.75, with 95% CI of [0.61–0.91]), when prescribed 15 days prior hospitalization. Although the relation is maintained, there is a general decrease of this effect when a longer period of 30 days prior hospitalization is considered (calcifediol HR = 0.73, with 95% CI [0.57–0.95] and cholecalciferol HR = 0.88, with 95% CI [0.75, 1.03]), suggesting that association was stronger when the prescription was closer to the hospitalization.
Collapse
|
12
|
Highly accurate whole-genome imputation of SARS-CoV-2 from partial or low-quality sequences. Gigascience 2021; 10:giab078. [PMID: 34865008 PMCID: PMC8643610 DOI: 10.1093/gigascience/giab078] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2021] [Revised: 10/26/2021] [Accepted: 11/12/2021] [Indexed: 11/13/2022] Open
Abstract
BACKGROUND The current SARS-CoV-2 pandemic has emphasized the utility of viral whole-genome sequencing in the surveillance and control of the pathogen. An unprecedented ongoing global initiative is producing hundreds of thousands of sequences worldwide. However, the complex circumstances in which viruses are sequenced, along with the demand of urgent results, causes a high rate of incomplete and, therefore, useless sequences. Viral sequences evolve in the context of a complex phylogeny and different positions along the genome are in linkage disequilibrium. Therefore, an imputation method would be able to predict missing positions from the available sequencing data. RESULTS We have developed the impuSARS application, which takes advantage of the enormous number of SARS-CoV-2 genomes available, using a reference panel containing 239,301 sequences, to produce missing data imputation in viral genomes. ImpuSARS was tested in a wide range of conditions (continuous fragments, amplicons or sparse individual positions missing), showing great fidelity when reconstructing the original sequences, recovering the lineage with a 100% precision for almost all the lineages, even in very poorly covered genomes (<20%). CONCLUSIONS Imputation can improve the pace of SARS-CoV-2 sequencing production by recovering many incomplete or low-quality sequences that would be otherwise discarded. ImpuSARS can be incorporated in any primary data processing pipeline for SARS-CoV-2 whole-genome sequencing.
Collapse
|
13
|
MamaPred: A new and innovative approach to determine recurrence risk in HR+/HER2- early-stage breast cancer using HTG EdgeSeq technology. J Clin Oncol 2021. [DOI: 10.1200/jco.2021.39.15_suppl.558] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
558 Background: Genomic platforms, such as Mammaprint (Agendia) (MP) and OncoType (Genomic Health) (OT), have been validated to determine the risk of relapse in therapeutic decision-making in early-stage hormone receptor positive (HR+), epidermal growth factor receptor 2 (HER2) negative breast cancer (BC). Discordances in risk allocation between these platforms affect up to 30% of patients. This study aims to develop the MamaPred test to improve the diagnostic performance of recurrence risk in HR+/HER2- early-stage BC. Methods: A total of 606 HR+/HER2- early-stage BC previously tested with OT [n = 287; Low Risk (LR) = 165, Intermediate Risk (IR) = 103 and High Risk (HR) = 19] and MP (n = 319; LR = 217 and HR = 102) were included. A retrospective independent series of 144 HR+/HER2- early-stage BC [median follow-up: 10.53 years (range: 3.1-23.1 yrs); age (median = 62.9 yrs (33-89 yrs); systemic relapse 10.5% (n = 15)] was used as validation set.The expression levels of 2560 cancer-related mRNAs were evaluated from one 5 μm thin-section of a FFPE block (15 mm2 tumor area) using the Oncology Biomarker Panel (OBP) and the HTG EdgeSeq System (HTG Molecular Diagnostics. Inc) and quantified by NGS on a NextSeq550 sequencer (Illumina). A predictive model was built from normalized and logarithmically transformed values (rescaled to [0, 1]) using as response a binary meta-variable constructed by taking the values -1 (for LR of MP and OT together the OT IR) and 1 (for HR MP and OT). Differential expression, GSEA and visualization were performed with DESeq2, gage and pathview packages respectively in R v4.0.1. Results: MamaPred consists of a logistic regression classifier with an elastic net penalty (mix of L1 and L2 priors as regularizer) where the mixing parameter is optimized along with regularization strength by selecting the ones that minimize the area under the precision and recall curve over a validation split for each training fold. Metrics of MamaPred were: balanced accuracy, 80.5%; Kappa, 0.562; specificity, 80.7%; and NPV, 91.4%. GSE analysis on differentially expressed genes (q < 0.1) showed four KEGG pathways overrepresented in HR (p < 0.05): adherens junction, tight junction, glutathione metabolism and focal adhesion; and two underrepresented: DNA replication (p = 0.0765) and pyrimidine metabolism (p = 0.086).The prognostic prediction of MamaPred was validated on the independent retrospective series, distant disease-free survival for HR and LR being 88.63% (95% IC: 78.72%-99.78%) and 98.1% (95% IC: 95.6%-100%) respectively (p = 0.00603). Correlation between the probabilities assigned to any given sample and its replicas was extremely high (r > 0.9 p < 1e-5). Conclusions: MamaPred identifies HR+/HER2- early-stage BC patients with high-risk of distant relapse improving the prognostic value of those studies that compare MP and OT, suggesting a more precise risk classification.
Collapse
|
14
|
Genome-scale mechanistic modeling of signaling pathways made easy: A bioconductor/cytoscape/web server framework for the analysis of omic data. Comput Struct Biotechnol J 2021; 19:2968-2978. [PMID: 34136096 PMCID: PMC8170118 DOI: 10.1016/j.csbj.2021.05.022] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2021] [Revised: 04/21/2021] [Accepted: 05/11/2021] [Indexed: 12/13/2022] Open
Abstract
Genome-scale mechanistic models of pathways are gaining importance for genomic data interpretation because they provide a natural link between genotype measurements (transcriptomics or genomics data) and the phenotype of the cell (its functional behavior). Moreover, mechanistic models can be used to predict the potential effect of interventions, including drug inhibitions. Here, we present the implementation of a mechanistic model of cell signaling for the interpretation of transcriptomic data as an R/Bioconductor package, a Cytoscape plugin and a web tool with enhanced functionality which includes building interpretable predictors, estimation of the effect of perturbations and assessment of the effect of mutations in complex scenarios.
Collapse
|
15
|
Abstract
Here we present a web interface that implements a comprehensive mechanistic model of the SARS-CoV-2 disease map. In this framework, the detailed activity of the human signaling circuits related to the viral infection, covering from the entry and replication mechanisms to the downstream consequences as inflammation and antigenic response, can be inferred from gene expression experiments. Moreover, the effect of potential interventions, such as knock-downs, or drug effects (currently the system models the effect of more than 8000 DrugBank drugs) can be studied. This freely available tool not only provides an unprecedentedly detailed view of the mechanisms of viral invasion and the consequences in the cell but has also the potential of becoming an invaluable asset in the search for efficient antiviral treatments.
Collapse
|
16
|
CSVS, a crowdsourcing database of the Spanish population genetic variability. Nucleic Acids Res 2021; 49:D1130-D1137. [PMID: 32990755 PMCID: PMC7778906 DOI: 10.1093/nar/gkaa794] [Citation(s) in RCA: 26] [Impact Index Per Article: 8.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/13/2020] [Revised: 09/08/2020] [Accepted: 09/10/2020] [Indexed: 01/01/2023] Open
Abstract
The knowledge of the genetic variability of the local population is of utmost importance in personalized medicine and has been revealed as a critical factor for the discovery of new disease variants. Here, we present the Collaborative Spanish Variability Server (CSVS), which currently contains more than 2000 genomes and exomes of unrelated Spanish individuals. This database has been generated in a collaborative crowdsourcing effort collecting sequencing data produced by local genomic projects and for other purposes. Sequences have been grouped by ICD10 upper categories. A web interface allows querying the database removing one or more ICD10 categories. In this way, aggregated counts of allele frequencies of the pseudo-control Spanish population can be obtained for diseases belonging to the category removed. Interestingly, in addition to pseudo-control studies, some population studies can be made, as, for example, prevalence of pharmacogenomic variants, etc. In addition, this genomic data has been used to define the first Spanish Genome Reference Panel (SGRP1.0) for imputation. This is the first local repository of variability entirely produced by a crowdsourcing effort and constitutes an example for future initiatives to characterize local variability worldwide. CSVS is also part of the GA4GH Beacon network. CSVS can be accessed at: http://csvs.babelomics.org/.
Collapse
|
17
|
SMN1 copy-number and sequence variant analysis from next-generation sequencing data. Hum Mutat 2020; 41:2073-2077. [PMID: 33058415 PMCID: PMC7756735 DOI: 10.1002/humu.24120] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2020] [Revised: 08/17/2020] [Accepted: 09/17/2020] [Indexed: 12/31/2022]
Abstract
Spinal muscular atrophy (SMA) is a severe neuromuscular autosomal recessive disorder affecting 1/10,000 live births. Most SMA patients present homozygous deletion of SMN1, while the vast majority of SMA carriers present only a single SMN1 copy. The sequence similarity between SMN1 and SMN2, and the complexity of the SMN locus makes the estimation of the SMN1 copy-number by next-generation sequencing (NGS) very difficult. Here, we present SMAca, the first python tool to detect SMA carriers and estimate the absolute SMN1 copy-number using NGS data. Moreover, SMAca takes advantage of the knowledge of certain variants specific to SMN1 duplication to also identify silent carriers. This tool has been validated with a cohort of 326 samples from the Navarra 1000 Genomes Project (NAGEN1000). SMAca was developed with a focus on execution speed and easy installation. This combination makes it especially suitable to be integrated into production NGS pipelines. Source code and documentation are available at https://www.github.com/babelomics/SMAca.
Collapse
|
18
|
Mechanistic models of signaling pathways deconvolute the glioblastoma single-cell functional landscape. NAR Cancer 2020; 2:zcaa011. [PMID: 34316686 PMCID: PMC8210212 DOI: 10.1093/narcan/zcaa011] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2020] [Revised: 06/08/2020] [Accepted: 06/11/2020] [Indexed: 02/07/2023] Open
Abstract
Single-cell RNA sequencing is revealing an unexpectedly large degree of heterogeneity in gene expression levels across cell populations. However, little is known on the functional consequences of this heterogeneity and the contribution of individual cell fate decisions to the collective behavior of the tissues these cells are part of. Here, we use mechanistic modeling of signaling circuits, which reveals a complex functional landscape at single-cell level. Different clusters of neoplastic glioblastoma cells have been defined according to their differences in signaling circuit activity profiles triggering specific cancer hallmarks, which suggest different functional strategies with distinct degrees of aggressiveness. Moreover, mechanistic modeling of effects of targeted drug inhibitions at single-cell level revealed, how in some cells, the substitution of VEGFA, the target of bevacizumab, by other expressed proteins, like PDGFD, KITLG and FGF2, keeps the VEGF pathway active, insensitive to the VEGFA inhibition by the drug. Here, we describe for the first time mechanisms that individual cells use to avoid the effect of a targeted therapy, providing an explanation for the innate resistance to the treatment displayed by some cells. Our results suggest that mechanistic modeling could become an important asset for the definition of personalized therapeutic interventions.
Collapse
|
19
|
Antibiotic resistance and metabolic profiles as functional biomarkers that accurately predict the geographic origin of city metagenomics samples. Biol Direct 2019; 14:15. [PMID: 31429791 PMCID: PMC6701120 DOI: 10.1186/s13062-019-0246-9] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2018] [Accepted: 08/06/2019] [Indexed: 12/16/2022] Open
Abstract
BACKGROUND The availability of hundreds of city microbiome profiles allows the development of increasingly accurate predictors of the origin of a sample based on its microbiota composition. Typical microbiome studies involve the analysis of bacterial abundance profiles. RESULTS Here we use a transformation of the conventional bacterial strain or gene abundance profiles to functional profiles that account for bacterial metabolism and other cell functionalities. These profiles are used as features for city classification in a machine learning algorithm that allows the extraction of the most relevant features for the classification. CONCLUSIONS We demonstrate here that the use of functional profiles not only predict accurately the most likely origin of a sample but also to provide an interesting functional point of view of the biogeography of the microbiota. Interestingly, we show how cities can be classified based on the observed profile of antibiotic resistances. REVIEWERS Open peer review: Reviewed by Jin Zhuang Dou, Jing Zhou, Torsten Semmler and Eran Elhaik.
Collapse
|
20
|
Exploring the druggable space around the Fanconi anemia pathway using machine learning and mechanistic models. BMC Bioinformatics 2019; 20:370. [PMID: 31266445 PMCID: PMC6604281 DOI: 10.1186/s12859-019-2969-0] [Citation(s) in RCA: 20] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2019] [Accepted: 06/25/2019] [Indexed: 02/07/2023] Open
Abstract
BACKGROUND In spite of the abundance of genomic data, predictive models that describe phenotypes as a function of gene expression or mutations are difficult to obtain because they are affected by the curse of dimensionality, given the disbalance between samples and candidate genes. And this is especially dramatic in scenarios in which the availability of samples is difficult, such as the case of rare diseases. RESULTS The application of multi-output regression machine learning methodologies to predict the potential effect of external proteins over the signaling circuits that trigger Fanconi anemia related cell functionalities, inferred with a mechanistic model, allowed us to detect over 20 potential therapeutic targets. CONCLUSIONS The use of artificial intelligence methods for the prediction of potentially causal relationships between proteins of interest and cell activities related with disease-related phenotypes opens promising avenues for the systematic search of new targets in rare diseases.
Collapse
|