1
|
Castañeda-Partida L, Ocadiz-Delgado R, Sánchez-López JM, García-Villa E, Peñaloza-González JG, Velázquez-Aviña MM, Torres-Nava JR, Martín-Trejo JA, Solís-Labastida K, Guerra-Castillo FX, Bekker-Méndez VC, Rosales-García VH, Romero-Rodríguez D, Mojica-Espinoza R, Mendez-Tenorio A, Ramírez-Calzada CA, Álvarez-Ríos E, Mejía-Aranguré JM, Gariglio P. Global expression profiling of CD10 + /CD19 + pre-B lymphoblasts from Hispanic B-ALL patients correlates with comparative TARGET database analysis. Discov Oncol 2022; 13:28. [PMID: 35445848 PMCID: PMC9023642 DOI: 10.1007/s12672-022-00480-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 12/05/2021] [Accepted: 03/16/2022] [Indexed: 11/29/2022] Open
Abstract
Mexico City has one of the highest incidences of acute lymphoblastic leukemia (ALL) globally, with patients showing low survival, and high relapse rates. To gain more insight into the molecular features of B-ALL in Mexican children, we isolated CD10 + /CD19 + precursor B lymphoblasts from four bone marrow and nine peripheral blood samples of B-ALL patients using a fluorescence-activated cell sorting protocol. The global gene expression profile (BM vs PB) revealed 136 differentially expressed genes; 62 were upregulated (45.6%) and 74 were downregulated (54.4%). Pearson's correlation coefficient was calculated to determine the similarity between pre-B lymphoblast populations. We selected 26 highly significant genes and validated 21 by RT-qPCR (CNN3, STON2, CALN1, RUNX2, GADD45A, CDC45, CDC20, PLK1, AIDA, HCK, LY86, GPR65, PIK3CG, LILRB2, IL7R, TCL1A, DOCK1, HIST1H3G, PTPN14, CD72, and NT5E). The gene set enrichment analysis of the total expression matrix and the ingenuity pathway analysis of the 136 differentially expressed genes showed that the cell cycle was altered in the bone marrow with four overexpressed genes (PLK1, CDC20, CDC45, and GADD45A) and a low expression of IL7R and PIK3CG, which are involved in B cell differentiation. A comparative bioinformatics analysis of 15 bone marrow and 10 peripheral blood samples from Hispanic B-ALL patients collected by the TARGET program, corroborated the genes observed, except for PIK3CG. We conclude the Mexican and the Hispanic B-ALL patients studied present common driver alterations and histotype-specific mutations that could facilitate risk stratification and diagnostic accuracy and serve as potential therapeutic targets.
Collapse
Affiliation(s)
- Laura Castañeda-Partida
- Laboratorio de Genética Toxicológica, Biología. Facultad de Estudios Profesionales Iztacala (FESI), Universidad Nacional Autónoma de México (UNAM), Tlalnepantla, Estado de México, Mexico
| | - Rodolfo Ocadiz-Delgado
- Laboratorio de Oncología Molecular, Departamento de Genética y Biología Molecular. Centro de Investigación y de Estudios Avanzados (Cinvestav), Ciudad de México, Mexico
| | | | - Enrique García-Villa
- Laboratorio de Oncología Molecular, Departamento de Genética y Biología Molecular. Centro de Investigación y de Estudios Avanzados (Cinvestav), Ciudad de México, Mexico
| | | | | | | | - Jorge Alfonso Martín-Trejo
- Servicio de Hematología, Hospital de Pediatría. Centro Médico Nacional (CMN), "Siglo XXI" , Instituto Mexicano del Seguro Social (IMSS), Mexico City, Mexico
| | - Karina Solís-Labastida
- Servicio de Hematología, Hospital de Pediatría. Centro Médico Nacional (CMN), "Siglo XXI" , Instituto Mexicano del Seguro Social (IMSS), Mexico City, Mexico
| | - Francisco Xavier Guerra-Castillo
- Unidad de Investigación Médica en Inmunología e Infectología, Hospital de Infectología ''Dr. Daniel Mendez Hernández'', ''La Raza'', IMSS, Mexico City, Mexico
| | - Vilma Carolina Bekker-Méndez
- Unidad de Investigación Médica en Inmunología e Infectología, Hospital de Infectología ''Dr. Daniel Mendez Hernández'', ''La Raza'', IMSS, Mexico City, Mexico
| | - Víctor Hugo Rosales-García
- Laboratorio de Citometría de Flujo, Laboratorios Nacionales de Servicios Experimentales, Centro de Investigación y de Estudios Avanzados (Cinvestav), Mexico City, Mexico
| | - Dámaris Romero-Rodríguez
- Unidad de Citometría, Instituto Nacional de Enfermedades Respiratorias (INER), Mexico City, Mexico
| | - Raúl Mojica-Espinoza
- Unidad de Genotipificación y Análisis de Expresión, Instituto Nacional de Medicina Genómica (INMEGEN), Mexico City, Mexico
| | - Alfonso Mendez-Tenorio
- Laboratorio Biotecnología y Bioinformática Genómica, Departamento de Bioquímica. Escuela Nacional de Ciencias Biológicas, Instituto Politécnico Nacional. MX, Mexico City, Mexico
| | - Crystel A Ramírez-Calzada
- Laboratorio Biotecnología y Bioinformática Genómica, Departamento de Bioquímica. Escuela Nacional de Ciencias Biológicas, Instituto Politécnico Nacional. MX, Mexico City, Mexico
| | - Elízabeth Álvarez-Ríos
- Laboratorio de Oncología Molecular, Departamento de Genética y Biología Molecular. Centro de Investigación y de Estudios Avanzados (Cinvestav), Ciudad de México, Mexico
| | - Juan Manuel Mejía-Aranguré
- Unidad de Investigación Médica en Epidemiología Clínica, UMAE Hospital de Pediatría. Centro Medico Nacional (CMN) ''Siglo XXI'', Instituto Mexicano del Seguro Social (IMSS), Mexico City, Mexico
- Coordinación de Investigación en Salud, Instituto Mexicano del Seguro Social (IMSS), Mexico City, Mexico
- Facultad de Medicina, Universidad Nacional Autónoma de México, Mexico City, Mexico
- Laboratorio de Genómica del Cáncer, Instituto Nacional de Medicina Genómica, Mexico City, Mexico
| | - Patricio Gariglio
- Laboratorio de Oncología Molecular, Departamento de Genética y Biología Molecular. Centro de Investigación y de Estudios Avanzados (Cinvestav), Ciudad de México, Mexico.
| |
Collapse
|
2
|
Prajapati B, Fatima M, Fatma M, Maddhesiya P, Arora H, Naskar T, Devasenapathy S, Seth P, Sinha S. Temporal transcriptome analysis of neuronal commitment reveals the preeminent role of the divergent lncRNA biotype and a critical candidate gene during differentiation. Cell Death Discov 2020; 6:28. [PMID: 32351715 PMCID: PMC7181654 DOI: 10.1038/s41420-020-0263-6] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2019] [Revised: 03/19/2020] [Accepted: 04/02/2020] [Indexed: 02/08/2023] Open
Abstract
lncRNA genes can be genic or "intergenic". "Genic" RNAs can be further divided into six biotypes. Through genome-wide analysis of a publicly available data set on corticogenesis, we found that the divergent lncRNA (XH) biotype, comprising the lncRNA and the coding gene being in opposite directions in a head-to-head manner, was most prominent during neural commitment. Within this biotype, a coding gene/divergent RNA pair of the BASP1 gene and the uncharacterized RNA loc285696 (hitherto referred as BASP1-AS1) formed a major HUB gene during neuronal differentiation. Experimental validation during the in vitro differentiation of human neural progenitor cells (hNPCs) showed that BASP1-AS1 regulates the expression of its adjacent coding gene, BASP1. Both transcripts increased sharply on the first day of neuronal differentiation of hNPCs, to fall steadily thereafter, reaching very low levels in differentiated neurons. BASP1-AS1 RNA and the BASP1 gene formed a molecular complex that also included the transcription factor TCF12. TCF12 is coded by the DYX1 locus, associated with inherited dyslexia and neurodevelopmental defects. Knockdown of BASP1-AS1, BASP1, or TCF12 impaired the neuronal differentiation of hNPCs, as seen by reduction in DCX and TUJ1-positive cells and by reduced neurite length. There was also increased cell proliferation. A common set of critical genes was affected by the three molecules in the complex. Our study thus identified the role of the XH biotype and a novel mediator of neuronal differentiation-the complex of BASP1-AS1, BASP1, and TCF12. It also linked a neuronal differentiation pathway to inherited dyslexia.
Collapse
Affiliation(s)
| | - Mahar Fatima
- National Brain Research Centre, Manesar, Gurgaon, Haryana India
| | - Mena Fatma
- National Brain Research Centre, Manesar, Gurgaon, Haryana India
| | | | - Himali Arora
- National Brain Research Centre, Manesar, Gurgaon, Haryana India
| | - Teesta Naskar
- National Brain Research Centre, Manesar, Gurgaon, Haryana India
| | | | - Pankaj Seth
- National Brain Research Centre, Manesar, Gurgaon, Haryana India
| | - Subrata Sinha
- National Brain Research Centre, Manesar, Gurgaon, Haryana India
- Department of Biochemistry, All India Institute of Medical Sciences, New Delhi, 110029 India
| |
Collapse
|
3
|
Ramos PIP, Arge LWP, Lima NCB, Fukutani KF, de Queiroz ATL. Leveraging User-Friendly Network Approaches to Extract Knowledge From High-Throughput Omics Datasets. Front Genet 2019; 10:1120. [PMID: 31798629 PMCID: PMC6863976 DOI: 10.3389/fgene.2019.01120] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2019] [Accepted: 10/16/2019] [Indexed: 11/13/2022] Open
Abstract
Recent technological advances for the acquisition of multi-omics data have allowed an unprecedented understanding of the complex intricacies of biological systems. In parallel, a myriad of computational analysis techniques and bioinformatics tools have been developed, with many efforts directed towards the creation and interpretation of networks from this data. In this review, we begin by examining key network concepts and terminology. Then, computational tools that allow for their construction and analysis from high-throughput omics datasets are presented. We focus on the study of functional relationships such as co-expression, protein-protein interactions, and regulatory interactions that are particularly amenable to modeling using the framework of networks. We envisage that many potential users of these analytical strategies may not be completely literate in programming languages and code adaptation, and for this reason, emphasis is given to tools' user-friendliness, including plugins for the widely adopted Cytoscape software, an open-source, cross-platform tool for network analysis, visualization, and data integration.
Collapse
Affiliation(s)
- Pablo Ivan Pereira Ramos
- Center for Data and Knowledge Integration for Health (CIDACS), Instituto Gonçalo Moniz, Fundação Oswaldo Cruz, Salvador, Brazil
| | - Luis Willian Pacheco Arge
- Laboratório de Genética Molecular e Biotecnologia Vegetal, Centro de Ciências da Saúde, Universidade Federal do Rio de Janeiro, Rio de Janeiro, Brazil
| | | | - Kiyoshi F. Fukutani
- Multinational Organization Network Sponsoring Translational and Epidemiological Research (MONSTER) Initiative, Fundação José Silveira, Salvador, Brazil
| | - Artur Trancoso L. de Queiroz
- Center for Data and Knowledge Integration for Health (CIDACS), Instituto Gonçalo Moniz, Fundação Oswaldo Cruz, Salvador, Brazil
| |
Collapse
|
4
|
Wei H, Lou Q, Xu K, Yan M, Xia H, Ma X, Yu X, Luo L. Alternative splicing complexity contributes to genetic improvement of drought resistance in the rice maintainer HuHan2B. Sci Rep 2017; 7:11686. [PMID: 28916800 PMCID: PMC5601427 DOI: 10.1038/s41598-017-12020-3] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2017] [Accepted: 09/01/2017] [Indexed: 12/19/2022] Open
Abstract
Water-saving and drought-resistantce rice (WDR) breeding practices have greatly increased grain yield and drought resistance. To study the genetic basis of adaptation to drought, transcriptome sequences from the WDR maintainer line HuHan2B and the recurrent parent HanFengB were analyzed for alternative splicing (AS) complexity. Intron retention, the dominant AS type, accounted for 42% of the observed AS events. Differential expression analysis revealed transcripts were preferentially expressed in different varieties and conditions. Based on gene ontology predictions, the biological functions of drought-induced transcripts were significantly enriched in genes involved in transcription regulation, chloroplast components and response to abiotic stimulus in HuHan2B, whereas developmental processes for reproduction were primarily enriched in HanFengB. The regulatory network of transcription factors was driven by cohorts of transcript splicing targets, resulting in more diversified regulatory relationships due to AS complexity than in our previous findings. Moreover, several genes were validated to accumulate novel splicing transcripts in a drought-induced manner. Together, these results suggest that HuHan2B and HanFengB share similar AS features but that a subset of genes with increased levels of AS involved in transcription regulatory networks may contribute an additional level of control for genetic improvement of drought resistance in rice maintainer HuHan2B through breeding.
Collapse
Affiliation(s)
- Haibin Wei
- Shanghai Agrobiological Gene Center, Shanghai, 201106, China
| | - Qiaojun Lou
- Shanghai Agrobiological Gene Center, Shanghai, 201106, China
| | - Kai Xu
- Shanghai Agrobiological Gene Center, Shanghai, 201106, China
| | - Ming Yan
- Shanghai Agrobiological Gene Center, Shanghai, 201106, China
| | - Hui Xia
- Shanghai Agrobiological Gene Center, Shanghai, 201106, China
| | - Xiaosong Ma
- Shanghai Agrobiological Gene Center, Shanghai, 201106, China
| | | | - Lijun Luo
- Shanghai Agrobiological Gene Center, Shanghai, 201106, China.
| |
Collapse
|
5
|
Shi R, Wang JP, Lin YC, Li Q, Sun YH, Chen H, Sederoff RR, Chiang VL. Tissue and cell-type co-expression networks of transcription factors and wood component genes in Populus trichocarpa. Planta 2017; 245:927-938. [PMID: 28083709 DOI: 10.1007/s00425-016-2640-1] [Citation(s) in RCA: 29] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/18/2016] [Accepted: 12/09/2016] [Indexed: 05/21/2023]
Abstract
Co-expression networks based on transcriptomes of Populus trichocarpa major tissues and specific cell types suggest redundant control of cell wall component biosynthetic genes by transcription factors in wood formation. We analyzed the transcriptomes of five tissues (xylem, phloem, shoot, leaf, and root) and two wood forming cell types (fiber and vessel) of Populus trichocarpa to assemble gene co-expression subnetworks associated with wood formation. We identified 165 transcription factors (TFs) that showed xylem-, fiber-, and vessel-specific expression. Of these 165 TFs, 101 co-expressed (correlation coefficient, r > 0.7) with the 45 secondary cell wall cellulose, hemicellulose, and lignin biosynthetic genes. Each cell wall component gene co-expressed on average with 34 TFs, suggesting redundant control of the cell wall component gene expression. Co-expression analysis showed that the 101 TFs and the 45 cell wall component genes each has two distinct groups (groups 1 and 2), based on their co-expression patterns. The group 1 TFs (44 members) are predominantly xylem and fiber specific, and are all highly positively co-expressed with the group 1 cell wall component genes (30 members), suggesting their roles as major wood formation regulators. Group 1 TFs include a lateral organ boundary domain gene (LBD) that has the highest number of positively correlated cell wall component genes (36) and TFs (47). The group 2 TFs have 57 members, including 14 vessel-specific TFs, and are generally less correlated with the cell wall component genes. An exception is a vessel-specific basic helix-loop-helix (bHLH) gene that negatively correlates with 20 cell wall component genes, and may function as a key transcriptional suppressor. The co-expression networks revealed here suggest a well-structured transcriptional homeostasis for cell wall component biosynthesis during wood formation.
Collapse
Affiliation(s)
- Rui Shi
- Forest Biotechnology Group, Department of Forestry and Environmental Resources, North Carolina State University, Raleigh, NC, 27695, USA
- Mountain Horticultural Crops Research and Extension Center, Department of Horticulture, North Carolina State University, Mills River, NC, 28759, USA
| | - Jack P Wang
- Forest Biotechnology Group, Department of Forestry and Environmental Resources, North Carolina State University, Raleigh, NC, 27695, USA
- State Key Laboratory of Tree Genetics and Breeding, Northeast Forestry University, Harbin, 150040, China
| | - Ying-Chung Lin
- Forest Biotechnology Group, Department of Forestry and Environmental Resources, North Carolina State University, Raleigh, NC, 27695, USA
- State Key Laboratory of Tree Genetics and Breeding, Northeast Forestry University, Harbin, 150040, China
- Department of Life Sciences, College of Life Science, National Taiwan University, Taipei, 10617, Taiwan
| | - Quanzi Li
- State Key Laboratory of Tree Genetics and Breeding, Chinese Academy of Forestry, Beijing, 100091, China
| | - Ying-Hsuan Sun
- Department of Forestry, National Chung Hsing University, Taichung, 40227, Taiwan
| | - Hao Chen
- Forest Biotechnology Group, Department of Forestry and Environmental Resources, North Carolina State University, Raleigh, NC, 27695, USA
| | - Ronald R Sederoff
- Forest Biotechnology Group, Department of Forestry and Environmental Resources, North Carolina State University, Raleigh, NC, 27695, USA.
| | - Vincent L Chiang
- Forest Biotechnology Group, Department of Forestry and Environmental Resources, North Carolina State University, Raleigh, NC, 27695, USA.
- State Key Laboratory of Tree Genetics and Breeding, Northeast Forestry University, Harbin, 150040, China.
- Department of Forest Biomaterials, North Carolina State University, Raleigh, NC, 27695, USA.
| |
Collapse
|
6
|
Hill SM, Heiser LM, Cokelaer T, Unger M, Nesser NK, Carlin DE, Zhang Y, Sokolov A, Paull EO, Wong CK, Graim K, Bivol A, Wang H, Zhu F, Afsari B, Danilova LV, Favorov AV, Lee WS, Taylor D, Hu CW, Long BL, Noren DP, Bisberg AJ, Mills GB, Gray JW, Kellen M, Norman T, Friend S, Qutub AA, Fertig EJ, Guan Y, Song M, Stuart JM, Spellman PT, Koeppl H, Stolovitzky G, Saez-Rodriguez J, Mukherjee S; HPN-DREAM Consortium. Inferring causal molecular networks: empirical assessment through a community-based effort. Nat Methods 2016; 13:310-8. [PMID: 26901648 DOI: 10.1038/nmeth.3773] [Citation(s) in RCA: 174] [Impact Index Per Article: 21.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/26/2015] [Accepted: 01/21/2016] [Indexed: 01/08/2023]
Abstract
The HPN-DREAM community challenge assessed the ability of computational methods to infer causal molecular networks, focusing specifically on the task of inferring causal protein signaling networks in cancer cell lines. It remains unclear whether causal, rather than merely correlational, relationships in molecular networks can be inferred in complex biological settings. Here we describe the HPN-DREAM network inference challenge, which focused on learning causal influences in signaling networks. We used phosphoprotein data from cancer cell lines as well as in silico data from a nonlinear dynamical model. Using the phosphoprotein data, we scored more than 2,000 networks submitted by challenge participants. The networks spanned 32 biological contexts and were scored in terms of causal validity with respect to unseen interventional data. A number of approaches were effective, and incorporating known biology was generally advantageous. Additional sub-challenges considered time-course prediction and visualization. Our results suggest that learning causal relationships may be feasible in complex settings such as disease states. Furthermore, our scoring approach provides a practical way to empirically assess inferred molecular networks in a causal sense.
Collapse
|
7
|
Avilés-Jiménez F, Guitron A, Segura-López F, Méndez-Tenorio A, Iwai S, Hernández-Guerrero A, Torres J. Microbiota studies in the bile duct strongly suggest a role for Helicobacter pylori in extrahepatic cholangiocarcinoma. Clin Microbiol Infect 2015; 22:178.e11-178.e22. [PMID: 26493848 DOI: 10.1016/j.cmi.2015.10.008] [Citation(s) in RCA: 52] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/27/2015] [Revised: 10/01/2015] [Accepted: 10/05/2015] [Indexed: 02/07/2023]
Abstract
Biliary tract cancer or extrahepatic cholangiocarcinoma (ECCA) represents the sixth commonest cause of cancer in the gastrointestinal tract in western countries. We aimed to characterize the microbiota and its predicted associated functions in the biliary tract of ECCA and benign biliary pathology (BBP). Samples were taken from 100 patients with ECCA and 100 patients with BBP by endoscopic cholangio-pancreatography for DNA extraction. Ten patients with ECCA and ten with BBP were selected for microbiota studies using the V4-16S rRNA gene and sequenced in Illumina platform. Microbiota analyses included sample-to-sample distance metrics, ordination/clustering and prediction of functions. Presence of Nesterenkonia sp. and Helicobacter pylori cagA and vacA genes were tested in the 100 ECCA and 100 BBP samples. Phylum Proteobacteria dominated all samples (60.4% average). Ordination multicomponent analyses showed significant microbiota separation between ECCA and BBP (p 0.010). Analyses of 4002 operational taxonomic units with presence variation in at least one category probed a separation of ECCA from BBP. Among these, Nesterenkonia decreased, whereas Methylophilaceae, Fusobacterium, Prevotella, Actinomyces, Novosphingobium and H. pylori increased in ECCA. Predicted associated functions showed increased abundance of H. pylori virulence genes in ECCA. cagA and vacA genes were confirmed by PCR in ECCA and BBP samples. This is the first microbiota report in ECCA and BBP to show significant changes in microbial composition. Bacterial species unusual for human flora were found: Methylophilaceae and Nesterenkonia are reported in hypersaline soils, and Mesorhizobium is a nitrogen-fixing bacterium. Enrichment of virulence genes confirms previous studies suggesting that H. pylori might be associated with ECCA.
Collapse
Affiliation(s)
- F Avilés-Jiménez
- Unidad de Investigación en Enfermedades Infecciosas, UMAE Pediatría, IMSS, Mexico
| | - A Guitron
- Departamento de Endoscopía Digestiva, UMAE 71 IMSS, Coahuila, Mexico
| | - F Segura-López
- Departamento de Anestesiología, UMAE 71 IMSS, Coahuila, Mexico
| | - A Méndez-Tenorio
- Laboratorio de Biotecnología y Bioinformática Genómica, ENCB, Instituto Politécnico Nacional, Mexico
| | - S Iwai
- Second Genome, South San Francisco, CA, USA
| | | | - J Torres
- Unidad de Investigación en Enfermedades Infecciosas, UMAE Pediatría, IMSS, Mexico.
| |
Collapse
|
8
|
Folch-Fortuny A, Villaverde AF, Ferrer A, Banga JR. Enabling network inference methods to handle missing data and outliers. BMC Bioinformatics 2015; 16:283. [PMID: 26335628 PMCID: PMC4559359 DOI: 10.1186/s12859-015-0717-7] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2015] [Accepted: 08/24/2015] [Indexed: 12/20/2022] Open
Abstract
Background The inference of complex networks from data is a challenging problem in biological sciences, as well as in a wide range of disciplines such as chemistry, technology, economics, or sociology. The quantity and quality of the data greatly affect the results. While many methodologies have been developed for this task, they seldom take into account issues such as missing data or outlier detection and correction, which need to be properly addressed before network inference. Results Here we present an approach to (i) handle missing data and (ii) detect and correct outliers based on multivariate projection to latent structures. The method, called trimmed scores regression (TSR), enables network inference methods to analyse incomplete datasets by imputing the missing values coherently with the latent data structure. Furthermore, it substitutes the faulty values in a dataset by proper estimations. We provide an implementation of this approach, and show how it can be integrated with any network inference method as a preliminary data curation step. This functionality is demonstrated with a state of the art network inference method based on mutual information distance and entropy reduction, MIDER. Conclusion The methodology presented here enables network inference methods to analyse a large number of incomplete and faulty datasets that could not be reliably analysed so far. Our comparative studies show the superiority of TSR over other missing data approaches used by practitioners. Furthermore, the method allows for outlier detection and correction. Electronic supplementary material The online version of this article (doi:10.1186/s12859-015-0717-7) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Abel Folch-Fortuny
- Departamento de Estadística e Investigación Operativa Aplicadas y Calidad, Universitat Politècnica de València, Camino de Vera s/n, Valencia, 46022, Spain.
| | - Alejandro F Villaverde
- BioProcess Engineering Group, IIM-CSIC, Eduardo Cabello 6, Vigo, 36208, Spain.,Centre of Biological Engineering, Universidade do Minho, Campus de Gualtar, Braga, 4710-057, Portugal.,Department of Systems and Control Engineering, Universidade de Vigo, Rua Maxwell, Vigo, 36310, Spain
| | - Alberto Ferrer
- Departamento de Estadística e Investigación Operativa Aplicadas y Calidad, Universitat Politècnica de València, Camino de Vera s/n, Valencia, 46022, Spain
| | - Julio R Banga
- BioProcess Engineering Group, IIM-CSIC, Eduardo Cabello 6, Vigo, 36208, Spain
| |
Collapse
|