1
|
Tadi AA, Alhadidi D, Rueda L. PPPCT: Privacy-Preserving framework for Parallel Clustering Transcriptomics data. Comput Biol Med 2024; 173:108351. [PMID: 38520921 DOI: 10.1016/j.compbiomed.2024.108351] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2023] [Revised: 03/18/2024] [Accepted: 03/18/2024] [Indexed: 03/25/2024]
Abstract
Single-cell transcriptomics data provides crucial insights into patients' health, yet poses significant privacy concerns. Genomic data privacy attacks can have deep implications, encompassing not only the patients' health information but also extending widely to compromise their families'. Moreover, the permanence of leaked data exacerbates the challenges, making retraction an impossibility. While extensive efforts have been directed towards clustering single-cell transcriptomics data, addressing critical challenges, especially in the realm of privacy, remains pivotal. This paper introduces an efficient, fast, privacy-preserving approach for clustering single-cell RNA-sequencing (scRNA-seq) datasets. The key contributions include ensuring data privacy, achieving high-quality clustering, accommodating the high dimensionality inherent in the datasets, and maintaining reasonable computation time for big-scale datasets. Our proposed approach utilizes the map-reduce scheme to parallelize clustering, addressing intensive calculation challenges. Intel Software Guard eXtension (SGX) processors are used to ensure the security of sensitive code and data during processing. Additionally, the approach incorporates a logarithm transformation as a preprocessing step, employs non-negative matrix factorization for dimensionality reduction, and utilizes parallel k-means for clustering. The approach fully leverages the computing capabilities of all processing resources within a secure private cloud environment. Experimental results demonstrate the efficacy of our approach in preserving patient privacy while surpassing state-of-the-art methods in both clustering quality and computation time. Our method consistently achieves a minimum of 7% higher Adjusted Rand Index (ARI) than existing approaches, contingent on dataset size. Additionally, due to parallel computations and dimensionality reduction, our approach exhibits efficiency, converging to very good results in less than 10 seconds for a scRNA-seq dataset with 5000 genes and 6000 cells when prioritizing privacy and under two seconds without privacy considerations. Availability and implementation Code and datasets availability: https://github.com/University-of-Windsor/PPPCT.
Collapse
Affiliation(s)
- Ali Abbasi Tadi
- University of Windsor, 401 Sunset Ave, Windsor, N9B 3P4, Ontario, Canada.
| | - Dima Alhadidi
- University of Windsor, 401 Sunset Ave, Windsor, N9B 3P4, Ontario, Canada
| | - Luis Rueda
- University of Windsor, 401 Sunset Ave, Windsor, N9B 3P4, Ontario, Canada
| |
Collapse
|
2
|
Vasighizaker A, Hora S, Zeng R, Rueda L. SEGCECO: Subgraph Embedding of Gene expression matrix for prediction of CEll-cell COmmunication. Brief Bioinform 2024; 25:bbae160. [PMID: 38605638 PMCID: PMC11009470 DOI: 10.1093/bib/bbae160] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2023] [Revised: 01/30/2024] [Accepted: 03/13/2024] [Indexed: 04/13/2024] Open
Abstract
Recent advances in single-cell RNA sequencing technology have eased analyses of signaling networks of cells. Recently, cell-cell interaction has been studied based on various link prediction approaches on graph-structured data. These approaches have assumptions about the likelihood of node interaction, thus showing high performance for only some specific networks. Subgraph-based methods have solved this problem and outperformed other approaches by extracting local subgraphs from a given network. In this work, we present a novel method, called Subgraph Embedding of Gene expression matrix for prediction of CEll-cell COmmunication (SEGCECO), which uses an attributed graph convolutional neural network to predict cell-cell communication from single-cell RNA-seq data. SEGCECO captures the latent and explicit attributes of undirected, attributed graphs constructed from the gene expression profile of individual cells. High-dimensional and sparse single-cell RNA-seq data make converting the data into a graphical format a daunting task. We successfully overcome this limitation by applying SoptSC, a similarity-based optimization method in which the cell-cell communication network is built using a cell-cell similarity matrix which is learned from gene expression data. We performed experiments on six datasets extracted from the human and mouse pancreas tissue. Our comparative analysis shows that SEGCECO outperforms latent feature-based approaches, and the state-of-the-art method for link prediction, WLNM, with 0.99 ROC and 99% prediction accuracy. The datasets can be found at https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE84133 and the code is publicly available at Github https://github.com/sheenahora/SEGCECO and Code Ocean https://codeocean.com/capsule/8244724/tree.
Collapse
Affiliation(s)
| | - Sheena Hora
- Software Development Department, Amazon, USA
| | - Raymond Zeng
- School of Computer Science, University of Windsor, Windsor, ON, Canada
| | - Luis Rueda
- School of Computer Science, University of Windsor, Windsor, ON, Canada
| |
Collapse
|
3
|
Yagin FH, Alkhateeb A, Colak C, Azzeh M, Yagin B, Rueda L. A Fecal-Microbial-Extracellular-Vesicles-Based Metabolomics Machine Learning Framework and Biomarker Discovery for Predicting Colorectal Cancer Patients. Metabolites 2023; 13:metabo13050589. [PMID: 37233630 DOI: 10.3390/metabo13050589] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2023] [Revised: 04/20/2023] [Accepted: 04/21/2023] [Indexed: 05/27/2023] Open
Abstract
Colorectal cancer (CRC) is one of the most common and lethal diseases among all types of cancer, and metabolites play a significant role in the development of this complex disease. This study aimed to identify potential biomarkers and targets in the diagnosis and treatment of CRC using high-throughput metabolomics. Metabolite data extracted from the feces of CRC patients and healthy volunteers were normalized with the median normalization and Pareto scale for multivariate analysis. Univariate ROC analysis, the t-test, and analysis of fold changes (FCs) were applied to identify biomarker candidate metabolites in CRC patients. Only metabolites that overlapped the two different statistical approaches (false-discovery-rate-corrected p-value < 0.05 and AUC > 0.70) were considered in the further analysis. Multivariate analysis was performed with biomarker candidate metabolites based on linear support vector machines (SVM), partial least squares discrimination analysis (PLS-DA), and random forests (RF). The model identified five biomarker candidate metabolites that were significantly and differently expressed (adjusted p-value < 0.05) in CRC patients compared to healthy controls. The metabolites were succinic acid, aminoisobutyric acid, butyric acid, isoleucine, and leucine. Aminoisobutyric acid was the metabolite with the highest discriminatory potential in CRC, with an AUC equal to 0.806 (95% CI = 0.700-0.897), and was down-regulated in CRC patients. The SVM model showed the most substantial discrimination capacity for the five metabolites selected in the CRC screening, with an AUC of 0.985 (95% CI: 0.94-1).
Collapse
Affiliation(s)
- Fatma Hilal Yagin
- Department of Biostatistics and Medical Informatics, Faculty of Medicine, Inonu University, 44280 Malatya, Turkey
| | - Abedalrhman Alkhateeb
- Software Engineering Department, King Hussein School of Computing Science, Princess Sumaya University for Technology, Amman P.O. Box 1438, Jordan
| | - Cemil Colak
- Department of Biostatistics and Medical Informatics, Faculty of Medicine, Inonu University, 44280 Malatya, Turkey
| | - Mohammad Azzeh
- Data Science Department, King Hussein School of Computing Science, Princess Sumaya University for Technology, Amman P.O. Box 1438, Jordan
| | - Burak Yagin
- Department of Biostatistics and Medical Informatics, Faculty of Medicine, Inonu University, 44280 Malatya, Turkey
| | - Luis Rueda
- School of Computer Science, University of Windsor, Windsor, ON N9B 3P4, Canada
| |
Collapse
|
4
|
Cyrenne JB, Kanjeekal SM, Porter L, Cavallo-Medved D, Fifield BA, Rueda L, Atikukke G, Alkhateeb A, El-Gohary Y. Comprehensive gene sequencing to identify progression predictors to muscle-invasive bladder cancer. J Clin Oncol 2023. [DOI: 10.1200/jco.2023.41.6_suppl.570] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/18/2023] Open
Abstract
570 Background: Over 8900 Canadians are diagnosed with bladder cancer every year, ranking it the fifth most frequent cancer. It can manifest as either non-muscle invasive bladder cancer (NMIBC) or muscle invasive bladder cancer (MIBC). The majority of patients initially receive a diagnosis of NMIBC, although high-grade NMIBC has a 50–70% recurrence rate and 10–30% chance at progressing to MIBC. The transition from high-grade NMIBC to MIBC is poorly understood, and there are currently no accurate biomarkers that predict disease progression. We propose a comprehensive molecular characterization to pinpoint specific copy number alterations (CNAs) related to either MIBC or NMIBC, and hence define the molecular development. Methods: This study analyzed a public dataset from MSKCC and 30 bladder cancer patient samples from Windsor Regional Hospital, both containing NMIBC and MIBC samples. Comprehensive gene sequencing was performed, and CNAs were obtained in over 500 common tumour gene panels. Results: Preliminary data from this study found MIBC may be predicted with 91% accuracy and 95% precision using CNA values of TP53, DDR2 and MLL2. In particular, MIBC correlates with gain of DDR2 or MLL2. Importantly, it has been demonstrated that high expression of DDR2 is associated with a worse prognosis. A panel of bladder carcinoma cell lines were used to validate the DDR2 findings. DDR2 values were quantified across the panels and corresponded with proliferation and invasiveness of cell lines. DDR2 was examined as a possible therapeutic target. Conclusions: Taken together, these findings provide insight to the pathogenesis of muscle invasion in bladder cancer. The potential to identify "genomic triggers" for the transition was facilitated by creating a genetic profile at these two stages.
Collapse
Affiliation(s)
| | | | - Lisa Porter
- Schulich School of Medicine, Windsor Campus, Western University, Windsor, ON, Canada
| | - Dora Cavallo-Medved
- Schulich School of Medicine, Windsor Campus, Western University, Windsor, ON, Canada
| | - Bre-Anne Fifield
- Schulich School of Medicine, Windsor Campus, Western University, Windsor, ON, Canada
| | - Luis Rueda
- Schulich School of Medicine, Windsor Campus, Western University, Windsor, ON, Canada
| | | | - Abedalrhman Alkhateeb
- Schulich School of Medicine, Windsor Campus, Western University, Windsor, ON, Canada
| | | |
Collapse
|
5
|
Modak S, Abdel-Raheem E, Rueda L. Applications of Deep Learning in Disease Diagnosis of Chest Radiographs: A Survey on Materials and Methods. Biomedical Engineering Advances 2023. [DOI: 10.1016/j.bea.2023.100076] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/05/2023] Open
|
6
|
Naik M, Rueda L, Vasighizaker A. Identification of Enriched Regions in ChIP-Seq Data via a Linear-Time Multi-Level Thresholding Algorithm. IEEE/ACM Trans Comput Biol Bioinform 2022; 19:2842-2850. [PMID: 34398762 DOI: 10.1109/tcbb.2021.3104734] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Chromatin immunoprecipitation (ChIP-Seq) has emerged as a superior alternative to microarray technology as it provides higher resolution, less noise, greater coverage and wider dynamic range. While ChIP-Seq enables probing of DNA-protein interaction over the entire genome, it requires the use of sophisticated tools to recognize hidden patterns and extract meaningful data. Over the years, various attempts have resulted in several algorithms making use of different heuristics to accurately determine individual peaks corresponding to unique DNA-protein. However, finding all the significant peaks with high accuracy in a reasonable time is still a challenge. In this work, we propose the use of Multi-level thresholding algorithm, which we call LinMLTBS, used to identify the enriched regions on ChIP-Seq data. Although various suboptimal heuristics have been proposed for multi-level thresholding, we emphasize on the use of an algorithm capable of obtaining an optimal solution, while maintaining linear-time complexity. Testing various algorithm on various ENCODE project datasets shows that our approach attains higher accuracy relative to previously proposed peak finders while retaining a reasonable processing speed.
Collapse
|
7
|
Firoozbakht F, Rezaeian I, Rueda L, Ngom A. Computationally repurposing drugs for breast cancer subtypes using a network-based approach. BMC Bioinformatics 2022; 23:143. [PMID: 35443626 PMCID: PMC9020161 DOI: 10.1186/s12859-022-04662-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/25/2021] [Accepted: 03/30/2022] [Indexed: 11/22/2022] Open
Abstract
‘De novo’ drug discovery is costly, slow, and with high risk. Repurposing known drugs for treatment of other diseases offers a fast, low-cost/risk and highly-efficient method toward development of efficacious treatments. The emergence of large-scale heterogeneous biomolecular networks, molecular, chemical and bioactivity data, and genomic and phenotypic data of pharmacological compounds is enabling the development of new area of drug repurposing called ‘in silico’ drug repurposing, i.e., computational drug repurposing (CDR). The aim of CDR is to discover new indications for an existing drug (drug-centric) or to identify effective drugs for a disease (disease-centric). Both drug-centric and disease-centric approaches have the common challenge of either assessing the similarity or connections between drugs and diseases. However, traditional CDR is fraught with many challenges due to the underlying complex pharmacology and biology of diseases, genes, and drugs, as well as the complexity of their associations. As such, capturing highly non-linear associations among drugs, genes, diseases by most existing CDR methods has been challenging. We propose a network-based integration approach that can best capture knowledge (and complex relationships) contained within and between drugs, genes and disease data. A network-based machine learning approach is applied thereafter by using the extracted knowledge and relationships in order to identify single and pair of approved or experimental drugs with potential therapeutic effects on different breast cancer subtypes. Indeed, further clinical analysis is needed to confirm the therapeutic effects of identified drugs on each breast cancer subtype.
Collapse
Affiliation(s)
- Forough Firoozbakht
- School of Computer Science, University of Windsor, 401 Sunset Ave., Windsor, ON, Canada
| | - Iman Rezaeian
- School of Computer Science, University of Windsor, 401 Sunset Ave., Windsor, ON, Canada.,Rocket Innovation Studio, 156 Chatham St W, Windsor, ON, Canada
| | - Luis Rueda
- School of Computer Science, University of Windsor, 401 Sunset Ave., Windsor, ON, Canada.
| | - Alioune Ngom
- School of Computer Science, University of Windsor, 401 Sunset Ave., Windsor, ON, Canada
| |
Collapse
|
8
|
Vasighizaker A, Danda S, Rueda L. Discovering cell types using manifold learning and enhanced visualization of single-cell RNA-Seq data. Sci Rep 2022; 12:120. [PMID: 34996927 PMCID: PMC8742092 DOI: 10.1038/s41598-021-03613-0] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2021] [Accepted: 12/07/2021] [Indexed: 01/03/2023] Open
Abstract
Identifying relevant disease modules such as target cell types is a significant step for studying diseases. High-throughput single-cell RNA-Seq (scRNA-seq) technologies have advanced in recent years, enabling researchers to investigate cells individually and understand their biological mechanisms. Computational techniques such as clustering, are the most suitable approach in scRNA-seq data analysis when the cell types have not been well-characterized. These techniques can be used to identify a group of genes that belong to a specific cell type based on their similar gene expression patterns. However, due to the sparsity and high-dimensionality of scRNA-seq data, classical clustering methods are not efficient. Therefore, the use of non-linear dimensionality reduction techniques to improve clustering results is crucial. We introduce a method that is used to identify representative clusters of different cell types by combining non-linear dimensionality reduction techniques and clustering algorithms. We assess the impact of different dimensionality reduction techniques combined with the clustering of thirteen publicly available scRNA-seq datasets of different tissues, sizes, and technologies. We further performed gene set enrichment analysis to evaluate the proposed method's performance. As such, our results show that modified locally linear embedding combined with independent component analysis yields overall the best performance relative to the existing unsupervised methods across different datasets.
Collapse
Affiliation(s)
| | - Saiteja Danda
- School of Computer Science, University of Windsor, Windsor, ON, Canada
| | - Luis Rueda
- School of Computer Science, University of Windsor, Windsor, ON, Canada.
| |
Collapse
|
9
|
Stokes K, Nunes M, Trombley C, Flôres DEFL, Wu G, Taleb Z, Alkhateeb A, Banskota S, Harris C, Love OP, Khan WI, Rueda L, Hogenesch JB, Karpowicz P. The Circadian Clock Gene, Bmal1, Regulates Intestinal Stem Cell Signaling and Represses Tumor Initiation. Cell Mol Gastroenterol Hepatol 2021; 12:1847-1872.e0. [PMID: 34534703 PMCID: PMC8591196 DOI: 10.1016/j.jcmgh.2021.08.001] [Citation(s) in RCA: 31] [Impact Index Per Article: 10.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 04/06/2021] [Revised: 08/02/2021] [Accepted: 08/03/2021] [Indexed: 12/11/2022]
Abstract
BACKGROUND & AIMS Circadian rhythms are daily physiological oscillations driven by the circadian clock: a 24-hour transcriptional timekeeper that regulates hormones, inflammation, and metabolism. Circadian rhythms are known to be important for health, but whether their loss contributes to colorectal cancer is not known. We tested the nonredundant clock gene Bmal1 in intestinal homeostasis and tumorigenesis, using the Apcmin model of colorectal cancer. METHODS Bmal1 mutant, epithelium-conditional Bmal1 mutant, and photoperiod (day/night cycle) disrupted mice bearing the Apcmin allele were assessed for tumorigenesis. Tumors and normal nontransformed tissue were characterized. Intestinal organoids were assessed for circadian transcription rhythms by RNA sequencing, and in vivo and organoid assays were used to test Bmal1-dependent proliferation and self-renewal. RESULTS Loss of Bmal1 or circadian photoperiod increases tumor initiation. In the intestinal epithelium the clock regulates transcripts involved in regeneration and intestinal stem cell signaling. Tumors have no self-autonomous clock function and only weak clock function in vivo. Apcmin clock-disrupted tumors show high Yes-associated protein 1 (Hippo signaling) activity but show low Wnt (Wingless and Int-1) activity. Intestinal organoid assays show that loss of Bmal1 increases self-renewal in a Yes-associated protein 1-dependent manner. CONCLUSIONS Bmal1 regulates intestinal stem cell pathways, including Hippo signaling, and the loss of circadian rhythms potentiates tumor initiation. Transcript profiling: GEO accession number: GSE157357.
Collapse
Affiliation(s)
- Kyle Stokes
- Department of Biomedical Sciences, Windsor, Ontario, Canada
| | - Malika Nunes
- Department of Biomedical Sciences, Windsor, Ontario, Canada
| | | | - Danilo E F L Flôres
- Division of Human Genetics and Immunobiology, Cincinnati Children's Hospital Medical Center, Cincinnati, Ohio
| | - Gang Wu
- Division of Human Genetics and Immunobiology, Cincinnati Children's Hospital Medical Center, Cincinnati, Ohio
| | - Zainab Taleb
- Department of Biomedical Sciences, Windsor, Ontario, Canada
| | | | - Suhrid Banskota
- Department of Pathology and Molecular Medicine, Farncombe Family Digestive Health Research Institute, McMaster University, Hamilton, Ontario, Canada
| | - Chris Harris
- Department of Integrative Biology, University of Windsor, Windsor, Ontario, Canada
| | - Oliver P Love
- Department of Integrative Biology, University of Windsor, Windsor, Ontario, Canada
| | - Waliul I Khan
- Department of Pathology and Molecular Medicine, Farncombe Family Digestive Health Research Institute, McMaster University, Hamilton, Ontario, Canada
| | - Luis Rueda
- School of Computer Science, Windsor, Ontario, Canada
| | - John B Hogenesch
- Division of Human Genetics and Immunobiology, Cincinnati Children's Hospital Medical Center, Cincinnati, Ohio
| | | |
Collapse
|
10
|
|
11
|
Fatima N, Rueda L. iSOM-GSN: an integrative approach for transforming multi-omic data into gene similarity networks via self-organizing maps. Bioinformatics 2021; 36:4248-4254. [PMID: 32407457 DOI: 10.1093/bioinformatics/btaa500] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2019] [Revised: 04/27/2020] [Accepted: 05/07/2020] [Indexed: 01/04/2023] Open
Abstract
MOTIVATION One of the main challenges in applying graph convolutional neural networks (CNNs) on gene-interaction data is the lack of understanding of the vector space to which they belong, and also the inherent difficulties involved in representing those interactions on a significantly lower dimension, viz Euclidean spaces. The challenge becomes more prevalent when dealing with various types of heterogeneous data. We introduce a systematic, generalized method, called iSOM-GSN, used to transform 'multi-omic' data with higher dimensions onto a 2D grid. Afterwards, we apply a CNN to predict disease states of various types. Based on the idea of Kohonen's self-organizing map, we generate a 2D grid for each sample for a given set of genes that represent a gene similarity network. RESULTS We have tested the model to predict breast and prostate cancer using gene expression, DNA methylation and copy number alteration. Prediction accuracies in the 94-98% range were obtained for tumor stages of breast cancer and calculated Gleason scores of prostate cancer with just 14 input genes for both cases. The scheme not only outputs nearly perfect classification accuracy, but also provides an enhanced scheme for representation learning, visualization, dimensionality reduction and interpretation of multi-omic data. AVAILABILITY AND IMPLEMENTATION The source code and sample data are available via a Github project at https://github.com/NaziaFatima/iSOM_GSN. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Nazia Fatima
- School of Computer Science, University of Windsor, Windsor, ON N9B 3P4, Canada
| | - Luis Rueda
- School of Computer Science, University of Windsor, Windsor, ON N9B 3P4, Canada
| |
Collapse
|
12
|
Atikukke G, Alkhateeb A, Porter L, Fifield B, Cavallo-Medved D, Facca J, Elfiki T, Elkeilani A, Rueda L, Misra S. P-370 Comprehensive targeted genomic profiling and comparative genomic analysis to identify molecular mechanisms driving cancer progression in young-onset sporadic colorectal cancer. Ann Oncol 2020. [DOI: 10.1016/j.annonc.2020.04.452] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022] Open
|
13
|
Jubair S, Alkhateeb A, Tabl AA, Rueda L, Ngom A. A novel approach to identify subtype-specific network biomarkers of breast cancer survivability. ACTA ACUST UNITED AC 2020. [DOI: 10.1007/s13721-020-00249-4] [Citation(s) in RCA: 17] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/20/2023]
|
14
|
Hamzeh O, Alkhateeb A, Zheng J, Kandalam S, Rueda L. Prediction of tumor location in prostate cancer tissue using a machine learning system on gene expression data. BMC Bioinformatics 2020; 21:78. [PMID: 32164523 PMCID: PMC7068980 DOI: 10.1186/s12859-020-3345-9] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022] Open
Abstract
BACKGROUND Finding the tumor location in the prostate is an essential pathological step for prostate cancer diagnosis and treatment. The location of the tumor - the laterality - can be unilateral (the tumor is affecting one side of the prostate), or bilateral on both sides. Nevertheless, the tumor can be overestimated or underestimated by standard screening methods. In this work, a combination of efficient machine learning methods for feature selection and classification are proposed to analyze gene activity and select them as relevant biomarkers for different laterality samples. RESULTS A data set that consists of 450 samples was used in this study. The samples were divided into three laterality classes (left, right, bilateral). The aim of this work is to understand the genomic activity in each class and find relevant genes as indicators for each class with nearly 99% accuracy. The system identified groups of differentially expressed genes (RTN1, HLA-DMB, MRI1) that are able to differentiate samples among the three classes. CONCLUSION The proposed method was able to detect sets of genes that can identify different laterality classes. The resulting genes are found to be strongly correlated with disease progression. HLA-DMB and EIF4G2, which are detected in the set of genes can detect the left laterality, were reported earlier to be in the same pathway called Allograft rejection SuperPath.
Collapse
Affiliation(s)
- Osama Hamzeh
- School of Computer Science, University of Windsor, 401 Sunset Ave, Windsor, N9B 3P4 ON Canada
| | - Abedalrhman Alkhateeb
- School of Computer Science, University of Windsor, 401 Sunset Ave, Windsor, N9B 3P4 ON Canada
| | - Julia Zheng
- School of Computer Science, University of Windsor, 401 Sunset Ave, Windsor, N9B 3P4 ON Canada
| | - Srinath Kandalam
- Department of Biomedical Sciences, University of Windsor, 401 Sunset Ave, Windsor, N9B 3P4 ON Canada
| | - Luis Rueda
- School of Computer Science, University of Windsor, 401 Sunset Ave, Windsor, N9B 3P4 ON Canada
| |
Collapse
|
15
|
Alkhateeb A, Atikukke G, Porter L, Fifield BA, Cavallo-Medved D, Facca J, El-Gohary Y, Zhang T, Hamzeh O, Rueda L, Kanjeekal SM. Comprehensive targeted gene profiling to determine the genomic signature likely to drive progression of high-grade nonmuscle invasive bladder cancer to muscle invasive bladder cancer. J Clin Oncol 2020. [DOI: 10.1200/jco.2020.38.6_suppl.568] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
568 Background: Bladder cancer is the fifth most common cancer and eighth leading cause of cancer related-death in North America. It can present as non-muscle invasive bladder cancer (NMIBC) and/or muscle invasive bladder (MIBC). Although genomic profiling studies have established that low-grade NMIBC and MIBC are genetically distinct, high-grade NMIBC can recur and progress to MIBC [ Knowles, M.A. and C.D. Hurst, 2015]. Low grade, non-invasive bladder cancers are characterized by activating mutations in fibroblast growth factor receptor 3 (FGFR3), HRAS or other pathways of receptor kinase activation. High-grade disease, which is often becomes invasive, is characterized by inactivation of TP53 and Rb pathways [Kim, J., et al.]. Finding a subtype of invasive carcinoma with FGFR3 mutation may suggest an alternate pathway by which low grade, non-invasive pathology could transform into invasive disease [Knowles, M.A. and C.D. Hurst, 2015]. Methods: In this study, using a total of 30 bladder cancer (NMIBC and MIBC) patient samples from Windsor Regional Hospital Cancer Program, we performed comprehensive targeted gene sequencing to identify single nucleotide variants, small insertions / deletions, copy number variants and splice variants in over 500 common tumor genes panel. Results: Preliminary data from our study correlates with previously published mutation landscape for NMIBC and MIBC, and includes mutations in EGFR, FGFR3, FGFR4, PIK3CA, CDK6, ALK, JAK, as well as RET. While mutations in AKT1, BRCA1, CCND1, ERBB2, FGFR1, FGFR2, HRAS, and MET appear to be prevalent in NMIBC, mutations in IDH1 and MAP2K2 appear to be more common in MIBC. Three of the samples used in the study are from patients who progressed from high-grade NMIBC to MIBC. Conclusions: Therefore, have the genomic profiling performed at these two stages, which provides a unique ability to identify the potential “genomic triggers” for the transition.
Collapse
Affiliation(s)
| | | | | | | | | | | | | | - Tianmin Zhang
- Schulich School of Medicine & Dentistry, Western University, London, ON, Canada
| | | | - Luis Rueda
- University of Windsor, Windsor, ON, Canada
| | | |
Collapse
|
16
|
Hamzeh O, Alkhateeb A, Zheng JZ, Kandalam S, Leung C, Atikukke G, Cavallo-Medved D, Palanisamy N, Rueda L. A Hierarchical Machine Learning Model to Discover Gleason Grade-Specific Biomarkers in Prostate Cancer. Diagnostics (Basel) 2019; 9:diagnostics9040219. [PMID: 31835700 PMCID: PMC6963340 DOI: 10.3390/diagnostics9040219] [Citation(s) in RCA: 16] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/08/2019] [Revised: 11/25/2019] [Accepted: 12/01/2019] [Indexed: 12/31/2022] Open
Abstract
(1) Background:One of the most common cancers that affect North American men and men worldwide is prostate cancer. The Gleason score is a pathological grading system to examine the potential aggressiveness of the disease in the prostate tissue. Advancements in computing and next-generation sequencing technology now allow us to study the genomic profiles of patients in association with their different Gleason scores more accurately and effectively. (2) Methods: In this study, we used a novel machine learning method to analyse gene expression of prostate tumours with different Gleason scores, and identify potential genetic biomarkers for each Gleason group. We obtained a publicly-available RNA-Seq dataset of a cohort of 104 prostate cancer patients from the National Center for Biotechnology Information's (NCBI) Gene Expression Omnibus (GEO) repository, and categorised patients based on their Gleason scores to create a hierarchy of disease progression. A hierarchical model with standard classifiers in different Gleason groups, also known as nodes, was developed to identify and predict nodes based on their mRNA or gene expression. In each node, patient samples were analysed via class imbalance and hybrid feature selection techniques to build the prediction model. The outcome from analysis of each node was a set of genes that could differentiate each Gleason group from the remaining groups. To validate the proposed method, the set of identified genes were used to classify a second dataset of 499 prostate cancer patients collected from cBioportal. (3) Results: The overall accuracy of applying this novel method to the first dataset was 93.3%; the method was further validated to have 87% accuracy using the second dataset. This method also identified genes that were not previously reported as potential biomarkers for specific Gleason groups. In particular, PIAS3 was identified as a potential biomarker for Gleason score 4 + 3 = 7, and UBE2V2 for Gleason score 6. (4) Insight: Previous reports show that the genes predicted by this newly proposed method strongly correlate with prostate cancer development and progression. Furthermore, pathway analysis shows that both PIAS3 and UBE2V2 share similar protein interaction pathways, the JAK/STAT signaling process.
Collapse
Affiliation(s)
- Osama Hamzeh
- School of Computer Science, University of Windsor, 401 Sunset Ave, Windsor, ON N9B 3P4, Canada; (O.H.); (J.Z.Z.)
| | - Abedalrhman Alkhateeb
- School of Computer Science, University of Windsor, 401 Sunset Ave, Windsor, ON N9B 3P4, Canada; (O.H.); (J.Z.Z.)
- Correspondence: (A.A.); (N.P.); (L.R.); Tel.: +1-519-253-0000 (ext. 3793) (A.A.); +1-313-874-6396 (N.P.); +1-519-253-0000 (ext. 3002) (L.R.)
| | - Julia Zhuoran Zheng
- School of Computer Science, University of Windsor, 401 Sunset Ave, Windsor, ON N9B 3P4, Canada; (O.H.); (J.Z.Z.)
| | - Srinath Kandalam
- Department of Biomedical Sciences, University of Windsor, 401 Sunset Ave, Windsor, ON N9B 3P4, Canada; (S.K.); (D.C.-M.)
| | - Crystal Leung
- Schulich School of Medicine and Dentistry, Western University, 1151 Richmond St, London, ON N6A 5C1, Canada;
| | | | - Dora Cavallo-Medved
- Department of Biomedical Sciences, University of Windsor, 401 Sunset Ave, Windsor, ON N9B 3P4, Canada; (S.K.); (D.C.-M.)
| | - Nallasivam Palanisamy
- Department of Urology, Henry Ford Health System, One Ford Place, Detroit, MI 48202, USA
- Correspondence: (A.A.); (N.P.); (L.R.); Tel.: +1-519-253-0000 (ext. 3793) (A.A.); +1-313-874-6396 (N.P.); +1-519-253-0000 (ext. 3002) (L.R.)
| | - Luis Rueda
- School of Computer Science, University of Windsor, 401 Sunset Ave, Windsor, ON N9B 3P4, Canada; (O.H.); (J.Z.Z.)
- Correspondence: (A.A.); (N.P.); (L.R.); Tel.: +1-519-253-0000 (ext. 3793) (A.A.); +1-313-874-6396 (N.P.); +1-519-253-0000 (ext. 3002) (L.R.)
| |
Collapse
|
17
|
Tabl AA, Alkhateeb A, ElMaraghy W, Rueda L, Ngom A. A Machine Learning Approach for Identifying Gene Biomarkers Guiding the Treatment of Breast Cancer. Front Genet 2019; 10:256. [PMID: 30972106 PMCID: PMC6446069 DOI: 10.3389/fgene.2019.00256] [Citation(s) in RCA: 50] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2018] [Accepted: 03/08/2019] [Indexed: 12/12/2022] Open
Abstract
Genomic profiles among different breast cancer survivors who received similar treatment may provide clues about the key biological processes involved in the cells and finding the right treatment. More specifically, such profiling may help personalize the treatment based on the patients’ gene expression. In this paper, we present a hierarchical machine learning system that predicts the 5-year survivability of the patients who underwent though specific therapy; The classes are built on the combination of two parts that are the survivability information and the given therapy. For the survivability information part, it defines whether the patient survives the 5-years interval or deceased. While the therapy part denotes the therapy has been taken during that interval, which includes hormone therapy, radiotherapy, or surgery, which totally forms six classes. The Model classifies one class vs. the rest at each node, which makes the tree-based model creates five nodes. The model is trained using a set of standard classifiers based on a comprehensive study dataset that includes genomic profiles and clinical information of 347 patients. A combination of feature selection methods and a prediction method are applied on each node to identify the genes that can predict the class at that node, the identified genes for each class may serve as potential biomarkers to the class’s treatment for better survivability. The results show that the model identifies the classes with high-performance measurements. An exhaustive analysis based on relevant literature shows that some of the potential biomarkers are strongly related to breast cancer survivability and cancer in general.
Collapse
Affiliation(s)
- Ashraf Abou Tabl
- Department of Mechanical, Automotive and Materials Engineering, University of Windsor, Windsor, ON, Canada
| | | | - Waguih ElMaraghy
- Department of Mechanical, Automotive and Materials Engineering, University of Windsor, Windsor, ON, Canada
| | - Luis Rueda
- School of Computer Science, University of Windsor, Windsor, ON, Canada
| | - Alioune Ngom
- School of Computer Science, University of Windsor, Windsor, ON, Canada
| |
Collapse
|
18
|
Alkhateeb A, Rezaeian I, Singireddy S, Cavallo-Medved D, Porter LA, Rueda L. Transcriptomics Signature from Next-Generation Sequencing Data Reveals New Transcriptomic Biomarkers Related to Prostate Cancer. Cancer Inform 2019; 18:1176935119835522. [PMID: 30890858 PMCID: PMC6416685 DOI: 10.1177/1176935119835522] [Citation(s) in RCA: 35] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2019] [Accepted: 01/23/2019] [Indexed: 12/11/2022] Open
Abstract
Prostate cancer is one of the most common types of cancer among Canadian men. Next-generation sequencing using RNA-Seq provides large amounts of data that may reveal novel and informative biomarkers. We introduce a method that uses machine learning techniques to identify transcripts that correlate with prostate cancer development and progression. We have isolated transcripts that have the potential to serve as prognostic indicators and may have tremendous value in guiding treatment decisions. Analysis of normal versus malignant prostate cancer data sets indicates differential expression of the genes HEATR5B, DDC, and GABPB1-AS1 as potential prostate cancer biomarkers. Our study also supports PTGFR, NREP, SCARNA22, DOCK9, FLVCR2, IK2F3, USP13, and CLASP1 as potential biomarkers to predict prostate cancer progression, especially between stage II and subsequent stages of the disease.
Collapse
Affiliation(s)
| | - Iman Rezaeian
- School of Computer Science, University
of Windsor, Windsor, ON, Canada
| | - Siva Singireddy
- School of Computer Science, University
of Windsor, Windsor, ON, Canada
| | - Dora Cavallo-Medved
- Department of Biological Sciences,
University of Windsor, Windsor, ON, Canada
| | - Lisa A Porter
- Department of Biological Sciences,
University of Windsor, Windsor, ON, Canada
| | - Luis Rueda
- School of Computer Science, University
of Windsor, Windsor, ON, Canada
| |
Collapse
|
19
|
Li Y, Maleki M, Carruthers NJ, Stemmer PM, Ngom A, Rueda L. The predictive performance of short-linear motif features in the prediction of calmodulin-binding proteins. BMC Bioinformatics 2018; 19:410. [PMID: 30453876 PMCID: PMC6245490 DOI: 10.1186/s12859-018-2378-9] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/21/2023] Open
Abstract
Background The prediction of calmodulin-binding (CaM-binding) proteins plays a very important role in the fields of biology and biochemistry, because the calmodulin protein binds and regulates a multitude of protein targets affecting different cellular processes. Computational methods that can accurately identify CaM-binding proteins and CaM-binding domains would accelerate research in calcium signaling and calmodulin function. Short-linear motifs (SLiMs), on the other hand, have been effectively used as features for analyzing protein-protein interactions, though their properties have not been utilized in the prediction of CaM-binding proteins. Results We propose a new method for the prediction of CaM-binding proteins based on both the total and average scores of known and new SLiMs in protein sequences using a new scoring method called sliding window scoring (SWS) as features for the prediction module. A dataset of 194 manually curated human CaM-binding proteins and 193 mitochondrial proteins have been obtained and used for testing the proposed model. The motif generation tool, Multiple EM for Motif Elucidation (MEME), has been used to obtain new motifs from each of the positive and negative datasets individually (the SM approach) and from the combined negative and positive datasets (the CM approach). Moreover, the wrapper criterion with random forest for feature selection (FS) has been applied followed by classification using different algorithms such as k-nearest neighbors (k-NN), support vector machines (SVM), naive Bayes (NB) and random forest (RF). Conclusions Our proposed method shows very good prediction results and demonstrates how information contained in SLiMs is highly relevant in predicting CaM-binding proteins. Further, three new CaM-binding motifs have been computationally selected and biologically validated in this study, and which can be used for predicting CaM-binding proteins. Electronic supplementary material The online version of this article (10.1186/s12859-018-2378-9) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Yixun Li
- School of Computer Science, University of Windsor, Windsor, Ontario, Canada
| | - Mina Maleki
- School of Computer Science, University of Windsor, Windsor, Ontario, Canada
| | | | - Paul M Stemmer
- Inst. of Env. Health Sci., Wayne State University, Detroit, MI, USA
| | - Alioune Ngom
- School of Computer Science, University of Windsor, Windsor, Ontario, Canada
| | - Luis Rueda
- School of Computer Science, University of Windsor, Windsor, Ontario, Canada.
| |
Collapse
|
20
|
Alvarez-Berastegui D, Coll J, Rueda L, Stobart B, Morey G, Navarro O, Aparicio-González A, Grau AM, Reñones O. Multiscale seascape habitat of necto-benthic littoral species, application to the study of the dusky grouper habitat shift throughout ontogeny. Mar Environ Res 2018; 142:21-31. [PMID: 30253919 DOI: 10.1016/j.marenvres.2018.09.002] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/03/2018] [Revised: 07/19/2018] [Accepted: 09/02/2018] [Indexed: 06/08/2023]
Abstract
Describing the spatial patterns of benthic coastal habitats and investigating how those patterns affect the ecology of inhabiting species is a main objective of seascape ecology. Within this emerging discipline spatial scale is a principal topic. Different spatial scales inform on different characteristics of the habitat and therefore the relation between species and their habitats would be better defined when observed at multiple levels of spatial scale. Here we apply a multiscale seascape approach to investigate the habitat preferences of juvenile and adult individuals of dusky grouper (Epinephelus marginatus) in a Mediterranean marine protected area. Results show that the information obtained at different spatial scales is complementary, improving our capability to identify the preferred habitats and how it changes throughout ontogeny. These results show the relevance of implementing multiscale seascape ecology approaches to investigate the species-habitat relationships and to improve management and conservation of necto-benthic endangered top predators.
Collapse
Affiliation(s)
- D Alvarez-Berastegui
- ICTS-SOCIB, Balearic Islands Coastal Observing and Forecasting System, Palma de Mallorca, Illes Balears, Spain.
| | - J Coll
- TRAGASATEC, Palma de Mallorca, Illes Balears, Spain
| | - L Rueda
- Instituto Español de Oceanografía, Centro Oceanográfico de Baleares, Palma de Mallorca, Illes Balears, Spain
| | - B Stobart
- South Australian Research and Development Institute SARDI, Fisheries, Port Lincoln Marine Science Centre, PO Box 1783, Port Lincoln, 5606, South Australia, Australia
| | - G Morey
- TRAGASATEC, Palma de Mallorca, Illes Balears, Spain
| | - O Navarro
- Serveis de Millora Agrària i Pesquera, Conselleria de Medi Ambient, Agricultura i Pesca, Govern de les Illes Balears, Palma de Mallorca, Illes Balears, Spain
| | - A Aparicio-González
- Instituto Español de Oceanografía, Centro Oceanográfico de Baleares, Palma de Mallorca, Illes Balears, Spain
| | - A M Grau
- Direccio General de Pesca i Medi Mari. Conselleria de Medi Ambient, Agricultura i Pesca, Govern de les Illes Balears, Foners 10, 07006, Palma de Mallorca, Illes Balears, Spain
| | - O Reñones
- Instituto Español de Oceanografía, Centro Oceanográfico de Baleares, Palma de Mallorca, Illes Balears, Spain
| |
Collapse
|
21
|
Tabl AA, Alkhateeb A, Pham HQ, Rueda L, ElMaraghy W, Ngom A. A Novel Approach for Identifying Relevant Genes for Breast Cancer Survivability on Specific Therapies. Evol Bioinform Online 2018; 14:1176934318790266. [PMID: 30116102 PMCID: PMC6088467 DOI: 10.1177/1176934318790266] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2018] [Accepted: 06/21/2018] [Indexed: 12/17/2022] Open
Abstract
Analyzing the genetic activity of breast cancer survival for a specific type of
therapy provides a better understanding of the body response to the treatment
and helps select the best course of action and while leading to the design of
drugs based on gene activity. In this work, we use supervised and nonsupervised
machine learning methods to deal with a multiclass classification problem in
which we label the samples based on the combination of the 5-year survivability
and treatment; we focus on hormone therapy, radiotherapy, and surgery. The
proposed nonsupervised hierarchical models are created to find the highest
separability between combinations of the classes. The supervised model consists
of a combination of feature selection techniques and efficient classifiers used
to find a potential set of biomarker genes specific to response to therapy. The
results show that different models achieve different performance scores with
accuracies ranging from 80.9% to 100%. We have investigated the roles of many
biomarkers through the literature and found that some of the discriminative
genes in the computational model such as ZC3H11A,
VAX2, MAF1, and ZFP91 are
related to breast cancer and other types of cancer.
Collapse
Affiliation(s)
- Ashraf Abou Tabl
- Department of Mechanical, Automotive and Materials Engineering (MAME), University of Windsor, Windsor, ON, Canada
| | | | - Huy Quang Pham
- School of Computer Science, University of Windsor, Windsor, ON, Canada
| | - Luis Rueda
- School of Computer Science, University of Windsor, Windsor, ON, Canada
| | - Waguih ElMaraghy
- Department of Mechanical, Automotive and Materials Engineering (MAME), University of Windsor, Windsor, ON, Canada
| | - Alioune Ngom
- School of Computer Science, University of Windsor, Windsor, ON, Canada
| |
Collapse
|
22
|
Firoozbakht F, Rezaeian I, D'agnillo M, Porter L, Rueda L, Ngom A. An Integrative Approach for Identifying Network Biomarkers of Breast Cancer Subtypes Using Genomic, Interactomic, and Transcriptomic Data. J Comput Biol 2017. [DOI: 10.1089/cmb.2017.0010] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Affiliation(s)
| | - Iman Rezaeian
- School of Computer Science, University of Windsor, Windsor, Canada
| | - Michele D'agnillo
- Department of Biological Sciences, University of Windsor, Windsor, Canada
| | - Lisa Porter
- Department of Biological Sciences, University of Windsor, Windsor, Canada
| | - Luis Rueda
- School of Computer Science, University of Windsor, Windsor, Canada
| | - Alioune Ngom
- School of Computer Science, University of Windsor, Windsor, Canada
| |
Collapse
|
23
|
Abstract
Next-generation sequencing technology generates a huge number of reads (short sequences), which contain a vast amount of genomic data. The sequencing process, however, comes with artifacts. Preprocessing of sequences is mandatory for further downstream analysis. We present Zseq, a linear method that identifies the most informative genomic sequences and reduces the number of biased sequences, sequence duplications, and ambiguous nucleotides. Zseq finds the complexity of the sequences by counting the number of uniquek-mers in each sequence as its corresponding score and also takes into the account other factors such as ambiguous nucleotides or high GC-content percentage ink-mers. Based on az-score threshold, Zseq sweeps through the sequences again and filters those with a z-score less than the user-defined threshold. Zseq algorithm is able to provide a better mapping rate; it reduces the number of ambiguous bases significantly in comparison with other methods. Evaluation of the filtered reads has been conducted by aligning the reads and assembling the transcripts using the reference genome as well as de novo assembly. The assembled transcripts show a better discriminative ability to separate cancer and normal samples in comparison with another state-of-the-art method. Moreover, de novo assembled transcripts from the reads filtered by Zseq have longer genomic sequences than other tested methods. Estimating the threshold of the cutoff point is introduced using labeling rules with optimistic results.
Collapse
Affiliation(s)
| | - Luis Rueda
- School of Computer Science, University of Windsor , Windsor, Canada
| |
Collapse
|
24
|
Cirigliano V, Ordoñez E, Rueda L, Syngelaki A, Nicolaides KH. Performance of the neoBona test: a new paired-end massively parallel shotgun sequencing approach for cell-free DNA-based aneuploidy screening. Ultrasound Obstet Gynecol 2017; 49:460-464. [PMID: 27981672 PMCID: PMC5396344 DOI: 10.1002/uog.17386] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/16/2016] [Revised: 11/29/2016] [Accepted: 12/09/2016] [Indexed: 06/06/2023]
Abstract
OBJECTIVE To assess the performance of screening for fetal trisomies 21, 18 and 13 by cell-free (cf) DNA analysis of maternal blood using a new method based on paired-end massively parallel shotgun sequencing (MPSS). METHODS This was a blinded study of plasma samples (1mL) obtained from 1000 women undergoing screening for fetal trisomies 21, 18 and 13 at 11-13 weeks' gestation. The study included 50 cases with confirmed fetal trisomy 21, 30 with trisomy 18, 10 with trisomy 13 and 910 unaffected pregnancies. Paired-end MPSS with the neoBona® test allowed simultaneous assessment of fetal fraction, cfDNA fragment size distribution and chromosome counting, which were integrated into a new analysis algorithm to calculate trisomy likelihood ratios (t-score) for each chromosome of interest. Each sample was classified as trisomic or unaffected using chromosome-specific cut-offs set at t-score values of 1.5 for trisomy 21 and 3.0 for trisomies 18 and 13. RESULTS Valid results were provided for 988 (98.8%) cases; 12 (1.2%) samples, from nine euploid and three trisomy 21 pregnancies, did not pass quality-control criteria and were excluded from further analysis. All 47 cases of trisomy 21, all 10 of trisomy 13, 29 of 30 with trisomy 18 and all 901 unaffected cases were classified correctly. Median fetal fraction was 10.5% (range, 0.3-33.8%) and trisomic and unaffected cases with low fetal fractions of < 1% were identified correctly. CONCLUSIONS This novel method for cfDNA analysis of maternal plasma, which utilizes paired-end MPSS, can provide accurate prediction of fetal trisomies. Use of a new multicomponent t-score removes the need to reject samples with fetal fraction < 4%, which potentially extends the benefits of non-invasive prenatal cfDNA analysis to a larger proportion of pregnancies. © 2016 Authors. Ultrasound in Obstetrics & Gynecology published by John Wiley & Sons Ltd on behalf of International Society of Ultrasound in Obstetrics and Gynecology.
Collapse
Affiliation(s)
- V. Cirigliano
- Department of Molecular Genetics, Labco Diagnostics, SYNLAB GroupBarcelonaSpain
| | - E. Ordoñez
- Department of Molecular Genetics, Labco Diagnostics, SYNLAB GroupBarcelonaSpain
| | - L. Rueda
- Department of Molecular Genetics, Labco Diagnostics, SYNLAB GroupBarcelonaSpain
| | - A. Syngelaki
- Harris Birthright Research Centre for Fetal MedicineKing's College HospitalLondonUK
| | - K. H. Nicolaides
- Harris Birthright Research Centre for Fetal MedicineKing's College HospitalLondonUK
| |
Collapse
|
25
|
Li Y, Maleki M, Carruthers NJ, Rueda L, Stemmer PM, Ngom A. Prediction of Calmodulin-Binding Proteins Using Short-Linear Motifs. Bioinformatics and Biomedical Engineering 2017. [DOI: 10.1007/978-3-319-56154-7_11] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
|
26
|
Mucaki EJ, Baranova K, Pham HQ, Rezaeian I, Angelov D, Ngom A, Rueda L, Rogan PK. Predicting Outcomes of Hormone and Chemotherapy in the Molecular Taxonomy of Breast Cancer International Consortium (METABRIC) Study by Biochemically-inspired Machine Learning. F1000Res 2016. [PMID: 28620450 PMCID: PMC5461908 DOI: 10.12688/f1000research.9417.3] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 01/19/2023] Open
Abstract
Genomic aberrations and gene expression-defined subtypes in the large METABRIC patient cohort have been used to stratify and predict survival. The present study used normalized gene expression signatures of paclitaxel drug response to predict outcome for different survival times in METABRIC patients receiving hormone (HT) and, in some cases, chemotherapy (CT) agents. This machine learning method, which distinguishes sensitivity vs. resistance in breast cancer cell lines and validates predictions in patients; was also used to derive gene signatures of other HT (tamoxifen) and CT agents (methotrexate, epirubicin, doxorubicin, and 5-fluorouracil) used in METABRIC. Paclitaxel gene signatures exhibited the best performance, however the other agents also predicted survival with acceptable accuracies. A support vector machine (SVM) model of paclitaxel response containing genes
ABCB1, ABCB11, ABCC1, ABCC10, BAD, BBC3, BCL2, BCL2L1, BMF, CYP2C8, CYP3A4, MAP2, MAP4, MAPT, NR1I2, SLCO1B3, TUBB1, TUBB4A, and
TUBB4B was 78.6% accurate in predicting survival of 84 patients treated with both HT and CT (median survival ≥ 4.4 yr). Accuracy was lower (73.4%) in 304 untreated patients. The performance of other machine learning approaches was also evaluated at different survival thresholds. Minimum redundancy maximum relevance feature selection of a paclitaxel-based SVM classifier based on expression of genes
BCL2L1, BBC3, FGF2, FN1, and
TWIST1 was 81.1% accurate in 53 CT patients. In addition, a random forest (RF) classifier using a gene signature (
ABCB1, ABCB11, ABCC1, ABCC10, BAD, BBC3, BCL2, BCL2L1, BMF, CYP2C8, CYP3A4, MAP2, MAP4, MAPT, NR1I2,SLCO1B3, TUBB1, TUBB4A, and
TUBB4B) predicted >3-year survival with 85.5% accuracy in 420 HT patients. A similar RF gene signature showed 82.7% accuracy in 504 patients treated with CT and/or HT. These results suggest that tumor gene expression signatures refined by machine learning techniques can be useful for predicting survival after drug therapies.
Collapse
Affiliation(s)
- Eliseos J Mucaki
- Deparment of Biochemistry , University of Western Ontario, London, Canada
| | - Katherina Baranova
- Deparment of Biochemistry , University of Western Ontario, London, Canada
| | - Huy Q Pham
- School of Computer Science, University of Windsor, Windsor, Canada
| | - Iman Rezaeian
- School of Computer Science, University of Windsor, Windsor, Canada
| | - Dimo Angelov
- Department of Computer Science, University of Western Ontario, London, Canada
| | - Alioune Ngom
- School of Computer Science, University of Windsor, Windsor, Canada
| | - Luis Rueda
- School of Computer Science, University of Windsor, Windsor, Canada
| | - Peter K Rogan
- Deparment of Biochemistry , University of Western Ontario, London, Canada.,Department of Computer Science, University of Western Ontario, London, Canada.,CytoGnomix Inc, London, Canada
| |
Collapse
|
27
|
Rezaeian I, Tavakoli A, Cavallo-Medved D, Porter LA, Rueda L. A novel model used to detect differential splice junctions as biomarkers in prostate cancer from RNA-Seq data. J Biomed Inform 2016; 60:422-30. [PMID: 26992567 DOI: 10.1016/j.jbi.2016.03.010] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2015] [Revised: 02/10/2016] [Accepted: 03/15/2016] [Indexed: 11/15/2022]
Abstract
BACKGROUND In cancer alternative RNA splicing represents one mechanism for flexible gene regulation, whereby protein isoforms can be created to promote cell growth, division and survival. Detecting novel splice junctions in the cancer transcriptome may reveal pathways driving tumorigenic events. In this regard, RNA-Seq, a high-throughput sequencing technology, has expanded the study of cancer transcriptomics in the areas of gene expression, chimeric events and alternative splicing in search of novel biomarkers for the disease. RESULTS In this study, we propose a new two-dimensional peak finding method for detecting differential splice junctions in prostate cancer using RNA-Seq data. We have designed an integrative process that involves a new two-dimensional peak finding algorithm to combine junctions and then remove irrelevant introns across different samples within a population. We have also designed a scoring mechanism to select the most common junctions. CONCLUSIONS Our computational analysis on three independent datasets collected from patients diagnosed with prostate cancer reveals a small subset of junctions that may potentially serve as biomarkers for prostate cancer. AVAILABILITY The pipeline, along with their corresponding algorithms, are available upon request.
Collapse
Affiliation(s)
- Iman Rezaeian
- School of Computer Science, University of Windsor, 401 Sunset Ave., Windsor, Ontario N9B 3P4, Canada.
| | - Ahmad Tavakoli
- School of Computer Science, University of Windsor, 401 Sunset Ave., Windsor, Ontario N9B 3P4, Canada.
| | - Dora Cavallo-Medved
- Department of Biological Sciences, University of Windsor, 401 Sunset Ave., Windsor, Ontario N9B 3P4, Canada.
| | - Lisa A Porter
- Department of Biological Sciences, University of Windsor, 401 Sunset Ave., Windsor, Ontario N9B 3P4, Canada.
| | - Luis Rueda
- School of Computer Science, University of Windsor, 401 Sunset Ave., Windsor, Ontario N9B 3P4, Canada.
| |
Collapse
|
28
|
Echeto LF, Sposetti V, Childs G, Aguilar ML, Behar-Horenstein LS, Rueda L, Nimmo A. Evaluation of Team-Based Learning and Traditional Instruction in Teaching Removable Partial Denture Concepts. J Dent Educ 2015; 79:1040-1048. [PMID: 26329028] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
The aim of this study was to evaluate the effectiveness of team-based learning (TBL) methodology on dental students' retention of knowledge regarding removable partial denture (RPD) treatment. The process of learning RPD treatment requires that students first acquire foundational knowledge and then use critical thinking skills to apply that knowledge to a variety of clinical situations. The traditional approach to teaching, characterized by a reliance on lectures, is not the most effective method for learning clinical applications. To address the limitations of that approach, the teaching methodology of the RPD preclinical course at the University of Florida was changed to TBL, which has been shown to motivate student learning and improve clinical performance. A written examination was constructed to compare the impact of TBL with that of traditional teaching regarding students' retention of knowledge and their ability to evaluate, diagnose, and treatment plan a partially edentulous patient with an RPD prosthesis. Students taught using traditional and TBL methods took the same examination. The response rate (those who completed the examination) for the class of 2013 (traditional method) was 94% (79 students of 84); for the class of 2014 (TBL method), it was 95% (78 students of 82). The results showed that students who learned RPD with TBL scored higher on the examination than those who learned RPD with traditional methods. Compared to the students taught with the traditional method, the TBL students' proportion of passing grades was statistically significantly higher (p=0.002), and 23.7% more TBL students passed the examination. The mean score for the TBL class (0.758) compared to the conventional class (0.700) was statistically significant with a large effect size, also demonstrating the practical significance of the findings. The results of the study suggest that TBL methodology is a promising approach to teaching RPD with successful outcomes.
Collapse
Affiliation(s)
- Luisa F Echeto
- Dr. Echeto is Clinical Associate Professor and Director, Division of Prosthodontics and Predoctoral Prosthodontics Program, Department of Restorative Dental Sciences, University of Florida College of Dentistry; Dr. Sposetti is Associate Dean for Education and Associate Professor, Division of Prosthodontics, Department of Restorative Dental Sciences, University of Florida College of Dentistry; Ms. Childs is Associate in Dentistry and Director of Curriculum and Instruction, University of Florida College of Dentistry; Dr. Aguilar is Clinical Assistant Professor, Division of Prosthodontics, Department of Restorative Dental Sciences, University of Florida College of Dentistry; Dr. Behar-Horenstein is Distinguished Teaching Scholar and Professor, Department of Educational Administration and Policy, University of Florida School of Human Development and Organizational Studies in Education and Affiliate Professor, Community Dentistry and Behavioral Science, University of Florida College of Dentistry; Dr. Rueda is Clinical Associate Professor, Division of Prosthodontics, Department of Restorative Dental Sciences, University of Florida College of Dentistry; and Dr. Nimmo is Professor, Division of Prosthodontics, Department of Restorative Dental Sciences, University of Florida College of Dentistry.
| | - Venita Sposetti
- Dr. Echeto is Clinical Associate Professor and Director, Division of Prosthodontics and Predoctoral Prosthodontics Program, Department of Restorative Dental Sciences, University of Florida College of Dentistry; Dr. Sposetti is Associate Dean for Education and Associate Professor, Division of Prosthodontics, Department of Restorative Dental Sciences, University of Florida College of Dentistry; Ms. Childs is Associate in Dentistry and Director of Curriculum and Instruction, University of Florida College of Dentistry; Dr. Aguilar is Clinical Assistant Professor, Division of Prosthodontics, Department of Restorative Dental Sciences, University of Florida College of Dentistry; Dr. Behar-Horenstein is Distinguished Teaching Scholar and Professor, Department of Educational Administration and Policy, University of Florida School of Human Development and Organizational Studies in Education and Affiliate Professor, Community Dentistry and Behavioral Science, University of Florida College of Dentistry; Dr. Rueda is Clinical Associate Professor, Division of Prosthodontics, Department of Restorative Dental Sciences, University of Florida College of Dentistry; and Dr. Nimmo is Professor, Division of Prosthodontics, Department of Restorative Dental Sciences, University of Florida College of Dentistry
| | - Gail Childs
- Dr. Echeto is Clinical Associate Professor and Director, Division of Prosthodontics and Predoctoral Prosthodontics Program, Department of Restorative Dental Sciences, University of Florida College of Dentistry; Dr. Sposetti is Associate Dean for Education and Associate Professor, Division of Prosthodontics, Department of Restorative Dental Sciences, University of Florida College of Dentistry; Ms. Childs is Associate in Dentistry and Director of Curriculum and Instruction, University of Florida College of Dentistry; Dr. Aguilar is Clinical Assistant Professor, Division of Prosthodontics, Department of Restorative Dental Sciences, University of Florida College of Dentistry; Dr. Behar-Horenstein is Distinguished Teaching Scholar and Professor, Department of Educational Administration and Policy, University of Florida School of Human Development and Organizational Studies in Education and Affiliate Professor, Community Dentistry and Behavioral Science, University of Florida College of Dentistry; Dr. Rueda is Clinical Associate Professor, Division of Prosthodontics, Department of Restorative Dental Sciences, University of Florida College of Dentistry; and Dr. Nimmo is Professor, Division of Prosthodontics, Department of Restorative Dental Sciences, University of Florida College of Dentistry
| | - Maria L Aguilar
- Dr. Echeto is Clinical Associate Professor and Director, Division of Prosthodontics and Predoctoral Prosthodontics Program, Department of Restorative Dental Sciences, University of Florida College of Dentistry; Dr. Sposetti is Associate Dean for Education and Associate Professor, Division of Prosthodontics, Department of Restorative Dental Sciences, University of Florida College of Dentistry; Ms. Childs is Associate in Dentistry and Director of Curriculum and Instruction, University of Florida College of Dentistry; Dr. Aguilar is Clinical Assistant Professor, Division of Prosthodontics, Department of Restorative Dental Sciences, University of Florida College of Dentistry; Dr. Behar-Horenstein is Distinguished Teaching Scholar and Professor, Department of Educational Administration and Policy, University of Florida School of Human Development and Organizational Studies in Education and Affiliate Professor, Community Dentistry and Behavioral Science, University of Florida College of Dentistry; Dr. Rueda is Clinical Associate Professor, Division of Prosthodontics, Department of Restorative Dental Sciences, University of Florida College of Dentistry; and Dr. Nimmo is Professor, Division of Prosthodontics, Department of Restorative Dental Sciences, University of Florida College of Dentistry
| | - Linda S Behar-Horenstein
- Dr. Echeto is Clinical Associate Professor and Director, Division of Prosthodontics and Predoctoral Prosthodontics Program, Department of Restorative Dental Sciences, University of Florida College of Dentistry; Dr. Sposetti is Associate Dean for Education and Associate Professor, Division of Prosthodontics, Department of Restorative Dental Sciences, University of Florida College of Dentistry; Ms. Childs is Associate in Dentistry and Director of Curriculum and Instruction, University of Florida College of Dentistry; Dr. Aguilar is Clinical Assistant Professor, Division of Prosthodontics, Department of Restorative Dental Sciences, University of Florida College of Dentistry; Dr. Behar-Horenstein is Distinguished Teaching Scholar and Professor, Department of Educational Administration and Policy, University of Florida School of Human Development and Organizational Studies in Education and Affiliate Professor, Community Dentistry and Behavioral Science, University of Florida College of Dentistry; Dr. Rueda is Clinical Associate Professor, Division of Prosthodontics, Department of Restorative Dental Sciences, University of Florida College of Dentistry; and Dr. Nimmo is Professor, Division of Prosthodontics, Department of Restorative Dental Sciences, University of Florida College of Dentistry
| | - Luis Rueda
- Dr. Echeto is Clinical Associate Professor and Director, Division of Prosthodontics and Predoctoral Prosthodontics Program, Department of Restorative Dental Sciences, University of Florida College of Dentistry; Dr. Sposetti is Associate Dean for Education and Associate Professor, Division of Prosthodontics, Department of Restorative Dental Sciences, University of Florida College of Dentistry; Ms. Childs is Associate in Dentistry and Director of Curriculum and Instruction, University of Florida College of Dentistry; Dr. Aguilar is Clinical Assistant Professor, Division of Prosthodontics, Department of Restorative Dental Sciences, University of Florida College of Dentistry; Dr. Behar-Horenstein is Distinguished Teaching Scholar and Professor, Department of Educational Administration and Policy, University of Florida School of Human Development and Organizational Studies in Education and Affiliate Professor, Community Dentistry and Behavioral Science, University of Florida College of Dentistry; Dr. Rueda is Clinical Associate Professor, Division of Prosthodontics, Department of Restorative Dental Sciences, University of Florida College of Dentistry; and Dr. Nimmo is Professor, Division of Prosthodontics, Department of Restorative Dental Sciences, University of Florida College of Dentistry
| | - Arthur Nimmo
- Dr. Echeto is Clinical Associate Professor and Director, Division of Prosthodontics and Predoctoral Prosthodontics Program, Department of Restorative Dental Sciences, University of Florida College of Dentistry; Dr. Sposetti is Associate Dean for Education and Associate Professor, Division of Prosthodontics, Department of Restorative Dental Sciences, University of Florida College of Dentistry; Ms. Childs is Associate in Dentistry and Director of Curriculum and Instruction, University of Florida College of Dentistry; Dr. Aguilar is Clinical Assistant Professor, Division of Prosthodontics, Department of Restorative Dental Sciences, University of Florida College of Dentistry; Dr. Behar-Horenstein is Distinguished Teaching Scholar and Professor, Department of Educational Administration and Policy, University of Florida School of Human Development and Organizational Studies in Education and Affiliate Professor, Community Dentistry and Behavioral Science, University of Florida College of Dentistry; Dr. Rueda is Clinical Associate Professor, Division of Prosthodontics, Department of Restorative Dental Sciences, University of Florida College of Dentistry; and Dr. Nimmo is Professor, Division of Prosthodontics, Department of Restorative Dental Sciences, University of Florida College of Dentistry
| |
Collapse
|
29
|
Echeto LF, Sposetti V, Childs G, Aguilar ML, Behar-Horenstein LS, Rueda L, Nimmo A. Evaluation of Team-Based Learning and Traditional Instruction in Teaching Removable Partial Denture Concepts. J Dent Educ 2015. [DOI: 10.1002/j.0022-0337.2015.79.9.tb05997.x] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
Affiliation(s)
- Luisa F. Echeto
- Division of Prosthodontics and Predoctoral Prosthodontics Program; Department of Restorative Dental Sciences; University of Florida College of Dentistry
| | - Venita Sposetti
- Division of Prosthodontics; Department of Restorative Dental Sciences; University of Florida College of Dentistry
| | | | - Maria L. Aguilar
- Division of Prosthodontics; Department of Restorative Dental Sciences; University of Florida College of Dentistry
| | - Linda S. Behar-Horenstein
- Department of Educational Administration and Policy; University of Florida School of Human Development and Organizational Studies in Education and Affiliate Professor; Community Dentistry and Behavioral Science; University of Florida College of Dentistry
| | - Luis Rueda
- Division of Prosthodontics; Department of Restorative Dental Sciences; University of Florida College of Dentistry
| | - Arthur Nimmo
- Division of Prosthodontics; Department of Restorative Dental Sciences; University of Florida College of Dentistry
| |
Collapse
|
30
|
|
31
|
Abstract
Genome-wide profiling of DNA-binding proteins using ChIP-Seq has emerged as an alternative to ChIP-chip methods. ChIP-Seq technology offers many advantages over ChIP-chip arrays, including but not limited to less noise, higher resolution, and more coverage. Several algorithms have been developed to take advantage of these abilities and find enriched regions by analyzing ChIP-Seq data. However, the complexity of analyzing various patterns of ChIP-Seq signals still needs the development of new algorithms. Most current algorithms use various heuristics to detect regions accurately. However, despite how many formulations are available, it is still difficult to accurately determine individual peaks corresponding to each binding event. We developed Constrained Multi-level Thresholding (CMT), an algorithm used to detect enriched regions on ChIP-Seq data. CMT employs a constraint-based module that can target regions within a specific range. We show that CMT has higher accuracy in detecting enriched regions (peaks) by objectively assessing its performance relative to other previously proposed peak finders. This is shown by testing three algorithms on the well-known FoxA1 Data set, four transcription factors (with a total of six antibodies) for Drosophila melanogaster and the H3K4ac antibody dataset.
Collapse
Affiliation(s)
- Iman Rezaeian
- School of Computer Science, University of Windsor, Windsor, Ontario, Canada
| | - Luis Rueda
- School of Computer Science, University of Windsor, Windsor, Ontario, Canada
- * E-mail:
| |
Collapse
|
32
|
Abstract
Background microRNAs are a class of small RNAs, about 20 nt long, which regulate cellular processes in animals and plants. Identifying microRNAs is one of the most important tasks in gene regulation studies. The main features used for identifying these tiny molecules are those in hairpin secondary structures of pre-microRNA. Results A new classifier is employed to identify precursor microRNAs from both pseudo hairpins and other non-coding RNAs. This classifier achieves a geometric mean Gm = 92.20% with just three features and 92.91% with seven features. Conclusion This study shows that linear dimensionality reduction combined with explicit feature mapping, namely miLDR-EM, achieves high performance in classification of microRNAs from other sequences. Also, explicitly mapping data onto a high dimensional space could be a useful alternative to kernel-based methods for large datasets with a small number of features. Moreover, we demonstrate that microRNAs can be accurately identified by just using three properties that involve minimum free energy.
Collapse
|
33
|
Abstract
BACKGROUND Prediction and analysis of protein-protein interactions (PPI) and specifically types of PPIs is an important problem in life science research because of the fundamental roles of PPIs in many biological processes in living cells. In addition, electrostatic interactions are important in understanding inter-molecular interactions, since they are long-range, and because of their influence in charged molecules. This is the main motivation for using electrostatic energy for prediction of PPI types. RESULTS We propose a prediction model to analyze protein interaction types, namely obligate and non-obligate, using electrostatic energy values as properties. The prediction approach uses electrostatic energy values for pairs of atoms and amino acids present in interfaces where the interaction occurs. The main features of the complexes are found and then the prediction is performed via several state-of-the-art classification techniques, including linear dimensionality reduction (LDR), support vector machine (SVM), naive Bayes (NB) and k-nearest neighbor (k-NN). For an in-depth analysis of classification results, some other experiments were performed by varying the distance cutoffs between atom pairs of interacting chains, ranging from 5Å to 13Å. Moreover, several feature selection algorithms including gain ratio (GR), information gain (IG), chi-square (Chi2) and minimum redundancy maximum relevance (mRMR) are applied on the available datasets to obtain more discriminative pairs of atom types and amino acid types as features for prediction. CONCLUSIONS Our results on two well-known datasets of obligate and non-obligate complexes confirm that electrostatic energy is an important property to predict obligate and non-obligate protein interaction types on the basis of all the experimental results, achieving accuracies of over 98%. Furthermore, a comparison performed by changing the distance cutoff demonstrates that the best values for prediction of PPI types using electrostatic energy range from 9Å to 12Å, which show that electrostatic interactions are long-range and cover a broader area in the interface. In addition, the results on using feature selection before prediction confirm that (a) a few pairs of atoms and amino acids are appropriate for prediction, and (b) prediction performance can be improved by eliminating irrelevant and noisy features and selecting the most discriminative ones.
Collapse
Affiliation(s)
- Mina Maleki
- School of Computer Science, University of Windsor, 401 Sunset Avenue, Windsor, Ontario, N9B 3P4, Canada
| | - Gokul Vasudev
- School of Computer Science, University of Windsor, 401 Sunset Avenue, Windsor, Ontario, N9B 3P4, Canada
| | - Luis Rueda
- School of Computer Science, University of Windsor, 401 Sunset Avenue, Windsor, Ontario, N9B 3P4, Canada
| |
Collapse
|
34
|
Rueda L, Saralegui A, Fernández d'Arlas B, Zhou Q, Berglund LA, Corcuera MA, Mondragon I, Eceiza A. Cellulose nanocrystals/polyurethane nanocomposites. Study from the viewpoint of microphase separated structure. Carbohydr Polym 2012; 92:751-7. [PMID: 23218363 DOI: 10.1016/j.carbpol.2012.09.093] [Citation(s) in RCA: 102] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2012] [Revised: 09/19/2012] [Accepted: 09/28/2012] [Indexed: 11/17/2022]
Abstract
Cellulose nanocrystals (CNC) successfully obtained from microcrystalline cellulose (MCC) were dispersed in a thermoplastic polyurethane as matrix. Nanocomposites containing 1.5, 5, 10 and 30 wt% CNC were prepared by solvent casting procedure and properties of the resulting films were evaluated from the viewpoint of polyurethane microphase separated structure, soft and hard domains. CNC were effectively dispersed in the segmented thermoplastic elastomeric polyurethane (STPUE) matrix due to the favorable matrix-nanocrystals interactions through hydrogen bonding. Cellulose nanocrystals interacted with both soft and hard segments, enhancing stiffness and stability versus temperature of the nanocomposites. Thermal and mechanical properties of STPUE/CNC nanocomposites have been associated to the generated morphologies investigated by AFM images.
Collapse
Affiliation(s)
- L Rueda
- Materials+Technologies Group, Department of Chemical and Environmental Engineering, Polytechnic School, University of Basque Country, Pza. Europa 1, 20018 Donostia-San Sebastián, Spain
| | | | | | | | | | | | | | | |
Collapse
|
35
|
Aziz MM, Maleki M, Rueda L, Raza M, Banerjee S. Prediction of biological protein-protein interactions using atom-type and amino acid properties. Proteomics 2011; 11:3802-10. [DOI: 10.1002/pmic.201100186] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2011] [Revised: 05/25/2011] [Accepted: 05/30/2011] [Indexed: 11/10/2022]
|
36
|
Abstract
BACKGROUND Processing cDNA microarray images is a crucial step in gene expression analysis, since any errors in early stages affect subsequent steps, leading to possibly erroneous biological conclusions. When processing the underlying images, accurately separating the sub-grids and spots is extremely important for subsequent steps that include segmentation, quantification, normalization and clustering. RESULTS We propose a parameterless and fully automatic approach that first detects the sub-grids given the entire microarray image, and then detects the locations of the spots in each sub-grid. The approach, first, detects and corrects rotations in the images by applying an affine transformation, followed by a polynomial-time optimal multi-level thresholding algorithm used to find the positions of the sub-grids in the image and the positions of the spots in each sub-grid. Additionally, a new validity index is proposed in order to find the correct number of sub-grids in the image, and the correct number of spots in each sub-grid. Moreover, a refinement procedure is used to correct possible misalignments and increase the accuracy of the method. CONCLUSIONS Extensive experiments on real-life microarray images and a comparison to other methods show that the proposed method performs these tasks fully automatically and with a very high degree of accuracy. Moreover, unlike previous methods, the proposed approach can be used in various type of microarray images with different resolutions and spot sizes and does not need any parameter to be adjusted.
Collapse
Affiliation(s)
- Luis Rueda
- School of Computer Science, University of Windsor, Ontario, Canada.
| | | |
Collapse
|
37
|
Rueda L, Wong F, Cooper M, Clark A. Cast metal bases as an economical alternative for the severely resorbed mandible. Gen Dent 2011; 59:e63-e66. [PMID: 21903510] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Subscribe] [Scholar Register] [Indexed: 05/31/2023]
Abstract
Resorption of the alveolar ridge is a common problem in edentulous patients and can compromise the stability and function of dentures. Resorption and its consequences can be minimized when strategically placed implants are used; however, this option is financially out of reach for many patients. The article discusses a more cost-effective alternative (metalbased dentures) for patients with ridge resorption. In certain environments, like a dental school, where patients are looking for solutions to their dental problems at a reasonable price, cast metal bases can be a feasible economical alternative for edentulous patients. Both cases presented here demonstrated a significant improvement in stability, phonation, and mastication.
Collapse
Affiliation(s)
- Luis Rueda
- Department of Prosthodontics, University of Florida College of Dentistry, Gainesville, USA
| | | | | | | |
Collapse
|
38
|
Rojas D, Rueda L, Ngom A, Hurrutia H, Carcamo G. Image segmentation of biofilm structures using optimal multi-level thresholding. INT J DATA MIN BIOIN 2011; 5:266-86. [DOI: 10.1504/ijdmb.2011.040384] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
|
39
|
Corcuera M, Rueda L, Fernandez d’Arlas B, Arbelaiz A, Marieta C, Mondragon I, Eceiza A. Microstructure and properties of polyurethanes derived from castor oil. Polym Degrad Stab 2010. [DOI: 10.1016/j.polymdegradstab.2010.03.001] [Citation(s) in RCA: 101] [Impact Index Per Article: 7.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]
|
40
|
Ngom A, Rueda L, Wang L, Gras R. Selection based heuristics for the non-unique oligonucleotide probe selection problem in microarray design. Pattern Recognit Lett 2010. [DOI: 10.1016/j.patrec.2010.04.015] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
|
41
|
Subhani N, Rueda L, Ngom A, Burden CJ. Multiple gene expression profile alignment for microarray time-series data clustering. Bioinformatics 2010; 26:2281-8. [DOI: 10.1093/bioinformatics/btq422] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
|
42
|
Rueda L, Bari A, Ngom A. Clustering Time-Series Gene Expression Data with Unequal Time Intervals. Transactions on Computational Systems Biology X 2008. [DOI: 10.1007/978-3-540-92273-5_6] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
|
43
|
Rueda L, Oommen BJ. Stochastic automata-based estimators for adaptively compressing files with nonstationary distributions. IEEE Trans Syst Man Cybern B Cybern 2006; 36:1196-200. [PMID: 17036824 DOI: 10.1109/tsmcb.2006.872256] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Abstract
This correspondence shows that learning automata techniques, which have been useful in developing weak estimators, can be applied to data compression applications in which the data distributions are nonstationary. The adaptive coding scheme utilizes stochastic learning-based weak estimation techniques to adaptively update the probabilities of the source symbols, and this is done without resorting to either maximum likelihood, Bayesian, or sliding-window methods. The authors have incorporated the estimator in the adaptive Fano coding scheme and in an adaptive entropy-based scheme that "resembles" the well-known arithmetic coding. The empirical results obtained for both of these adaptive methods are obtained on real-life files that possess a fair degree of nonstationarity. From these results, it can be seen that the proposed schemes compress nearly 10% more than their respective adaptive methods that use maximum-likelihood estimator-based estimates.
Collapse
|
44
|
Abstract
Image and statistical analysis are two important stages of cDNA microarrays. Of these, gridding is necessary to accurately identify the location of each spot while extracting spot intensities from the microarray images and automating this procedure permits high-throughput analysis. Due to the deficiencies of the equipment used to print the arrays, rotations, misalignments, high contamination with noise and artifacts, and the enormous amount of data generated, solving the gridding problem by means of an automatic system is not trivial. Existing techniques to solve the automatic grid segmentation problem cover only limited aspects of this challenging problem and require the user to specify the size of the spots, the number of rows and columns in the grid, and boundary conditions. In this paper, a hill-climbing automatic gridding and spot quantification technique is proposed which takes a microarray image (or a subgrid) as input and makes no assumptions about the size of the spots, rows, and columns in the grid. The proposed method is based on a hill-climbing approach that utilizes different objective functions. The method has been found to effectively detect the grids on microarray images drawn from databases from GEO and the Stanford genomic laboratories.
Collapse
Affiliation(s)
- Luis Rueda
- Department of Computer Science, University of Concepcion, Edmundo Larenas 215, Concepcion, VIII Region, Chile.
| | | |
Collapse
|
45
|
Abstract
Following the invention of microarrays in 1994, the development and applications of this technology have grown exponentially. The numerous applications of microarray technology include clinical diagnosis and treatment, drug design and discovery, tumour detection, and environmental health research. One of the key issues in the experimental approaches utilising microarrays is to extract quantitative information from the spots, which represent genes in a given experiment. For this process, the initial stages are important and they influence future steps in the analysis. Identifying the spots and separating the background from the foreground is a fundamental problem in DNA microarray data analysis. In this review, we present an overview of state-of-the-art methods for microarray image segmentation. We discuss the foundations of the circle-shaped approach, adaptive shape segmentation, histogram-based methods and the recently introduced clustering-based techniques. We analytically show that clustering-based techniques are equivalent to the one-dimensional, standard k-means clustering algorithm that utilises the Euclidean distance.
Collapse
Affiliation(s)
- Li Qin
- IBM Canada Ltd, Markham, Ontario, Canada
| | | | | | | |
Collapse
|
46
|
Pineda-Fernández A, Rueda L, Huang D, Nur J, Jaramillo J. Laser in situ Keratomileusis for Hyperopia and Hyperopic Astigmatism With the Nidek EC-5000 Excimer Laser. J Refract Surg 2001; 17:670-5. [PMID: 11758985 DOI: 10.3928/1081-597x-20011101-06] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022]
Abstract
PURPOSE We evaluated the efficacy, predictability, stability, and safety of laser in situ keratomileusis (LASIK) for hyperopia and hyperopic astigmatism. METHODS A retrospective study was performed for 92 eyes of 62 consecutive patients to evaluate uncorrected (UCVA) and best spectacle-corrected visual acuity (BSCVA) and manifest refraction before and 3 and 6 months after LASIK (Moria LSK-ONE microkeratome, Nidek EC-5000 excimer laser). Eyes were divided into groups: Group 1 (low hyperopia) for spherical correction of +1.00 to +3.00 D (22 eyes), Group 2 (low hyperopic astigmatism) for toric correction with spherical equivalent refraction of +1.00 to +3.00 D (18 eyes), Group 3 (moderate hyperopia) for spherical correction of +3.25 to +6.00 D (10 eyes), and Group 4 (moderate hyperopic astigmatism) for toric correction with spherical equivalent refraction between +3.25 and +6.00 D (18 eyes). RESULTS At 3 and 6 months after LASIK, 68 eyes (73.9%) were available for follow-up examination. Percentage of eyes with a spherical equivalent refraction within +/-0.50 D of emmetropia for Group 1 was 54.5% (12 eyes); Group 2, 50% (9 eyes); Group 3, 40% (4 eyes), and Group 4, 38.8% (7 eyes). UCVA > or =20/20 in Group 1 was 14% and in Groups 2, 3, and 4, 0%. One eye (5.5%) lost two lines of BSCVA. CONCLUSION LASIK with the Moria LSK-ONE microkeratome and the Nidek EC-5000 excimer laser reduced low and moderate hyperopia and was within +/-0.50 D of target outcome in approximately 50% of eyes. Undercorrection was evident in all groups. The procedure was safe.
Collapse
|
47
|
Botello AV, Diaz G, Rueda L, Villanueva SF. Organochlorine compounds in oysters and sediments from coastal lagoons of the Gulf of Mexico. Bull Environ Contam Toxicol 1994; 53:238-245. [PMID: 8086706 DOI: 10.1007/bf00192039] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/22/2023]
Affiliation(s)
- A V Botello
- Institute for Marine and Limnological Sciences, National Autonomous University of Mexico, Mexico City, D.F
| | | | | | | |
Collapse
|
48
|
Londoño F, Muvdi F, Giraldo F, Rueda L, Caputo A. [Familial actinic prurigo]. Arch Argent Dermatol 1966; 16:290-307. [PMID: 5999271] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [MESH Headings] [Subscribe] [Scholar Register] [Indexed: 01/17/2023]
|