1
|
Ai D, Chen L, Xie J, Cheng L, Zhang F, Luan Y, Li Y, Hou S, Sun F, Xia LC. Identifying local associations in biological time series: algorithms, statistical significance, and applications. Brief Bioinform 2023; 24:bbad390. [PMID: 37930023 DOI: 10.1093/bib/bbad390] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2023] [Revised: 08/21/2023] [Accepted: 09/14/2023] [Indexed: 11/07/2023] Open
Abstract
Local associations refer to spatial-temporal correlations that emerge from the biological realm, such as time-dependent gene co-expression or seasonal interactions between microbes. One can reveal the intricate dynamics and inherent interactions of biological systems by examining the biological time series data for these associations. To accomplish this goal, local similarity analysis algorithms and statistical methods that facilitate the local alignment of time series and assess the significance of the resulting alignments have been developed. Although these algorithms were initially devised for gene expression analysis from microarrays, they have been adapted and accelerated for multi-omics next generation sequencing datasets, achieving high scientific impact. In this review, we present an overview of the historical developments and recent advances for local similarity analysis algorithms, their statistical properties, and real applications in analyzing biological time series data. The benchmark data and analysis scripts used in this review are freely available at http://github.com/labxscut/lsareview.
Collapse
Affiliation(s)
- Dongmei Ai
- School of Mathematics and Physics, University of Science and Technology Beijing, Beijing 100083, China
| | - Lulu Chen
- School of Mathematics and Physics, University of Science and Technology Beijing, Beijing 100083, China
| | - Jiemin Xie
- Department of Statistics and Financial Mathematics, School of Mathematics, South China University of Technology, Guangzhou 510641, China
| | - Longwei Cheng
- School of Mathematics and Physics, University of Science and Technology Beijing, Beijing 100083, China
| | - Fang Zhang
- Shenwan Hongyuan Securities Co. Ltd., Shanghai 200031, China
| | - Yihui Luan
- School of Mathematics, Shandong University, Jinan 250100, China
| | - Yang Li
- Department of Statistics and Financial Mathematics, School of Mathematics, South China University of Technology, Guangzhou 510641, China
| | - Shengwei Hou
- Department of Ocean Science and Engineering, Southern University of Science and Technology, Shenzhen, 518055, China
| | - Fengzhu Sun
- Department of Quantitative and Computational Biology, University of Southern California, California, 90007, USA
| | - Li Charlie Xia
- Department of Statistics and Financial Mathematics, School of Mathematics, South China University of Technology, Guangzhou 510641, China
| |
Collapse
|
2
|
Zainal-Abidin RA, Harun S, Vengatharajuloo V, Tamizi AA, Samsulrizal NH. Gene Co-Expression Network Tools and Databases for Crop Improvement. PLANTS (BASEL, SWITZERLAND) 2022; 11:1625. [PMID: 35807577 PMCID: PMC9269215 DOI: 10.3390/plants11131625] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 04/26/2022] [Revised: 06/05/2022] [Accepted: 06/05/2022] [Indexed: 06/15/2023]
Abstract
Transcriptomics has significantly grown as a functional genomics tool for understanding the expression of biological systems. The generated transcriptomics data can be utilised to produce a gene co-expression network that is one of the essential downstream omics data analyses. To date, several gene co-expression network databases that store correlation values, expression profiles, gene names and gene descriptions have been developed. Although these resources remain scattered across the Internet, such databases complement each other and support efficient growth in the functional genomics area. This review presents the features and the most recent gene co-expression network databases in crops and summarises the present status of the tools that are widely used for constructing the gene co-expression network. The highlights of gene co-expression network databases and the tools presented here will pave the way for a robust interpretation of biologically relevant information. With this effort, the researcher would be able to explore and utilise gene co-expression network databases for crops improvement.
Collapse
Affiliation(s)
- Rabiatul-Adawiah Zainal-Abidin
- Biotechnology and Nanotechnology Research Centre, Malaysian Agricultural Research and Development Institute (MARDI), Serdang 43400, Selangor, Malaysia; (R.-A.Z.-A.); (A.-A.T.)
| | - Sarahani Harun
- Centre for Bioinformatics Research, Institute of Systems Biology, Universiti Kebangsaan Malaysia (UKM), Bangi 43600, Selangor, Malaysia;
| | - Vinothienii Vengatharajuloo
- Centre for Bioinformatics Research, Institute of Systems Biology, Universiti Kebangsaan Malaysia (UKM), Bangi 43600, Selangor, Malaysia;
| | - Amin-Asyraf Tamizi
- Biotechnology and Nanotechnology Research Centre, Malaysian Agricultural Research and Development Institute (MARDI), Serdang 43400, Selangor, Malaysia; (R.-A.Z.-A.); (A.-A.T.)
- Department of Plant Science, Kulliyyah of Science, International Islamic Universiti Malaysia (IIUM), Jalan Sultan Ahmad Shah, Bandar Indera Mahkota, Kuantan 25200, Pahang, Malaysia
| | - Nurul Hidayah Samsulrizal
- Department of Plant Science, Kulliyyah of Science, International Islamic Universiti Malaysia (IIUM), Jalan Sultan Ahmad Shah, Bandar Indera Mahkota, Kuantan 25200, Pahang, Malaysia
| |
Collapse
|
3
|
Arcsine laws for random walks generated from random permutations with applications to genomics. J Appl Probab 2021. [DOI: 10.1017/jpr.2021.14] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Abstract
AbstractA classical result for the simple symmetric random walk with 2n steps is that the number of steps above the origin, the time of the last visit to the origin, and the time of the maximum height all have exactly the same distribution and converge when scaled to the arcsine law. Motivated by applications in genomics, we study the distributions of these statistics for the non-Markovian random walk generated from the ascents and descents of a uniform random permutation and a Mallows(q) permutation and show that they have the same asymptotic distributions as for the simple random walk. We also give an unexpected conjecture, along with numerical evidence and a partial proof in special cases, for the result that the number of steps above the origin by step 2n for the uniform permutation generated walk has exactly the same discrete arcsine distribution as for the simple random walk, even though the other statistics for these walks have very different laws. We also give explicit error bounds to the limit theorems using Stein’s method for the arcsine distribution, as well as functional central limit theorems and a strong embedding of the Mallows(q) permutation which is of independent interest.
Collapse
|
4
|
Kandel A, Dhillon SK, Prabaharan CB, Fatnin Binti Hisham S, Rajamanickam K, Napper S, Chidambaram SB, Essa MM, Yang J, Sakharkar MK. Identifying kinase targets of PPARγ in human breast cancer. J Drug Target 2021; 29:660-668. [PMID: 33496213 DOI: 10.1080/1061186x.2021.1877719] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/22/2022]
Abstract
Breast cancer is the most common cancer in women. Despite advances in screening women for genetic predisposition to breast cancer and risk stratification, a majority of women carriers remain undetected until they become affected. Thus, there is a need to develop a cost-effective, rapid, sensitive and non-invasive early-stage diagnostic method. Kinases are involved in all fundamental cellular processes and mutations in kinases have been reported as drivers of cancer. PPARγ is a ligand-activated transcription factor that plays important roles in cell proliferation and metabolism. However, the complete set of kinases modulated by PPARγ is still unknown. In this study, we identified human kinases that are potential PPARγ targets and evaluated their differential expression and gene pair correlations in human breast cancer patient dataset TCGA-BRCA. We further confirmed the findings in human breast cancer cell lines MCF7 and SK-BR-3 using a kinome array. We observed that gene pair correlations are lost in tumours as compared to healthy controls and could be used as a supplement strategy for diagnosis and prognosis of breast cancer.
Collapse
Affiliation(s)
- Anish Kandel
- Drug Discovery and Development Research Group, College of Pharmacy and Nutrition, University of Saskatchewan, Saskatoon, Canada
| | - Sarinder Kaur Dhillon
- Faculty of Science, Institute of Biological Sciences, University of Malaya, Kuala Lumpur, Malaysia
| | - Chandra Bose Prabaharan
- Drug Discovery and Development Research Group, College of Pharmacy and Nutrition, University of Saskatchewan, Saskatoon, Canada
| | | | - Karthic Rajamanickam
- Drug Discovery and Development Research Group, College of Pharmacy and Nutrition, University of Saskatchewan, Saskatoon, Canada
| | - Scott Napper
- Vaccine and Infectious Disease Organization-International Vaccine Research Centre, University of Saskatchewan, Saskatoon, Canada.,Department of Biochemistry, College of Medicine, University of Saskatchewan, Saskatoon, Canada
| | - Saravana Babu Chidambaram
- Department of Pharmacology, JSS College of Pharmacy, JSS Academy of Higher Education & Research (JSSAHER), Mysuru, India
| | - Musthafa Mohamed Essa
- Ageing and Dementia Research Group, Sultan Qaboos University, Muscat, Oman.,Department of Food Science and Nutrition, College of Agricultural and Marine Sciences, Sultan Qaboos University, Muscat, Oman
| | - Jian Yang
- Drug Discovery and Development Research Group, College of Pharmacy and Nutrition, University of Saskatchewan, Saskatoon, Canada
| | - Meena Kishore Sakharkar
- Drug Discovery and Development Research Group, College of Pharmacy and Nutrition, University of Saskatchewan, Saskatoon, Canada
| |
Collapse
|
5
|
Wang YXR, Li L, Li JJ, Huang H. Network Modeling in Biology: Statistical Methods for Gene and Brain Networks. Stat Sci 2021; 36:89-108. [PMID: 34305304 PMCID: PMC8296984 DOI: 10.1214/20-sts792] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
Abstract
The rise of network data in many different domains has offered researchers new insight into the problem of modeling complex systems and propelled the development of numerous innovative statistical methodologies and computational tools. In this paper, we primarily focus on two types of biological networks, gene networks and brain networks, where statistical network modeling has found both fruitful and challenging applications. Unlike other network examples such as social networks where network edges can be directly observed, both gene and brain networks require careful estimation of edges using covariates as a first step. We provide a discussion on existing statistical and computational methods for edge esitimation and subsequent statistical inference problems in these two types of biological networks.
Collapse
Affiliation(s)
- Y X Rachel Wang
- School of Mathematics and Statistics, University of Sydney, Australia
| | - Lexin Li
- Department of Biostatistics and Epidemiology, School of Public Health, University of California, Berkeley
| | | | - Haiyan Huang
- Department of Statistics, University of California, Berkeley
| |
Collapse
|
6
|
Jiang P, Chamberlain CS, Vanderby R, Thomson JA, Stewart R. TimeMeter assesses temporal gene expression similarity and identifies differentially progressing genes. Nucleic Acids Res 2020; 48:e51. [PMID: 32123905 PMCID: PMC7229845 DOI: 10.1093/nar/gkaa142] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2019] [Revised: 02/03/2020] [Accepted: 02/26/2020] [Indexed: 01/02/2023] Open
Abstract
Comparative time series transcriptome analysis is a powerful tool to study development, evolution, aging, disease progression and cancer prognosis. We develop TimeMeter, a statistical method and tool to assess temporal gene expression similarity, and identify differentially progressing genes where one pattern is more temporally advanced than the other. We apply TimeMeter to several datasets, and show that TimeMeter is capable of characterizing complicated temporal gene expression associations. Interestingly, we find: (i) the measurement of differential progression provides a novel feature in addition to pattern similarity that can characterize early developmental divergence between two species; (ii) genes exhibiting similar temporal patterns between human and mouse during neural differentiation are under strong negative (purifying) selection during evolution; (iii) analysis of genes with similar temporal patterns in mouse digit regeneration and axolotl blastema differentiation reveals common gene groups for appendage regeneration with potential implications in regenerative medicine.
Collapse
Affiliation(s)
- Peng Jiang
- Regenerative Biology Laboratory, Morgridge Institute for Research, Madison, WI 53707, USA
| | - Connie S Chamberlain
- Department of Orthopedics and Rehabilitation, University of Wisconsin, Madison, WI 53706, USA
| | - Ray Vanderby
- Department of Orthopedics and Rehabilitation, University of Wisconsin, Madison, WI 53706, USA.,Department of Biomedical Engineering, University of Wisconsin, Madison, WI 53706, USA
| | - James A Thomson
- Regenerative Biology Laboratory, Morgridge Institute for Research, Madison, WI 53707, USA.,Department of Molecular, Cellular and Developmental Biology, University of California, Santa Barbara, CA 93106, USA
| | - Ron Stewart
- Regenerative Biology Laboratory, Morgridge Institute for Research, Madison, WI 53707, USA
| |
Collapse
|
7
|
Wada T, Fukumori K, Tanaka T, Fiori S. Anisotropic Gaussian kernel adaptive filtering by Lie-group dictionary learning. PLoS One 2020; 15:e0237654. [PMID: 32797071 PMCID: PMC7428144 DOI: 10.1371/journal.pone.0237654] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/11/2020] [Accepted: 07/30/2020] [Indexed: 11/23/2022] Open
Abstract
The present paper proposes a novel kernel adaptive filtering algorithm, where each Gaussian kernel is parameterized by a center vector and a symmetric positive definite (SPD) precision matrix, which is regarded as a generalization of scalar width parameter. In fact, different from conventional kernel adaptive systems, the proposed filter is structured as a superposition of non-isotropic Gaussian kernels, whose non-isotropy makes the filter more flexible. The adaptation algorithm will search for optimal parameters in a wider parameter space. This generalization brings the need of special treatment of parameters that have a geometric structure. In fact, the main contribution of this paper is to establish update rules for precision matrices on the Lie group of SPD matrices in order to ensure their symmetry and positive-definiteness. The parameters of this filter are adapted on the basis of a least-squares criterion to minimize the filtering error, together with an ℓ1-type regularization criterion to avoid overfitting and to prevent the increase of dimensionality of the dictionary. Experimental results confirm the validity of the proposed method.
Collapse
Affiliation(s)
- Tomoya Wada
- Department of Electrical and Electronic Engineering, Tokyo University of Agriculture and Technology, Koganei-shi, Tokyo, Japan
| | - Kosuke Fukumori
- Department of Electrical and Electronic Engineering, Tokyo University of Agriculture and Technology, Koganei-shi, Tokyo, Japan
| | - Toshihisa Tanaka
- Department of Electrical and Electronic Engineering, Tokyo University of Agriculture and Technology, Koganei-shi, Tokyo, Japan
| | - Simone Fiori
- Università Politecnica delle Marche, Ancona, Italy
| |
Collapse
|
8
|
Abstract
Rapid advances in genomic technologies have led to a wealth of diverse data, from which novel discoveries can be gleaned through the application of robust statistical and computational methods. Here, we describe GeneFishing, a semisupervised computational approach to reconstruct context-specific portraits of biological processes by leveraging gene-gene coexpression information. GeneFishing incorporates multiple high-dimensional statistical ideas, including dimensionality reduction, clustering, subsampling, and results aggregation, to produce robust results. To illustrate the power of our method, we applied it using 21 genes involved in cholesterol metabolism as "bait" to "fish out" (or identify) genes not previously identified as being connected to cholesterol metabolism. Using simulation and real datasets, we found that the results obtained through GeneFishing were more interesting for our study than those provided by related gene prioritization methods. In particular, application of GeneFishing to the GTEx liver RNA sequencing (RNAseq) data not only reidentified many known cholesterol-related genes, but also pointed to glyoxalase I (GLO1) as a gene implicated in cholesterol metabolism. In a follow-up experiment, we found that GLO1 knockdown in human hepatoma cell lines increased levels of cellular cholesterol ester, validating a role for GLO1 in cholesterol metabolism. In addition, we performed pantissue analysis by applying GeneFishing on various tissues and identified many potential tissue-specific cholesterol metabolism-related genes. GeneFishing appears to be a powerful tool for identifying related components of complex biological systems and may be used across a wide range of applications.
Collapse
|