1
|
Castanho EN, Aidos H, Madeira SC. Biclustering data analysis: a comprehensive survey. Brief Bioinform 2024; 25:bbae342. [PMID: 39007596 PMCID: PMC11247412 DOI: 10.1093/bib/bbae342] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2023] [Revised: 05/16/2024] [Accepted: 07/01/2024] [Indexed: 07/16/2024] Open
Abstract
Biclustering, the simultaneous clustering of rows and columns of a data matrix, has proved its effectiveness in bioinformatics due to its capacity to produce local instead of global models, evolving from a key technique used in gene expression data analysis into one of the most used approaches for pattern discovery and identification of biological modules, used in both descriptive and predictive learning tasks. This survey presents a comprehensive overview of biclustering. It proposes an updated taxonomy for its fundamental components (bicluster, biclustering solution, biclustering algorithms, and evaluation measures) and applications. We unify scattered concepts in the literature with new definitions to accommodate the diversity of data types (such as tabular, network, and time series data) and the specificities of biological and biomedical data domains. We further propose a pipeline for biclustering data analysis and discuss practical aspects of incorporating biclustering in real-world applications. We highlight prominent application domains, particularly in bioinformatics, and identify typical biclusters to illustrate the analysis output. Moreover, we discuss important aspects to consider when choosing, applying, and evaluating a biclustering algorithm. We also relate biclustering with other data mining tasks (clustering, pattern mining, classification, triclustering, N-way clustering, and graph mining). Thus, it provides theoretical and practical guidance on biclustering data analysis, demonstrating its potential to uncover actionable insights from complex datasets.
Collapse
Affiliation(s)
- Eduardo N Castanho
- LASIGE, Faculdade de Ciências, Universidade de Lisboa, Campo Grande 16, P-1749-016 Lisbon, Portugal
| | - Helena Aidos
- LASIGE, Faculdade de Ciências, Universidade de Lisboa, Campo Grande 16, P-1749-016 Lisbon, Portugal
| | - Sara C Madeira
- LASIGE, Faculdade de Ciências, Universidade de Lisboa, Campo Grande 16, P-1749-016 Lisbon, Portugal
| |
Collapse
|
2
|
Scharl T, Grün B. A clustering procedure for three-way RNA sequencing data using data transformations and matrix-variate Gaussian mixture models. BMC Bioinformatics 2024; 25:90. [PMID: 38429687 PMCID: PMC10905927 DOI: 10.1186/s12859-024-05717-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2023] [Accepted: 02/21/2024] [Indexed: 03/03/2024] Open
Abstract
RNA sequencing of time-course experiments results in three-way count data where the dimensions are the genes, the time points and the biological units. Clustering RNA-seq data allows to extract groups of co-expressed genes over time. After standardisation, the normalised counts of individual genes across time points and biological units have similar properties as compositional data. We propose the following procedure to suitably cluster three-way RNA-seq data: (1) pre-process the RNA-seq data by calculating the normalised expression profiles, (2) transform the data using the additive log ratio transform to map the composition in the D-part Aitchison simplex to a D - 1 -dimensional Euclidean vector, (3) cluster the transformed RNA-seq data using matrix-variate Gaussian mixture models and (4) assess the quality of the overall cluster solution and of individual clusters based on cluster separation in the transformed space using density-based silhouette information and on compactness of the cluster in the original space using cluster maps as a suitable visualisation. The proposed procedure is illustrated on RNA-seq data from fission yeast and results are also compared to an analogous two-way approach after flattening out the biological units.
Collapse
Affiliation(s)
- Theresa Scharl
- Institute of Statistics, University of Natural Resources and Life Sciences, Vienna, Austria.
| | - Bettina Grün
- Institute for Statistics and Mathematics, Vienna University of Economics and Business, Vienna, Austria
| |
Collapse
|
3
|
Ai D, Chen L, Xie J, Cheng L, Zhang F, Luan Y, Li Y, Hou S, Sun F, Xia LC. Identifying local associations in biological time series: algorithms, statistical significance, and applications. Brief Bioinform 2023; 24:bbad390. [PMID: 37930023 DOI: 10.1093/bib/bbad390] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2023] [Revised: 08/21/2023] [Accepted: 09/14/2023] [Indexed: 11/07/2023] Open
Abstract
Local associations refer to spatial-temporal correlations that emerge from the biological realm, such as time-dependent gene co-expression or seasonal interactions between microbes. One can reveal the intricate dynamics and inherent interactions of biological systems by examining the biological time series data for these associations. To accomplish this goal, local similarity analysis algorithms and statistical methods that facilitate the local alignment of time series and assess the significance of the resulting alignments have been developed. Although these algorithms were initially devised for gene expression analysis from microarrays, they have been adapted and accelerated for multi-omics next generation sequencing datasets, achieving high scientific impact. In this review, we present an overview of the historical developments and recent advances for local similarity analysis algorithms, their statistical properties, and real applications in analyzing biological time series data. The benchmark data and analysis scripts used in this review are freely available at http://github.com/labxscut/lsareview.
Collapse
Affiliation(s)
- Dongmei Ai
- School of Mathematics and Physics, University of Science and Technology Beijing, Beijing 100083, China
| | - Lulu Chen
- School of Mathematics and Physics, University of Science and Technology Beijing, Beijing 100083, China
| | - Jiemin Xie
- Department of Statistics and Financial Mathematics, School of Mathematics, South China University of Technology, Guangzhou 510641, China
| | - Longwei Cheng
- School of Mathematics and Physics, University of Science and Technology Beijing, Beijing 100083, China
| | - Fang Zhang
- Shenwan Hongyuan Securities Co. Ltd., Shanghai 200031, China
| | - Yihui Luan
- School of Mathematics, Shandong University, Jinan 250100, China
| | - Yang Li
- Department of Statistics and Financial Mathematics, School of Mathematics, South China University of Technology, Guangzhou 510641, China
| | - Shengwei Hou
- Department of Ocean Science and Engineering, Southern University of Science and Technology, Shenzhen, 518055, China
| | - Fengzhu Sun
- Department of Quantitative and Computational Biology, University of Southern California, California, 90007, USA
| | - Li Charlie Xia
- Department of Statistics and Financial Mathematics, School of Mathematics, South China University of Technology, Guangzhou 510641, China
| |
Collapse
|
4
|
Han J, Yang Y, Yang X, Wang D, Wang X, Sun P. Exploring air pollution characteristics from spatio-temporal perspective: A case study of the top 10 urban agglomerations in China. ENVIRONMENTAL RESEARCH 2023; 224:115512. [PMID: 36804315 DOI: 10.1016/j.envres.2023.115512] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/01/2022] [Revised: 01/20/2023] [Accepted: 02/14/2023] [Indexed: 06/18/2023]
Abstract
Air pollution has become a global public health risk factor as rapid urbanization advances. To observe the air pollution situation, air monitoring stations have been established in many cities, which record six air pollutants. Previous studies have identified cities exhibiting similar air pollution characteristics by combining principal component analysis (PCA) with cluster analysis (CA). However, spatial and temporal effects were neglected. In this paper, we focus on the combination of GTWPCA and STCA, which fully incorporates spatio-temporal effects. It is then applied to air pollution data from the top 10 urban agglomerations in China during 2016-2021. Key experimental findings include: 1. GTWPCA provides a more detailed interpretation of local variation than PCA. 2. Compared with CA, STCA highlights the coupling effect in the spatial and temporal dimensions. 3. The combination of GTWPCA and STCA captures similar air pollution characteristics from spatio-temporal perspectives, which has the potential to help environmental authorities take further action to control air pollution.
Collapse
Affiliation(s)
- Jiakuan Han
- School of Marine Technology and Geomatics, Jiangsu Ocean University, Lianyungang, 222005, China
| | - Yi Yang
- School of Marine Technology and Geomatics, Jiangsu Ocean University, Lianyungang, 222005, China
| | - Xiaoyue Yang
- School of Marine Technology and Geomatics, Jiangsu Ocean University, Lianyungang, 222005, China.
| | - Dongchao Wang
- College of Geography and Environment, Shandong Normal University, Jinan, 250000, China
| | - Xiaolong Wang
- School of Marine Technology and Geomatics, Jiangsu Ocean University, Lianyungang, 222005, China
| | - Pengqi Sun
- School of Marine Technology and Geomatics, Jiangsu Ocean University, Lianyungang, 222005, China
| |
Collapse
|
5
|
Soares DF, Henriques R, Gromicho M, de Carvalho M, Madeira SC. Triclustering-based classification of longitudinal data for prognostic prediction: targeting relevant clinical endpoints in amyotrophic lateral sclerosis. Sci Rep 2023; 13:6182. [PMID: 37061549 PMCID: PMC10105751 DOI: 10.1038/s41598-023-33223-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/25/2022] [Accepted: 04/10/2023] [Indexed: 04/17/2023] Open
Abstract
This work proposes a new class of explainable prognostic models for longitudinal data classification using triclusters. A new temporally constrained triclustering algorithm, termed TCtriCluster, is proposed to comprehensively find informative temporal patterns common to a subset of patients in a subset of features (triclusters), and use them as discriminative features within a state-of-the-art classifier with guarantees of interpretability. The proposed approach further enhances prediction with the potentialities of model explainability by revealing clinically relevant disease progression patterns underlying prognostics, describing features used for classification. The proposed methodology is used in the Amyotrophic Lateral Sclerosis (ALS) Portuguese cohort (N = 1321), providing the first comprehensive assessment of the prognostic limits of five notable clinical endpoints: need for non-invasive ventilation (NIV); need for an auxiliary communication device; need for percutaneous endoscopic gastrostomy (PEG); need for a caregiver; and need for a wheelchair. Triclustering-based predictors outperform state-of-the-art alternatives, being able to predict the need for auxiliary communication device (within 180 days) and the need for PEG (within 90 days) with an AUC above 90%. The approach was validated in clinical practice, supporting healthcare professionals in understanding the link between the highly heterogeneous patterns of ALS disease progression and the prognosis.
Collapse
Affiliation(s)
- Diogo F Soares
- LASIGE, Faculdade de Ciências, Universidade de Lisboa, Lisbon, Portugal.
| | - Rui Henriques
- INESC-ID and Instituto Superior Técnico, Universidade de Lisboa, Lisbon, Portugal
| | - Marta Gromicho
- Instituto de Medicina Molecular and Instituto de Fisiologia, Faculdade de Medicina, Universidade de Lisboa, Lisbon, Portugal
| | - Mamede de Carvalho
- Instituto de Medicina Molecular and Instituto de Fisiologia, Faculdade de Medicina, Universidade de Lisboa, Lisbon, Portugal
| | - Sara C Madeira
- LASIGE, Faculdade de Ciências, Universidade de Lisboa, Lisbon, Portugal
| |
Collapse
|
6
|
Bi-EB: Empirical Bayesian Biclustering for Multi-Omics Data Integration Pattern Identification among Species. Genes (Basel) 2022; 13:genes13111982. [DOI: 10.3390/genes13111982] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/25/2022] [Revised: 09/20/2022] [Accepted: 09/23/2022] [Indexed: 11/16/2022] Open
Abstract
Although several biclustering algorithms have been studied, few are used for cross-pattern identification across species using multi-omics data mining. A fast empirical Bayesian biclustering (Bi-EB) algorithm is developed to detect the patterns shared from both integrated omics data and between species. The Bi-EB algorithm addresses the clinical critical translational question using the bioinformatics strategy, which addresses how modules of genotype variation associated with phenotype from cancer cell screening data can be identified and how these findings can be directly translated to a cancer patient subpopulation. Empirical Bayesian probabilistic interpretation and ratio strategy are proposed in Bi-EB for the first time to detect the pairwise regulation patterns among species and variations in multiple omics on a gene level, such as proteins and mRNA. An expectation–maximization (EM) optimal algorithm is used to extract the foreground co-current variations out of its background noise data by adjusting parameters with bicluster membership probability threshold Ac; and the bicluster average probability p. Three simulation experiments and two real biology mRNA and protein data analyses conducted on the well-known Cancer Genomics Atlas (TCGA) and The Cancer Cell Line Encyclopedia (CCLE) verify that the proposed Bi-EB algorithm can significantly improve the clustering recovery and relevance accuracy, outperforming the other seven biclustering methods—Cheng and Church (CC), xMOTIFs, BiMax, Plaid, Spectral, FABIA, and QUBIC—with a recovery score of 0.98 and a relevance score of 0.99. At the same time, the Bi-EB algorithm is used to determine shared the causality patterns of mRNA to the protein between patients and cancer cells in TCGA and CCLE breast cancer. The clinically well-known treatment target protein module estrogen receptor (ER), ER (p118), AR, BCL2, cyclin E1, and IGFBP2 are identified in accordance with their mRNA expression variations in the luminal-like subtype. Ten genes, including CCNB1, CDH1, KDR, RAB25, PRKCA, etc., found which can maintain the high accordance of mRNA–protein for both breast cancer patients and cell lines in basal-like subtypes for the first time. Bi-EB provides a useful biclustering analysis tool to discover the cross patterns hidden both in multiple data matrixes (omics) and species. The implementation of the Bi-EB method in the clinical setting will have a direct impact on administrating translational research based on the cancer cell screening guidance.
Collapse
|
7
|
Castanho EN, Aidos H, Madeira SC. Biclustering fMRI time series: a comparative study. BMC Bioinformatics 2022; 23:192. [PMID: 35606701 PMCID: PMC9126639 DOI: 10.1186/s12859-022-04733-8] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2021] [Accepted: 05/13/2022] [Indexed: 12/12/2022] Open
Abstract
BACKGROUND The effectiveness of biclustering, simultaneous clustering of rows and columns in a data matrix, was shown in gene expression data analysis. Several researchers recognize its potentialities in other research areas. Nevertheless, the last two decades have witnessed the development of a significant number of biclustering algorithms targeting gene expression data analysis and a lack of consistent studies exploring the capacities of biclustering outside this traditional application domain. RESULTS This work evaluates the potential use of biclustering in fMRI time series data, targeting the Region × Time dimensions by comparing seven state-in-the-art biclustering and three traditional clustering algorithms on artificial and real data. It further proposes a methodology for biclustering evaluation beyond gene expression data analysis. The results discuss the use of different search strategies in both artificial and real fMRI time series showed the superiority of exhaustive biclustering approaches, obtaining the most homogeneous biclusters. However, their high computational costs are a challenge, and further work is needed for the efficient use of biclustering in fMRI data analysis. CONCLUSIONS This work pinpoints avenues for the use of biclustering in spatio-temporal data analysis, in particular neurosciences applications. The proposed evaluation methodology showed evidence of the effectiveness of biclustering in finding local patterns in fMRI time series data. Further work is needed regarding scalability to promote the application in real scenarios.
Collapse
Affiliation(s)
| | - Helena Aidos
- LASIGE, Faculdade de Ciências, Universidade de Lisboa, Lisbon, Portugal
| | - Sara C. Madeira
- LASIGE, Faculdade de Ciências, Universidade de Lisboa, Lisbon, Portugal
| |
Collapse
|
8
|
Luo Y, Zhang AR. Tensor clustering with planted structures: Statistical optimality and computational limits. Ann Stat 2022. [DOI: 10.1214/21-aos2123] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
Affiliation(s)
- Yuetian Luo
- Department of Statistics, University of Wisconsin-Madison
| | - Anru R. Zhang
- Department of Statistics, University of Wisconsin-Madison
| |
Collapse
|
9
|
Lobo J, Henriques R, Madeira SC. G-Tric: generating three-way synthetic datasets with triclustering solutions. BMC Bioinformatics 2021; 22:16. [PMID: 33413095 PMCID: PMC7789692 DOI: 10.1186/s12859-020-03925-4] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2020] [Accepted: 12/07/2020] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Three-way data started to gain popularity due to their increasing capacity to describe inherently multivariate and temporal events, such as biological responses, social interactions along time, urban dynamics, or complex geophysical phenomena. Triclustering, subspace clustering of three-way data, enables the discovery of patterns corresponding to data subspaces (triclusters) with values correlated across the three dimensions (observations [Formula: see text] features [Formula: see text] contexts). With increasing number of algorithms being proposed, effectively comparing them with state-of-the-art algorithms is paramount. These comparisons are usually performed using real data, without a known ground-truth, thus limiting the assessments. In this context, we propose a synthetic data generator, G-Tric, allowing the creation of synthetic datasets with configurable properties and the possibility to plant triclusters. The generator is prepared to create datasets resembling real 3-way data from biomedical and social data domains, with the additional advantage of further providing the ground truth (triclustering solution) as output. RESULTS G-Tric can replicate real-world datasets and create new ones that match researchers needs across several properties, including data type (numeric or symbolic), dimensions, and background distribution. Users can tune the patterns and structure that characterize the planted triclusters (subspaces) and how they interact (overlapping). Data quality can also be controlled, by defining the amount of missing, noise or errors. Furthermore, a benchmark of datasets resembling real data is made available, together with the corresponding triclustering solutions (planted triclusters) and generating parameters. CONCLUSIONS Triclustering evaluation using G-Tric provides the possibility to combine both intrinsic and extrinsic metrics to compare solutions that produce more reliable analyses. A set of predefined datasets, mimicking widely used three-way data and exploring crucial properties was generated and made available, highlighting G-Tric's potential to advance triclustering state-of-the-art by easing the process of evaluating the quality of new triclustering approaches.
Collapse
Affiliation(s)
- João Lobo
- LASIGE, Faculdade de Ciências, Universidade de Lisboa, Campo Grande 016, 1749-016, Lisbon, Portugal
| | - Rui Henriques
- INESC-ID and Instituto Superior Técnico, Universidade de Lisboa, Av. Rovisco Pais 1, 1900-001, Lisbon, Portugal
| | - Sara C Madeira
- LASIGE, Faculdade de Ciências, Universidade de Lisboa, Campo Grande 016, 1749-016, Lisbon, Portugal.
| |
Collapse
|
10
|
Sanford JA, Nogiec CD, Lindholm ME, Adkins JN, Amar D, Dasari S, Drugan JK, Fernández FM, Radom-Aizik S, Schenk S, Snyder MP, Tracy RP, Vanderboom P, Trappe S, Walsh MJ, Adkins JN, Amar D, Dasari S, Drugan JK, Evans CR, Fernandez FM, Li Y, Lindholm ME, Nogiec CD, Radom-Aizik S, Sanford JA, Schenk S, Snyder MP, Tomlinson L, Tracy RP, Trappe S, Vanderboom P, Walsh MJ, Lee Alekel D, Bekirov I, Boyce AT, Boyington J, Fleg JL, Joseph LJ, Laughlin MR, Maruvada P, Morris SA, McGowan JA, Nierras C, Pai V, Peterson C, Ramos E, Roary MC, Williams JP, Xia A, Cornell E, Rooney J, Miller ME, Ambrosius WT, Rushing S, Stowe CL, Jack Rejeski W, Nicklas BJ, Pahor M, Lu CJ, Trappe T, Chambers T, Raue U, Lester B, Bergman BC, Bessesen DH, Jankowski CM, Kohrt WM, Melanson EL, Moreau KL, Schauer IE, Schwartz RS, Kraus WE, Slentz CA, Huffman KM, Johnson JL, Willis LH, Kelly L, Houmard JA, Dubis G, Broskey N, Goodpaster BH, Sparks LM, Coen PM, Cooper DM, Haddad F, Rankinen T, Ravussin E, Johannsen N, Harris M, Jakicic JM, Newman AB, Forman DD, Kershaw E, Rogers RJ, Nindl BC, Page LC, Stefanovic-Racic M, Barr SL, Rasmussen BB, et alSanford JA, Nogiec CD, Lindholm ME, Adkins JN, Amar D, Dasari S, Drugan JK, Fernández FM, Radom-Aizik S, Schenk S, Snyder MP, Tracy RP, Vanderboom P, Trappe S, Walsh MJ, Adkins JN, Amar D, Dasari S, Drugan JK, Evans CR, Fernandez FM, Li Y, Lindholm ME, Nogiec CD, Radom-Aizik S, Sanford JA, Schenk S, Snyder MP, Tomlinson L, Tracy RP, Trappe S, Vanderboom P, Walsh MJ, Lee Alekel D, Bekirov I, Boyce AT, Boyington J, Fleg JL, Joseph LJ, Laughlin MR, Maruvada P, Morris SA, McGowan JA, Nierras C, Pai V, Peterson C, Ramos E, Roary MC, Williams JP, Xia A, Cornell E, Rooney J, Miller ME, Ambrosius WT, Rushing S, Stowe CL, Jack Rejeski W, Nicklas BJ, Pahor M, Lu CJ, Trappe T, Chambers T, Raue U, Lester B, Bergman BC, Bessesen DH, Jankowski CM, Kohrt WM, Melanson EL, Moreau KL, Schauer IE, Schwartz RS, Kraus WE, Slentz CA, Huffman KM, Johnson JL, Willis LH, Kelly L, Houmard JA, Dubis G, Broskey N, Goodpaster BH, Sparks LM, Coen PM, Cooper DM, Haddad F, Rankinen T, Ravussin E, Johannsen N, Harris M, Jakicic JM, Newman AB, Forman DD, Kershaw E, Rogers RJ, Nindl BC, Page LC, Stefanovic-Racic M, Barr SL, Rasmussen BB, Moro T, Paddon-Jones D, Volpi E, Spratt H, Musi N, Espinoza S, Patel D, Serra M, Gelfond J, Burns A, Bamman MM, Buford TW, Cutter GR, Bodine SC, Esser K, Farrar RP, Goodyear LJ, Hirshman MF, Albertson BG, Qian WJ, Piehowski P, Gritsenko MA, Monore ME, Petyuk VA, McDermott JE, Hansen JN, Hutchison C, Moore S, Gaul DA, Clish CB, Avila-Pacheco J, Dennis C, Kellis M, Carr S, Jean-Beltran PM, Keshishian H, Mani D, Clauser K, Krug K, Mundorff C, Pearce C, Ivanova AA, Ortlund EA, Maner-Smith K, Uppal K, Zhang T, Sealfon SC, Zaslavsky E, Nair V, Li S, Jain N, Ge Y, Sun Y, Nudelman G, Ruf-zamojski F, Smith G, Pincas N, Rubenstein A, Anne Amper M, Seenarine N, Lappalainen T, Lanza IR, Sreekumaran Nair K, Klaus K, Montgomery SB, Smith KS, Gay NR, Zhao B, Hung CJ, Zebarjadi N, Balliu B, Fresard L, Burant CF, Li JZ, Kachman M, Soni T, Raskind AB, Gerszten R, Robbins J, Ilkayeva O, Muehlbauer MJ, Newgard CB, Ashley EA, Wheeler MT, Jimenez-Morales D, Raja A, Dalton KP, Zhen J, Suk Kim Y, Christle JW, Marwaha S, Chin ET, Hershman SG, Hastie T, Tibshirani R, Rivas MA. Molecular Transducers of Physical Activity Consortium (MoTrPAC): Mapping the Dynamic Responses to Exercise. Cell 2020; 181:1464-1474. [PMID: 32589957 PMCID: PMC8800485 DOI: 10.1016/j.cell.2020.06.004] [Show More Authors] [Citation(s) in RCA: 175] [Impact Index Per Article: 35.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2020] [Revised: 05/19/2020] [Accepted: 06/01/2020] [Indexed: 12/31/2022]
Abstract
Exercise provides a robust physiological stimulus that evokes cross-talk among multiple tissues that when repeated regularly (i.e., training) improves physiological capacity, benefits numerous organ systems, and decreases the risk for premature mortality. However, a gap remains in identifying the detailed molecular signals induced by exercise that benefits health and prevents disease. The Molecular Transducers of Physical Activity Consortium (MoTrPAC) was established to address this gap and generate a molecular map of exercise. Preclinical and clinical studies will examine the systemic effects of endurance and resistance exercise across a range of ages and fitness levels by molecular probing of multiple tissues before and after acute and chronic exercise. From this multi-omic and bioinformatic analysis, a molecular map of exercise will be established. Altogether, MoTrPAC will provide a public database that is expected to enhance our understanding of the health benefits of exercise and to provide insight into how physical activity mitigates disease.
Collapse
|
11
|
Kakati T, Ahmed HA, Bhattacharyya DK, Kalita JK. THD-Tricluster: A robust triclustering technique and its application in condition specific change analysis in HIV-1 progression data. Comput Biol Chem 2018; 75:154-167. [PMID: 29787933 DOI: 10.1016/j.compbiolchem.2018.05.007] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2015] [Revised: 03/20/2018] [Accepted: 05/06/2018] [Indexed: 01/18/2023]
Abstract
Developing a cost-effective and robust triclustering algorithm that can identify triclusters of high biological significance in the gene-sample-time (GST) domain is a challenging task. Most existing triclustering algorithms can detect shifting and scaling patterns in isolation, they are not able to handle co-occurring shifting-and-scaling patterns. This paper makes an attempt to address this issue. It introduces a robust triclustering algorithm called THD-Tricluster to identify triclusters over the GST domain. In addition to applying over several benchmark datasets for its validation, the proposed THD-Tricluster algorithm was applied on HIV-1 progression data to identify disease-specific genes. THD-Tricluster could identify 38 most responsible genes for the deadly disease which includes GATA3, EGR1, JUN, ELF1, AGFG1, AGFG2, CX3CR1, CXCL12, CCR5, CCR2, and many others. The results are validated using GeneCard and other established results.
Collapse
Affiliation(s)
- Tulika Kakati
- Department of Computer Science and Engineering, School of Engineering, Tezpur University, Napaam, Sonitpur, Assam 784028, India.
| | - Hasin A Ahmed
- Department of Computer Science and Engineering, School of Engineering, Tezpur University, Napaam, Sonitpur, Assam 784028, India.
| | - Dhruba K Bhattacharyya
- Department of Computer Science and Engineering, School of Engineering, Tezpur University, Napaam, Sonitpur, Assam 784028, India.
| | - Jugal K Kalita
- Department of Computer Science, University of Colorado, Colorado Springs, USA.
| |
Collapse
|
12
|
Balasubramanian A, Wang J, Prabhakaran B. Discovering Multidimensional Motifs in Physiological Signals for Personalized Healthcare. IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING 2016; 10:832-841. [PMID: 28191269 PMCID: PMC5298205 DOI: 10.1109/jstsp.2016.2543679] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
Personalized diagnosis and therapy requires monitoring patient activity using various body sensors. Sensor data generated during personalized exercises or tasks may be too specific or inadequate to be evaluated using supervised methods such as classification. We propose multidimensional motif (MDM) discovery as a means for patient activity monitoring, since such motifs can capture repeating patterns across multiple dimensions of the data, and can serve as conformance indicators. Previous studies pertaining to mining MDMs have proposed approaches that lack the capability of concurrently processing multiple dimensions, thus limiting their utility in online scenarios. In this paper, we propose an efficient real-time approach to MDM discovery in body sensor generated time series data for monitoring performance of patients during therapy. We present two alternative models for MDMs based on motif co-occurrences and temporal ordering among motifs across multiple dimensions, with detailed formulation of the concepts proposed. The proposed method uses an efficient hashing based record to enable speedy update and retrieval of motif sets, and identification of MDMs. Performance evaluation using synthetic and real body sensor data in unsupervised motif discovery tasks shows that the approach is effective for (a) concurrent processing of multidimensional time series information suitable for real-time applications, (b) finding unknown naturally occurring patterns with minimal delay, and
Collapse
Affiliation(s)
- Arvind Balasubramanian
- Department of Computer Science, The University of Texas at Dallas, Richardson, TX, 75080 USA
| | - Jun Wang
- Department of Bioengineering, The University of Texas at Dallas, Richardson, TX, 75080 USA
| | | |
Collapse
|