1
|
Li Y, Song M. Exact model-free function inference using uniform marginal counts for null population. Bioinformatics 2025; 41:btaf121. [PMID: 40111834 PMCID: PMC11972114 DOI: 10.1093/bioinformatics/btaf121] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2024] [Revised: 11/14/2024] [Accepted: 03/18/2025] [Indexed: 03/22/2025] Open
Abstract
MOTIVATION Recognizing cause-effect relationships is a fundamental inquiry in science. However, current causal inference methods often focus on directionality but not statistical significance. A ramification is chance patterns of uneven marginal distributions achieving a perfect directionality score. RESULTS To overcome such issues, we design the uniform exact function test with continuity correction (UEFTC) to detect functional dependency between two discrete random variables. The null hypothesis is two variables being statistically independent. Unique from related tests whose null populations use observed marginals, we define the null population by an embedded uniform square. We also present a fast algorithm to accomplish the test. On datasets with ground truth, the UEFTC exhibits accurate directionality, low biases, and robust statistical behavior over alternatives. We found nonmonotonic response by gene TCB2 to beta-estradiol dosage in engineered yeast strains. In the human duodenum with environmental enteric dysfunction, we discovered pathology-dependent anti-co-methylated CpG sites in the vicinity of genes POU2AF1 and LSP1; such activity represents orchestrated methylation and demethylation along the same gene, unreported previously. The UEFTC has much improved effectiveness in exact model-free function inference for data-driven knowledge discovery. AVAILABILITY AND IMPLEMENTATION An open-source R package "UniExactFunTest" implementing the presented uniform exact function tests is available via CRAN at doi: 10.32614/CRAN.package.UniExactFunTest. Computer code to reproduce figures can be found in supplementary file "UEFTC-main.zip."
Collapse
Affiliation(s)
- Yiyi Li
- Department of Computer Science, New Mexico State University, Las Cruces, NM 88003, United States
| | - Mingzhou Song
- Department of Computer Science, New Mexico State University, Las Cruces, NM 88003, United States
- Molecular Biology and Interdisciplinary Life Sciences Graduate Program, New Mexico State University, Las Cruces, NM 88003, United States
| |
Collapse
|
2
|
Patil AR, Schug J, Liu C, Lahori D, Descamps HC, Naji A, Kaestner KH, Faryabi RB, Vahedi G. Modeling type 1 diabetes progression using machine learning and single-cell transcriptomic measurements in human islets. Cell Rep Med 2024; 5:101535. [PMID: 38677282 PMCID: PMC11148720 DOI: 10.1016/j.xcrm.2024.101535] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2023] [Revised: 01/22/2024] [Accepted: 04/07/2024] [Indexed: 04/29/2024]
Abstract
Type 1 diabetes (T1D) is a chronic condition in which beta cells are destroyed by immune cells. Despite progress in immunotherapies that could delay T1D onset, early detection of autoimmunity remains challenging. Here, we evaluate the utility of machine learning for early prediction of T1D using single-cell analysis of islets. Using gradient-boosting algorithms, we model changes in gene expression of single cells from pancreatic tissues in T1D and non-diabetic organ donors. We assess if mathematical modeling could predict the likelihood of T1D development in non-diabetic autoantibody-positive donors. While most autoantibody-positive donors are predicted to be non-diabetic, select donors with unique gene signatures are classified as T1D. Our strategy also reveals a shared gene signature in distinct T1D-associated models across cell types, suggesting a common effect of the disease on transcriptional outputs of these cells. Our study establishes a precedent for using machine learning in early detection of T1D.
Collapse
Affiliation(s)
- Abhijeet R Patil
- Department of Genetics, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA 19104, USA; Institute for Immunology and Immune Health, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA 19104, USA; Epigenetics Institute, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA 19104, USA; Institute for Diabetes, Obesity and Metabolism, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA 19104, USA
| | - Jonathan Schug
- Department of Genetics, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA 19104, USA; Epigenetics Institute, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA 19104, USA; Institute for Diabetes, Obesity and Metabolism, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA 19104, USA
| | - Chengyang Liu
- Institute for Immunology and Immune Health, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA 19104, USA; Department of Surgery, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA 19104, USA
| | - Deeksha Lahori
- Department of Genetics, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA 19104, USA; Epigenetics Institute, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA 19104, USA; Institute for Diabetes, Obesity and Metabolism, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA 19104, USA
| | - Hélène C Descamps
- Department of Genetics, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA 19104, USA; Epigenetics Institute, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA 19104, USA; Institute for Diabetes, Obesity and Metabolism, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA 19104, USA
| | - Ali Naji
- Institute for Immunology and Immune Health, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA 19104, USA; Institute for Diabetes, Obesity and Metabolism, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA 19104, USA; Department of Surgery, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA 19104, USA
| | - Klaus H Kaestner
- Department of Genetics, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA 19104, USA; Epigenetics Institute, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA 19104, USA; Institute for Diabetes, Obesity and Metabolism, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA 19104, USA
| | - Robert B Faryabi
- Institute for Immunology and Immune Health, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA 19104, USA; Epigenetics Institute, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA 19104, USA; Department of Pathology and Laboratory Medicine, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA 19104, USA; Abramson Family Cancer Research Institute, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA 19104, USA
| | - Golnaz Vahedi
- Department of Genetics, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA 19104, USA; Institute for Immunology and Immune Health, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA 19104, USA; Epigenetics Institute, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA 19104, USA; Institute for Diabetes, Obesity and Metabolism, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA 19104, USA; Abramson Family Cancer Research Institute, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA 19104, USA.
| |
Collapse
|
3
|
Gendall P, Gendall K, Branston JR, Edwards R, Wilson N, Hoek J. Going 'Super Value' in New Zealand: cigarette pricing strategies during a period of sustained annual excise tax increases. Tob Control 2024; 33:240-246. [PMID: 36008127 DOI: 10.1136/tc-2021-057232] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2021] [Accepted: 08/09/2022] [Indexed: 11/04/2022]
Abstract
BACKGROUND Between 2010 and 2020, the New Zealand (NZ) Government increased tobacco excise tax by inflation plus 10% each year. We reviewed market structure changes and examined whether NZ tobacco companies shifted excise tax increases to maintain the affordability of lower priced cigarette brands. METHODS We cluster-analysed market data that tobacco companies supply to the NZ Ministry of Health, created four price partitions and examined the size and share of these over time. For each partition, we analysed cigarette brand numbers and market share, calculated the volume-weighted real stick price for each year and compared this price across different price partitions. We calculated the net real retail price (price before tax) for each price partition and compared these prices before and after plain packaging took effect. RESULTS The number and market share of Super Value and Budget brands increased, while those of Everyday and Premium brands decreased. Differences between the price of Premium and Super Value brands increased, as did the net retail price difference for these partitions. Following plain packaging's implementation, Super Value brand numbers more than doubled; contrary to industry predictions, the price difference between these and higher priced brands did not narrow. CONCLUSIONS Between 2010 and 2020, NZ tobacco companies introduced more Super Value cigarette brands and shifted excise tax increases to reduce the impact these had on low-priced brands. Setting a minimum retail price for cigarettes could curtail tobacco companies' ability to undermine tobacco taxation policies designed to reduce smoking.
Collapse
Affiliation(s)
- Philip Gendall
- Public Health, University of Otago Wellington, Wellington, New Zealand
| | | | | | - Richard Edwards
- Public Health, University of Otago Wellington, Wellington, New Zealand
| | - Nick Wilson
- Public Health, University of Otago Wellington, Wellington, New Zealand
| | - Janet Hoek
- Public Health, University of Otago Wellington, Wellington, New Zealand
| |
Collapse
|
4
|
Chen Y, Debnath T, Cai A, Song M. Circular Silhouette and a Fast Algorithm. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2023; 45:14038-14044. [PMID: 37651497 DOI: 10.1109/tpami.2023.3310495] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/02/2023]
Abstract
Circular data clustering has recently been solved exactly in sub-quadratic time. However, the solution requires a given number of clusters; methods for choosing this number on linear data are inapplicable to circular data. To fill this gap, we introduce the circular silhouette to measure cluster quality and a fast algorithm to calculate the average silhouette width. The algorithm runs in linear time to the number of points on sorted data, instead of quadratic time by the silhouette definition. Empirically, it is over 3000 times faster than by silhouette definition on 1,000,000 circular data points in five clusters. On simulated datasets, the algorithm returned correct numbers of clusters. We identified clusters on round genomes of human mitochondria and bacteria. On sunspot activity data, we found changed solar-cycle patterns over the past two centuries. Using the circular silhouette not only eliminates the subjective selection of number of clusters, but is also scalable to big circular and periodic data abundant in science, engineering, and medicine.
Collapse
|
5
|
Javaid A, Frost HR. STREAK: A supervised cell surface receptor abundance estimation strategy for single cell RNA-sequencing data using feature selection and thresholded gene set scoring. PLoS Comput Biol 2023; 19:e1011413. [PMID: 37603589 PMCID: PMC10470905 DOI: 10.1371/journal.pcbi.1011413] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2022] [Revised: 08/31/2023] [Accepted: 08/07/2023] [Indexed: 08/23/2023] Open
Abstract
The accurate estimation of cell surface receptor abundance for single cell transcriptomics data is important for the tasks of cell type and phenotype categorization and cell-cell interaction quantification. We previously developed an unsupervised receptor abundance estimation technique named SPECK (Surface Protein abundance Estimation using CKmeans-based clustered thresholding) to address the challenges associated with accurate abundance estimation. In that paper, we concluded that SPECK results in improved concordance with Cellular Indexing of Transcriptomes and Epitopes by Sequencing (CITE-seq) data relative to comparative unsupervised abundance estimation techniques using only single-cell RNA-sequencing (scRNA-seq) data. In this paper, we outline a new supervised receptor abundance estimation method called STREAK (gene Set Testing-based Receptor abundance Estimation using Adjusted distances and cKmeans thresholding) that leverages associations learned from joint scRNA-seq/CITE-seq training data and a thresholded gene set scoring mechanism to estimate receptor abundance for scRNA-seq target data. We evaluate STREAK relative to both unsupervised and supervised receptor abundance estimation techniques using two evaluation approaches on six joint scRNA-seq/CITE-seq datasets that represent four human and mouse tissue types. We conclude that STREAK outperforms other abundance estimation strategies and provides a more biologically interpretable and transparent statistical model.
Collapse
Affiliation(s)
- Azka Javaid
- Department of Biomedical Data Science, Dartmouth College, Hanover, New Hampshire, United States of America
| | - Hildreth Robert Frost
- Department of Biomedical Data Science, Dartmouth College, Hanover, New Hampshire, United States of America
| |
Collapse
|
6
|
Javaid A, Frost HR. SPECK: an unsupervised learning approach for cell surface receptor abundance estimation for single-cell RNA-sequencing data. BIOINFORMATICS ADVANCES 2023; 3:vbad073. [PMID: 37359727 PMCID: PMC10290233 DOI: 10.1093/bioadv/vbad073] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 10/15/2022] [Revised: 05/23/2023] [Accepted: 06/12/2023] [Indexed: 06/28/2023]
Abstract
Summary The rapid development of single-cell transcriptomics has revolutionized the study of complex tissues. Single-cell RNA-sequencing (scRNA-seq) can profile tens-of-thousands of dissociated cells from a tissue sample, enabling researchers to identify cell types, phenotypes and interactions that control tissue structure and function. A key requirement of these applications is the accurate estimation of cell surface protein abundance. Although technologies to directly quantify surface proteins are available, these data are uncommon and limited to proteins with available antibodies. While supervised methods that are trained on Cellular Indexing of Transcriptomes and Epitopes by Sequencing data can provide the best performance, these training data are limited by available antibodies and may not exist for the tissue under investigation. In the absence of protein measurements, researchers must estimate receptor abundance from scRNA-seq data. Therefore, we developed a new unsupervised method for receptor abundance estimation using scRNA-seq data called SPECK (Surface Protein abundance Estimation using CKmeans-based clustered thresholding) and primarily evaluated its performance against unsupervised approaches for at least 25 human receptors and multiple tissue types. This analysis reveals that techniques based on a thresholded reduced rank reconstruction of scRNA-seq data are effective for receptor abundance estimation, with SPECK providing the best overall performance. Availability and implementation SPECK is freely available at https://CRAN.R-project.org/package=SPECK. Supplementary information Supplementary data are available at Bioinformatics Advances online.
Collapse
Affiliation(s)
- Azka Javaid
- Department of Biomedical Data Science, Dartmouth College, Hanover, NH 03755, USA
| | - H Robert Frost
- Department of Biomedical Data Science, Dartmouth College, Hanover, NH 03755, USA
| |
Collapse
|
7
|
Zhang R, Xin R, Seltzer M, Rudin C. Optimal Sparse Regression Trees. PROCEEDINGS OF THE ... AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE. AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE 2023; 37:11270-11279. [PMID: 38650922 PMCID: PMC11034802 DOI: 10.1609/aaai.v37i9.26334] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/25/2024]
Abstract
Regression trees are one of the oldest forms of AI models, and their predictions can be made without a calculator, which makes them broadly useful, particularly for high-stakes applications. Within the large literature on regression trees, there has been little effort towards full provable optimization, mainly due to the computational hardness of the problem. This work proposes a dynamic-programming-with-bounds approach to the construction of provably-optimal sparse regression trees. We leverage a novel lower bound based on an optimal solution to the k-Means clustering algorithm on one dimensional data. We are often able to find optimal sparse trees in seconds, even for challenging datasets that involve large numbers of samples and highly-correlated features.
Collapse
|
8
|
Detecting genetic epistasis by differential departure from independence. Mol Genet Genomics 2022; 297:911-924. [PMID: 35606612 DOI: 10.1007/s00438-022-01893-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/06/2021] [Accepted: 03/27/2022] [Indexed: 10/18/2022]
Abstract
Countering prior beliefs that epistasis is rare, genomics advancements suggest the other way. Current practice often filters out genomic loci with low variant counts before detecting epistasis. We argue that this practice is far from optimal because it can throw away strong epistatic patterns. Instead, we present the compensated Sharma-Song test to infer genetic epistasis in genome-wide association studies by differential departure from independence. The test does not require a minimum number of replicates for each variant. We also introduce algorithms to simulate epistatic patterns that differentially depart from independence. Using two simulators, the test performed comparably to the original Sharma-Song test when variant frequencies at a locus are marginally uniform; encouragingly, it has a marked advantage over alternatives when variant frequencies are marginally nonuniform. The test further revealed uniquely clean epistatic variants associated with chicken abdominal fat content that are not prioritized by other methods. Genes involved in most numbers of inferred epistasis between single nucleotide polymorphisms (SNPs) belong to pathways known for obesity regulation; many top SNPs are located on chromosome 20 and in intergenic regions. Measuring differential departure from independence, the compensated Sharma-Song test offers a practical choice for studying epistasis robust to nonuniform genetic variant frequencies.
Collapse
|
9
|
Unsupervised Feature Selection for Outlier Detection on Streaming Data to Enhance Network Security. APPLIED SCIENCES-BASEL 2021. [DOI: 10.3390/app112412073] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/06/2023]
Abstract
Over the past couple of years, machine learning methods—especially the outlier detection ones—have anchored in the cybersecurity field to detect network-based anomalies rooted in novel attack patterns. However, the ubiquity of massive continuously generated data streams poses an enormous challenge to efficient detection schemes and demands fast, memory-constrained online algorithms that are capable to deal with concept drifts. Feature selection plays an important role when it comes to improve outlier detection in terms of identifying noisy data that contain irrelevant or redundant features. State-of-the-art work either focuses on unsupervised feature selection for data streams or (offline) outlier detection. Substantial requirements to combine both fields are derived and compared with existing approaches. The comprehensive review reveals a research gap in unsupervised feature selection for the improvement of outlier detection methods in data streams. Thus, a novel algorithm for Unsupervised Feature Selection for Streaming Outlier Detection, denoted as UFSSOD, will be proposed, which is able to perform unsupervised feature selection for the purpose of outlier detection on streaming data. Furthermore, it is able to determine the amount of top-performing features by clustering their score values. A generic concept that shows two application scenarios of UFSSOD in conjunction with off-the-shell online outlier detection algorithms has been derived. Extensive experiments have shown that a promising feature selection mechanism for streaming data is not applicable in the field of outlier detection. Moreover, UFSSOD, as an online capable algorithm, yields comparable results to a state-of-the-art offline method trimmed for outlier detection.
Collapse
|
10
|
Debnath T, Song M. Fast Optimal Circular Clustering and Applications on Round Genomes. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2021; 18:2061-2071. [PMID: 33945485 DOI: 10.1109/tcbb.2021.3077573] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
Round genomes are found in bacteria, plant chloroplasts, and mitochondria. Genetic or epigenetic marks can present biologically interesting clusters along a circular genome. The circular data clustering problem groups N points on a circle into K clusters to minimize the within-cluster sum of squared distances. Repeatedly applying the K-means algorithm takes quadratic time, impractical for large circular datasets. To overcome this issue, we developed a reproducible fast optimal circular clustering (FOCC) algorithm of worst-case O(KN log2 N) time. The core is a fast optimal framed clustering algorithm, which we designed by integrating two divide-and-conquer and one bracket dynamic programming strategies. The algorithm is optimal based on a property of monotonic increasing cluster borders over frames on linearized data. On clustering 50,000 circular data points, FOCC outruns brute-force or heuristic circular clustering by three orders of magnitude in time. We produced clusters of CpG sites and genes along three round genomes, exhibiting higher quality than heuristic clustering. More broadly, the presented subquadratic-time algorithms offer the fastest known solution to not only framed and circular clustering, but also angular, periodical, and looped clustering. We implemented these algorithms in the R package 'OptCirClust' (https://CRAN.R-project.org/package=OptCirClust).
Collapse
|
11
|
Mitra S, Pinch M, Kandel Y, Li Y, Rodriguez SD, Hansen IA. Olfaction-Related Gene Expression in the Antennae of Female Mosquitoes From Common Aedes aegypti Laboratory Strains. Front Physiol 2021; 12:668236. [PMID: 34497531 PMCID: PMC8419471 DOI: 10.3389/fphys.2021.668236] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2021] [Accepted: 08/02/2021] [Indexed: 11/17/2022] Open
Abstract
Adult female mosquitoes rely on olfactory cues like carbon dioxide and other small molecules to find vertebrate hosts to acquire blood. The molecular physiology of the mosquito olfactory system is critical for their host preferences. Many laboratory strains of the yellow fever mosquito Aedes aegypti have been established since the late 19th century. These strains have been used for most molecular studies in this species. Some earlier comparative studies have identified significant physiological differences between different laboratory strains. In this study, we used a Y-tube olfactometer to determine the attraction of females of seven different strains of Ae. aegypti to a human host: UGAL, Rockefeller, Liverpool, Costa Rica, Puerto Rico, and two odorant receptor co-receptor (Orco) mutants Orco2 and Orco16. We performed RNA-seq using antennae of Rockefeller, Liverpool, Costa Rica, and Puerto Rico females. Our results showed that female Aedes aegypti from the Puerto Rico strain had significantly reduced attraction rates toward human hosts compared to all other strains. RNA-seq analyses of the antenna transcriptomes of Rockefeller, Liverpool, Costa Rica, and Puerto Rico strains revealed distinct differences in gene expression between the four strains, but conservation in gene expression patterns of known human-sensing genes. However, we identified several olfaction-related genes that significantly vary between strains, including receptors with significantly different expression in mosquitoes from the Puerto Rico strain and the other strains.
Collapse
Affiliation(s)
- Soumi Mitra
- Department of Biology, New Mexico State University, Las Cruces, NM, United States
| | - Matthew Pinch
- Department of Biology, New Mexico State University, Las Cruces, NM, United States
| | - Yashoda Kandel
- Department of Biology, New Mexico State University, Las Cruces, NM, United States
| | - Yiyi Li
- Department of Computer Science, New Mexico State University, Las Cruces, NM, United States
| | - Stacy D Rodriguez
- Department of Biology, New Mexico State University, Las Cruces, NM, United States
| | - Immo A Hansen
- Department of Biology, New Mexico State University, Las Cruces, NM, United States
| |
Collapse
|
12
|
Sharma R, Kumar S, Song M. Fundamental gene network rewiring at the second order within and across mammalian systems. Bioinformatics 2021; 37:3293-3301. [PMID: 33950233 DOI: 10.1093/bioinformatics/btab240] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/23/2020] [Revised: 02/24/2021] [Accepted: 04/09/2021] [Indexed: 12/13/2022] Open
Abstract
MOTIVATION Genetic or epigenetic events can rewire molecular networks to induce extraordinary phenotypical divergences. Among the many network rewiring approaches, no model-free statistical methods can differentiate gene-gene pattern changes not attributed to marginal changes. This may obscure fundamental rewiring from superficial changes. RESULTS Here we introduce a model-free Sharma-Song test to determine if patterns differ in the second order, meaning that the deviation of the joint distribution from the product of marginal distributions is unequal across conditions. We prove an asymptotic chi-squared null distribution for the test statistic. Simulation studies demonstrate its advantage over alternative methods in detecting second-order differential patterns. Applying the test on three independent mammalian developmental transcriptome datasets, we report a lower frequency of co-expression network rewiring between human and mouse for the same tissue group than the frequency of rewiring between tissue groups within the same species. We also find secondorder differential patterns between microRNA promoters and genes contrasting cerebellum and liver development in mice. These patterns are enriched in the spliceosome pathway regulating tissue specificity. Complementary to previous mammalian comparative studies mostly driven by first-order effects, our findings contribute an understanding of system-wide second-order gene network rewiring within and across mammalian systems. Second-order differential patterns constitute evidence for fundamentally rewired biological circuitry due to evolution, environment, or disease. AVAILABILITY The generic Sharma-Song test is available from the R package 'DiffXTables' at https://cran.rproject.org/package=DiffXTables. Other code and data are described in Methods. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Ruby Sharma
- Department of Computer Science, New Mexico State University, Las Cruces, NM 88003, USA
| | - Sajal Kumar
- Department of Computer Science, New Mexico State University, Las Cruces, NM 88003, USA
| | - Mingzhou Song
- Department of Computer Science, New Mexico State University, Las Cruces, NM 88003, USA.,Molecular Biology and Interdisciplinary Life Science Graduate Program New Mexico State University, Las Cruces, NM 88003, USA
| |
Collapse
|
13
|
Surface boulder banding indicates Martian debris-covered glaciers formed over multiple glaciations. Proc Natl Acad Sci U S A 2021; 118:2015971118. [PMID: 33468681 PMCID: PMC7848752 DOI: 10.1073/pnas.2015971118] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
Significant debate exists whether the global population of Martian debris-covered glacier deposits formed continuously over the past 300 to 800 Ma, or whether they formed during punctuated episodes of ice accumulation during obliquity maxima (lasting ∼10–100 ka). We show that, like ancient debris-covered glaciers on Earth, boulder banding on Martian glacial deposits indicates multiple episodes of ice accumulation and advance. In our analysis, glacial periods are followed by ice removal from the glacier accumulation zone, forming debris bands. We report a median of five to six glacial/interglacial transitions recorded on Martian debris-covered glaciers, suggesting the cadence of glaciation on Mars is set by orbital forcing over tens to hundreds of Ma, not individual ∼120 ka obliquity cycles. Glacial landforms, including lobate debris aprons, are a global water ice reservoir on Mars preserving ice from past periods when high orbital obliquity permitted nonpolar ice accumulation. Numerous studies have noted morphological similarities between lobate debris aprons and terrestrial debris-covered glaciers, an interpretation supported by radar observations. On Earth and Mars, these landforms consist of a core of flowing ice covered by a rocky lag. Terrestrial debris-covered glaciers advance in response to climate forcing driven by obliquity-paced changes to ice mass balance. However, on Mars, it is not known whether glacial landforms emplaced over the past 300 to 800 formed during a single, long deposition event or during multiple glaciations. Here, we show that boulders atop 45 lobate debris aprons exhibit no evidence of monotonic comminution but are clustered into bands that become more numerous with increasing latitude, debris apron length, and pole-facing flow orientation. Boulder bands are prominent at glacier headwalls, consistent with debris accumulation during the current Martian interglacial. Terrestrial glacier boulder bands occur near flow discontinuities caused by obliquity-driven hiatuses in ice accumulation, forming internal debris layers. By analogy, we suggest that Martian lobate debris aprons experienced multiple cycles of ice deposition, followed by ice destabilization in the accumulation zone, leading to boulder-dominated lenses and subsequent ice deposition and continued flow. Correlation between latitude and boulder clustering suggests that ice mass-balance works across global scales on Mars. Lobate debris aprons may preserve ice spanning multiple glacial/interglacial cycles, extending Mars climate records back hundreds of millions of years.
Collapse
|
14
|
Shi MJ, Meng XY, Fontugne J, Chen CL, Radvanyi F, Bernard-Pierrot I. Identification of new driver and passenger mutations within APOBEC-induced hotspot mutations in bladder cancer. Genome Med 2020; 12:85. [PMID: 32988402 PMCID: PMC7646471 DOI: 10.1186/s13073-020-00781-y] [Citation(s) in RCA: 33] [Impact Index Per Article: 6.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2020] [Accepted: 09/11/2020] [Indexed: 12/12/2022] Open
Abstract
BACKGROUND APOBEC-driven mutagenesis and functional positive selection of mutated genes may synergistically drive the higher frequency of some hotspot driver mutations compared to other mutations within the same gene, as we reported for FGFR3 S249C. Only a few APOBEC-associated driver hotspot mutations have been identified in bladder cancer (BCa). Here, we systematically looked for and characterised APOBEC-associated hotspots in BCa. METHODS We analysed 602 published exome-sequenced BCas, for part of which gene expression data were also available. APOBEC-associated hotspots were identified by motif-mapping, mutation signature fitting and APOBEC-mediated mutagenesis comparison. Joint analysis of DNA hairpin stability and gene expression was performed to predict driver or passenger hotspots. Aryl hydrocarbon receptor (AhR) activity was calculated based on its target genes expression. Effects of AhR knockout/inhibition on BCa cell viability were analysed. RESULTS We established a panel of 44 APOBEC-associated hotspot mutations in BCa, which accounted for about half of the hotspot mutations. Fourteen of them overlapped with the hotspots found in other cancer types with high APOBEC activity. They mostly occurred in the DNA lagging-strand templates and the loop of DNA hairpins. APOBEC-associated hotspots presented systematically a higher prevalence than the other mutations within each APOBEC-target gene, independently of their functional impact. A combined analysis of DNA loop stability and gene expression allowed to distinguish known passenger from known driver hotspot mutations in BCa, including loss-of-function mutations affecting tumour suppressor genes, and to predict new candidate drivers, such as AHR Q383H. We further characterised AHR Q383H as an activating driver mutation associated with high AhR activity in luminal tumours. High AhR activity was also found in tumours presenting amplifications of AHR and its co-receptor ARNT. We finally showed that BCa cells presenting those different genetic alterations were sensitive to AhR inhibition. CONCLUSIONS Our study identified novel potential drivers within APOBEC-associated hotspot mutations in BCa reinforcing the importance of APOBEC mutagenesis in BCa. It could allow a better understanding of BCa biology and aetiology and have clinical implications such as AhR as a potential therapeutic target. Our results also challenge the dogma that all hotspot mutations are drivers and mostly gain-of-function mutations affecting oncogenes.
Collapse
Affiliation(s)
- Ming-Jun Shi
- Department of Urology, Beijing Friendship Hospital, Capital Medical University, Beijing, China
- Institut Curie, CNRS, UMR144, Molecular Oncology team, PSL Research University, 26 Rue d'Ulm, 75005, Paris, France
- Paris-Saclay University, Paris, France
| | - Xiang-Yu Meng
- Institut Curie, CNRS, UMR144, Molecular Oncology team, PSL Research University, 26 Rue d'Ulm, 75005, Paris, France.
- Paris-Saclay University, Paris, France.
- Department of Urology, Zhongnan Hospital of Wuhan University, Wuhan, China.
| | - Jacqueline Fontugne
- Institut Curie, CNRS, UMR144, Molecular Oncology team, PSL Research University, 26 Rue d'Ulm, 75005, Paris, France
- Paris-Saclay University, Paris, France
| | - Chun-Long Chen
- Institut Curie, CNRS, UMR3244, PSL Research University, Paris, France
- Sorbonne Université, Paris, France
| | - François Radvanyi
- Institut Curie, CNRS, UMR144, Molecular Oncology team, PSL Research University, 26 Rue d'Ulm, 75005, Paris, France
| | - Isabelle Bernard-Pierrot
- Institut Curie, CNRS, UMR144, Molecular Oncology team, PSL Research University, 26 Rue d'Ulm, 75005, Paris, France.
| |
Collapse
|