1
|
Establishment of novel ferroptosis-related prognostic subtypes correlating with immune dysfunction in prostate cancer patients. Heliyon 2024; 10:e23495. [PMID: 38187257 PMCID: PMC10770465 DOI: 10.1016/j.heliyon.2023.e23495] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2023] [Revised: 09/19/2023] [Accepted: 12/05/2023] [Indexed: 01/09/2024] Open
Abstract
Background We aimed to identify two new prognostic subtypes and create a predictive index for prostate cancer (PCa) patients based on ferroptosis database. Methods The nonnegative matrix factorization approach was used to identify molecular subtypes. We investigate the differences between cluster 1 and cluster 2 in terms of clinical features, functional pathways, tumour stemness, tumour heterogeneity, gene mutation and tumour immune microenvironment score after identifying the two molecular subtypes. Colony formation assay and flow cytometry assay were performed. Results The stratification of two clusters was closely connected to BCR-free survival using the nonnegative matrix factorization method, which was validated in the other three datasets. Furthermore, multivariate Cox regression analysis revealed that this classification was an independent risk factor for patients with PCa. Ribosome, aminoacyl tRNA production, oxidative phosphorylation, and Parkinson's disease-related pathways were shown to be highly enriched in cluster 1. In comparison to cluster 2, patients in cluster 1 exhibited significantly reduced CD4+ T cells, CD8+ T cells, neutrophils, dendritic cells and tumor immune microenvironment scores. Only HHLA2 was more abundant in cluster 1. Moreover, we found that P4HB downregulation could significantly inhibit the colony formation ability and contributed to cell apoptosis of C4-2B and DU145 cell lines. Conclusions We discovered two new prognostic subtypes associated with immunological dysfunction in PCa patients based on ferroptosis-related genes and found that P4HB downregulation could significantly inhibit the colony formation ability and contributed to cell apoptosis of PCa cell lines.
Collapse
|
2
|
Dual regularized subspace learning using adaptive graph learning and rank constraint: Unsupervised feature selection on gene expression microarray datasets. Comput Biol Med 2023; 167:107659. [PMID: 37950946 DOI: 10.1016/j.compbiomed.2023.107659] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/05/2023] [Revised: 10/13/2023] [Accepted: 10/31/2023] [Indexed: 11/13/2023]
Abstract
High-dimensional problems have increasingly drawn attention in gene selection and analysis. To add insult to injury, usually the number of features is greater than number of samples in microarray gene dataset which leads to an ill-posed underdetermined equation system. Poor performance and high computational time for learning algorithms are consequences of redundant features in high-dimensional data. Feature selection is a noteworthy pre-processing method to ameliorate the curse of dimensionality with aim of maximum relevancy and minimum redundancy information preservation. Likewise, unsupervised feature selection has been important since collecting labels for data is expensive. In this paper, we develop a novel robust unsupervised feature selection to select discriminative subset of features for unlabeled data based on rank constrained and dual regularized nonnegative matrix factorization. The major focus of the proposed technique is to discard redundant features while keeping the informative features. Proposed feature selection technique consists of nonnegative matrix factorization to decompose the data into feature weight matrix and representation matrix, inner product norm as regularization for both feature weight matrix and representation matrix, adaptive structure learning to preserve local information and Schatten-p norm as rank constraint. To demonstrate the effectiveness of the proposed method, numerical studies are conducted on six benchmark microarray datasets. The results show that the proposed technique outperforms eight state-of-art unsupervised feature selection techniques in terms of clustering accuracy and normalized mutual information.
Collapse
|
3
|
Subspace learning using structure learning and non-convex regularization: Hybrid technique with mushroom reproduction optimization in gene selection. Comput Biol Med 2023; 164:107309. [PMID: 37536092 DOI: 10.1016/j.compbiomed.2023.107309] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2023] [Revised: 07/26/2023] [Accepted: 07/28/2023] [Indexed: 08/05/2023]
Abstract
Gene selection as a problem with high dimensions has drawn considerable attention in machine learning and computational biology over the past decade. In the field of gene selection in cancer datasets, different types of feature selection techniques in terms of strategy (filter, wrapper and embedded) and label information (supervised, unsupervised, and semi-supervised) have been developed. However, using hybrid feature selection can still improve the performance. In this paper, we propose a hybrid feature selection based on filter and wrapper strategies. In the filter-phase, we develop an unsupervised features selection based on non-convex regularized non-negative matrix factorization and structure learning, which we deem NCNMFSL. In the wrapper-phase, for the first time, mushroom reproduction optimization (MRO) is leveraged to obtain the most informative features subset. In this hybrid feature selection method, irrelevant features are filtered-out through NCNMFSL, and most discriminative features are selected by MRO. To show the effectiveness and proficiency of the proposed method, numerical experiments are conducted on Breast, Heart, Colon, Leukemia, Prostate, Tox-171 and GLI-85 benchmark datasets. SVM and decision tree classifiers are leveraged to analyze proposed technique and top accuracy are 0.97, 0.84, 0.98, 0.95, 0.98, 0.87 and 0.85 for Breast, Heart, Colon, Leukemia, Prostate, Tox-171 and GLI-85, respectively. The computational results show the effectiveness of the proposed method in comparison with state-of-art feature selection techniques.
Collapse
|
4
|
Structural brain networks in schizophrenia based on nonnegative matrix factorization. Psychiatry Res Neuroimaging 2023; 334:111690. [PMID: 37480705 DOI: 10.1016/j.pscychresns.2023.111690] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 08/26/2022] [Revised: 06/11/2023] [Accepted: 07/18/2023] [Indexed: 07/24/2023]
Abstract
Schizophrenia is a severe mental disease with significant morphometric reductions in gray matter volume and cortical thickness in a variety of brain regions. However, most studies only focused on the voxel level alterations in specific cerebral regions and ignored the spatial relationship between voxels. In the present study, we used a novel, data-driven technique-nonnegative matrix factorization (NMF) to group voxels with similar information into a network, and studied the structural covariance at the network level in schizophrenia. Our sample included 36 patients with schizophrenia and 21 healthy controls. Compared with healthy controls, patients with schizophrenia showed significant gray matter volume reductions in six structural covariance networks (dorsal striatum, thalamus, hippocampus-parahippocampus, supplementary motor area-fusiform, middle/inferior temporal network, frontal-parietal-occipital network). Our findings confirmed the assumption of a disturbance in the cortical-subcortical circuit in schizophrenia and suggested that NMF is a useful multivariate method to identify brain networks, which provides a new perspective to study the neural mechanism in schizophrenia.
Collapse
|
5
|
Membrane tension-mediated stiff and soft tumor subtypes closely associated with prognosis for prostate cancer patients. Eur J Med Res 2023; 28:172. [PMID: 37179366 PMCID: PMC10182623 DOI: 10.1186/s40001-023-01132-4] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2022] [Accepted: 05/02/2023] [Indexed: 05/15/2023] Open
Abstract
BACKGROUND Prostate cancer (PCa) is usually considered as cold tumor. Malignancy is associated with cell mechanic changes that contribute to extensive cell deformation required for metastatic dissemination. Thus, we established stiff and soft tumor subtypes for PCa patients from perspective of membrane tension. METHODS Nonnegative matrix factorization algorithm was used to identify molecular subtypes. We completed analyses using software R 3.6.3 and its suitable packages. RESULTS We constructed stiff and soft tumor subtypes using eight membrane tension-related genes through lasso regression and nonnegative matrix factorization analyses. We found that patients in stiff subtype were more prone to biochemical recurrence than those in soft subtype (HR 16.18; p < 0.001), which was externally validated in other three cohorts. The top ten mutation genes between stiff and soft subtypes were DNAH, NYNRIN, PTCHD4, WNK1, ARFGEF1, HRAS, ARHGEF2, MYOM1, ITGB6 and CPS1. E2F targets, base excision repair and notch signaling pathway were highly enriched in stiff subtype. Stiff subtype had significantly higher TMB and T cells follicular helper levels than soft subtype, as well as CTLA4, CD276, CD47 and TNFRSF25. CONCLUSIONS From the perspective of cell membrane tension, we found that stiff and soft tumor subtypes were closely associated with BCR-free survival for PCa patients, which might be important for the future research in the field of PCa.
Collapse
|
6
|
Post COVID-19 pandemic recovery of intracity human mobility in Wuhan: Spatiotemporal characteristic and driving mechanism. TRAVEL BEHAVIOUR & SOCIETY 2023; 31:37-48. [PMID: 36405767 PMCID: PMC9650583 DOI: 10.1016/j.tbs.2022.11.003] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 12/12/2021] [Revised: 09/27/2022] [Accepted: 11/07/2022] [Indexed: 06/16/2023]
Abstract
After successfully inhibiting the first wave of COVID-19 transmission through a city lockdown, Wuhan implemented a series of policies to gradually lift restrictions and restore daily activities. Existing studies mainly focus on the intercity recovery under a macroscopic view. How does the intracity mobility return to normal? Is the recovery process consistent among different subareas, and what factor affects the post-pandemic recovery? To answer these questions, we sorted out policies adopted during the Wuhan resumption, and collected the long-time mobility big data in 1105 traffic analysis zones (TAZs) to construct an observation matrix (A). We then used the nonnegative matrix factorization (NMF) method to approximate A as the product of two condensed matrices (WH). The column vectors of W matrix were visualized as five typical recovery curves to reveal the temporal change. The row vectors of H matrix were visualized to identify the spatial distribution of each recovery type, and were analyzed with variables of population, GDP, land use, and key facility to explain the recovery driving mechanisms. We found that the "staggered time" policies implemented in Wuhan effectively staggered the peak mobility of several recovery types ("staggered peak"). Besides, different TAZs had heterogeneous response intensities to these policies ("staggered area") which were closely related to land uses and key facilities. The creative policies taken by Wuhan highlight the wisdom of public health crisis management, and could provide an empirical reference for the adjustment of post-pandemic intervention measures in other cities.
Collapse
|
7
|
An improved nonnegative matrix factorization with the imputation method model for pollution source apportionment during rainstorm events. JOURNAL OF ENVIRONMENTAL MANAGEMENT 2023; 328:116888. [PMID: 36516713 DOI: 10.1016/j.jenvman.2022.116888] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/12/2022] [Revised: 11/11/2022] [Accepted: 11/25/2022] [Indexed: 06/17/2023]
Abstract
Data scarcity caused by extreme conditions during storms adds difficulties in performing pollution source apportionment. This study integrated nonnegative matrix factorization with the imputation method (NMF-IM) to fill in missing data (NAs) and conduct source apportionment. A total of 367 river samples and 35 runoff samples were taken from the Banqiao and Nanfei River basins located in Hefei, China, during four rainfall events from June to August 2020. Sixteen indicators were quantified and used for source diagnostics using NMF-IM. The results showed that total phosphorus (TP) had higher concentrations and more violent fluctuations than total nitrogen (TN) in river samples taken from rain. NMF-IM was shown to recover the value distribution of NAs approximately. The source profiles and contribution rates calculated by NMF-IM with NAs were close to the original results calculated by NMF without NAs, with root mean square error of less than 2.3% and differences less than 9.5%. Multiple forms of nitrogen and phosphorus indicators benefit reaching reasonable source diagnostics results. At least four indicators were needed to reach the same contribution rates as 16 indicator diagnostics. The two good indicator combination groups are nitrate (NO3-N), nitrite (NO2-N), ammonia nitrogen (NH3-N), and total suspended solids (TSS) and NO3-N, NO2-N, phosphorus (PO4-P), and TSS. The pollution source contributions changed with the Antecedent dry period (ADPs) of rain events. Treated tailwater and untreated sewage were major sources, contributing more than 80% of the total pollution of the rainstorm events with short ADPs. Dust wash became the dominant contributor after 60 min and contributed 36% of the total pollution of rainstorm events with long ADPs. The average source contribution rates for rainfall events in the Banqiao River were treated tailwater (41%) > untreated sewage (27%) > dust wash (19%) > other sources (16%). The pollution source diagnostics results were verified to be reasonable by simulation using tested run-off data and literature results.
Collapse
|
8
|
Analysis of Mutational Signatures Using the mutSignatures R Library. Methods Mol Biol 2023; 2684:45-57. [PMID: 37410227 DOI: 10.1007/978-1-0716-3291-8_3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/07/2023]
Abstract
Accumulation of somatic mutations is a hallmark of cancer. Defects in DNA metabolism and DNA repair and exposure to mutagens may result in characteristic nonrandom profiles of DNA mutations, also known as mutational signatures. Resolving mutational signatures can help identifying genetic instability processes active in human cancer samples, and there is an expectation that this information might be exploited in the future for drug discovery and personalized treatment.Here we show how to analyze bladder cancer mutation data using mutSignatures, an open-source R-based computational framework aimed at investigating DNA mutational signatures. We illustrate the typical steps of a mutational signature analysis. We start by importing and pre-processing mutation data from a list of Variant Call Format (VCF) files. Next, we show how to perform de novo mutational signature extraction and how to determine activity of previously resolved mutational signatures, including Catalogue of Somatic Mutations In Cancer (COSMIC) signatures. Finally, we provide insights into parameter selection, algorithm tuning, and data visualization.Overall, the chapter guides the reader through all steps of a mutational signature analysis using R and mutSignatures, a software that may help gathering insights into genetic instability and cancer biology.
Collapse
|
9
|
GEOMETRIC STRUCTURE GUIDED MODEL AND ALGORITHMS FOR COMPLETE DECONVOLUTION OF GENE EXPRESSION DATA. FOUNDATIONS OF DATA SCIENCE (SPRINGFIELD, MO.) 2022; 4:441-466. [PMID: 38250319 PMCID: PMC10798655 DOI: 10.3934/fods.2022013] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/23/2024]
Abstract
Complete deconvolution analysis for bulk RNA-seq data is important and helpful to distinguish whether the differences of disease-associated GEPs (gene expression profiles) in tissues of patients and normal controls are due to changes in cellular composition of tissue samples, or due to GEPs changes in specific cells. One of the major techniques to perform complete deconvolution is nonnegative matrix factorization (NMF), which also has a wide-range of applications in the machine learning community. However, the NMF is a well-known strongly ill-posed problem, so a direct application of NMF to RNA-seq data will suffer severe difficulties in the interpretability of solutions. In this paper, we develop an NMF-based mathematical model and corresponding computational algorithms to improve the solution identifiability of deconvoluting bulk RNA-seq data. In our approach, we combine the biological concept of marker genes with the solvability conditions of the NMF theories, and develop a geometric structures guided optimization model. In this strategy, the geometric structure of bulk tissue data is first explored by the spectral clustering technique. Then, the identified information of marker genes is integrated as solvability constraints, while the overall correlation graph is used as manifold regularization. Both synthetic and biological data are used to validate the proposed model and algorithms, from which solution interpretability and accuracy are significantly improved.
Collapse
|
10
|
Four Distinct Subtypes of Alzheimer's Disease Based on Resting-State Connectivity Biomarkers. Biol Psychiatry 2022; 93:759-769. [PMID: 36137824 DOI: 10.1016/j.biopsych.2022.06.019] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 01/13/2022] [Revised: 05/19/2022] [Accepted: 06/13/2022] [Indexed: 11/29/2022]
Abstract
BACKGROUND Alzheimer's disease (AD) is a neurodegenerative disorder with significant heterogeneity. Different AD phenotypes may be associated with specific brain network changes. Uncovering disease heterogeneity by using functional networks could provide insights into precise diagnoses. METHODS We investigated the subtypes of AD using nonnegative matrix factorization clustering on the previously identified 216 resting-state functional connectivities that differed between AD and normal control subjects. We conducted the analysis using a discovery dataset (n = 809) and a validated dataset (n = 291). Next, we grouped individuals with mild cognitive impairment according to the model obtained in the AD groups. Finally, the clinical measures and brain structural characteristics were compared among the subtypes to assess their relationship with differences in the functional network. RESULTS Individuals with AD were clustered into 4 subtypes reproducibly, which included those with 1) diffuse and mild functional connectivity disruption (subtype 1), 2) predominantly decreased connectivity in the default mode network accompanied by an increase in the prefrontal circuit (subtype 2), 3) predominantly decreased connectivity in the anterior cingulate cortex accompanied by an increase in prefrontal cortex connectivity (subtype 3), and 4) predominantly decreased connectivity in the basal ganglia accompanied by an increase in prefrontal cortex connectivity (subtype 4). In addition to these differences in functional connectivity, differences between the AD subtypes were found in cognition, structural measures, and cognitive decline patterns. CONCLUSIONS These comprehensive results offer new insights that may advance precision medicine for AD and facilitate strategies for future clinical trials.
Collapse
|
11
|
Blind source separation of inspiration and expiration in respiratory sEMG signals. Physiol Meas 2022; 43. [PMID: 35709716 DOI: 10.1088/1361-6579/ac799c] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/04/2022] [Accepted: 06/16/2022] [Indexed: 11/11/2022]
Abstract
OBJECTIVE Surface electromyography (sEMG) is a noninvasive option for monitoring respiratory effort in ventilated patients. However, respiratory sEMG signals are affected by crosstalk and cardiac activity. This work addresses the blind source separation (BSS) of inspiratory and expiratory electrical activity in single- or two-channel recordings. The main contribution of the presented methodology is its applicability to the addressed muscles and the number of available channels. APPROACH We propose a two-step procedure consisting of a single-channel cardiac artifact removal algorithm, followed by a single- or multi-channel BSS stage. First, cardiac components are removed in the wavelet domain. Subsequently, a nonnegative matrix factorization (NMF) algorithm is applied to the envelopes of the resulting wavelet bands. The NMF is initialized based on simultaneous standard pneumatic measurements of the ventilated patient. MAIN RESULTS The proposed estimation scheme is applied to twelve clinical datasets and simulated sEMG signals of the respiratory system. The results on the clinical datasets are validated based on expert annotations using invasive pneumatic measurements. In the simulation, three measures evaluate the separation success: The distortion and the correlation to the known ground truth and the inspiratory-to-expiratory signal power ratio. We find an improvement across all SNRs, recruitment patterns, and channel configurations. Moreover, our results indicate that the initialization strategy replaces the manual matching of sources after the BSS. SIGNIFICANCE The proposed separation algorithm facilitates the interpretation of respiratory sEMG signals. In crosstalk affected measurements, the developed method may help clinicians distinguish between inspiratory effort and other muscle activities using only noninvasive measurements.
Collapse
|
12
|
JSNMF enables effective and accurate integrative analysis of single-cell multiomics data. Brief Bioinform 2022; 23:6563185. [PMID: 35380624 DOI: 10.1093/bib/bbac105] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2021] [Revised: 02/25/2022] [Accepted: 03/02/2022] [Indexed: 01/18/2023] Open
Abstract
The single-cell multiomics technologies provide an unprecedented opportunity to study the cellular heterogeneity from different layers of transcriptional regulation. However, the datasets generated from these technologies tend to have high levels of noise, making data analysis challenging. Here, we propose jointly semi-orthogonal nonnegative matrix factorization (JSNMF), which is a versatile toolkit for the integrative analysis of transcriptomic and epigenomic data profiled from the same cell. JSNMF enables data visualization and clustering of the cells and also facilitates downstream analysis, including the characterization of markers and functional pathway enrichment analysis. The core of JSNMF is an unsupervised method based on JSNMF, where it assumes different latent variables for the two molecular modalities, and integrates the information of transcriptomic and epigenomic data with consensus graph fusion, which better tackles the distinct characteristics and levels of noise across different molecular modalities in single-cell multiomics data. We applied JSNMF to single-cell multiomics datasets from different tissues and different technologies. The results demonstrate the superior performance of JSNMF in clustering and data visualization of the cells. JSNMF also allows joint analysis of multiple single-cell multiomics experiments and single-cell multiomics data with more than two modalities profiled on the same cell. JSNMF also provides rich biological insight on the markers, cell-type-specific region-gene associations and the functions of the identified cell subpopulation.
Collapse
|
13
|
Muscle synergies of multi-directional postural control in astronauts on Earth after a long-term stay in space. J Neurophysiol 2022; 127:1230-1239. [PMID: 35353615 DOI: 10.1152/jn.00232.2021] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022] Open
Abstract
Movements of the human biological system have adapted to the physical environment under the 1-g gravitational force on Earth. However, the effects of microgravity in space on the underlying functional neuromuscular control behaviors remain poorly understood. Here, we aimed to elucidate the effects of prolonged exposure to a microgravity environment on the functional coordination of multiple muscle activities. The activities of 16 lower limb muscles of 5 astronauts who stayed in space for at least 3 months were recorded while they maintained multidirectional postural control during bipedal standing. The coordinated activation patterns of groups of muscles, i.e., muscle synergies, were estimated from the muscle activation datasets using a factorization algorithm. The experiments were repeated a total of 5 times for each astronaut, once before and 4 times after spaceflight. The compositions of muscle synergies were altered, with a constant number of synergies, after long-term exposure to microgravity, and the extent of the changes was correlated with the severity of the deficits in postural stability. Furthermore, the muscle synergies extracted 3 months after the return were similar in their activation profile but not in their muscle composition compared with those extracted in the preflight condition. These results suggest that the modularity in the neuromuscular system became reorganized to adapt to the microgravity environment and then possibly reoptimized to the new sensorimotor environment after the astronauts were re-exposed to a gravitational force. It is expected that muscle synergies can be used as physiological markers of the status of astronauts with gravity-dependent change.
Collapse
|
14
|
Spectrogram decomposition of ultrasonic guided waves for cortical thickness assessment using basis learning. ULTRASONICS 2022; 120:106665. [PMID: 34968990 DOI: 10.1016/j.ultras.2021.106665] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/08/2020] [Revised: 10/12/2021] [Accepted: 12/09/2021] [Indexed: 06/14/2023]
Abstract
Due to its multimode and dispersive nature, ultrasonic guided waves (UGWs) usually consist of overlapped wave packets, which challenge accurate bone characterization. To overcome this obstacle, a classic idea is to separate individual modes and to extract the corresponding dispersion curves. Reported single-channel mode separation algorithms mainly focused on offering a time-frequency representation (TFR) where the energy distributions of individual modes were apart from each other. However, such approaches are still limited to identifying the modes without significant overlapping in time-frequency domain. In this study, a spectrogram decomposition technique was developed based on a combination strategy of generalized separable nonnegative matrix factorization (GS-NMF) and adaptive basis learning, towards the automatic mode extraction under severe overlapping and low signal-to-noise ratio (SNR). The extracted modes were further used for cortical thickness estimation. The method was verified using broadband simulated and experimental datasets. Experiments were conducted on a bone-mimicking plate and bovine cortical bone plates. For simulated data, the relative errors between extracted and theoretical dispersion curves are 1.33% (SNR = ∞), 1.43% (SNR = 10 dB) and 0.88% (SNR = 5 dB). The root-mean-square errors of the estimated thickness for 3.10 mm-thick bone-mimicking plate, 3.83 mm- and 4.00 mm-thick bovine cortical bone plates are 0.039 mm, 0.049 mm, and 0.052 mm, respectively. It is demonstrated that the proposed method is capable of separating multimodal UGWs even under significantly overlapping and low SNR conditions, further facilitating the UGW-based cortical thickness assessment.
Collapse
|
15
|
Source apportionment of atmospheric particle number concentrations with wide size range by nonnegative matrix factorization (NMF). ENVIRONMENTAL POLLUTION (BARKING, ESSEX : 1987) 2021; 289:117846. [PMID: 34330013 DOI: 10.1016/j.envpol.2021.117846] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/20/2021] [Revised: 07/05/2021] [Accepted: 07/24/2021] [Indexed: 06/13/2023]
Abstract
Quantifying the sources of atmospheric particles is essential to air quality control but remains challenging, especially for the source apportionment of particles based on number concentration with wide size range. Here, particle number concentrations (PNC) with size range 19-20,000 nm involving four modes Nucleation, Aitken, Accumulation, and Coarse are used to do source apportionment of PNC at the Guangdong Atmospheric Supersite (Heshan) during July-October 2015 by nonnegative matrix factorization (NMF) with 6 factors. For July 2015, separated source apportionments for three different size ranges from collocated instruments nano scanning mobility particle sizer (NSMPS), SMPS, and aerodynamic particle sizer (APS) and for two different size ranges (below and above 100 nm) show similar quantitative source information with that for the one whole size range. The mean absolute difference of contribution percentages of total particle number concentrations (TPNC) based on 5 unique apportioned sources is 5.6 % (4.3-7.6 %) for the instrument segregated apportionment and 4.2 % (0-5.3 %) for the size range segregated apportionment respectively, relative to the one whole apportionment. Moreover, the contribution percentages of TPNC are close to the weighted sum of contribution percentages of all size bins, with a mean absolute difference of 1.1 % (0-3.4 %). In both these two aspects, the consistency among different technical paths proves the matrix factorization by NMF is practically desirable and the simplicity of reducing some steps or calculations saves time. Besides, dust can be identified with the wide size range including larger than 3000 nm. Six apportioned sources in the 4 months are Accumulation (32.4 %), Nucleation (20.0 %), Aitken (15.2 %), traffic (14.6 %), dust (10.6 %), and Coarse (7.1 %). Therefore, NMF would serve as a promising tool for PNC source apportionment with wide size range and conducting the apportionment with the whole size range in one matrix factorization procedure and using the single TPNC contribution percentage are feasible.
Collapse
|
16
|
Analysis of fibroblast genes selected by NMF to reveal the potential crosstalk between ulcerative colitis and colorectal cancer. Exp Mol Pathol 2021; 123:104713. [PMID: 34666047 DOI: 10.1016/j.yexmp.2021.104713] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/19/2021] [Revised: 06/30/2021] [Accepted: 10/12/2021] [Indexed: 12/24/2022]
Abstract
Patients with ulcerative colitis (UC) have an increased risk of developing colorectal cancer (CRC). The CRC risk extent raises with increasing age, duration of symptoms, severity of inflammation and dysplasia. CRC is a complex multi-stage process and associated with UC represents 2% of all colon cancers. With the aim of clarifying some aspects of the evolution of UC towards CRC, we characterized the phenotype of fibroblasts present in the mucosa of subjects affected by UC to verify whether they can contribute to the genesis of a microenvironment favorable to tumor transformation. The fibroblast phenotype was obtained with the help of transcriptome analysis adopting a novel framework based on Nonnegative Matrix Factorization (NMF) which automatically extracts a limited number of genes from fibroblast gene expression profiles of patients with UC and CRC. These genes may be considered possible candidates in generating a permissive microenvironment for the evolution of disease under study.
Collapse
|
17
|
Similarity and Dissimilarity Regularized Nonnegative Matrix Factorization for Single-Cell RNA-seq Analysis. Interdiscip Sci 2021; 14:45-54. [PMID: 34231183 DOI: 10.1007/s12539-021-00457-0] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/17/2021] [Revised: 06/24/2021] [Accepted: 06/27/2021] [Indexed: 10/20/2022]
Abstract
In traditional sequencing techniques, the different functions of cells and the different roles they play in differentiation are often ignored. With the advancement of single-cell RNA sequencing (scRNA-seq) techniques, scientists can measure the gene expression value at the single-cell level, and it is helping to understand the heterogeneity hidden in cells. One of the most powerful ways to find heterogeneity is using the unsupervised clustering method to get separate subpopulations. In this paper, we propose a novel clustering method Similarity and Dissimilarity Regularized Nonnegative Matrix Factorization (SDCNMF) that simultaneously impose similarity and dissimilarity constraints on low-dimensional representations. SDCNMF both considers the similarity of closer cells and the dissimilarity of cells that are farther away. It can not only keep the similar cells getting closer in low-dimensional space, but also can push the dissimilar cells away from each other. We test the validity of our proposed method on five scRNA-seq datasets. Clustering results show that SDCNMF is better than other comparative methods, and the gene markers we find are also consistent with previous studies. Therefore, we can conclude that SDCNMF is effective in scRNA-seq data analysis. This paper proposes a novel clustering method Similarity and Dissimilarity Regularized Nonnegative Matrix Factorization (SDCNMF) that simultaneously impose similarity and dissimilarity constraints on low-dimensional representations. SDCNMF both considers the similarity of closer cells and the dissimilarity of cells that are farther away. It can not only keep the similar cells getting closer in low-dimensional space, but also can push the dissimilar cells away from each other. Clustering results show that SDCNMF is better than other comparative methods, and the gene markers we find are also consistent with previous studies.
Collapse
|
18
|
Identification of immune subtypes of cervical squamous cell carcinoma predicting prognosis and immunotherapy responses. J Transl Med 2021; 19:222. [PMID: 34030694 PMCID: PMC8142504 DOI: 10.1186/s12967-021-02894-3] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2021] [Accepted: 05/17/2021] [Indexed: 12/22/2022] Open
Abstract
Background The main limitation of current immune checkpoint inhibitors (ICIs) in the treatment of cervical cancer comes from the fact that it benefits only a minority of patients. The study aims to develop a classification system to identify immune subtypes of cervical squamous cell carcinoma (SCC), thereby helping to screen candidates who may respond to ICIs. Methods A real-world cervical SCC cohort of 36 samples were analyzed. We used a nonnegative matrix factorization (NMF) algorithm to separate different expression patterns of immune-related genes (IRGs). The immune characteristics, potential immune biomarkers, and somatic mutations were compared. Two independent data sets containing 555 samples were used for validation. Results Two subtypes with different immunophenotypes were identified. Patients in sub1 showed favorable progression-free survival (PFS) and overall survival (OS) in the training and validation cohorts. The sub1 was remarkably related to increased immune cell abundance, more enriched immune activation pathways, and higher somatic mutation burden. Also, the sub1 group was more sensitive to ICIs, while patients in the sub2 group were more likely to fail to respond to ICIs but exhibited GPCR pathway activity. Finally, an 83-gene classifier was constructed for cervical SCC classification. Conclusion This study establishes a new classification to further understand the immunological diversity of cervical SCC, to assist in the selection of candidates for immunotherapy. Supplementary Information The online version contains supplementary material available at 10.1186/s12967-021-02894-3.
Collapse
|
19
|
Unsupervised phenotyping of sepsis using nonnegative matrix factorization of temporal trends from a multivariate panel of physiological measurements. BMC Med Inform Decis Mak 2021; 21:95. [PMID: 33836745 PMCID: PMC8033653 DOI: 10.1186/s12911-021-01460-7] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/18/2021] [Accepted: 03/01/2021] [Indexed: 12/21/2022] Open
Abstract
BACKGROUND Sepsis is a highly lethal and heterogeneous disease. Utilization of an unsupervised method may identify novel clinical phenotypes that lead to targeted therapies and improved care. METHODS Our objective was to derive clinically relevant sepsis phenotypes from a multivariate panel of physiological data using subgraph-augmented nonnegative matrix factorization. We utilized data from the Medical Information Mart for Intensive Care III database of patients who were admitted to the intensive care unit with sepsis. The extracted data contained patient demographics, physiological records, sequential organ failure assessment scores, and comorbidities. We applied frequent subgraph mining to extract subgraphs from physiological time series and performed nonnegative matrix factorization over the subgraphs to derive patient clusters as phenotypes. Finally, we profiled these phenotypes based on demographics, physiological patterns, disease trajectories, comorbidities and outcomes, and performed functional validation of their clinical implications. RESULTS We analyzed a cohort of 5782 patients, derived three novel phenotypes of distinct clinical characteristics and demonstrated their prognostic implications on patient outcome. Subgroup 1 included relatively less severe/deadly patients (30-day mortality, 17%) and was the smallest-in-size group (n = 1218, 21%). It was characterized by old age (mean age, 73 years), a male majority (male-to-female ratio, 59-to-41), and complex chronic conditions. Subgroup 2 included the most severe/deadliest patients (30-day mortality, 28%) and was the second-in-size group (n = 2036, 35%). It was characterized by a male majority (male-to-female ratio, 60-to-40), severe organ dysfunction or failure compounded by a wide range of comorbidities, and uniquely high incidences of coagulopathy and liver disease. Subgroup 3 included the least severe/deadly patients (30-day mortality, 10%) and was the largest group (n = 2528, 44%). It was characterized by low age (mean age, 60 years), a balanced gender ratio (male-to-female ratio, 50-to-50), the least complicated conditions, and a uniquely high incidence of neurologic disease. These phenotypes were validated to be prognostic factors of mortality for sepsis patients. CONCLUSIONS Our results suggest that these phenotypes can be used to develop targeted therapies based on phenotypic heterogeneity and algorithms designed for monitoring, validating and intervening clinical decisions for sepsis patients.
Collapse
|
20
|
A statistical framework for non-negative matrix factorization based on generalized dual divergence. Neural Netw 2021; 140:309-324. [PMID: 33892302 DOI: 10.1016/j.neunet.2021.03.020] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2019] [Revised: 01/11/2021] [Accepted: 03/12/2021] [Indexed: 11/30/2022]
Abstract
A statistical framework for non-negative matrix factorization based on generalized dual Kullback-Leibler divergence, which includes members of the exponential family of models, is proposed. A family of algorithms is developed using this framework, including under sparsity constraints, and its convergence proven using the Expectation-Maximization algorithm. The framework generalizes some existing methods for different noise structures and contrasts with the recently developed quasi-likelihood approach, thus providing a useful alternative for non-negative matrix factorization. A measure to evaluate the goodness-of-fit of the resulting factorization is described. The performance of the proposed methods is evaluated extensively using real life and simulated data and their utility in unsupervised and semi-supervised learning is illustrated using an application in cancer genomics. This framework can be viewed from the perspective of reinforcement learning, and can be adapted to incorporate discriminant functions and multi-layered neural networks within a deep learning paradigm.
Collapse
|
21
|
Reaction rate ambiguities for perturbed spectroscopic data: Theory and implementation. Anal Chim Acta 2020; 1137:170-180. [PMID: 33153600 DOI: 10.1016/j.aca.2020.08.055] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2020] [Revised: 08/25/2020] [Accepted: 08/27/2020] [Indexed: 10/23/2022]
Abstract
The analysis of reaction systems and their kinetic modeling is important for both exploratory research and process design. Multivariate curve resolution (MCR) methods are state-of-the-art tools for the analysis of spectral series, but are also affected by an unavoidable solution ambiguity that impacts the obtained concentration profiles, spectra and model parameters. These uncertainties depend on the underlying model and the magnitude of the measurement perturbations. We present a general theoretical approach together with a computational method for the analysis of the solution ambiguity underlying arbitrary kinetic models. The main idea is to determine all those model parameters for which the corresponding pure component factorizations satisfy all given constraints within small error tolerances. This makes it possible to determine bands of concentration profiles and spectra that reflect the underlying ambiguity and circumscribes the potential reliability of MCR solutions. False conclusions on the uniqueness of a solution can be prevented. The procedure can be applied as a post-processing step to MCR methods as MCR-ALS, ReactLab or others. The Matlab program code is freely accessible and includes not only the proposed ambiguity analysis but also an MCR hard-modeling approach. Application studies are presented for two experimental data sets, namely for UV/Vis spectra on the relaxation of a photoexcited state of benzophenone and for Raman spectra on an aldehyde formation process.
Collapse
|
22
|
Assessment of nonnegative matrix factorization algorithms for electroencephalography spectral analysis. Biomed Eng Online 2020; 19:61. [PMID: 32736630 PMCID: PMC7393858 DOI: 10.1186/s12938-020-00796-x] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/28/2020] [Accepted: 06/09/2020] [Indexed: 02/01/2023] Open
Abstract
BACKGROUND Nonnegative matrix factorization (NMF) has been successfully used for electroencephalography (EEG) spectral analysis. Since NMF was proposed in the 1990s, many adaptive algorithms have been developed. However, the performance of their use in EEG data analysis has not been fully compared. Here, we provide a comparison of four NMF algorithms in terms of accuracy of estimation, stability (repeatability of the results) and time complexity of algorithms with simulated data. In the practical application of NMF algorithms, stability plays an important role, which was an emphasis in the comparison. A Hierarchical clustering algorithm was implemented to evaluate the stability of NMF algorithms. RESULTS In simulation-based comprehensive analysis of fit, stability, accuracy of estimation and time complexity, hierarchical alternating least squares (HALS) low-rank NMF algorithm (lraNMF_HALS) outperformed the other three NMF algorithms. In the application of lraNMF_HALS for real resting-state EEG data analysis, stable and interpretable features were extracted. CONCLUSION Based on the results of assessment, our recommendation is to use lraNMF_HALS, providing the most accurate and robust estimation.
Collapse
|
23
|
Nonnegative matrix factorization for the identification of pressure ulcer risks from seating interface pressures in people with spinal cord injury. Med Biol Eng Comput 2019; 58:227-237. [PMID: 31832862 DOI: 10.1007/s11517-019-02081-z] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2019] [Accepted: 11/12/2019] [Indexed: 10/25/2022]
Abstract
The purpose of this study was to predict and visualize pressure ulcer risks by using a novel approach of extracting computational features from seating interface pressures in people with spinal cord injury (SCI). In conventional clinical practice, seating interface pressure assessments rely on descriptive statistics of pressure magnitude. In this study, rank-2 nonnegative matrix factorization (NMF) was applied to the seating interface pressure maps during loading and pressure-relieving conditions in 16 people with SCI. The NMF basis images were used for visual interpretation and computational prediction of pressure ulcer risks. The two NMF basis images encapsulated pressure concentration and pressure dispersion, respectively. The first basis converged on the ischial tuberosity under both seating conditions, whereas the second basis converged anterior to the ischial tuberosity during loading and converged on the coccyx during unloading. The classification yielded 81.25% overall accuracy. In general, higher ulceration risk was associated with higher and lower activations of the first and second bases, respectively. The NMF pipeline yielded promising performance. Basis visualization affirmed the importance of lower ischial pressure and higher distribution dispersion while also revealing that clinical practice may currently be underestimating the importance of coccygeal pressure in response to pressure-relieving activities. Graphical abstract.
Collapse
|
24
|
Virtual methylome dissection facilitated by single-cell analyses. Epigenetics Chromatin 2019; 12:66. [PMID: 31711526 PMCID: PMC6844058 DOI: 10.1186/s13072-019-0310-9] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2019] [Accepted: 10/21/2019] [Indexed: 12/31/2022] Open
Abstract
Background Numerous cell types can be identified within plant tissues and animal organs, and the epigenetic modifications underlying such enormous cellular heterogeneity are just beginning to be understood. It remains a challenge to infer cellular composition using DNA methylomes generated for mixed cell populations. Here, we propose a semi-reference-free procedure to perform virtual methylome dissection using the nonnegative matrix factorization (NMF) algorithm. Results In the pipeline that we implemented to predict cell-subtype percentages, putative cell-type-specific methylated (pCSM) loci were first determined according to their DNA methylation patterns in bulk methylomes and clustered into groups based on their correlations in methylation profiles. A representative set of pCSM loci was then chosen to decompose target methylomes into multiple latent DNA methylation components (LMCs). To test the performance of this pipeline, we made use of single-cell brain methylomes to create synthetic methylomes of known cell composition. Compared with highly variable CpG sites, pCSM loci achieved a higher prediction accuracy in the virtual methylome dissection of synthetic methylomes. In addition, pCSM loci were shown to be good predictors of the cell type of the sorted brain cells. The software package developed in this study is available in the GitHub repository (https://github.com/Gavin-Yinld). Conclusions We anticipate that the pipeline implemented in this study will be an innovative and valuable tool for the decoding of cellular heterogeneity.
Collapse
|
25
|
Comparison of muscle synergies extracted from both legs during cycling at different mechanical conditions. AUSTRALASIAN PHYSICAL & ENGINEERING SCIENCES IN MEDICINE 2019; 42:827-838. [PMID: 31161596 DOI: 10.1007/s13246-019-00767-0] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/08/2018] [Accepted: 05/24/2019] [Indexed: 12/11/2022]
Abstract
Muscle synergies are the building blocks for generating movement by the central nervous system (CNS). According to this hypothesis, CNS decreases the complexity of motor control by combination of a small number of muscle synergies. The aim of this work is to investigate similarity of muscle synergies during cycling across various mechanical conditions. Twenty healthy subjects performed three 6- min cycling tasks at over a range of rotational speed (40, 50, and 60 rpm) and resistant torque (3, 5, and 7 N/m). Surface electromyography (sEMG) signals were recorded during pedaling from eight muscles of the right and left legs. We extracted four synchronous muscle synergies by using the non-negative matrix factorization (NMF) method. Mean and standard deviation of the goodness of the signal reconstruction (R2) for all subjects was obtained 0.9898 ± 0.0535. We investigated the functional roles of both leg muscles during cycling by synchronous muscle synergy extraction. We compared the muscle synergies extracted from all subjects in all mechanical conditions. The total mean and standard deviation of the similarity of synergy vectors for all subjects in all mechanical conditions was obtained 0.8788 ± 0.0709. We found the high degrees of similarity among the sets of synchronous muscle synergies across mechanical conditions and also across different subjects. Our results demonstrated that different subjects at different mechanical conditions use the same motor control strategies for cycling, despite inter-individual variability of muscle patterns.
Collapse
|
26
|
Low-rank network signatures in the triple network separate schizophrenia and major depressive disorder. NEUROIMAGE-CLINICAL 2019; 22:101725. [PMID: 30798168 PMCID: PMC6389685 DOI: 10.1016/j.nicl.2019.101725] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/27/2018] [Revised: 01/30/2019] [Accepted: 02/18/2019] [Indexed: 02/05/2023]
Abstract
Brain imaging studies have revealed that functional and structural brain connectivity in the so-called triple network (i.e., default mode network (DMN), salience network (SN) and central executive network (CEN)) are consistently altered in schizophrenia. However, similar changes have also been found in patients with major depressive disorder, prompting the question of specific triple network signatures for the two disorders. In this study, we proposed Supervised Convex Nonnegative Matrix Factorization (SCNMF) to extract distributed multi-modal brain patterns. These patterns distinguish schizophrenia and major depressive disorder in a latent low-dimensional space of the triple brain network. Specifically, 21 patients of schizophrenia and 25 patients of major depressive disorder were assessed by T1-weighted, diffusion-weighted, and resting-state functional MRIs. Individual structural and functional connectivity networks, based on pre-defined regions of the triple network were constructed, respectively. Afterwards, SCNMF was employed to extract the discriminative patterns. Experiments indicate that SCNMF allows extracting the low-rank discriminative patterns between the two disorders, achieving a classification accuracy of 82.6% based on the extracted functional and structural abnormalities with support vector machine. Experimental results show the specific brain patterns for schizophrenia and major depressive disorder that are multi-modal, complex, and distributed in the triple network. Parts of the prefrontal cortex including superior frontal gyri showed variation between patients with schizophrenia and major depression due to structural properties. In terms of functional properties, the middle cingulate cortex, inferior parietal lobule, and cingulate cortex were the most discriminative regions. Specific changes in SZP and MDD are complex but subtle, and distributed in triple networks. Low-rank network signatures on multi-modal data well separate SZP and MDD. Group-specific latent disrupted patterns are uncovered with SCNMF.
Collapse
|
27
|
Supervised Nonnegative Matrix Factorization to Predict ICU Mortality Risk. PROCEEDINGS. IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE 2019; 2018:1189-1194. [PMID: 31360595 DOI: 10.1109/bibm.2018.8621403] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/02/2023]
Abstract
ICU mortality risk prediction is a tough yet important task. On one hand, due to the complex temporal data collected, it is difficult to identify the effective features and interpret them easily; on the other hand, good prediction can help clinicians take timely actions to prevent the mortality. These correspond to the interpretability and accuracy problems. Most existing methods lack of the interpretability, but recently Subgraph Augmented Nonnegative Matrix Factorization (SANMF) has been successfully applied to time series data to provide a path to interpret the features well. Therefore, we adopted this approach as the backbone to analyze the patient data. One limitation of the original SANMF method is its poor prediction ability due to its unsupervised nature. To deal with this problem, we proposed a supervised SANMF algorithm by integrating the logistic regression loss function into the NMF framework and solved it with an alternating optimization procedure. We used the simulation data to verify the effectiveness of this method, and then we applied it to ICU mortality risk prediction and demonstrated its superiority over other conventional supervised NMF methods.
Collapse
|
28
|
A nonnegative matrix factorization algorithm based on a discrete-time projection neural network. Neural Netw 2018; 103:63-71. [PMID: 29642020 DOI: 10.1016/j.neunet.2018.03.003] [Citation(s) in RCA: 35] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2017] [Revised: 12/18/2017] [Accepted: 03/06/2018] [Indexed: 11/19/2022]
Abstract
This paper presents an algorithm for nonnegative matrix factorization based on a biconvex optimization formulation. First, a discrete-time projection neural network is introduced. An upper bound of its step size is derived to guarantee the stability of the neural network. Then, an algorithm is proposed based on the discrete-time projection neural network and a backtracking step-size adaptation. The proposed algorithm is proven to be able to reduce the objective function value iteratively until attaining a partial optimum of the formulated biconvex optimization problem. Experimental results based on various data sets are presented to substantiate the efficacy of the algorithm.
Collapse
|
29
|
Predicting and understanding comprehensive drug-drug interactions via semi- nonnegative matrix factorization. BMC SYSTEMS BIOLOGY 2018; 12:14. [PMID: 29671393 PMCID: PMC5907306 DOI: 10.1186/s12918-018-0532-7] [Citation(s) in RCA: 37] [Impact Index Per Article: 6.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]
Abstract
Background Drug-drug interactions (DDIs) always cause unexpected and even adverse drug reactions. It is important to identify DDIs before drugs are used in the market. However, preclinical identification of DDIs requires much money and time. Computational approaches have exhibited their abilities to predict potential DDIs on a large scale by utilizing pre-market drug properties (e.g. chemical structure). Nevertheless, none of them can predict two comprehensive types of DDIs, including enhancive and degressive DDIs, which increases and decreases the behaviors of the interacting drugs respectively. There is a lack of systematic analysis on the structural relationship among known DDIs. Revealing such a relationship is very important, because it is able to help understand how DDIs occur. Both the prediction of comprehensive DDIs and the discovery of structural relationship among them play an important guidance when making a co-prescription. Results In this work, treating a set of comprehensive DDIs as a signed network, we design a novel model (DDINMF) for the prediction of enhancive and degressive DDIs based on semi-nonnegative matrix factorization. Inspiringly, DDINMF achieves the conventional DDI prediction (AUROC = 0.872 and AUPR = 0.605) and the comprehensive DDI prediction (AUROC = 0.796 and AUPR = 0.579). Compared with two state-of-the-art approaches, DDINMF shows it superiority. Finally, representing DDIs as a binary network and a signed network respectively, an analysis based on NMF reveals crucial knowledge hidden among DDIs. Conclusions Our approach is able to predict not only conventional binary DDIs but also comprehensive DDIs. More importantly, it reveals several key points about the DDI network: (1) both binary and signed networks show fairly clear clusters, in which both drug degree and the difference between positive degree and negative degree show significant distribution; (2) the drugs having large degrees tend to have a larger difference between positive degree and negative degree; (3) though the binary DDI network contains no information about enhancive and degressive DDIs at all, it implies some of their relationship in the comprehensive DDI matrix; (4) the occurrence of signs indicating enhancive and degressive DDIs is not random because the comprehensive DDI network is equipped with a structural balance.
Collapse
|
30
|
Hierarchical community detection via rank-2 symmetric nonnegative matrix factorization. COMPUTATIONAL SOCIAL NETWORKS 2017; 4:7. [PMID: 29266136 PMCID: PMC5732610 DOI: 10.1186/s40649-017-0043-5] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/19/2017] [Accepted: 08/24/2017] [Indexed: 11/26/2022]
Abstract
Background Community discovery is an important task for revealing structures in large networks. The massive size of contemporary social networks poses a tremendous challenge to the scalability of traditional graph clustering algorithms and the evaluation of discovered communities. Methods We propose a divide-and-conquer strategy to discover hierarchical community structure, nonoverlapping within each level. Our algorithm is based on the highly efficient rank-2 symmetric nonnegative matrix factorization. We solve several implementation challenges to boost its efficiency on modern computer architectures, specifically for very sparse adjacency matrices that represent a wide range of social networks. Conclusions Empirical results have shown that our algorithm has competitive overall efficiency and leading performance in minimizing the average normalized cut, and that the nonoverlapping communities found by our algorithm recover the ground-truth communities better than state-of-the-art algorithms for overlapping community detection. In addition, we present a new dataset of the DBLP computer science bibliography network with richer meta-data and verifiable ground-truth knowledge, which can foster future research in community finding and interpretation of communities in large networks.
Collapse
|
31
|
*K-means and cluster models for cancer signatures. BIOMOLECULAR DETECTION AND QUANTIFICATION 2017; 13:7-31. [PMID: 29021969 PMCID: PMC5634820 DOI: 10.1016/j.bdq.2017.07.001] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/13/2017] [Revised: 07/18/2017] [Accepted: 07/18/2017] [Indexed: 01/03/2023]
Abstract
We present *K-means clustering algorithm and source code by expanding statistical clustering methods applied in https://ssrn.com/abstract=2802753 to quantitative finance. *K-means is statistically deterministic without specifying initial centers, etc. We apply *K-means to extracting cancer signatures from genome data without using nonnegative matrix factorization (NMF). *K-means’ computational cost is a fraction of NMF’s. Using 1389 published samples for 14 cancer types, we find that 3 cancers (liver cancer, lung cancer and renal cell carcinoma) stand out and do not have cluster-like structures. Two clusters have especially high within-cluster correlations with 11 other cancers indicating common underlying structures. Our approach opens a novel avenue for studying such structures. *K-means is universal and can be applied in other fields. We discuss some potential applications in quantitative finance.
Collapse
|
32
|
Nonnegative matrix factorization and sparse representation for the automated detection of periodic limb movements in sleep. Med Biol Eng Comput 2016; 54:1641-1654. [PMID: 26872678 DOI: 10.1007/s11517-015-1444-y] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2015] [Accepted: 12/14/2015] [Indexed: 10/22/2022]
Abstract
Stroke is a leading cause of death and disability in adults, and incurs a significant economic burden to society. Periodic limb movements (PLMs) in sleep are repetitive movements involving the great toe, ankle, and hip. Evolving evidence suggests that PLMs may be associated with high blood pressure and stroke, but this relationship remains underexplored. Several issues limit the study of PLMs including the need to manually score them, which is time-consuming and costly. For this reason, we developed a novel automated method for nocturnal PLM detection, which was shown to be correlated with (a) the manually scored PLM index on polysomnography, and (b) white matter hyperintensities on brain imaging, which have been demonstrated to be associated with PLMs. Our proposed algorithm consists of three main stages: (1) representing the signal in the time-frequency plane using time-frequency matrices (TFM), (2) applying K-nonnegative matrix factorization technique to decompose the TFM matrix into its significant components, and (3) applying kernel sparse representation for classification (KSRC) to the decomposed signal. Our approach was applied to a dataset that consisted of 65 subjects who underwent polysomnography. An overall classification of 97 % was achieved for discrimination of the aforementioned signals, demonstrating the potential of the presented method.
Collapse
|
33
|
On the ambiguity of the reaction rate constants in multivariate curve resolution for reversible first-order reaction systems. Anal Chim Acta 2016; 927:21-34. [PMID: 27237834 DOI: 10.1016/j.aca.2016.04.009] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/29/2015] [Revised: 02/22/2016] [Accepted: 04/05/2016] [Indexed: 11/28/2022]
Abstract
If for a chemical reaction with a known reaction mechanism the concentration profiles are accessible only for certain species, e.g. only for the main product, then often the reaction rate constants cannot uniquely be determined from the concentration data. This is a well-known fact which includes the so-called slow-fast ambiguity. This work combines the question of unique or non-unique reaction rate constants with factor analytic methods of chemometrics. The idea is to reduce the rotational ambiguity of pure component factorizations by considering only those concentration factors which are possible solutions of the kinetic equations for a properly adapted set of reaction rate constants. The resulting set of reaction rate constants corresponds to those solutions of the rate equations which appear as feasible factors in a pure component factorization. The new analysis of the ambiguity of reaction rate constants extends recent research activities on the Area of Feasible Solutions (AFS). The consistency with a given chemical reaction scheme is shown to be a valuable tool in order to reduce the AFS. The new methods are applied to model and experimental data.
Collapse
|
34
|
Abstract
An asymmetric one-mode data matrix has rows and columns that correspond to the same set of objects. However, the roles of the objects frequently differ for the rows and the columns. For example, in a visual alphabetic confusion matrix from an experimental psychology study, both the rows and columns pertain to letters of the alphabet. Yet the rows correspond to the presented stimulus letter, whereas the columns refer to the letter provided as the response. Other examples abound in psychology, including applications related to interpersonal interactions (friendship, trust, information sharing) in social and developmental psychology, brand switching in consumer psychology, journal citation analysis in any discipline (including quantitative psychology), and free association tasks in any subarea of psychology. When seeking to establish a partition of the objects in such applications, it is overly restrictive to require the partitions of the row and column objects to be identical, or even the numbers of clusters for the row and column objects to be the same. This suggests the need for a biclustering approach that simultaneously establishes separate partitions of the row and column objects. We present and compare several approaches for the biclustering of one-mode matrices using data sets from the empirical literature. A suite of MATLAB m-files for implementing the procedures is provided as a Web supplement with this article.
Collapse
|
35
|
Motor imagery classification via combinatory decomposition of ERP and ERSP using sparse nonnegative matrix factorization. J Neurosci Methods 2015; 249:41-9. [PMID: 25845481 DOI: 10.1016/j.jneumeth.2015.03.031] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2014] [Revised: 03/26/2015] [Accepted: 03/27/2015] [Indexed: 11/28/2022]
Abstract
BACKGROUND Brain activities could be measured by devices like EEG, MEG, MRI etc. in terms of electric or magnetic signal, which could provide information from three domains, i.e., time, frequency and space. Combinatory analysis of these features could definitely help to improve the classification performance on brain activities. NMF (nonnegative matrix factorization) has been widely applied in pattern extraction tasks (e.g., face recognition, gene data analysis) which could provide physically meaningful explanation of the data. However, brain signals also take negative values, so only spectral feature has been employed in existing NMF studies for brain computer interface. In addition, sparsity is an intrinsic characteristic of electric signals. NEW METHOD To incorporate sparsity constraint and enable analysis of time domain feature using NMF, a new solution for motor imagery classification is developed, which combinatorially analyzes the ERP (event related potential, time domain) and ERSP (event related spectral perturbation, frequency domain) features via a modified mixed alternating least square based NMF method (MALS-NMF for short). RESULTS Extensive experiments have verified the effectivity the proposed method. The results also showed that imposing sparsity constraint on the coefficient matrix in ERP factorization and basis matrix in ERSP factorization could better improve the algorithm performance. COMPARISON WITH EXISTING METHODS Comparisons with other eight representative methods have further verified the superiority of the proposed method. CONCLUSIONS The MALS-NMF method is an effective solution for motor imagery classification and has shed some new light into the field of brain dynamics pattern analysis.
Collapse
|
36
|
Convex nonnegative matrix factorization with manifold regularization. Neural Netw 2014; 63:94-103. [PMID: 25523040 DOI: 10.1016/j.neunet.2014.11.007] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2013] [Revised: 11/04/2014] [Accepted: 11/18/2014] [Indexed: 11/18/2022]
Abstract
Nonnegative Matrix Factorization (NMF) has been extensively applied in many areas, including computer vision, pattern recognition, text mining, and signal processing. However, nonnegative entries are usually required for the data matrix in NMF, which limits its application. Besides, while the basis and encoding vectors obtained by NMF can represent the original data in low dimension, the representations do not always reflect the intrinsic geometric structure embedded in the data. Motivated by manifold learning and Convex NMF (CNMF), we propose a novel matrix factorization method called Graph Regularized and Convex Nonnegative Matrix Factorization (GCNMF) by introducing a graph regularized term into CNMF. The proposed matrix factorization technique not only inherits the intrinsic low-dimensional manifold structure, but also allows the processing of mixed-sign data matrix. Clustering experiments on nonnegative and mixed-sign real-world data sets are conducted to demonstrate the effectiveness of the proposed method.
Collapse
|
37
|
A unified statistical approach to non-negative matrix factorization and probabilistic latent semantic indexing. Mach Learn 2014; 99:137-163. [PMID: 25821345 DOI: 10.1007/s10994-014-5470-z] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/24/2022]
Abstract
Non-negative matrix factorization (NMF) is a powerful machine learning method for decomposing a high-dimensional nonnegative matrix V into the product of two nonnegative matrices, W and H, such that V ∼ W H. It has been shown to have a parts-based, sparse representation of the data. NMF has been successfully applied in a variety of areas such as natural language processing, neuroscience, information retrieval, image processing, speech recognition and computational biology for the analysis and interpretation of large-scale data. There has also been simultaneous development of a related statistical latent class modeling approach, namely, probabilistic latent semantic indexing (PLSI), for analyzing and interpreting co-occurrence count data arising in natural language processing. In this paper, we present a generalized statistical approach to NMF and PLSI based on Renyi's divergence between two non-negative matrices, stemming from the Poisson likelihood. Our approach unifies various competing models and provides a unique theoretical framework for these methods. We propose a unified algorithm for NMF and provide a rigorous proof of monotonicity of multiplicative updates for W and H. In addition, we generalize the relationship between NMF and PLSI within this framework. We demonstrate the applicability and utility of our approach as well as its superior performance relative to existing methods using real-life and simulated document clustering data.
Collapse
|
38
|
CloudNMF: a MapReduce implementation of nonnegative matrix factorization for large-scale biological datasets. GENOMICS PROTEOMICS & BIOINFORMATICS 2013; 12:48-51. [PMID: 23933456 PMCID: PMC4411332 DOI: 10.1016/j.gpb.2013.06.001] [Citation(s) in RCA: 31] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/23/2013] [Revised: 06/21/2013] [Accepted: 06/26/2013] [Indexed: 12/03/2022]
Abstract
In the past decades, advances in high-throughput technologies have led to the generation of huge amounts of biological data that require analysis and interpretation. Recently, nonnegative matrix factorization (NMF) has been introduced as an efficient way to reduce the complexity of data as well as to interpret them, and has been applied to various fields of biological research. In this paper, we present CloudNMF, a distributed open-source implementation of NMF on a MapReduce framework. Experimental evaluation demonstrated that CloudNMF is scalable and can be used to deal with huge amounts of data, which may enable various kinds of a high-throughput biological data analysis in the cloud. CloudNMF is freely accessible at http://admis.fudan.edu.cn/projects/CloudNMF.html.
Collapse
|
39
|
Improved Convolutive and Under-Determined Blind Audio Source Separation with MRF Smoothing. Cognit Comput 2012; 5:493-503. [PMID: 24348879 PMCID: PMC3855489 DOI: 10.1007/s12559-012-9185-9] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2012] [Accepted: 08/17/2012] [Indexed: 11/26/2022]
Abstract
Convolutive and under-determined blind audio source separation from noisy recordings is a challenging problem. Several computational strategies have been proposed to address this problem. This study is concerned with several modifications to the expectation-minimization-based algorithm, which iteratively estimates the mixing and source parameters. This strategy assumes that any entry in each source spectrogram is modeled using superimposed Gaussian components, which are mutually and individually independent across frequency and time bins. In our approach, we resolve this issue by considering a locally smooth temporal and frequency structure in the power source spectrograms. Local smoothness is enforced by incorporating a Gibbs prior in the complete data likelihood function, which models the interactions between neighboring spectrogram bins using a Markov random field. Simulations using audio files derived from stereo audio source separation evaluation campaign 2008 demonstrate high efficiency with the proposed improvement.
Collapse
|