1
|
Data-driven decomposition and staging of flortaucipir uptake in Alzheimer's disease. Alzheimers Dement 2024. [PMID: 38683905 DOI: 10.1002/alz.13769] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2023] [Revised: 02/06/2024] [Accepted: 02/06/2024] [Indexed: 05/02/2024]
Abstract
INTRODUCTION Previous approaches pursuing in vivo staging of tau pathology in Alzheimer's disease (AD) have typically relied on neuropathologically defined criteria. In using predefined systems, these studies may miss spatial deposition patterns which are informative of disease progression. METHODS We selected discovery (n = 418) and replication (n = 132) cohorts with flortaucipir imaging. Non-negative matrix factorization (NMF) was applied to learn tau covariance patterns and develop a tau staging system. Flortaucipir components were also validated by comparison with amyloid burden, gray matter loss, and the expression of AD-related genes. RESULTS We found eight flortaucipir covariance patterns which were reproducible and overlapped with relevant gene expression maps. Tau stages were associated with AD severity as indexed by dementia status and neuropsychological performance. Comparisons of flortaucipir uptake with amyloid and atrophy also supported our model of tau progression. DISCUSSION Data-driven decomposition of flortaucipir uptake provides a novel framework for tau staging which complements existing systems. HIGHLIGHTS NMF reveals patterns of tau deposition in AD. Data-driven staging of flortaucipir tracks AD severity. Learned flortaucipir patterns overlap with AD-related gene expression.
Collapse
|
2
|
Analysis and modelling of profiles to understand fractionation processes for contaminations with polychlorinated biphenyls observed in fish. THE SCIENCE OF THE TOTAL ENVIRONMENT 2024; 920:170925. [PMID: 38360309 DOI: 10.1016/j.scitotenv.2024.170925] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/08/2023] [Revised: 01/24/2024] [Accepted: 02/10/2024] [Indexed: 02/17/2024]
Abstract
Polychlorinated biphenyls (PCB) both continue to spread into the environment and to bioaccumulate from primary urban and industrial sources as well as from secondary sources such as soils and the oceans. Fractions of congeners in PCB mixtures, i.e. PCB profiles, can be used as fingerprints to trace contamination pathways from sources to sinks because PCB mixtures fractionate during transport due to congener specific phase changes and degradation. Using a statistical analysis of a total of 8584 PCB profiles with seven congeners (CB28, CB52, CB101, CB118, CB138, CB153, CB180) for contaminated fish from two international datasets as well as a modelling of profiles, two major fractionation processes related to distinct contamination pathways were identified: (1) A relative enrichment of lighter congeners (CB28, CB52, CB101) in seawater fish due to a predominantly atmospheric transport, whereas freshwater and some coastal fish had higher fractions of heavier congeners (CB138, CB153) because those were mainly contaminated by particle-sorbed PCB from surface runoff. (2) A temperature driven fractionation tended to affect congeners with a medium molecular weight (CB118) as well as the heaviest congeners (CB180), a fractionation process which was conceptually associated with transport of PCB from secondary sources. Specifically, medium chlorinated PCB is sufficiently volatile and persistent for a preferred transport into cooler waters. In warmer climates, only the highest chlorinated congeners are persistent enough to ultimately accumulate in fish. Our analysis and modelling provide a starting point for the development of systems to trace - better than before - sources of PCB contaminations observed in fish.
Collapse
|
3
|
Inferring Drug Set and Identifying the Mechanism of Drugs for PC3. Int J Mol Sci 2024; 25:765. [PMID: 38255837 PMCID: PMC10815650 DOI: 10.3390/ijms25020765] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2023] [Revised: 12/24/2023] [Accepted: 01/05/2024] [Indexed: 01/24/2024] Open
Abstract
Drug repurposing is a strategy for discovering new applications of existing drugs for use in various diseases. Despite the use of structured networks in drug research, it is still unclear how drugs interact with one another or with genes. Prostate adenocarcinoma is the second leading cause of cancer mortality in the United States, with an estimated incidence of 288,300 new cases and 34,700 deaths in 2023. In our study, we used integrative information from genes, pathways, and drugs for machine learning methods such as clustering, feature selection, and enrichment pathway analysis. We investigated how drugs affect drugs and how drugs affect genes in human pancreatic cancer cell lines that were derived from bone metastases of grade IV prostate cancer. Finally, we identified significant drug interactions within or between clusters, such as estradiol-rosiglitazone, estradiol-diclofenac, troglitazone-rosiglitazone, celecoxib-rofecoxib, celecoxib-diclofenac, and sodium phenylbutyrate-valproic acid.
Collapse
|
4
|
Surface-Enhanced Raman Spectroscopy-Based Detection of Micro-RNA Biomarkers for Biomedical Diagnosis Using a Comparative Study of Interpretable Machine Learning Algorithms. APPLIED SPECTROSCOPY 2024; 78:84-98. [PMID: 37908079 DOI: 10.1177/00037028231209053] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/02/2023]
Abstract
Surface-enhanced Raman spectroscopy (SERS) has wide diagnostic applications due to narrow spectral features that allow multiplex analysis. We have previously developed a multiplexed, SERS-based nanosensor for micro-RNA (miRNA) detection called the inverse molecular sentinel (iMS). Machine learning (ML) algorithms have been increasingly adopted for spectral analysis due to their ability to discover underlying patterns and relationships within large and complex data sets. However, the high dimensionality of SERS data poses a challenge for traditional ML techniques, which can be prone to overfitting and poor generalization. Non-negative matrix factorization (NMF) reduces the dimensionality of SERS data while preserving information content. In this paper, we compared the performance of ML methods including convolutional neural network (CNN), support vector regression, and extreme gradient boosting combined with and without NMF for spectral unmixing of four-way multiplexed SERS spectra from iMS assays used for miRNA detection. CNN achieved high accuracy in spectral unmixing. Incorporating NMF before CNN drastically decreased memory and training demands without sacrificing model performance on SERS spectral unmixing. Additionally, models were interpreted using gradient class activation maps and partial dependency plots to understand predictions. These models were used to analyze clinical SERS data from single-plexed iMS in RNA extracted from 17 endoscopic tissue biopsies. CNN and CNN-NMF, trained on multiplexed data, performed most accurately with RMSElabel = 0.101 and 9.68 × 10-2, respectively. We demonstrated that CNN-based ML shows great promise in spectral unmixing of multiplexed SERS spectra, and the effect of dimensionality reduction on performance and training speed.
Collapse
|
5
|
NMF Clustering: Accessible NMF-based Clustering Utilizing GPU Acceleration. JOURNAL OF BIOINFORMATICS AND SYSTEMS BIOLOGY : OPEN ACCESS 2023; 6:379-383. [PMID: 38390437 PMCID: PMC10883375 DOI: 10.26502/jbsb.5107072] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/24/2024]
Abstract
Non-negative Matrix Factorization (NMF) is an algorithm that can reduce high dimensional datasets of tens of thousands of genes to a handful of metagenes which are biologically easier to interpret. Application of NMF on gene expression data has been limited by its computationally intensive nature, which hinders its use on large datasets such as single-cell RNA sequencing (scRNA-seq) count matrices. We have implemented NMF based clustering to run on high performance GPU compute nodes using CuPy, a GPU backed python library, and the Message Passing Interface (MPI). This reduces the computation time by up to three orders of magnitude and makes the NMF Clustering analysis of large RNA-Seq and scRNA-seq datasets practical. We have made the method freely available through the GenePattern gateway, which provides free public access to hundreds of tools for the analysis and visualization of multiple 'omic data types. Its web-based interface gives easy access to these tools and allows the creation of multi-step analysis pipelines on high performance computing (HPC) clusters that enable reproducible in silico research for non-programmers.
Collapse
|
6
|
Four functional profiles for fibre and mucin metabolism in the human gut microbiome. MICROBIOME 2023; 11:231. [PMID: 37858269 PMCID: PMC10588041 DOI: 10.1186/s40168-023-01667-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/02/2023] [Accepted: 09/07/2023] [Indexed: 10/21/2023]
Abstract
BACKGROUND With the emergence of metagenomic data, multiple links between the gut microbiome and the host health have been shown. Deciphering these complex interactions require evolved analysis methods focusing on the microbial ecosystem functions. Despite the fact that host or diet-derived fibres are the most abundant nutrients available in the gut, the presence of distinct functional traits regarding fibre and mucin hydrolysis, fermentation and hydrogenotrophic processes has never been investigated. RESULTS After manually selecting 91 KEGG orthologies and 33 glycoside hydrolases further aggregated in 101 functional descriptors representative of fibre and mucin degradation pathways in the gut microbiome, we used nonnegative matrix factorization to mine metagenomic datasets. Four distinct metabolic profiles were further identified on a training set of 1153 samples, thoroughly validated on a large database of 2571 unseen samples from 5 external metagenomic cohorts and confirmed with metatranscriptomic data. Profiles 1 and 2 are the main contributors to the fibre-degradation-related metagenome: they present contrasted involvement in fibre degradation and sugar metabolism and are differentially linked to dysbiosis, metabolic disease and inflammation. Profile 1 takes over Profile 2 in healthy samples, and unbalance of these profiles characterize dysbiotic samples. Furthermore, high fibre diet favours a healthy balance between profiles 1 and profile 2. Profile 3 takes over profile 2 during Crohn's disease, inducing functional reorientations towards unusual metabolism such as fucose and H2S degradation or propionate, acetone and butanediol production. Profile 4 gathers under-represented functions, like methanogenesis. Two taxonomic makes up of the profiles were investigated, using either the covariation of 203 prevalent genomes or metagenomic species, both providing consistent results in line with their functional characteristics. This taxonomic characterization showed that profiles 1 and 2 were respectively mainly composed of bacteria from the phyla Bacteroidetes and Firmicutes while profile 3 is representative of Proteobacteria and profile 4 of methanogens. CONCLUSIONS Integrating anaerobic microbiology knowledge with statistical learning can narrow down the metagenomic analysis to investigate functional profiles. Applying this approach to fibre degradation in the gut ended with 4 distinct functional profiles that can be easily monitored as markers of diet, dysbiosis, inflammation and disease. Video Abstract.
Collapse
|
7
|
Restoring natural upper limb movement through a wrist prosthetic module for partial hand amputees. J Neuroeng Rehabil 2023; 20:135. [PMID: 37798778 PMCID: PMC10552222 DOI: 10.1186/s12984-023-01259-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2023] [Accepted: 09/21/2023] [Indexed: 10/07/2023] Open
Abstract
BACKGROUND Most partial hand amputees experience limited wrist movement. The limited rotational wrist movement deteriorates natural upper limb system related to hand use and the usability of the prosthetic hand, which may cause secondary damage to the musculoskeletal system due to overuse of the upper limb affected by repetitive compensatory movement patterns. Nevertheless, partial hand prosthetics, in common, have only been proposed without rotational wrist movement because patients have various hand shapes, and a prosthetic hand should be attached to a narrow space. METHODS We hypothesized that partial hand amputees, when using a prosthetic hand with a wrist rotation module, would achieve natural upper limb movement muscle synergy and motion analysis comparable to a control group. To validate the proposed prototype design with the wrist rotation module and verify our hypothesis, we compared a control group with partial hand amputees wearing hand prostheses, both with and without the wrist rotation module prototype. The study contained muscle synergy analysis through non-negative matrix factorization (NMF) using surface electromyography (sEMG) and motion analyses employing a motion capture system during the reach-to-grasp task. Additionally, we assessed the usability of the prototype design for partial hand amputees using the Jebsen-Taylor hand function test (JHFT). RESULTS The results showed that the number of muscle synergies identified through NMF remained consistent at 3 for both the control group and amputees using a hand prosthesis with a wrist rotation module. In the motion analysis, a statistically significant difference was observed between the control group and the prosthetic hand without the wrist rotation module, indicating the presence of compensatory movements when utilizing a prosthetic hand lacking this module. Furthermore, among the amputees, the JHFT demonstrated a greater improvement in total score when using the prosthetic hand equipped with a wrist rotation module compared to the prosthetic hand without this module. CONCLUSION In conclusion, integrating a wrist rotation module in prosthetic hand designs for partial hand amputees restores natural upper limb movement patterns, reduces compensatory movements, and prevent the secondary musculoskeletal. This highlights the importance of this module in enhancing overall functionality and quality of life.
Collapse
|
8
|
A transcriptome study of p53-pathway related prognostic gene signature set in bladder cancer. Heliyon 2023; 9:e21058. [PMID: 37876438 PMCID: PMC10590981 DOI: 10.1016/j.heliyon.2023.e21058] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/14/2022] [Revised: 10/12/2023] [Accepted: 10/13/2023] [Indexed: 10/26/2023] Open
Abstract
p53 pathway is important in tumorigenesis. However, no study has been performed to specifically investigate the role of p53 pathway genes in bladder cancer (BLCA). In this study, transcriptomics data of muscle invasive bladder cancer patients (n = 411) from The Cancer Genome Atlas (TCGA) were investigated. Using the hallmark p53 pathway gene set, the Non-Negative Matrix factorization (NMF) analysis identified two subtypes (C1 and C2). Clinical, survival, and immunological analysis were done to validate distinct characteristics of the subtypes. Pathway enrichment analysis showed the subtype C1 with poor prognosis having enrichment in genes of the immunity related pathways, where C2 subtype with better prognosis being enriched in genes of the steroid synthesis and drug metabolism pathways. A signature gene set consisting of MDGA2, GNLY, GGT2, UGT2B4, DLX1, and DSC1 was created followed by a risk model. Their expressions were analyzed in RNA extracted from the blood and matched tumor tissues of BLCA patients (n = 10). DSC1 had significant difference of expression (p = 0.005) between the blood and tumor tissues in our BLCA samples. Contrary to the usual normal bladder tissue to blood ratio, DLX1 expression was lower (p = 0.02734) in tumor tissues than in blood. Being the first research of p53 pathway related signature gene set in bladder cancer, this study potentially has a substantial impact on the development of biomarkers for BLCA.
Collapse
|
9
|
Music emotion representation based on non-negative matrix factorization algorithm and user label information. PeerJ Comput Sci 2023; 9:e1590. [PMID: 37810354 PMCID: PMC10557512 DOI: 10.7717/peerj-cs.1590] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2023] [Accepted: 08/24/2023] [Indexed: 10/10/2023]
Abstract
Music emotion representation learning forms the foundation of user emotion recognition, addressing the challenges posed by the vast volume of digital music data and the scarcity of emotion annotation data. This article introduces a novel music emotion representation model, leveraging the nonnegative matrix factorization algorithm (NMF) to derive emotional embeddings of music by utilizing user-generated listening lists and emotional labels. This approach facilitates emotion recognition by positioning music within the emotional space. Furthermore, a dedicated music emotion recognition algorithm is formulated, alongside the proposal of a user emotion recognition model, which employs similarity-weighted calculations to obtain user emotion representations. Experimental findings demonstrate the method's convergence after a mere 400 iterations, yielding a remarkable 47.62% increase in F1 value across all emotion classes. In practical testing scenarios, the comprehensive accuracy rate of user emotion recognition attains an impressive 52.7%, effectively discerning emotions within seven emotion categories and accurately identifying users' emotional states.
Collapse
|
10
|
Block-Active ADMM to Minimize NMF with Bregman Divergences. SENSORS (BASEL, SWITZERLAND) 2023; 23:7229. [PMID: 37631765 PMCID: PMC10459034 DOI: 10.3390/s23167229] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/26/2023] [Revised: 08/10/2023] [Accepted: 08/16/2023] [Indexed: 08/27/2023]
Abstract
Over the last ten years, there has been a significant interest in employing nonnegative matrix factorization (NMF) to reduce dimensionality to enable a more efficient clustering analysis in machine learning. This technique has been applied in various image processing applications within the fields of computer vision and sensor-based systems. Many algorithms exist to solve the NMF problem. Among these algorithms, the alternating direction method of multipliers (ADMM) and its variants are one of the most popular methods used in practice. In this paper, we propose a block-active ADMM method to minimize the NMF problem with general Bregman divergences. The subproblems in the ADMM are solved iteratively by a block-coordinate-descent-type (BCD-type) method. In particular, each block is chosen directly based on the stationary condition. As a result, we are able to use much fewer auxiliary variables and the proposed algorithm converges faster than the previously proposed algorithms. From the theoretical point of view, the proposed algorithm is proved to converge to a stationary point sublinearly. We also conduct a series of numerical experiments to demonstrate the superiority of the proposed algorithm.
Collapse
|
11
|
A new gene-scoring method for uncovering novel glaucoma-related genes using non-negative matrix factorization based on RNA-seq data. Front Genet 2023; 14:1204909. [PMID: 37377596 PMCID: PMC10292752 DOI: 10.3389/fgene.2023.1204909] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2023] [Accepted: 05/30/2023] [Indexed: 06/29/2023] Open
Abstract
Early diagnosis and treatment of glaucoma are challenging. The discovery of glaucoma biomarkers based on gene expression data could potentially provide new insights for early diagnosis, monitoring, and treatment options of glaucoma. Non-negative Matrix Factorization (NMF) has been widely used in numerous transcriptome data analyses in order to identify subtypes and biomarkers of different diseases; however, its application in glaucoma biomarker discovery has not been previously reported. Our study applied NMF to extract latent representations of RNA-seq data from BXD mouse strains and sorted the genes based on a novel gene scoring method. The enrichment ratio of the glaucoma-reference genes, extracted from multiple relevant resources, was compared using both the classical differentially expressed gene (DEG) analysis and NMF methods. The complete pipeline was validated using an independent RNA-seq dataset. Findings showed our NMF method significantly improved the enrichment detection of glaucoma genes. The application of NMF with the scoring method showed great promise in the identification of marker genes for glaucoma.
Collapse
|
12
|
Scalable Orthonormal Projective NMF via Diversified Stochastic Optimization. INFORMATION PROCESSING IN MEDICAL IMAGING : PROCEEDINGS OF THE ... CONFERENCE 2023; 13939:497-508. [PMID: 37969113 PMCID: PMC10642358 DOI: 10.1007/978-3-031-34048-2_38] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/17/2023]
Abstract
The increasing availability of large-scale neuroimaging initiatives opens exciting opportunities for discovery science of human brain structure and function. Data-driven techniques, such as Orthonormal Projective Non-negative Matrix Factorization (opNMF), are well positioned to explore multivariate relationships in big data towards uncovering brain organization. opNMF enjoys advantageous interpretability and reproducibility compared to commonly used matrix factorization methods like Principal Component Analysis (PCA) and Independent Component Analysis (ICA), which led to its wide adoption in clinical computational neuroscience. However, applying opNMF in large-scale cohort studies is hindered by its limited scalability caused by its accompanying computational complexity. In this work, we address the computational challenges of opNMF using a stochastic optimization approach that learns over mini-batches of the data. Additionally, we diversify the stochastic batches via repulsive point processes, which reduce redundancy in the mini-batches and in turn lead to lower variance in the updates. We validated our framework on gray matter tissue density maps estimated from 1000 subjects part of the Open Access Series of Imaging (OASIS) dataset. We demonstrated that operations over mini-batches of data yield significant reduction in computational cost. Importantly, we showed that our novel optimization does not compromise the accuracy or interpretability of factors when compared to standard opNMF. The proposed model enables new investigations of brain structure using big neuroimaging data that could improve our understanding of brain structure in health and disease.
Collapse
|
13
|
Reconstruction of Raman Spectra of Biochemical Mixtures Using Group and Basis Restricted Non-Negative Matrix Factorization. APPLIED SPECTROSCOPY 2023:37028231169971. [PMID: 37097829 DOI: 10.1177/00037028231169971] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/19/2023]
Abstract
Raman spectroscopy is a useful tool for obtaining biochemical information from biological samples. However, interpretation of Raman spectroscopy data in order to draw meaningful conclusions related to the biochemical make up of cells and tissues is often difficult and could be misleading if care is not taken in the deconstruction of the spectral data. Our group has previously demonstrated the implementation of a group- and basis-restricted non-negative matrix factorization (GBR-NMF) framework as an alternative to more widely used dimensionality reduction techniques such as principal component analysis (PCA) for the deconstruction of Raman spectroscopy data as related to radiation response monitoring in both cellular and tissue data. While this method provides better biological interpretability of the Raman spectroscopy data, there are some important factors which must be considered in order to provide the most robust GBR-NMF model. We here evaluate and compare the accuracy of a GBR-NMF model in the reconstruction of three mixture solutions of known concentrations. The factors assessed include the effect of solid versus solutions bases spectra, the number of unconstrained components used in the model, the tolerance of different signal to noise thresholds, and how different groups of biochemicals compare to each other. The robustness of the model was assessed by how well the relative concentration of each individual biochemical in the solution mixture is reflected in the GBR-NMF scores obtained. We also evaluated how well the model can reconstruct original data, both with and without the inclusion of an unconstrained component. Overall, we found that solid bases spectra were generally comparable to solution bases spectra in the GBR-NMF model for all groups of biochemicals. The model was found to be relatively tolerant of high levels of noise in the mixture solutions using solid bases spectra. Additionally, the inclusion of an unconstrained component did not have a significant effect on the deconstruction, on the condition that all biochemicals in the mixture were included as bases chemicals in the model. We also report that some groups of biochemicals achieve a more accurate deconstruction using GBR-NMF than others, likely due to similarity in the individual bases spectra.
Collapse
|
14
|
Study subnetwork developing pattern of autism children by non-negative matrix factorization. Comput Biol Med 2023; 158:106816. [PMID: 37003070 DOI: 10.1016/j.compbiomed.2023.106816] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2022] [Revised: 03/08/2023] [Accepted: 03/20/2023] [Indexed: 04/03/2023]
Abstract
BACKGROUND As a developmental disorder, the brain networks of autism children show abnormal patterns compared with that of typically developing. The differences between them are not stable due to the developing progress of children. It has become a choice to study the differences of developing trajectories between autistic and typically developing children by investigating the change of each group respectively. Related researches studied the developing of brain network by analyzing the relationship between network indices of the entire or sub brain networks and the cognitive developing scores. METHODS As a matrix decomposition algorithm, non-negative matrix factorization (NMF) was applied to decompose the association matrices of brain networks. By NMF, we can obtain subnetworks in an unsupervised way. The association matrices of autism and control children were estimated by their magnetoencephalography data. NMF was applied to decompose the matrices to obtain common subnetworks of both groups. Then we calculated the expression of each subnetwork in each child's brain network by two indices, energy and entropy. The relationship between the expression and the cognitive and development indices were investigated. RESULTS We found a subnetwork with left lateralization pattern in α band showed different expression tendency in two groups. The expression indices of two groups were correlated with cognitive indices in autism and control group in an opposite way. In γ band, a subnetwork with strong connections on right hemisphere of brain showed a negative correlation between the expression indices and development indices in autism group. CONCLUSION NMF algorithm can effectively decompose brain network to meaningful subnetworks. The finding of α band subnetworks confirms the results of abnormal lateralization of autistic children mentioned in relevant studies. We assume the results of decrease of expression of the subnetwork may relate to the dysfunction of mirror neuron. The decrease expression of γ subnetwork of autism may be related to the weaken process of high-frequency neurons in the neurotrophic competition.
Collapse
|
15
|
High-throughput genetic clustering of type 2 diabetes loci reveals heterogeneous mechanistic pathways of metabolic disease. Diabetologia 2023; 66:495-507. [PMID: 36538063 PMCID: PMC10108373 DOI: 10.1007/s00125-022-05848-6] [Citation(s) in RCA: 15] [Impact Index Per Article: 15.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 08/05/2022] [Accepted: 10/28/2022] [Indexed: 12/24/2022]
Abstract
AIMS/HYPOTHESIS Type 2 diabetes is highly polygenic and influenced by multiple biological pathways. Rapid expansion in the number of type 2 diabetes loci can be leveraged to identify such pathways. METHODS We developed a high-throughput pipeline to enable clustering of type 2 diabetes loci based on variant-trait associations. Our pipeline extracted summary statistics from genome-wide association studies (GWAS) for type 2 diabetes and related traits to generate a matrix of 323 variants × 64 trait associations and applied Bayesian non-negative matrix factorisation (bNMF) to identify genetic components of type 2 diabetes. Epigenomic enrichment analysis was performed in 28 cell types and single pancreatic cells. We generated cluster-specific polygenic scores and performed regression analysis in an independent cohort (N=25,419) to assess for clinical relevance. RESULTS We identified ten clusters of genetic loci, recapturing the five from our prior analysis as well as novel clusters related to beta cell dysfunction, pronounced insulin secretion, and levels of alkaline phosphatase, lipoprotein A and sex hormone-binding globulin. Four clusters related to mechanisms of insulin deficiency, five to insulin resistance and one had an unclear mechanism. The clusters displayed tissue-specific epigenomic enrichment, notably with the two beta cell clusters differentially enriched in functional and stressed pancreatic beta cell states. Additionally, cluster-specific polygenic scores were differentially associated with patient clinical characteristics and outcomes. The pipeline was applied to coronary artery disease and chronic kidney disease, identifying multiple overlapping clusters with type 2 diabetes. CONCLUSIONS/INTERPRETATION Our approach stratifies type 2 diabetes loci into physiologically interpretable genetic clusters associated with distinct tissues and clinical outcomes. The pipeline allows for efficient updating as additional GWAS become available and can be readily applied to other conditions, facilitating clinical translation of GWAS findings. Software to perform this clustering pipeline is freely available.
Collapse
|
16
|
Systematic single-cell dissecting reveals heterogeneous oncofetal reprogramming in the tumor microenvironment of gastric cancer. Hum Cell 2023; 36:689-701. [PMID: 36662371 DOI: 10.1007/s13577-023-00856-z] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2022] [Accepted: 01/05/2023] [Indexed: 01/21/2023]
Abstract
Oncofetal reprogramming of the tumor microenvironment is clinically relevant. This study used the non-negative matrix factorial (NMF) algorithm for single-cell RNA sequencing data of gastric cancer (GC) based on embryonic stem genes. Pseudotime analysis, cell-cell interaction analysis, and SCENIC analysis revealed that cancer-associated fibroblasts (CAFs), tumor-associated endothelial cells (TECs), and tumor-associated macrophages (TAMs) have different oncofetal reprogramming that affects cell function, enhances intercellular communication, and activates multiple transcription factors in these cells. Furthermore, based on the signatures of the newly defined oncofetal cell subtypes and expression profiles of large cohorts in GC patients, we determined that GJA1 + TEC-C2, IFITM1 + CAF-C3, PODXL + TEC-C1, SFRP2 + CAF-C2, and SRSF7 + CAF-C1 are crucial prognostic factors for GC patients and predictors of immune checkpoint blockade in GC. Cell subtypes were validated by immunohistochemical methods. Our novel, profound, and systematic analysis of the oncofetal reprogramming of GC may facilitate the development of improved drugs for treating GC.
Collapse
|
17
|
Single-cell dissection reveals the role of DNA damage response patterns in tumor microenvironment components contributing to colorectal cancer progression and immunotherapy. Genes Cells 2023; 28:348-363. [PMID: 36811212 DOI: 10.1111/gtc.13017] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2022] [Revised: 02/17/2023] [Accepted: 02/17/2023] [Indexed: 02/24/2023]
Abstract
Colorectal cancer (CRC) is one of the leading malignant cancers. DNA damage response (DDR), referring to the molecular process of DNA damage, is emerging as a promising field in targeted cancer therapy. However, the engagement of DDR in the remodeling of the tumor microenvironment is rarely studied. In this study, by sequential nonnegative matrix factorization (NMF) algorithm, pseudotime analysis, cell-cell interaction analysis, and SCENIC analysis, we have shown that DDR genes demonstrate various patterns among different cell types in CRC TME (tumor microenvironment), especially in epithelial cells, cancer-associated fibroblasts, CD8+ T cells, tumor-associated macrophages, which enhance the intensity of intercellular communication and transcription factor activation. Furthermore, based on the newly identified DDR-related TME signatures, cell subtypes including MNAT+CD8+T_cells-C5, POLR2E+Mac-C10, HMGB2+Epi-C4, HMGB1+Mac-C11, PER1+Mac-C5, PER1+CD8+T_cells-C1, POLR2A+Mac-C1, TDG+Epi-C5, TDG+CD8+T_cells-C8 are determined as critical prognostic factors for CRC patients and predictors of immune checkpoint blockade (ICB) therapy efficacy in two public CRC cohorts, TCGA-COAD and GSE39582. Our novel and systematic analysis on the level of the single-cell analysis has revealed the unique role of DDR in remodeling CRC TME for the first time, facilitating the prediction of prognosis and guidance of personalized ICB regimens in CRC.
Collapse
|
18
|
MDSR- NMF: Multiple deconstruction single reconstruction deep neural network model for non-negative matrix factorization. NETWORK (BRISTOL, ENGLAND) 2023; 34:306-342. [PMID: 37818635 DOI: 10.1080/0954898x.2023.2257773] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/07/2022] [Accepted: 08/31/2023] [Indexed: 10/12/2023]
Abstract
Dimension reduction is one of the most sought-after strategies to cope with high-dimensional ever-expanding datasets. To address this, a novel deep-learning architecture has been designed with multiple deconstruction and single reconstruction layers for non-negative matrix factorization aimed at low-rank approximation. This design ensures that the reconstructed input matrix has a unique pair of factor matrices. The two-stage approach, namely, pretraining and stacking, aids in the robustness of the architecture. The sigmoid function has been adjusted in such a way that fulfils the non-negativity criteria and also helps to alleviate the data-loss problem. Xavier initialization technique aids in the solution of the exploding or vanishing gradient problem. The objective function involves regularizer that ensures the best possible approximation of the input matrix. The superior performance of MDSR-NMF, over six well-known dimension reduction methods, has been demonstrated extensively using five datasets for classification and clustering. Computational complexity and convergence analysis have also been presented to establish the model.
Collapse
|
19
|
Imaging genetic association analysis of triple-negative breast cancer based on the integration of prior sample information. Front Genet 2023; 14:1090847. [PMID: 36911413 PMCID: PMC9992804 DOI: 10.3389/fgene.2023.1090847] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/06/2022] [Accepted: 02/10/2023] [Indexed: 02/25/2023] Open
Abstract
Triple-negative breast cancer (TNBC) is one of the more aggressive subtypes of breast cancer. The prognosis of TNBC patients remains low. Therefore, there is still a need to continue identifying novel biomarkers to improve the prognosis and treatment of TNBC patients. Research in recent years has shown that the effective use and integration of information in genomic data and image data will contribute to the prediction and prognosis of diseases. Considering that imaging genetics can deeply study the influence of microscopic genetic variation on disease phenotype, this paper proposes a sample prior information-induced multidimensional combined non-negative matrix factorization (SPID-MDJNMF) algorithm to integrate the Whole-slide image (WSI), mRNAs expression data, and miRNAs expression data. The algorithm effectively fuses high-dimensional data of three modalities through various constraints. In addition, this paper constructs an undirected graph between samples, uses an adjacency matrix to constrain the similarity, and embeds the clinical stage information of patients in the algorithm so that the algorithm can identify the co-expression patterns of samples with different labels. We performed univariate and multivariate Cox regression analysis on the mRNAs and miRNAs in the screened co-expression modules to construct a TNBC-related prognostic model. Finally, we constructed prognostic models for 2-mRNAs (IL12RB2 and CNIH2) and 2-miRNAs (miR-203a-3p and miR-148b-3p), respectively. The prognostic model can predict the survival time of TNBC patients with high accuracy. In conclusion, our proposed SPID-MDJNMF algorithm can efficiently integrate image and genomic data. Furthermore, we evaluated the prognostic value of mRNAs and miRNAs screened by the SPID-MDJNMF algorithm in TNBC, which may provide promising targets for the prognosis of TNBC patients.
Collapse
|
20
|
Prognostic Role of Unfolded Protein Response-Related Genes in Hepatocellular Carcinoma. Curr Protein Pept Sci 2023; 24:666-683. [PMID: 37587817 DOI: 10.2174/1389203724666230816090504] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/29/2022] [Revised: 04/25/2023] [Accepted: 05/08/2023] [Indexed: 08/18/2023]
Abstract
AIMS To reveal the prognostic role of unfolded protein response (UPR) -related genes in hepatocellular carcinoma (HCC). BACKGROUND Hepatocellular carcinoma is a genetically heterogeneous tumor, and the prediction of its prognosis remains a challenge. Studies elucidating the molecular mechanisms of UPR have rapidly increased. However, the UPR molecular subtype characteristics of the related genes in HCC progression have yet to be thoroughly studied. OBJECTIVE Conducting a comprehensive assessment of the prognostic signature of genes related to the UPR in patients with HCC can advance our understanding of the cellular processes contributing to the progression of HCC and offer innovative strategies in precise therapy. METHODS Based on the gene expression profiles associated with UPR in HCC, we explored the molecular subtypes mediated by UPR-related genes and constructed a UPR-related genes signature that could precisely predict the prognosis for HCC. RESULTS Using microarray data of HCC patients, differentially expressed UPR-related genes (DEGs) were discovered in malignancies and normal tissues. The HCC was classified into two molecular subtypes by the NMF algorithm based on DEGs modification of the UPR. Moreover, we developed a UPR-related model for predicting HCC patients' prognosis. The robustness of the UPR- related model was confirmed in external validation. Moreover, we analyzed immune responses in different risk groups. Analysis of immune functions revealed that Treg, Macrophages, aDCs, and MHC class-I were significantly up-regulated in high-risk HCC. At the same time, cytolytic activity and type I and II INF response were higher in a low-risk subgroup. CONCLUSION This study identified two UPR molecular subtypes of HCC and developed a ten-gene HCC prognostic signature model (EXTL3, PPP2R5B, ZBTB17, CCT3, CCT4, CCT5, GRPEL2, HSP90AA1, PDRG1, and STC2), which can robustly forecast the progression of HCC.
Collapse
|
21
|
Statistical Methods for Integrative Clustering of Multi-omics Data. Methods Mol Biol 2023; 2629:73-93. [PMID: 36929074 PMCID: PMC10950392 DOI: 10.1007/978-1-0716-2986-4_5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/18/2023]
Abstract
Cancers are heterogeneous diseases caused by accumulated mutations or abnormal alterations at multi-levels of biological processes including genomics, epigenomics, transcriptomics, and proteomics. There is a great clinical interest in identifying cancer molecular subtypes for disease prognosis and personalized medicine. Integrative clustering is a powerful unsupervised learning method that has been increasingly used to identify cancer molecular subtypes using multi-omics data including somatic mutations, DNA copy numbers, DNA methylation, and gene expression. Integrative clustering methods are generally classified into model-based or nonparametric approaches. In this chapter, we will give an overview of the frequently used model-based methods, including iCluster, iClusterPlus, and iClusterBayes, and the nonparametric method, integrative nonnegative matrix factorization (intNMF). We will use the integrative analyses of uveal melanoma and lower-grade glioma to illustrate these representative methods. Finally, we will discuss the strengths and limitations of these representative methods and give suggestions for performing integrative analyses of cancer multi-omics data in practice.
Collapse
|
22
|
Towards Automated Classification of Zooplankton Using Combination of Laser Spectral Techniques and Advanced Chemometrics. SENSORS (BASEL, SWITZERLAND) 2022; 22:8234. [PMID: 36365928 PMCID: PMC9657760 DOI: 10.3390/s22218234] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 09/08/2022] [Revised: 10/17/2022] [Accepted: 10/25/2022] [Indexed: 06/16/2023]
Abstract
Zooplankton identification has been the subject of many studies. They are mainly based on the analysis of photographs (computer vision). However, spectroscopic techniques can be a good alternative due to the valuable additional information that they provide. We tested the performance of several chemometric techniques (principal component analysis (PCA), non-negative matrix factorisation (NMF), and common dimensions and specific weights analysis (CCSWA of ComDim)) for the unsupervised classification of zooplankton species based on their spectra. The spectra were obtained using laser-induced breakdown spectroscopy (LIBS) and Raman spectroscopy. It was convenient to assess the discriminative power in terms of silhouette metrics (Sil). The LIBS data were substantially more useful for the task than the Raman spectra, although the best results were achieved for the combined LIBS + Raman dataset (best Sil = 0.67). Although NMF (Sil = 0.63) and ComDim (Sil = 0.39) gave interesting information in the loadings, PCA was generally enough for the discrimination based on the score graphs. The distinguishing between Calanoida and Euphausiacea crustaceans and Limacina helicina sea snails has proved possible, probably because of their different mineral compositions. Conversely, arrow worms (Parasagitta elegans) usually fell into the same class with Calanoida despite the differences in their Raman spectra.
Collapse
|
23
|
Application of non-negative matrix factorization in oncology: one approach for establishing precision medicine. Brief Bioinform 2022; 23:6628783. [PMID: 35788277 PMCID: PMC9294421 DOI: 10.1093/bib/bbac246] [Citation(s) in RCA: 11] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2022] [Revised: 05/06/2022] [Accepted: 05/25/2022] [Indexed: 12/19/2022] Open
Abstract
The increase in the expectations of artificial intelligence (AI) technology has led to machine learning technology being actively used in the medical field. Non-negative matrix factorization (NMF) is a machine learning technique used for image analysis, speech recognition, and language processing; recently, it is being applied to medical research. Precision medicine, wherein important information is extracted from large-scale medical data to provide optimal medical care for every individual, is considered important in medical policies globally, and the application of machine learning techniques to this end is being handled in several ways. NMF is also introduced differently because of the characteristics of its algorithms. In this review, the importance of NMF in the field of medicine, with a focus on the field of oncology, is described by explaining the mathematical science of NMF and the characteristics of the algorithm, providing examples of how NMF can be used to establish precision medicine, and presenting the challenges of NMF. Finally, the direction regarding the effective use of NMF in the field of oncology is also discussed.
Collapse
|
24
|
Biomarkers in asthma in the context of atopic dermatitis in young children. Pediatr Allergy Immunol 2022; 33:e13823. [PMID: 35871461 PMCID: PMC9544684 DOI: 10.1111/pai.13823] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 02/27/2022] [Revised: 06/03/2022] [Accepted: 06/07/2022] [Indexed: 12/22/2022]
Abstract
BACKGROUND Diverse pathways stemming from a history of atopic dermatitis (AD) might modulate different biomarkers associated with the development of asthma. Biomarkers associated with AD and asthma separately have been investigated, but none have characterized a combined AD+asthma phenotype. We investigated the clinical and molecular characteristics associated with an AD+asthma phenotype compared with AD, asthma and controls. METHODS From a prospective birth cohort and the outpatient allergy clinic, we included four groups of 6-12-year-old children: (1) healthy controls (2) previous, current, or present AD without asthma, (3) previous, current, or present AD and current asthma and (4) current asthma without AD. We performed clinical examinations and interviews and measured serum IgE, natural moisturizing factors (NMF), and plasma cytokine levels. RESULTS We found an increased number of IgE sensitizations in AD+asthma, prominent after stratifying for food allergens (p < .05). Pro-Th2 cytokines CCL18, TSLP, and Eotaxin-3 were elevated in AD+asthma, though not significantly higher than asthma, and elevated in asthma compared with controls. NMF levels were decreased in AD compared with asthma and control groups (p = .019, p < .001, respectively). NMF levels correlated negatively to sensitization (p = .026), though nonsignificant with only the patient groups. CONCLUSION Our results indicate that Th2 cytokines and increased number of sensitizations are associated with AD + asthma phenotypes compared with AD alone and that skin barrier impairment as well as decreased airway epithelial integrity may play a role in sensitization and immune modulation. Our findings suggest candidate biomarkers that should be further explored for their functional roles and prognostic potential.
Collapse
|
25
|
Confocal Raman spectroscopy is suitable to assess hair cleansing-derived skin dryness on human scalp. Skin Res Technol 2022; 28:577-581. [PMID: 35638406 PMCID: PMC9907629 DOI: 10.1111/srt.13157] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2021] [Accepted: 03/09/2022] [Indexed: 11/26/2022]
Abstract
BACKGROUND The purpose of this pilot study was to provide information about the washout-dependent depletion of important skin components in the horny layer of the scalp. They were taken as markers for scalp drying effects of cosmetic cleansing products and were measured directly in vivo. METHOD In vivo confocal Raman spectroscopy was used to measure the depletion of the total natural moisturizing factor (total NMF) and some of its components (urea and lactic acid) as well as a fraction of stratum corneum lipids, after repeated washing with a standard shampoo on the human scalp. RESULTS The measurements showed a reduction in the amount of NMF and lipids of the stratum corneum caused by repeated shampooing. CONCLUSION Confocal Raman spectroscopy is an innovative technology that can be used successfully in vivo on the hairy scalp. The loss of the most important skin components caused by hair washing can be quantified directly with this technology. The method is valuable to support the development cosmetic cleansing products, as it is suitable to directly compare the effects of different product candidates on the human scalp in a most realistic way.
Collapse
|
26
|
Revisiting the Roles of Filaggrin in Atopic Dermatitis. Int J Mol Sci 2022; 23:5318. [PMID: 35628125 PMCID: PMC9140947 DOI: 10.3390/ijms23105318] [Citation(s) in RCA: 28] [Impact Index Per Article: 14.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2022] [Revised: 05/03/2022] [Accepted: 05/06/2022] [Indexed: 12/31/2022] Open
Abstract
The discovery in 2006 that loss-of-function mutations in the filaggrin gene (FLG) cause ichthyosis vulgaris and can predispose to atopic dermatitis (AD) galvanized the dermatology research community and shed new light on a skin protein that was first identified in 1981. However, although outstanding work has uncovered several key functions of filaggrin in epidermal homeostasis, a comprehensive understanding of how filaggrin deficiency contributes to AD is still incomplete, including details of the upstream factors that lead to the reduced amounts of filaggrin, regardless of genotype. In this review, we re-evaluate data focusing on the roles of filaggrin in the epidermis, as well as in AD. Filaggrin is important for alignment of keratin intermediate filaments, control of keratinocyte shape, and maintenance of epidermal texture via production of water-retaining molecules. Moreover, filaggrin deficiency leads to cellular abnormalities in keratinocytes and induces subtle epidermal barrier impairment that is sufficient enough to facilitate the ingress of certain exogenous molecules into the epidermis. However, although FLG null mutations regulate skin moisture in non-lesional AD skin, filaggrin deficiency per se does not lead to the neutralization of skin surface pH or to excessive transepidermal water loss in atopic skin. Separating facts from chaff regarding the functions of filaggrin in the epidermis is necessary for the design efficacious therapies to treat dry and atopic skin.
Collapse
|
27
|
A Topic Modeling Comparison Between LDA, NMF, Top2Vec, and BERTopic to Demystify Twitter Posts. FRONTIERS IN SOCIOLOGY 2022; 7:886498. [PMID: 35602001 PMCID: PMC9120935 DOI: 10.3389/fsoc.2022.886498] [Citation(s) in RCA: 29] [Impact Index Per Article: 14.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/28/2022] [Accepted: 04/19/2022] [Indexed: 05/28/2023]
Abstract
The richness of social media data has opened a new avenue for social science research to gain insights into human behaviors and experiences. In particular, emerging data-driven approaches relying on topic models provide entirely new perspectives on interpreting social phenomena. However, the short, text-heavy, and unstructured nature of social media content often leads to methodological challenges in both data collection and analysis. In order to bridge the developing field of computational science and empirical social research, this study aims to evaluate the performance of four topic modeling techniques; namely latent Dirichlet allocation (LDA), non-negative matrix factorization (NMF), Top2Vec, and BERTopic. In view of the interplay between human relations and digital media, this research takes Twitter posts as the reference point and assesses the performance of different algorithms concerning their strengths and weaknesses in a social science context. Based on certain details during the analytical procedures and on quality issues, this research sheds light on the efficacy of using BERTopic and NMF to analyze Twitter data.
Collapse
|
28
|
Component spectra extraction and quantitative analysis for preservative mixtures by combining terahertz spectroscopy and machine learning. SPECTROCHIMICA ACTA. PART A, MOLECULAR AND BIOMOLECULAR SPECTROSCOPY 2022; 271:120908. [PMID: 35077979 DOI: 10.1016/j.saa.2022.120908] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/16/2021] [Revised: 01/09/2022] [Accepted: 01/13/2022] [Indexed: 06/14/2023]
Abstract
Preservatives are universally used in synergistic combination to enhance antimicrobial effect. Identify compositions and quantify components of preservatives are crucial steps in quality monitoring to guarantee merchandise safety. In the work, three most common preservatives, sorbic acid, potassium sorbate and sodium benzoate, are deliberately mixed in pairs with different mass ratios, which aresupposedto be the "unknown" multicomponent systems and measured by terahertz (THz) time-domain spectroscopy. Subsequently, three major challenges have been accomplished by machine learning methods in this work. The singular value decomposition (SVD) effectively obtains the number of components in mixed preservatives. Then, the component spectra are successfully extracted by non-negative matrix factorization (NMF) and self-modeling mixture analysis (SMMA), which match well with the measured THz spectra of pure reagents. Moreover, the support vector machine for regression (SVR) designed an underlying model to the target components and simultaneously identify contents of each individual component in validation mixtures with decision coefficient R2 = 0.989. By taking advantages of the fingerprint-based THz technique and machine learning methods, our approach has been demonstrated the great potential to be served as a useful strategy for detecting preservative mixtures in practical applications.
Collapse
|
29
|
Raman Research on Bleomycin-Induced DNA Strand Breaks and Repair Processes in Living Cells. Int J Mol Sci 2022; 23:ijms23073524. [PMID: 35408885 PMCID: PMC8998246 DOI: 10.3390/ijms23073524] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2022] [Revised: 03/20/2022] [Accepted: 03/22/2022] [Indexed: 01/27/2023] Open
Abstract
Even several thousands of DNA lesions are induced in one cell within one day. DNA damage may lead to mutations, formation of chromosomal aberrations, or cellular death. A particularly cytotoxic type of DNA damage is single- and double-strand breaks (SSBs and DSBs, respectively). In this work, we followed DNA conformational transitions induced by the disruption of DNA backbone. Conformational changes of chromatin in living cells were induced by a bleomycin (BLM), an anticancer drug, which generates SSBs and DSBs. Raman micro-spectroscopy enabled to observe chemical changes at the level of single cell and to collect hyperspectral images of molecular structure and composition with sub-micrometer resolution. We applied multivariate data analysis methods to extract key information from registered data, particularly to probe DNA conformational changes. Applied methodology enabled to track conformational transition from B-DNA to A-DNA upon cellular response to BLM treatment. Additionally, increased expression of proteins within the cell nucleus resulting from the activation of repair processes was demonstrated. The ongoing DNA repair process under the BLM action was also confirmed with confocal laser scanning fluorescent microscopy.
Collapse
|
30
|
Integration of Single-Cell RNA Sequencing and Bulk RNA Sequencing Data to Establish and Validate a Prognostic Model for Patients With Lung Adenocarcinoma. Front Genet 2022; 13:833797. [PMID: 35154287 PMCID: PMC8829512 DOI: 10.3389/fgene.2022.833797] [Citation(s) in RCA: 13] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2021] [Accepted: 01/14/2022] [Indexed: 12/27/2022] Open
Abstract
Background: Lung adenocarcinoma (LUAD) remains a lethal disease worldwide, with numerous studies exploring its potential prognostic markers using traditional RNA sequencing (RNA-seq) data. However, it cannot detect the exact cellular and molecular changes in tumor cells. This study aimed to construct a prognostic model for LUAD using single-cell RNA-seq (scRNA-seq) and traditional RNA-seq data. Methods: Bulk RNA-seq data were downloaded from The Cancer Genome Atlas (TCGA) database. LUAD scRNA-seq data were acquired from Gene Expression Omnibus (GEO) database. The uniform manifold approximation and projection (UMAP) was used for dimensionality reduction and cluster identification. Weighted Gene Correlation Network Analysis (WGCNA) was utilized to identify key modules and differentially expressed genes (DEGs). The non-negative Matrix Factorization (NMF) algorithm was used to identify different subtypes based on DEGs. The Cox regression analysis was used to develop the prognostic model. The characteristics of mutation landscape, immune status, and immune checkpoint inhibitors (ICIs) related genes between different risk groups were also investigated. Results: scRNA-seq data of four samples were integrated to identify 13 clusters and 9cell types. After applying differential analysis, NK cells, bladder epithelial cells, and bronchial epithelial cells were identified as significant cell types. Overall, 329 DEGs were selected for prognostic model construction through differential analysis and WGCNA. Besides, NMF identified two clusters based on DEGs in the TCGA cohort, with distinct prognosis and immune characteristics being observed. We developed a prognostic model based on the expression levels of six DEGs. A higher risk score was significantly correlated with poor survival outcomes but was associated with a more frequent TP53 mutation rate, higher tumor mutation burden (TMB), and up-regulation of PD-L1. Two independent external validation cohorts were also adopted to verify our results, with consistent results observed in them. Conclusion: This study constructed and validated a prognostic model for LUAD by integrating 10× scRNA-seq and bulk RNA-seq data. Besides, we observed two distinct subtypes in this population, with different prognosis and immune characteristics.
Collapse
|
31
|
A proof-of-concept study utilising 2D NMR spectrometry for in situ characterisation and quantitation of key biomarkers and actives in tape stripped ex vivo human skin. Talanta 2022; 237:122980. [PMID: 34736701 DOI: 10.1016/j.talanta.2021.122980] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/04/2021] [Revised: 10/11/2021] [Accepted: 10/17/2021] [Indexed: 11/17/2022]
Abstract
The development of a semi-automated and rapid analytical technique for dermatological analysis has become a key aim of many medical and commercial entities through greater awareness of people to skin health and its importance in the 21st century. We present a proof-of-concept methodology demonstrating the use of validated non-destructive, in-situ (Nuclear Magnetic Resonance Spectroscopy) NMR techniques for characterisation and quantitation of (Natural Moisturising Factor) NMF compounds and actives from topical formulations. This quantitation is crucial for appropriate diagnosis of atopic dermatitis severity due to its association with reduced NMF abundance. This study is the first to combine diffusion NMR, semi-automated quantitation and ex-vivo skin samples to measure NMF and permeation of actives. We have shown that diffusion NMR allows for resolution between formulation components through determination of self-diffusion coefficients. We also demonstrate how the metabolomics software chenomxtm can be used to identify and quantitate individual NMF components. We show comparable results to previous literature on NMF layers in the skin, alongside reinforcing findings on permeation enhancers and heat effects on transdermal delivery of actives and formulation components. The presented methodology has shown great potential as an effective non-destructive, fast and versatile technique for dermatological analysis of physiology and actives, with future hardware and software developments in NMR making the future of dermatological analysis via NMR very promising.
Collapse
|
32
|
Deep Unfolding for Non-Negative Matrix Factorization with Application to Mutational Signature Analysis. J Comput Biol 2022; 29:45-55. [PMID: 34986029 DOI: 10.1089/cmb.2021.0438] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
Non-negative matrix factorization (NMF) is a fundamental matrix decomposition technique that is used primarily for dimensionality reduction and is increasing in popularity in the biological domain. Although finding a unique NMF is generally not possible, there are various iterative algorithms for NMF optimization that converge to locally optimal solutions. Such techniques can also serve as a starting point for deep learning methods that unroll the algorithmic iterations into layers of a deep network. In this study, we develop unfolded deep networks for NMF and several regularized variants in both a supervised and an unsupervised setting. We apply our method to various mutation data sets to reconstruct their underlying mutational signatures and their exposures. We demonstrate the increased accuracy of our approach over standard formulations in analyzing simulated and real mutation data.
Collapse
|
33
|
Machine learning revealed molecular classification of colorectal cancer with negative lymph node metastasis. Biomarkers 2021; 27:86-94. [PMID: 34894932 DOI: 10.1080/1354750x.2021.2016971] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022]
Abstract
Purpose: Accurate preoperative staging directly affects the treatment decision of patients with rectal cancer. However, our understanding of the immune subclasses of CRC without lymph node metastasis is still incomplete.Materials and methods: Here, we first analyzed the subclasses of CRC without lymph node metastasis on the Cancer Genome Atlas (TCGA) and verified its stability in the GSE39582 dataset. Four immune subclasses (C1-C4) were identified and verified by non-negative matrix factorization (NMF) of gene expression profiles. Then, ICI scores of six genes were constructed to characterize subclasses.Results: There were significant differences in metabolic and progression-associated signatures, immune characteristics, and clinical characteristics among subclasses. C3 represented a good prognosis with high TMB. C4 showed unique immune characteristics. We believe that C3 is the initial stage of CRC. After the C1 and C2 stages, it progresses to the C4 stage, and finally, lymph node metastasis occurs.Conclusions: This work may help to provide a basis for immunotherapy decision-making in early CRC and may guide personalized methods of cancer immunotherapy.
Collapse
|
34
|
Quantitative evaluation for fluid components on 2D NMR spectrum using Blind Source Separation. JOURNAL OF MAGNETIC RESONANCE (SAN DIEGO, CALIF. : 1997) 2021; 332:107079. [PMID: 34638086 DOI: 10.1016/j.jmr.2021.107079] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/30/2021] [Revised: 09/10/2021] [Accepted: 09/13/2021] [Indexed: 06/13/2023]
Abstract
During oil and gas exploration, it is difficult to quantitatively evaluate fluid components and accurately calculate the saturation of different fluids because of the overlapping of fluid components on 2D NMR spectrum. In this paper, Blind Source Separation (BSS) is proposed to separate fluid components, which utilizes the statistical independence of fluid signals on 2D NMR spectrum. Fast Independent Component Analysis (FastICA) is employed for the inverted NMR spectrums in an entire logged interval to obtain the residual information to determine the number of fluid components. Based on the determined number of fluid components, Nonnegative matrix factorization (NMF) is used to obtain the features of fluid components on NMR spectrum and the region on 2D NMR spectrum is divided into different regions. The overlapping regions are classified by distance or distance and T1/T2 to obtain the modified NMR spectrum. Through T2-D and T1-T2 numerical simulation, the fluid saturations calculated by the proposed method and NMF are compared to verify the effectiveness of the proposed method. The results showed that the proposed method can be used to determine the number of fluid components effectively, and the calculated fluid saturations are more accurate than that obtained by NMF.
Collapse
|
35
|
Classification modeling method for hyperspectral stamp-pad ink data based on one-dimensional convolutional neural network. J Forensic Sci 2021; 67:550-561. [PMID: 34617278 DOI: 10.1111/1556-4029.14909] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2021] [Revised: 08/25/2021] [Accepted: 09/21/2021] [Indexed: 11/28/2022]
Abstract
In the questioned document, the examination of stamp-pad ink is crucial scientific evidence to discern the difference between genuine and forged documents. In this study, a new method for rapid and non-destructive identification of types of stamp-pad inks by combining hyperspectral imaging (HSI) technology and deep learning was developed. Twenty stamp-pad inks of different brands and models were collected and numbered in turn, and then, each of them was sealed six times repeatedly on the A4 printing paper for the test. After that, the hyperspectral imager was used to collect the hyperspectral images and the reflectance spectral data were obtained after pixel fusion. Principal component analysis (PCA) and non-negative matrix factorization (NMF) were used to deal with the dataset, but visual results were not good. Then, back propagation neural network (BPNN) and one-dimensional convolutional neural network (1D-CNN) were constructed and their merits and drawbacks were compared. The final loss function of the BPNN of training set and validation set was stable at 0.27 and 0.42, and the classification accuracy of the training set and validation set reached 90.02% and 83.99%, respectively. Compared with the BPNN, the 1D-CNN had better stability and efficiency for the classification. The loss function of the training set and validation set was as low as 0.068 and 0.075, and the final classification accuracy reached 98.30% and 97.94%, respectively. Therefore, the combination of hyperspectral imaging technology and 1D-CNN represents a potentially simple, non-destructive, and rapid method for stamp-pad inks detection and classification.
Collapse
|
36
|
Extraction of natural moisturizing factor from the stratum corneum and its implication on skin molecular mobility. J Colloid Interface Sci 2021; 604:480-491. [PMID: 34273783 DOI: 10.1016/j.jcis.2021.07.012] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/09/2021] [Revised: 07/01/2021] [Accepted: 07/02/2021] [Indexed: 12/12/2022]
Abstract
The natural moisturizing factor (NMF) is a mixture of small water-soluble compounds present in the upper layer of the skin, stratum corneum (SC). Soaking of SC in water leads to extraction of the NMF molecules, which may influence the SC molecular properties and lead to brittle and dry skin. In this study, we investigate how the molecular dynamics in SC lipid and protein components are affected by the removal of the NMF compounds. We then explore whether the changes in SC components caused by NMF removal can be reversed by a subsequent addition of one single NMF component: urea, pyrrolidone carboxylic acid (PCA) or potassium lactate. Samples of intact SC were investigated using NMR, X-ray diffraction, infrared spectroscopy and sorption microbalance. It is shown that the removal of NMF leads to reduced molecular mobility in keratin filaments and SC lipids compared to untreated SC. When the complex NMF mixture is replaced by one single NMF component, the molecular mobility in both keratin filaments and lipids is regained. From this we propose a general relation between the molecular mobility in SC and the amount of polar solutes which does not appear specific to the precise chemical identify of the NMF compounds.
Collapse
|
37
|
Immune Subtypes Based on Immune-Related lncRNA: Differential Prognostic Mechanism of Pancreatic Cancer. Front Cell Dev Biol 2021; 9:698296. [PMID: 34307375 PMCID: PMC8292792 DOI: 10.3389/fcell.2021.698296] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/21/2021] [Accepted: 06/17/2021] [Indexed: 01/05/2023] Open
Abstract
Pancreatic cancer consists one of tumors with the highest degree of malignancy and the worst prognosis. To date, immunotherapy has become an effective means to improve the prognosis of patients with pancreatic cancer. Long non-coding RNAs (lncRNAs) have also been associated with the immune response. However, the role of immune-related lncRNAs in the immune response of pancreatic cancer remains unclear. In this study, we identified immune-related lncRNA pairs through a new combinatorial algorithm, and then clustered and deeply analyzed the immune characteristics and functional differences between subtypes. Subsequently, the prognostic model of 3 candidate lncRNA pairs was determined by multivariate COX analysis. The results showed significant prognostic differences between the C1 and C2 subtypes, which may be due to the differential infiltration of CTL and NK cells and the activation of tumor-related pathways. The prognostic model of the 3 lncRNA pairs (AC244035.1_vs._AC063926.1, AC066612.1_vs._AC090124.1, and AC244035.1_vs._LINC01885) was established, which exhibits stable and effective prognostic prediction performance. These 3 lncRNA pairs may regulate the anti-tumor effect of immune cells through ion channel pathways. In conclusion, our research demonstrated the panoramic differences in immune characteristics between subtypes and stable prognostic models, and identified new potential targets for immunotherapy.
Collapse
|
38
|
Effects of petrolatum, a petrolatum depositing body wash and a regular body wash on biomarkers and biophysical properties of the stratum corneum. Int J Cosmet Sci 2021; 43:218-224. [PMID: 33336384 DOI: 10.1111/ics.12684] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2020] [Revised: 12/08/2020] [Accepted: 12/15/2020] [Indexed: 11/27/2022]
Abstract
BACKGROUND An important trend in the personal care industry involves the development of body wash products that not only clean the skin without damage but deposit conditioning ingredients to improve skin barrier function. OBJECTIVE The objective of this study was to develop skin biomarker measures to quantify the treatment effects of body wash products. METHODS We employed analysis of structural proteins (keratin 1,10,11 and involucrin), a natural moisturizing factor (pyrrolidone carboxylic acid) and an inflammatory mediator (IL-1ra/IL-1α) from adhesive discs with dry skin grading, TEWL and capacitance measurements to compare the effects of direct application of petrolatum, a high petrolatum depositing body wash, and a regular body wash on dry leg skin in a standard leg-wash treatment protocol. RESULTS High depositing body wash and petrolatum had positive effects on stratum corneum barrier function as judged by biomarker analysis, biophysical measurements and skin grading compared to the regular body wash product. CONCLUSIONS The results clearly indicate that a combination of biomarker and biophysical property measurements is effective for determining the skin benefits of moisturizing body wash products.
Collapse
|
39
|
Detection of Hate Speech in COVID-19-Related Tweets in the Arab Region: Deep Learning and Topic Modeling Approach. J Med Internet Res 2020; 22:e22609. [PMID: 33207310 PMCID: PMC7725497 DOI: 10.2196/22609] [Citation(s) in RCA: 19] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2020] [Revised: 10/10/2020] [Accepted: 11/16/2020] [Indexed: 01/20/2023] Open
Abstract
Background The massive scale of social media platforms requires an automatic solution for detecting hate speech. These automatic solutions will help reduce the need for manual analysis of content. Most previous literature has cast the hate speech detection problem as a supervised text classification task using classical machine learning methods or, more recently, deep learning methods. However, work investigating this problem in Arabic cyberspace is still limited compared to the published work on English text. Objective This study aims to identify hate speech related to the COVID-19 pandemic posted by Twitter users in the Arab region and to discover the main issues discussed in tweets containing hate speech. Methods We used the ArCOV-19 dataset, an ongoing collection of Arabic tweets related to COVID-19, starting from January 27, 2020. Tweets were analyzed for hate speech using a pretrained convolutional neural network (CNN) model; each tweet was given a score between 0 and 1, with 1 being the most hateful text. We also used nonnegative matrix factorization to discover the main issues and topics discussed in hate tweets. Results The analysis of hate speech in Twitter data in the Arab region identified that the number of non–hate tweets greatly exceeded the number of hate tweets, where the percentage of hate tweets among COVID-19 related tweets was 3.2% (11,743/547,554). The analysis also revealed that the majority of hate tweets (8385/11,743, 71.4%) contained a low level of hate based on the score provided by the CNN. This study identified Saudi Arabia as the Arab country from which the most COVID-19 hate tweets originated during the pandemic. Furthermore, we showed that the largest number of hate tweets appeared during the time period of March 1-30, 2020, representing 51.9% of all hate tweets (6095/11,743). Contrary to what was anticipated, in the Arab region, it was found that the spread of COVID-19–related hate speech on Twitter was weakly related with the dissemination of the pandemic based on the Pearson correlation coefficient (r=0.1982, P=.50). The study also identified the commonly discussed topics in hate tweets during the pandemic. Analysis of the 7 extracted topics showed that 6 of the 7 identified topics were related to hate speech against China and Iran. Arab users also discussed topics related to political conflicts in the Arab region during the COVID-19 pandemic. Conclusions The COVID-19 pandemic poses serious public health challenges to nations worldwide. During the COVID-19 pandemic, frequent use of social media can contribute to the spread of hate speech. Hate speech on the web can have a negative impact on society, and hate speech may have a direct correlation with real hate crimes, which increases the threat associated with being targeted by hate speech and abusive language. This study is the first to analyze hate speech in the context of Arabic COVID-19–related tweets in the Arab region.
Collapse
|
40
|
Genetic-Based Hypertension Subtype Identification Using Informative SNPs. Genes (Basel) 2020; 11:genes11111265. [PMID: 33121163 PMCID: PMC7693873 DOI: 10.3390/genes11111265] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2020] [Revised: 09/29/2020] [Accepted: 10/05/2020] [Indexed: 12/22/2022] Open
Abstract
In this work, we proposed a process to select informative genetic variants for identifying clinically meaningful subtypes of hypertensive patients. We studied 575 African American (AA) and 612 Caucasian hypertensive participants enrolled in the Hypertension Genetic Epidemiology Network (HyperGEN) study and analyzed each race-based group separately. All study participants underwent GWAS (Genome-Wide Association Studies) and echocardiography. We applied a variety of statistical methods and filtering criteria, including generalized linear models, F statistics, burden tests, deleterious variant filtering, and others to select the most informative hypertension-related genetic variants. We performed an unsupervised learning algorithm non-negative matrix factorization (NMF) to identify hypertension subtypes with similar genetic characteristics. Kruskal–Wallis tests were used to demonstrate the clinical meaningfulness of genetic-based hypertension subtypes. Two subgroups were identified for both African American and Caucasian HyperGEN participants. In both AAs and Caucasians, indices of cardiac mechanics differed significantly by hypertension subtypes. African Americans tend to have more genetic variants compared to Caucasians; therefore, using genetic information to distinguish the disease subtypes for this group of people is relatively challenging, but we were able to identify two subtypes whose cardiac mechanics have statistically different distributions using the proposed process. The research gives a promising direction in using statistical methods to select genetic information and identify subgroups of diseases, which may inform the development and trial of novel targeted therapies.
Collapse
|
41
|
Decomposing Cell Identity for Transfer Learning across Cellular Measurements, Platforms, Tissues, and Species. Cell Syst 2020; 8:395-411.e8. [PMID: 31121116 DOI: 10.1016/j.cels.2019.04.004] [Citation(s) in RCA: 80] [Impact Index Per Article: 20.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2018] [Revised: 01/24/2019] [Accepted: 04/17/2019] [Indexed: 02/07/2023]
Abstract
Analysis of gene expression in single cells allows for decomposition of cellular states as low-dimensional latent spaces. However, the interpretation and validation of these spaces remains a challenge. Here, we present scCoGAPS, which defines latent spaces from a source single-cell RNA-sequencing (scRNA-seq) dataset, and projectR, which evaluates these latent spaces in independent target datasets via transfer learning. Application of developing mouse retina to scRNA-Seq reveals intrinsic relationships across biological contexts and assays while avoiding batch effects and other technical features. We compare the dimensions learned in this source dataset to adult mouse retina, a time-course of human retinal development, select scRNA-seq datasets from developing brain, chromatin accessibility data, and a murine-cell type atlas to identify shared biological features. These tools lay the groundwork for exploratory analysis of scRNA-seq data via latent space representations, enabling a shift in how we compare and identify cells beyond reliance on marker genes or ensemble molecular identity.
Collapse
|
42
|
Muscle Synergy of the Underwater Undulatory Swimming in Elite Male Swimmers. Front Sports Act Living 2020; 2:62. [PMID: 33345053 PMCID: PMC7739797 DOI: 10.3389/fspor.2020.00062] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2020] [Accepted: 05/04/2020] [Indexed: 11/23/2022] Open
Abstract
Improving the performance of underwater undulatory swimming (UUS) improves swimming time, so it is important to identify the pattern of muscle coordination in swimmers with fast UUS. This study aimed to identify muscular coordination in the trunk and lower limb during UUS in elite swimmers. Nine swimmers (aged 20 ± 2 years; height, 1.74 ± 0.03 m; weight, 73.0 ± 4.4 kg) participated in this study. Measurements were taken by electromyography of eight muscles: rectus abdominis (RA), internal abdominal muscle (IO), rectus femoris (RF), erector spinae (ES), multifidus (MF), tibialis anterior (TA), and thigh biceps (BF), and gastrocnemius (GS). For evaluation of muscle coordination, “muscle synergy” and “activation coefficient” were calculated using non-negative matrix factorization from electromyographic data. Kick frequency, kick amplitude, swim velocity, and kinematics of the pelvis were also calculated. Kick cycle was divided into two kick phases: downward kick (from the highest toe vertical coordinate to the lowest point) and upward kick (from the lowest point to the highest point). Kick frequency, kick amplitude, and swimming velocity were 1.9 ± 0.3 Hz, 0.45 ± 0.6 m, and 1.8 ± 0.2 m·s −1, respectively. The maximum backward pelvic tilt was 94.4 ± 4.5° and the minimum (forward) was 90.8 ± 5.7°. Three muscle synergy values were extracted from each swimmer during UUS: those involved in the transition from upward kick to downward kick (Synergy 1), downward kick (Synergy 2), and upward kick (Synergy 3). Synergy 1 involved mainly the RF, IO, and RA, which were activated during the turn from the upward to the downward phase. Synergy 2 involved mainly the MF, ES, and TA in the downward kick. Synergy 3 corresponded to the coordination of the BF and GS, which were active in the upward kick. In UUS by elite swimmers, both the upward kick and downward kick followed the trunk muscles involved in the pelvic forward–backward tilt movement, and lower limb muscles were activated. Muscle coordination based on pelvic forward-backward tilt during UUS is expected to contribute to the coaching field for elite swimmer development.
Collapse
|
43
|
Monitor Ionizing Radiation-Induced Cellular Responses with Raman Spectroscopy, Non-Negative Matrix Factorization, and Non-Negative Least Squares. APPLIED SPECTROSCOPY 2020; 74:701-711. [PMID: 32098482 DOI: 10.1177/0003702820906221] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Radiation therapy (RT) is one of the most commonly prescribed cancer treatments. New tools that can accurately monitor and evaluate individual patient responses would be a major advantage and lend to the implementation of personalized treatment plans. In this study, Raman spectroscopy (RS) was applied to examine radiation-induced cellular responses in H460, MCF7, and LNCaP cancer cell lines across different dose levels and times post-irradiation. Previous Raman data analysis was conducted using principal component analysis (PCA), which showed the ability to extract biological information of glycogen. In the current studies, the use of non-negative matrix factorization (NMF) allowed for the discovery of multiplexed biological information, specifically uncovering glycogen-like and lipid-like component bases. The corresponding scores of glycogen and previously unidentified lipids revealed the content variations of these two chemicals in the cellular data. The NMF decomposed glycogen and lipid-like bases were able to separate the cancer cell lines into radiosensitive and radioresistant groups. A further lipid phenotype investigation was also attempted by applying non-negative least squares (NNLS) to the lipid-like bases decomposed individually from three cell lines. Qualitative differences found in lipid weights for each lipid-like basis suggest the lipid phenotype differences in the three tested cancer cell lines. Collectively, this study demonstrates that the application of NMF and NNLS on RS data analysis to monitor ionizing radiation-induced cellular responses can yield multiplexed biological information on bio-response to RT not revealed by conventional chemometric approaches.
Collapse
|
44
|
An NMF-Based Methodology for Selecting Biomarkers in the Landscape of Genes of Heterogeneous Cancer-Associated Fibroblast Populations. Bioinform Biol Insights 2020; 14:1177932220906827. [PMID: 32425511 PMCID: PMC7218276 DOI: 10.1177/1177932220906827] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2019] [Accepted: 01/22/2020] [Indexed: 01/27/2023] Open
Abstract
The rapid development of high-performance technologies has greatly promoted studies of molecular oncology producing large amounts of data. Even if these data are publicly available, they need to be processed and studied to extract information useful to better understand mechanisms of pathogenesis of complex diseases, such as tumors. In this article, we illustrated a procedure for mining biologically meaningful biomarkers from microarray datasets of different tumor histotypes. The proposed methodology allows to automatically identify a subset of potentially informative genes from microarray data matrices, which differs either in the number of rows (genes) and of columns (patients). The methodology integrates nonnegative matrix factorization method, a functional enrichment analysis web tool with a properly designed gene extraction procedure to allow the analysis of omics input data with different row size. The proposed methodology has been used to mine microarray of solid tumors of different embryonic origin to verify the presence of common genes characterizing the heterogeneity of cancer-associated fibroblasts. These automatically extracted biomarkers could be used to suggest appropriate therapies to inactivate the state of active fibroblasts, thus avoiding their action on tumor progression.
Collapse
|
45
|
Automatic microscopic image analysis by moving window local Fourier Transform and Machine Learning. Micron 2019; 130:102800. [PMID: 31855656 DOI: 10.1016/j.micron.2019.102800] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2019] [Revised: 12/09/2019] [Accepted: 12/10/2019] [Indexed: 11/25/2022]
Abstract
Analysis of microscope images is a tedious work which requires patience and time, usually done manually by the microscopist after data collection. The results obtained in such a way might be biased by the human who performed the analysis. Here we introduce an approach of automatic image analysis, which is based on locally applied Fourier Transform and Machine Learning methods. In this approach, a whole image is scanned by a local moving window with defined size and the 2D Fourier Transform is calculated for each window. Then, all the Local Fourier Transforms are fed into Machine Learning processing. Firstly, a number of components in the data is estimated from Principal Component Analysis (PCA) Scree Plot performed on the data. Secondly, the data are decomposed blindly by Non-Negative Matrix Factorization (NMF) into interpretable spatial maps (loadings) and corresponding Fourier Transforms (factors). As a result, the microscopic image is analyzed and the features on the image are automatically discovered, based on the local changes in Fourier Transform, without human bias. The user selects only a size and movement of the scanning local window which defines the final analysis resolution. This automatic approach was successfully applied to analysis of various microscopic images with and without local periodicity i.e. atomically resolved High Angle Annular Dark Field (HAADF) Scanning Transmission Electron Microscopy (STEM) image of Au nanoisland of fcc and Au hcp phases, Scanning Tunneling Microscopy (STM) image of Au-induced reconstruction on Ge(001) surface, Scanning Electron Microscopy (SEM) image of metallic nanoclusters grown on GaSb surface, and Fluorescence microscopy image of HeLa cell line of cervical cancer. The proposed approach could be used to automatically analyze the local structure of microscopic images within a time of about a minute for a single image on a modern desktop/notebook computer and it is freely available as a Python analysis notebook and Python program for batch processing.
Collapse
|
46
|
Approaches for identifying PM 2.5 source types and source areas at a remote background site of South China in spring. THE SCIENCE OF THE TOTAL ENVIRONMENT 2019; 691:1320-1327. [PMID: 31466211 DOI: 10.1016/j.scitotenv.2019.07.178] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/28/2019] [Revised: 07/11/2019] [Accepted: 07/12/2019] [Indexed: 06/10/2023]
Abstract
The receptor model is an effectively and widely used tool for analyzing the source of PM2.5, and its development and improvement have always been focused and challenged. In this study, approaches of source analysis is applied and compared. The PM2.5 samples were collected in spring of 2015 at a remote background site of Weizhou, South China and were analyzed for water-soluble ions, trace metals, and sugars. The 28 measurement species were introduced into the positive matrix factorization (PMF) and a non-negative matrix factorization (NMF) model for inter-comparison of PM2.5 prediction. Results showed that the NMF model is a more robust tool to identify source types and source apportionment in the case of a small sample size (n = 31). In NMF, four source variants were obtained as dust (15.6%), biomass combustion (11.8%), secondary formation (17.6%), and coal combustion (54.9%), corresponding to four main source areas. These were Southeast Asia, South China Sea, Taiwan Strait, as well as Pearl River Delta, respectively. The areas were distinguished based on hybrid receptor models, potential source contribution function (PSCF) and concentration weighted trajectory (CWT), by introducing the daily loadings of each source factor from NMF method. These model results were highly consistent with categorized chemical characteristics of PM2.5, suggesting that NMF linking with hybrid receptor models provides valuable implications for exploring source types and source areas of PM2.5. Meanwhile, biomass combustion and coal combustion comparably contributed to the high PM2.5 concentrations indicating control strategy in South China in spring.
Collapse
|
47
|
Supervised cross-fusion method: a new triplet approach to fuse thermal, radar, and optical satellite data for land use classification. ENVIRONMENTAL MONITORING AND ASSESSMENT 2019; 191:481. [PMID: 31273539 DOI: 10.1007/s10661-019-7621-y] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/16/2019] [Accepted: 06/25/2019] [Indexed: 06/09/2023]
Abstract
This study presents a new fusion method namely supervised cross-fusion method to improve the capability of fused thermal, radar, and optical images for classification. The proposed cross-fusion method is a combination of pixel-based and supervised feature-based fusion of thermal, radar, and optical data. The pixel-based fusion was applied to fuse optical data of Sentinel-2 and Landsat 8. According to correlation coefficient (CR) and signal to noise ratio (SNR), among the used pixel-based fusion methods, wavelet obtained the best results for fusion. Considering spectral and spatial information preservation, CR of the wavelet method is 0.97 and 0.96, respectively. The supervised feature-based fusion method is a fusion of best output of pixel-based fusion level, land surface temperature (LST) data, and Sentinel-1 radar image using a supervised approach. The supervised approach is a supervised feature selection and learning of the inputs based on linear discriminant analysis and sparse regularization (LDASR) algorithm. In the present study, the non-negative matrix factorization (NMF) was utilized for feature extraction. A comparison of the obtained results with state of the art fusion method indicated a higher accuracy of our proposed method of classification. The rotation forest (RoF) classification results improvement was 25% and the support vector machine (SVM) results improvement was 31%. The results showed that the proposed method is well classified and separated four main classes of settlements, barren land, river, river bank, and even the bridges over the river. Also, a number of unclassified pixels by SVM are very low compared to other classification methods and can be neglected. The study results showed that LST calculated using thermal data has had positive effects on improving the classification results. By comparing the results of supervised cross-fusion without using LST data to the proposed method results, SVM and RoF classifiers showed 38% and 7% of classification improvement, respectively.
Collapse
|
48
|
Orthogonal joint sparse NMF for microarray data analysis. J Math Biol 2019; 79:223-247. [PMID: 31004215 DOI: 10.1007/s00285-019-01355-2] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2018] [Revised: 03/29/2019] [Indexed: 12/20/2022]
Abstract
The 3D microarrays, generally known as gene-sample-time microarrays, couple the information on different time points collected by 2D microarrays that measure gene expression levels among different samples. Their analysis is useful in several biomedical applications, like monitoring dose or drug treatment responses of patients over time in pharmacogenomics studies. Many statistical and data analysis tools have been used to extract useful information. In particular, nonnegative matrix factorization (NMF), with its natural nonnegativity constraints, has demonstrated its ability to extract from 2D microarrays relevant information on specific genes involved in the particular biological process. In this paper, we propose a new NMF model, namely Orthogonal Joint Sparse NMF, to extract relevant information from 3D microarrays containing the time evolution of a 2D microarray, by adding additional constraints to enforce important biological proprieties useful for further biological analysis. We develop multiplicative updates rules that decrease the objective function monotonically, and compare our approach to state-of-the-art NMF algorithms on both synthetic and real data sets.
Collapse
|
49
|
Structured Sparse Spectral Transforms and Structural Measures for Voice Conversion. IEEE/ACM TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING 2018; 26:2267-2276. [PMID: 31984214 PMCID: PMC6980218 DOI: 10.1109/taslp.2018.2860682] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
We investigate a structured sparse spectral transform method for voice conversion (VC) to perform frequency warping and spectral shaping simultaneously on high-dimensional (D) STRAIGHT spectra. Learning a large transform matrix for high-D data often results in an overfit matrix with low sparsity, which leads to muffled speech in VC. We address this problem by using the frequency-warping characteristic of a source-target speaker pair to define a region of support (ROS) in a transform matrix, and further optimize it by nonnegative matrix factorization (NMF) to obtain structured sparse transform. We also investigate structural measures of spectral and temporal covariance and variance at different scales for assessing VC speech quality. Our experiments on ARCTIC dataset of 12 speaker pairs show that embedding the ROS in spectral transforms offers flexibility in tradeoffs between spectral distortion and structure preservation, and the structural measures provide quantitatively reasonable results on converted speech. Our subjective listening tests show that the proposed VC method achieves a mean opinion score of "very good" relative to natural speech, and in comparison with three other VC methods, it is the most preferred one in naturalness and in voice similarity to target speakers.
Collapse
|
50
|
Exploiting MEDLINE for gene molecular function prediction via NMF based multi-label classification. J Biomed Inform 2018; 86:160-166. [PMID: 30130573 DOI: 10.1016/j.jbi.2018.08.009] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2017] [Revised: 08/13/2018] [Accepted: 08/17/2018] [Indexed: 11/25/2022]
Abstract
Gene ontology (GO) provides a representation of terms and categories used to describe genes and their molecular functions, cellular components and biological processes. GO has been the standard for describing the functions of specific genes in different model organisms. GO annotation, or the tagging of genes with GO terms, has mostly been a manual and time-consuming curation process. Although many automated approaches have been proposed for annotation, few have utilized knowledge available in the literature. In this manuscript, we describe the development and evaluation of an innovative predictive system to automatically assign molecular functions (GO terms) to genes using the biomedical literature. Because genes could be associated with multiple molecular functions, we posed the GO molecular function annotation as a multi-label classification problem with several classes. We used non-negative matrix factorization (NMF) for feature reduction and then classified the genes. To address the multi-label aspect of the data, we used the binary-relevance method. Although we experimented with several classifiers, the combination of binary-relevance and K-nearest neighbor (KNN) classifier performed best. Our evaluation on UniProtKB/Swiss-Prot dataset showed the best performance of 0.84 in terms of F1-measure.
Collapse
|