1
|
Prasad V, van Sloun RJG, Vilanova A, Pezzotti N. ProactiV: Studying Deep Learning Model Behavior Under Input Transformations. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2024; 30:5651-5665. [PMID: 37535493 DOI: 10.1109/tvcg.2023.3301722] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 08/05/2023]
Abstract
Deep learning (DL) models have shown performance benefits across many applications, from classification to image-to-image translation. However, low interpretability often leads to unexpected model behavior once deployed in the real world. Usually, this unexpected behavior is because the training data domain does not reflect the deployment data domain. Identifying a model's breaking points under input conditions and domain shifts, i.e., input transformations, is essential to improve models. Although visual analytics (VA) has shown promise in studying the behavior of model outputs under continually varying inputs, existing methods mainly focus on per-class or instance-level analysis. We aim to generalize beyond classification where classes do not exist and provide a global view of model behavior under co-occurring input transformations. We present a DL model-agnostic VA method (ProactiV) to help model developers proactively study output behavior under input transformations to identify and verify breaking points. ProactiV relies on a proposed input optimization method to determine the changes to a given transformed input to achieve the desired output. The data from this optimization process allows the study of global and local model behavior under input transformations at scale. Additionally, the optimization method provides insights into the input characteristics that result in desired outputs and helps recognize model biases. We highlight how ProactiV effectively supports studying model behavior with example classification and image-to-image translation tasks.
Collapse
|
2
|
Zhong F, Xue M, Zhang J, Zhang F, Ban R, Deussen O, Wang Y. Force-Directed Graph Layouts Revisited: A New Force Based on the T-Distribution. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2024; 30:3650-3663. [PMID: 37021999 DOI: 10.1109/tvcg.2023.3238821] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/19/2023]
Abstract
In this article, we propose the t-FDP model, a force-directed placement method based on a novel bounded short-range force (t-force) defined by Student's t-distribution. Our formulation is flexible, exerts limited repulsive forces for nearby nodes and can be adapted separately in its short- and long-range effects. Using such forces in force-directed graph layouts yields better neighborhood preservation than current methods, while maintaining low stress errors. Our efficient implementation using a Fast Fourier Transform is one order of magnitude faster than state-of-the-art methods and two orders faster on the GPU, enabling us to perform parameter tuning by globally and locally adjusting the t-force in real-time for complex graphs. We demonstrate the quality of our approach by numerical evaluation against state-of-the-art approaches and extensions for interactive exploration.
Collapse
|
3
|
Liu W, Tang JW, Mou JY, Lyu JW, Di YW, Liao YL, Luo YF, Li ZK, Wu X, Wang L. Rapid discrimination of Shigella spp. and Escherichia coli via label-free surface enhanced Raman spectroscopy coupled with machine learning algorithms. Front Microbiol 2023; 14:1101357. [PMID: 36970678 PMCID: PMC10030586 DOI: 10.3389/fmicb.2023.1101357] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2022] [Accepted: 02/20/2023] [Indexed: 03/11/2023] Open
Abstract
Shigella and enterotoxigenic Escherichia coli (ETEC) are major bacterial pathogens of diarrheal disease that is the second leading cause of childhood mortality globally. Currently, it is well known that Shigella spp., and E. coli are very closely related with many common characteristics. Evolutionarily speaking, Shigella spp., are positioned within the phylogenetic tree of E. coli. Therefore, discrimination of Shigella spp., from E. coli is very difficult. Many methods have been developed with the aim of differentiating the two species, which include but not limited to biochemical tests, nucleic acids amplification, and mass spectrometry, etc. However, these methods suffer from high false positive rates and complicated operation procedures, which requires the development of novel methods for accurate and rapid identification of Shigella spp., and E. coli. As a low-cost and non-invasive method, surface enhanced Raman spectroscopy (SERS) is currently under intensive study for its diagnostic potential in bacterial pathogens, which is worthy of further investigation for its application in bacterial discrimination. In this study, we focused on clinically isolated E. coli strains and Shigella species (spp.), that is, S. dysenteriae, S. boydii, S. flexneri, and S. sonnei, based on which SERS spectra were generated and characteristic peaks for Shigella spp., and E. coli were identified, revealing unique molecular components in the two bacterial groups. Further comparative analysis of machine learning algorithms showed that, the Convolutional Neural Network (CNN) achieved the best performance and robustness in bacterial discrimination capacity when compared with Random Forest (RF) and Support Vector Machine (SVM) algorithms. Taken together, this study confirmed that SERS paired with machine learning could achieve high accuracy in discriminating Shigella spp., from E. coli, which facilitated its application potential for diarrheal prevention and control in clinical settings.
Collapse
Affiliation(s)
- Wei Liu
- School of Medical Informatics and Engineering, Xuzhou Medical University, Xuzhou, Jiangsu, China
| | - Jia-Wei Tang
- School of Medical Informatics and Engineering, Xuzhou Medical University, Xuzhou, Jiangsu, China
- Laboratory Medicine, Guangdong Provincial People’s Hospital, Guangdong Academy of Medical Sciences, Southern Medical University, Guangzhou, Guangdong, China
| | - Jing-Yi Mou
- The First School of Clinical Medicine, Xuzhou Medical University, Xuzhou, Jiangsu, China
| | - Jing-Wen Lyu
- Laboratory Medicine, Guangdong Provincial People’s Hospital, Guangdong Academy of Medical Sciences, Southern Medical University, Guangzhou, Guangdong, China
| | - Yu-Wei Di
- Laboratory Medicine, Guangdong Provincial People’s Hospital, Guangdong Academy of Medical Sciences, Southern Medical University, Guangzhou, Guangdong, China
| | - Ya-Long Liao
- Laboratory Medicine, Guangdong Provincial People’s Hospital, Guangdong Academy of Medical Sciences, Southern Medical University, Guangzhou, Guangdong, China
| | - Yan-Fei Luo
- Laboratory Medicine, Guangdong Provincial People’s Hospital, Guangdong Academy of Medical Sciences, Southern Medical University, Guangzhou, Guangdong, China
| | - Zheng-Kang Li
- Laboratory Medicine, Guangdong Provincial People’s Hospital, Guangdong Academy of Medical Sciences, Southern Medical University, Guangzhou, Guangdong, China
- *Correspondence: Zheng-Kang Li,
| | - Xiang Wu
- School of Medical Informatics and Engineering, Xuzhou Medical University, Xuzhou, Jiangsu, China
- Xiang Wu,
| | - Liang Wang
- Laboratory Medicine, Guangdong Provincial People’s Hospital, Guangdong Academy of Medical Sciences, Southern Medical University, Guangzhou, Guangdong, China
- Liang Wang,
| |
Collapse
|
4
|
Warchol S, Krueger R, Nirmal AJ, Gaglia G, Jessup J, Ritch CC, Hoffer J, Muhlich J, Burger ML, Jacks T, Santagata S, Sorger PK, Pfister H. Visinity: Visual Spatial Neighborhood Analysis for Multiplexed Tissue Imaging Data. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2023; 29:106-116. [PMID: 36170403 PMCID: PMC10043053 DOI: 10.1109/tvcg.2022.3209378] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/04/2023]
Abstract
New highly-multiplexed imaging technologies have enabled the study of tissues in unprecedented detail. These methods are increasingly being applied to understand how cancer cells and immune response change during tumor development, progression, and metastasis, as well as following treatment. Yet, existing analysis approaches focus on investigating small tissue samples on a per-cell basis, not taking into account the spatial proximity of cells, which indicates cell-cell interaction and specific biological processes in the larger cancer microenvironment. We present Visinity, a scalable visual analytics system to analyze cell interaction patterns across cohorts of whole-slide multiplexed tissue images. Our approach is based on a fast regional neighborhood computation, leveraging unsupervised learning to quantify, compare, and group cells by their surrounding cellular neighborhood. These neighborhoods can be visually analyzed in an exploratory and confirmatory workflow. Users can explore spatial patterns present across tissues through a scalable image viewer and coordinated views highlighting the neighborhood composition and spatial arrangements of cells. To verify or refine existing hypotheses, users can query for specific patterns to determine their presence and statistical significance. Findings can be interactively annotated, ranked, and compared in the form of small multiples. In two case studies with biomedical experts, we demonstrate that Visinity can identify common biological processes within a human tonsil and uncover novel white-blood cell networks and immune-tumor interactions.
Collapse
|
5
|
Goh HA, Ho CK, Abas FS. Front-end deep learning web apps development and deployment: a review. APPL INTELL 2022; 53:15923-15945. [PMID: 36466774 PMCID: PMC9709375 DOI: 10.1007/s10489-022-04278-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 10/17/2022] [Indexed: 12/03/2022]
Abstract
Machine learning and deep learning models are commonly developed using programming languages such as Python, C++, or R and deployed as web apps delivered from a back-end server or as mobile apps installed from an app store. However, recently front-end technologies and JavaScript libraries, such as TensorFlow.js, have been introduced to make machine learning more accessible to researchers and end-users. Using JavaScript, TensorFlow.js can define, train, and run new or existing, pre-trained machine learning models entirely in the browser from the client-side, which improves the user experience through interaction while preserving privacy. Deep learning models deployed on front-end browsers must be small, have fast inference, and ideally be interactive in real-time. Therefore, the emphasis on development and deployment is different. This paper aims to review the development and deployment of these deep-learning web apps to raise awareness of the recent advancements and encourage more researchers to take advantage of this technology for their own work. First, the rationale behind the deployment stack (front-end, JavaScript, and TensorFlow.js) is discussed. Then, the development approach for obtaining deep learning models that are optimized and suitable for front-end deployment is then described. The article also provides current web applications divided into seven categories to show deep learning potential on the front end. These include web apps for deep learning playground, pose detection and gesture tracking, music and art creation, expression detection and facial recognition, video segmentation, image and signal analysis, healthcare diagnosis, recognition, and identification.
Collapse
Affiliation(s)
- Hock-Ann Goh
- Faculty of Engineering and Technology, Multimedia University, Jalan Ayer Keroh Lama, Bukit Beruang, 75450 Melaka Malaysia
| | - Chin-Kuan Ho
- Asia Pacific University of Technology and Innovation, Jalan Teknologi 5, Technology Park Malaysia, 57000 Kuala Lumpur, Malaysia
| | - Fazly Salleh Abas
- Faculty of Engineering and Technology, Multimedia University, Jalan Ayer Keroh Lama, Bukit Beruang, 75450 Melaka Malaysia
| |
Collapse
|
6
|
Cai Y, He Q, Liu W, Liang Q, Peng B, Li J, Zhang W, Kang F, Hong Q, Yan Y, Peng J, Xu Z, Bai N. Comprehensive analysis of the potential cuproptosis-related biomarker LIAS that regulates prognosis and immunotherapy of pan-cancers. Front Oncol 2022; 12:952129. [PMID: 35982953 PMCID: PMC9379260 DOI: 10.3389/fonc.2022.952129] [Citation(s) in RCA: 43] [Impact Index Per Article: 14.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2022] [Accepted: 07/11/2022] [Indexed: 12/14/2022] Open
Abstract
Lipoic acid synthetase (LIAS) has been demonstrated to play a crucial role in the progression of cancer. Exploring the underlying mechanisms and biological functions of LIAS could have potential therapeutic guidance for cancer treatment. Our study has explored the expression levels and prognostic values of LIAS in pan-cancer through several bioinformatics platforms, including TIMER2.0, Gene Expression Profiling Interactive Analysis, version 2 (GEPIA2.0), and Human Protein Atlas (HPA). We found that a high LIAS expression was related to the good prognosis in patients with kidney renal clear cell carcinoma (KIRC), rectum adenocarcinoma (READ), breast cancer, and ovarian cancer. Inversely, a high LIAS expression showed unfavorable prognosis in lung cancer patients. In addition, the genetic alteration, methylation levels, and immune analysis of LIAS in pan-cancer have been evaluated. To elucidate the underlying molecular mechanism of LIAS, we conduct the single-cell sequencing to implicate that LIAS expression was related to hypoxia, angiogenesis, and DNA repair. Thus, these comprehensive pan-cancer analyses have conveyed that LIAS could be potentially significant in the progression of various cancers. Moreover, the LIAS expression could predict the efficacy of immunotherapy in cancer patients.
Collapse
Affiliation(s)
- Yuan Cai
- Department of Pathology, Xiangya Hospital, Central South University, Changsha, China
- Department of Pathology, Xiangya Changde Hospital, Changde, China
| | - Qingchun He
- Department of Emergency, Xiangya Hospital, Central South University, Changsha, China
- Department of Emergency, Xiangya Changde Hospital, Changde, China
| | - Wei Liu
- Department of Orthopedic Surgery, The Second Hospital University of South China, Hengyang, China
| | - Qiuju Liang
- Department of Pharmacy, Xiangya Hospital, Central South University, Changsha, China
| | - Bi Peng
- Department of Pathology, Xiangya Hospital, Central South University, Changsha, China
| | - Jianbo Li
- Department of Pathology, Xiangya Changde Hospital, Changde, China
| | - Wenqin Zhang
- Department of Pathology, Xiangya Changde Hospital, Changde, China
| | - Fanhua Kang
- Department of Pathology, Xiangya Changde Hospital, Changde, China
| | - Qianhui Hong
- Department of Pathology, Xiangya Changde Hospital, Changde, China
| | - Yuanliang Yan
- Department of Pharmacy, Xiangya Hospital, Central South University, Changsha, China
| | - Jinwu Peng
- Department of Pathology, Xiangya Hospital, Central South University, Changsha, China
- Department of Pathology, Xiangya Changde Hospital, Changde, China
- National Clinical Research Center for Geriatric Disorders, Xiangya Hospital, Central South University, Changsha, China
| | - Zhijie Xu
- Department of Pathology, Xiangya Hospital, Central South University, Changsha, China
- Department of Pathology, Xiangya Changde Hospital, Changde, China
- National Clinical Research Center for Geriatric Disorders, Xiangya Hospital, Central South University, Changsha, China
| | - Ning Bai
- Department of General Surgery, Xiangya Hospital, Central South University, Changsha, China
| |
Collapse
|
7
|
Research on E-Commerce Database Marketing Based on Machine Learning Algorithm. COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE 2022; 2022:7973446. [PMID: 35814538 PMCID: PMC9259266 DOI: 10.1155/2022/7973446] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/21/2022] [Accepted: 05/16/2022] [Indexed: 11/18/2022]
Abstract
From simple commercial relations to complex online transactions at this stage, it not only highlights the progress of science and technology, but also indirectly explains the diversified evolution of marketing methods and means. In marketing, database marketing has been favored by more marketers with its low cost and high efficiency and has become the “rookie” in marketing in recent years. However, as a kind of prediction and ferry, database marketing tends to be applied after simple data analysis in unpredictable market and in practice. In contrast, database marketing combined with machine learning algorithms has always been a depression in the marketing field. Therefore, this paper takes e-commerce as the research object and carries out database marketing research based on machine learning algorithm from four stages: theoretical preparation, status analysis, model construction, and results application. Firstly, the connotation, advantages, and specific operation procedures of database marketing are discussed. At the same time, four excellent machine learning algorithms including logistic regression, random forest, support vector machine, and gradient boosted decision tree (GBDT) are selected to explain the basic principles and algorithm introduction, respectively, laying a theoretical foundation for the model training chapter. Secondly, it analyzes the current situation of e-commerce from the distribution of marketing objects, the proportion of marketing channels, and the composition of marketing methods and finds new marketing ideas based on the main problems existing at the present stage of database marketing using machine learning algorithm. Thirdly, on the premise of marketing ideas, data acquisition, data processing, and positive and negative sample setting. At the same time, four machine learning algorithms are used to combine features from the perspectives of consumers, stores, and the relationship between consumers and stores. Finally, by substituting the predicted sample into the model for testing, the crowd whose predicted score is between 80 and 99 is selected to be put into the market as the model predicted crowd, and it is proposed that e-commerce should mainly adopt the database marketing method of model prediction. On the one hand, machine learning algorithm can solve the problem of uneven distribution of marketing objects, and on the other hand, it can effectively prevent the loss of potential consumers. In addition, the application strategy of optimizing other database marketing methods and assisting model prediction to improve marketing effect is also put forward.
Collapse
|
8
|
Zhou Z, Zu X, Wang Y, Lelieveldt BPF, Tao Q. Deep Recursive Embedding for High-Dimensional Data. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2022; 28:1237-1248. [PMID: 34699363 DOI: 10.1109/tvcg.2021.3122388] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Embedding high-dimensional data onto a low-dimensional manifold is of both theoretical and practical value. In this article, we propose to combine deep neural networks (DNN) with mathematics-guided embedding rules for high-dimensional data embedding. We introduce a generic deep embedding network (DEN) framework, which is able to learn a parametric mapping from high-dimensional space to low-dimensional space, guided by well-established objectives such as Kullback-Leibler (KL) divergence minimization. We further propose a recursive strategy, called deep recursive embedding (DRE), to make use of the latent data representations for boosted embedding performance. We exemplify the flexibility of DRE by different architectures and loss functions, and benchmarked our method against the two most popular embedding methods, namely, t-distributed stochastic neighbor embedding (t-SNE) and uniform manifold approximation and projection (UMAP). The proposed DRE method can map out-of-sample data and scale to extremely large datasets. Experiments on a range of public datasets demonstrated improved embedding performance in terms of local and global structure preservation, compared with other state-of-the-art embedding methods. Code is available at https://github.com/tao-aimi/DeepRecursiveEmbedding.
Collapse
|
9
|
Koolstra K, Börnert P, Lelieveldt BPF, Webb A, Dzyubachyk O. Stochastic neighbor embedding as a tool for visualizing the encoding capability of magnetic resonance fingerprinting dictionaries. MAGNETIC RESONANCE MATERIALS IN PHYSICS BIOLOGY AND MEDICINE 2021; 35:223-234. [PMID: 34687369 PMCID: PMC8995272 DOI: 10.1007/s10334-021-00963-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/18/2021] [Revised: 09/08/2021] [Accepted: 09/23/2021] [Indexed: 11/28/2022]
Abstract
Objective To visualize the encoding capability of magnetic resonance fingerprinting (MRF) dictionaries. Materials and methods High-dimensional MRF dictionaries were simulated and embedded into a lower-dimensional space using t-distributed stochastic neighbor embedding (t-SNE). The embeddings were visualized via colors as a surrogate for location in low-dimensional space. First, we illustrate this technique on three different MRF sequences. We then compare the resulting embeddings and the color-coded dictionary maps to these obtained with a singular value decomposition (SVD) dimensionality reduction technique. We validate the t-SNE approach with measures based on existing quantitative measures of encoding capability using the Euclidean distance. Finally, we use t-SNE to visualize MRF sequences resulting from an MRF sequence optimization algorithm. Results t-SNE was able to show clear differences between the color-coded dictionary maps of three MRF sequences. SVD showed smaller differences between different sequences. These findings were confirmed by quantitative measures of encoding. t-SNE was also able to visualize differences in encoding capability between subsequent iterations of an MRF sequence optimization algorithm. Discussion This visualization approach enables comparison of the encoding capability of different MRF sequences. This technique can be used as a confirmation tool in MRF sequence optimization. Supplementary Information The online version contains supplementary material available at 10.1007/s10334-021-00963-8.
Collapse
Affiliation(s)
- Kirsten Koolstra
- Division of Image Processing, Department of Radiology, Leiden University Medical Center, Albinusdreef 2, 2333 ZA, Leiden, The Netherlands.
| | - Peter Börnert
- C. J. Gorter Center for High Field MRI, Department of Radiology, Leiden University Medical Center, Albinusdreef 2, 2333 ZA, Leiden, The Netherlands.,Philips Research Hamburg, Röntgenstrasse 24, 22335, Hamburg, Germany
| | - Boudewijn P F Lelieveldt
- Division of Image Processing, Department of Radiology, Leiden University Medical Center, Albinusdreef 2, 2333 ZA, Leiden, The Netherlands.,Intelligent Systems Department, Delft University of Technology, Mekelweg 4, 2628 CD, Delft, The Netherlands
| | - Andrew Webb
- C. J. Gorter Center for High Field MRI, Department of Radiology, Leiden University Medical Center, Albinusdreef 2, 2333 ZA, Leiden, The Netherlands
| | - Oleh Dzyubachyk
- Division of Image Processing, Department of Radiology, Leiden University Medical Center, Albinusdreef 2, 2333 ZA, Leiden, The Netherlands.,Electron Microscopy Facility, Department of Cell and Chemical Biology, Leiden University Medical Center, Albinusdreef 2, 2333 ZA, Leiden, The Netherlands
| |
Collapse
|
10
|
Kang B, Seok C, Lee J. MOLGENGO: Finding Novel Molecules with Desired Electronic Properties by Capitalizing on Their Global Optimization. ACS OMEGA 2021; 6:27454-27465. [PMID: 34693166 PMCID: PMC8529683 DOI: 10.1021/acsomega.1c04347] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/13/2021] [Accepted: 09/17/2021] [Indexed: 06/13/2023]
Abstract
The discovery of novel and favorable fluorophores is critical for understanding many chemical and biological studies. High-resolution biological imaging necessitates fluorophores with diverse colors and high quantum yields. The maximum oscillator strength and its corresponding absorption wavelength of a molecule are closely related to the quantum yields and the emission spectrum of fluorophores, respectively. Thus, the core step to design favorable fluorophore molecules is to optimize the desired electronic transition properties of molecules. Here, we present MOLGENGO, a new molecular property optimization algorithm, to discover novel and favorable fluorophores with machine learning and global optimization. This study reports novel molecules from MOLGENGO with high oscillator strength and absorption wavelength close to 200, 400, and 600 nm. The results of MOLGENGO simulations have the potential to be candidates for new fluorophore frameworks.
Collapse
Affiliation(s)
- Beomchang Kang
- Department
of Chemistry, Seoul National University, 08826 Seoul, Republic of Korea
| | - Chaok Seok
- Department
of Chemistry, Seoul National University, 08826 Seoul, Republic of Korea
| | - Juyong Lee
- Department
of Chemistry, Division of Chemistry and Biochemistry, Kangwon National University, 24341 Chuncheon, Republic of
Korea
| |
Collapse
|
11
|
Aevermann B, Zhang Y, Novotny M, Keshk M, Bakken T, Miller J, Hodge R, Lelieveldt B, Lein E, Scheuermann RH. A machine learning method for the discovery of minimum marker gene combinations for cell type identification from single-cell RNA sequencing. Genome Res 2021; 31:1767-1780. [PMID: 34088715 PMCID: PMC8494219 DOI: 10.1101/gr.275569.121] [Citation(s) in RCA: 47] [Impact Index Per Article: 11.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2021] [Accepted: 05/24/2021] [Indexed: 11/24/2022]
Abstract
Single-cell genomics is rapidly advancing our knowledge of the diversity of cell phenotypes, including both cell types and cell states. Driven by single-cell/-nucleus RNA sequencing (scRNA-seq), comprehensive cell atlas projects characterizing a wide range of organisms and tissues are currently underway. As a result, it is critical that the transcriptional phenotypes discovered are defined and disseminated in a consistent and concise manner. Molecular biomarkers have historically played an important role in biological research, from defining immune cell types by surface protein expression to defining diseases by their molecular drivers. Here, we describe a machine learning-based marker gene selection algorithm, NS-Forest version 2.0, which leverages the nonlinear attributes of random forest feature selection and a binary expression scoring approach to discover the minimal marker gene expression combinations that optimally capture the cell type identity represented in complete scRNA-seq transcriptional profiles. The marker genes selected provide an expression barcode that serves as both a useful tool for downstream biological investigation and the necessary and sufficient characteristics for semantic cell type definition. The use of NS-Forest to identify marker genes for human brain middle temporal gyrus cell types reveals the importance of cell signaling and noncoding RNAs in neuronal cell type identity.
Collapse
Affiliation(s)
| | - Yun Zhang
- J. Craig Venter Institute, La Jolla, California 92037, USA
| | - Mark Novotny
- J. Craig Venter Institute, La Jolla, California 92037, USA
| | - Mohamed Keshk
- J. Craig Venter Institute, La Jolla, California 92037, USA
| | - Trygve Bakken
- Allen Institute for Brain Science, Seattle, Washington 98109, USA
| | - Jeremy Miller
- Allen Institute for Brain Science, Seattle, Washington 98109, USA
| | - Rebecca Hodge
- Allen Institute for Brain Science, Seattle, Washington 98109, USA
| | - Boudewijn Lelieveldt
- Department of Radiology, Leiden University Medical Center, 2300 Leiden, The Netherlands
- Department of Intelligent Systems, Delft University of Technology, 2628 Delft, The Netherlands
| | - Ed Lein
- Allen Institute for Brain Science, Seattle, Washington 98109, USA
| | - Richard H Scheuermann
- J. Craig Venter Institute, La Jolla, California 92037, USA
- University of California San Diego, La Jolla, California 92093, USA
- La Jolla Institute for Immunology, La Jolla, California 92037, USA
| |
Collapse
|
12
|
SFE-GACN: A novel unknown attack detection under insufficient data via intra categories generation in embedding space. Comput Secur 2021. [DOI: 10.1016/j.cose.2021.102262] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
|
13
|
Latent Dirichlet Allocation and t-Distributed Stochastic Neighbor Embedding Enhance Scientific Reading Comprehension of Articles Related to Enterprise Architecture. AI 2021. [DOI: 10.3390/ai2020011] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022] Open
Abstract
As the amount of scientific information increases steadily, it is crucial to improve fast-reading comprehension. To grasp many scientific articles in a short period, artificial intelligence becomes essential. This paper aims to apply artificial intelligence methodologies to examine broad topics such as enterprise architecture in scientific articles. Analyzing abstracts with latent dirichlet allocation or inverse document frequency appears to be more beneficial than exploring full texts. Furthermore, we demonstrate that t-distributed stochastic neighbor embedding is well suited to explore the degree of connectivity to neighboring topics, such as complexity theory. Artificial intelligence produces results that are similar to those obtained by manual reading. Our full-text study confirms enterprise architecture trends such as sustainability and modeling languages.
Collapse
|
14
|
Häkkinen A, Koiranen J, Casado J, Kaipio K, Lehtonen O, Petrucci E, Hynninen J, Hietanen S, Carpén O, Pasquini L, Biffoni M, Lehtonen R, Hautaniemi S. qSNE: quadratic rate t-SNE optimizer with automatic parameter tuning for large datasets. Bioinformatics 2021; 36:5086-5092. [PMID: 32663244 PMCID: PMC7755412 DOI: 10.1093/bioinformatics/btaa637] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2020] [Revised: 07/06/2020] [Accepted: 07/08/2020] [Indexed: 01/04/2023] Open
Abstract
Motivation Non-parametric dimensionality reduction techniques, such as t-distributed stochastic neighbor embedding (t-SNE), are the most frequently used methods in the exploratory analysis of single-cell datasets. Current implementations scale poorly to massive datasets and often require downsampling or interpolative approximations, which can leave less-frequent populations undiscovered and much information unexploited. Results We implemented a fast t-SNE package, qSNE, which uses a quasi-Newton optimizer, allowing quadratic convergence rate and automatic perplexity (level of detail) optimizer. Our results show that these improvements make qSNE significantly faster than regular t-SNE packages and enables full analysis of large datasets, such as mass cytometry data, without downsampling. Availability and implementation Source code and documentation are openly available at https://bitbucket.org/anthakki/qsne/. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Antti Häkkinen
- Research Program in Systems Oncology, Research Programs Unit, Faculty of Medicine, University of Helsinki, 00014 Helsinki, Finland
| | - Juha Koiranen
- Research Program in Systems Oncology, Research Programs Unit, Faculty of Medicine, University of Helsinki, 00014 Helsinki, Finland
| | - Julia Casado
- Research Program in Systems Oncology, Research Programs Unit, Faculty of Medicine, University of Helsinki, 00014 Helsinki, Finland
| | - Katja Kaipio
- Research Center for Cancer, Infections and Immunity, Institute of Biomedicine, University of Turku, Turku 20014, Finland
| | - Oskari Lehtonen
- Research Program in Systems Oncology, Research Programs Unit, Faculty of Medicine, University of Helsinki, 00014 Helsinki, Finland
| | - Eleonora Petrucci
- Department of Oncology and Molecular Medicine, Istituto Superiore di Sanità, Rome 00161, Italy
| | - Johanna Hynninen
- Department of Obstetrics and Gynecology, University of Turku and Turku University Hospital, Turku 20521, Finland
| | - Sakari Hietanen
- Department of Obstetrics and Gynecology, University of Turku and Turku University Hospital, Turku 20521, Finland
| | - Olli Carpén
- Research Program in Systems Oncology, Research Programs Unit, Faculty of Medicine, University of Helsinki, 00014 Helsinki, Finland.,Research Center for Cancer, Infections and Immunity, Institute of Biomedicine, University of Turku, Turku 20014, Finland.,Department of Pathology, University of Helsinki and HUSLAB, Helsinki University Hospital, Helsinki 00014, Finland
| | - Luca Pasquini
- Major Equipments and Core Facilities, Istituto Superiore di Sanità, Rome 00161, Italy
| | - Mauro Biffoni
- Department of Oncology and Molecular Medicine, Istituto Superiore di Sanità, Rome 00161, Italy
| | - Rainer Lehtonen
- Research Program in Systems Oncology, Research Programs Unit, Faculty of Medicine, University of Helsinki, 00014 Helsinki, Finland
| | - Sampsa Hautaniemi
- Research Program in Systems Oncology, Research Programs Unit, Faculty of Medicine, University of Helsinki, 00014 Helsinki, Finland
| |
Collapse
|
15
|
Zhu M, Chen W, Hu Y, Hou Y, Liu L, Zhang K. DRGraph: An Efficient Graph Layout Algorithm for Large-scale Graphs by Dimensionality Reduction. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2021; 27:1666-1676. [PMID: 33275582 DOI: 10.1109/tvcg.2020.3030447] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
Efficient layout of large-scale graphs remains a challenging problem: the force-directed and dimensionality reduction-based methods suffer from high overhead for graph distance and gradient computation. In this paper, we present a new graph layout algorithm, called DRGraph, that enhances the nonlinear dimensionality reduction process with three schemes: approximating graph distances by means of a sparse distance matrix, estimating the gradient by using the negative sampling technique, and accelerating the optimization process through a multi-level layout scheme. DRGraph achieves a linear complexity for the computation and memory consumption, and scales up to large-scale graphs with millions of nodes. Experimental results and comparisons with state-of-the-art graph layout methods demonstrate that DRGraph can generate visually comparable layouts with a faster running time and a lower memory requirement.
Collapse
|
16
|
Abdelaal T, de Raadt P, Lelieveldt BPF, Reinders MJT, Mahfouz A. SCHNEL: scalable clustering of high dimensional single-cell data. Bioinformatics 2020; 36:i849-i856. [PMID: 33381821 DOI: 10.1093/bioinformatics/btaa816] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 09/07/2020] [Indexed: 11/13/2022] Open
Abstract
MOTIVATION Single cell data measures multiple cellular markers at the single-cell level for thousands to millions of cells. Identification of distinct cell populations is a key step for further biological understanding, usually performed by clustering this data. Dimensionality reduction based clustering tools are either not scalable to large datasets containing millions of cells, or not fully automated requiring an initial manual estimation of the number of clusters. Graph clustering tools provide automated and reliable clustering for single cell data, but suffer heavily from scalability to large datasets. RESULTS We developed SCHNEL, a scalable, reliable and automated clustering tool for high-dimensional single-cell data. SCHNEL transforms large high-dimensional data to a hierarchy of datasets containing subsets of data points following the original data manifold. The novel approach of SCHNEL combines this hierarchical representation of the data with graph clustering, making graph clustering scalable to millions of cells. Using seven different cytometry datasets, SCHNEL outperformed three popular clustering tools for cytometry data, and was able to produce meaningful clustering results for datasets of 3.5 and 17.2 million cells within workable time frames. In addition, we show that SCHNEL is a general clustering tool by applying it to single-cell RNA sequencing data, as well as a popular machine learning benchmark dataset MNIST. AVAILABILITY AND IMPLEMENTATION Implementation is available on GitHub (https://github.com/biovault/SCHNELpy). All datasets used in this study are publicly available. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Tamim Abdelaal
- Delft Bioinformatics Lab, Delft University of Technology, 2628 XE Delft, The Netherlands.,Leiden Computational Biology Center
| | | | - Boudewijn P F Lelieveldt
- Delft Bioinformatics Lab, Delft University of Technology, 2628 XE Delft, The Netherlands.,Leiden Computational Biology Center
| | - Marcel J T Reinders
- Delft Bioinformatics Lab, Delft University of Technology, 2628 XE Delft, The Netherlands.,Leiden Computational Biology Center.,Department of Human Genetics, Leiden University Medical Center, 2333 ZC Leiden, The Netherlands
| | - Ahmed Mahfouz
- Delft Bioinformatics Lab, Delft University of Technology, 2628 XE Delft, The Netherlands.,Leiden Computational Biology Center.,Department of Human Genetics, Leiden University Medical Center, 2333 ZC Leiden, The Netherlands
| |
Collapse
|