1
|
ACP-ESM2: The prediction of anticancer peptides based on pre-trained classifier. Comput Biol Chem 2024; 110:108091. [PMID: 38735271 DOI: 10.1016/j.compbiolchem.2024.108091] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2024] [Revised: 04/07/2024] [Accepted: 04/29/2024] [Indexed: 05/14/2024]
Abstract
Anticancer peptides (ACPs) are a type of protein molecule that has anti-cancer activity and can inhibit cancer cell growth and survival. Traditional classification approaches for ACPs are expensive and time-consuming. This paper proposes a pre-trained classifier model, ESM2-GRU, for ACP prediction to make it easier to predict ACPs, gain a better understanding of the structural and functional differences of anti-cancer peptides, and optimize the design for the development of more effective anti-cancer treatment strategies. The model is made up of the ESM2 pre-trained model, a bidirectional GRU recurrent neural network, and a fully connected layer. ACP sequences are first fed into the ESM2 model, which then expands the dimensions before feeding the findings back into the bidirectional GRU recurrent neural network. Finally, the fully connected layer generates the ultimate output. Experimental validation demonstrates that the ESM2-GRU model greatly improves classification performance on the benchmark dataset ACP606, with AUC, ACC, and MCC values of 0.975, 0.852, and 0.738, respectively. This exceptional prediction potential helps to identify specific types of anti-cancer peptides, improving their targeting and selectivity and, therefore, furthering the development of tailored medicine and treatments.
Collapse
|
2
|
CNN-GMM approach to identifying data distribution shifts in forgeries caused by noise: a step towards resolving the deepfake problem. PeerJ Comput Sci 2024; 10:e1991. [PMID: 38660187 PMCID: PMC11042019 DOI: 10.7717/peerj-cs.1991] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/24/2023] [Accepted: 03/25/2024] [Indexed: 04/26/2024]
Abstract
Recently, there have been notable advancements in video editing software. These advancements have allowed novices or those without access to advanced computer technology to generate videos that are visually indistinguishable to the human eye from real ones to the human observer. Therefore, the application of deepfake technology has the potential to expand the scope of identity theft, which poses a significant risk and a formidable challenge to global security. The development of an effective approach for detecting fake videos is necessary. Here, we introduce a novel methodology that employs a convolutional neural network (CNN) and Gaussian mixture model (GMM) to effectively differentiate between fake and real images or videos. The proposed methodology presents a novel CNN-GMM architecture in which the fully connected (FC) layer in the CNN is replaced with a customized Gaussian mixture model (GMM) fully connected layer. The GMM layer utilizes a weighted set of Gaussian probability density functions (PDFs) to represent the distribution of data frequencies in both real and fake images. This representation indicates there is a shift in the distribution of the manipulated images due to added noise. The CNN-GMM model demonstrates the ability to accurately identify variations resulting from different types of deepfakes within the probability distribution. It achieves a high level of classification accuracy, reaching up to 100% in training accuracy and up to 96% in validation accuracy. Notwithstanding the ratio of the genuine class to the counterfeit class being 16.6% to 83.4%, the CNN-GMM model exhibited high-performance metrics in terms of recall, accuracy, and F-score when classifying the least genuine class.
Collapse
|
3
|
Enhanced lung cancer detection: Integrating improved random walker segmentation with artificial neural network and random forest classifier. Heliyon 2024; 10:e29032. [PMID: 38617949 PMCID: PMC11015404 DOI: 10.1016/j.heliyon.2024.e29032] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/08/2023] [Revised: 03/22/2024] [Accepted: 03/28/2024] [Indexed: 04/16/2024] Open
Abstract
Background Medical image segmentation is a vital yet difficult job because of the multimodality of the acquired images. It is difficult to locate the polluted area before it spreads. Methods This research makes use of several machine learning tools, including an artificial neural network as well as a random forest classifier, to increase the system's reliability of pulmonary nodule classification. Anisotropic diffusion filtering is initially used to remove noise from a picture. After that, a modified random walk method is used to get the region of interest inside the lung parenchyma. Finally, the features corresponding to the consistency of the picture segments are extracted using texture-based feature extraction for pulmonary nodules. The final stage is to identify and classify the pulmonary nodules using a classifier algorithm. Results The studies employ cross-validation to demonstrate the validity of the diagnosis framework. In this instance, the proposed method is tested using CT scan information provided by the Lung Image Database Consortium. A random forest classifier showed 99.6 percent accuracy rate for detecting lung cancer, compared to a artificial neural network's 94.8 percent accuracy rate. Conclusions Due to this, current research is now primarily concerned with identifying lung nodules and classifying them as benign or malignant. The diagnostic potential of machine learning as well as image processing approaches are enormous for the categorization of lung cancer.
Collapse
|
4
|
Head and neck cancer of unknown primary: unveiling primary tumor sites through machine learning on DNA methylation profiles. Clin Epigenetics 2024; 16:47. [PMID: 38528631 PMCID: PMC10964705 DOI: 10.1186/s13148-024-01657-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2023] [Accepted: 03/13/2024] [Indexed: 03/27/2024] Open
Abstract
BACKGROUND The unknown tissue of origin in head and neck cancer of unknown primary (hnCUP) leads to invasive diagnostic procedures and unspecific and potentially inefficient treatment options for patients. The most common histologic subtype, squamous cell carcinoma, can stem from various tumor primary sites, including the oral cavity, oropharynx, larynx, head and neck skin, lungs, and esophagus. DNA methylation profiles are highly tissue-specific and have been successfully used to classify tissue origin. We therefore developed a support vector machine (SVM) classifier trained with publicly available DNA methylation profiles of commonly cervically metastasizing squamous cell carcinomas (n = 1103) in order to identify the primary tissue of origin of our own cohort of squamous cell hnCUP patient's samples (n = 28). Methylation analysis was performed with Infinium MethylationEPIC v1.0 BeadChip by Illumina. RESULTS The SVM algorithm achieved the highest overall accuracy of tested classifiers, with 87%. Squamous cell hnCUP samples on DNA methylation level resembled squamous cell carcinomas commonly metastasizing into cervical lymph nodes. The most frequently predicted cancer localization was the oral cavity in 11 cases (39%), followed by the oropharynx and larynx (both 7, 25%), skin (2, 7%), and esophagus (1, 4%). These frequencies concord with the expected distribution of lymph node metastases in epidemiological studies. CONCLUSIONS On DNA methylation level, hnCUP is comparable to primary tumor tissue cancer types that commonly metastasize to cervical lymph nodes. Our SVM-based classifier can accurately predict these cancers' tissues of origin and could significantly reduce the invasiveness of hnCUP diagnostics and enable a more precise therapy after clinical validation.
Collapse
|
5
|
Clinical practice of sepsis-induced immunosuppression: Current immunotherapy and future options. Chin J Traumatol 2024; 27:63-70. [PMID: 38040590 DOI: 10.1016/j.cjtee.2023.11.001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 02/23/2023] [Revised: 08/07/2023] [Accepted: 08/17/2023] [Indexed: 12/03/2023] Open
Abstract
Sepsis is a potentially fatal condition characterized by the failure of one or more organs due to a disordered host response to infection. The development of sepsis is closely linked to immune dysfunction. As a result, immunotherapy has gained traction as a promising approach to sepsis treatment, as it holds the potential to reverse immunosuppression and restore immune balance, thereby improving the prognosis of septic patients. However, due to the highly heterogeneous nature of sepsis, it is crucial to carefully select the appropriate patient population for immunotherapy. This review summarizes the current and evolved treatments for sepsis-induced immunosuppression to enhance clinicians' understanding and practical application of immunotherapy in the management of sepsis.
Collapse
|
6
|
Improved predictive diagnosis of diabetic macular edema based on hybrid models: An observational study. Comput Biol Med 2024; 170:107979. [PMID: 38219645 DOI: 10.1016/j.compbiomed.2024.107979] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2023] [Revised: 12/11/2023] [Accepted: 01/08/2024] [Indexed: 01/16/2024]
Abstract
Diabetic Macular Edema (DME) is the most common sight-threatening complication of type 2 diabetes. Optical Coherence Tomography (OCT) is the most useful imaging technique to diagnose, follow up, and evaluate treatments for DME. However, OCT exam and devices are expensive and unavailable in all clinics in low- and middle-income countries. Our primary goal was therefore to develop an alternative method to OCT for DME diagnosis by introducing spectral information derived from spontaneous electroretinogram (ERG) signals as a single input or combined with fundus that is much more widespread. Baseline ERGs were recorded in 233 patients and transformed into scalograms and spectrograms via Wavelet and Fourier transforms, respectively. Using transfer learning, distinct Convolutional Neural Networks (CNN) were trained as classifiers for DME using OCT, scalogram, spectrogram, and eye fundus images. Input data were randomly split into training and test sets with a proportion of 80 %-20 %, respectively. The top performers for each input type were selected, OpticNet-71 for OCT, DenseNet-201 for eye fundus, and non-evoked ERG-derived scalograms, to generate a combined model by assigning different weights for each of the selected models. Model validation was performed using a dataset alien to the training phase of the models. None of the models powered by mock ERG-derived input performed well. In contrast, hybrid models showed better results, in particular, the model powered by eye fundus combined with mock ERG-derived information with a 91 % AUC and 86 % F1-score, and the model powered by OCT and mock ERG-derived scalogram images with a 93 % AUC and 89 % F1-score. These data show that the spontaneous ERG-derived input adds predictive value to the fundus- and OCT-based models to diagnose DME, except for the sensitivity of the OCT model which remains the same. The inclusion of mock ERG signals, which have recently been shown to take only 5 min to record in daylight conditions, therefore represents a potential improvement over existing OCT-based models, as well as a reliable and cost-effective alternative when combined with the fundus, especially in underserved areas, to predict DME.
Collapse
|
7
|
Fractal Analysis in Clinical Neurosciences: An Overview. ADVANCES IN NEUROBIOLOGY 2024; 36:261-271. [PMID: 38468037 DOI: 10.1007/978-3-031-47606-8_13] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/13/2024]
Abstract
Over the last years, fractals have entered into the realms of clinical neurosciences. The whole brain and its components (i.e., neurons and astrocytes) have been studied as fractal objects, and even more relevant, the fractal-based quantification of the geometrical complexity of histopathological and neuroradiological images as well as neurophysiopathological time series has suggested the existence of a gradient in the pattern representation of neurological diseases. Computational fractal-based parameters have been suggested as potential diagnostic and prognostic biomarkers in different brain diseases, including brain tumors, neurodegeneration, epilepsy, demyelinating diseases, cerebrovascular malformations, and psychiatric disorders as well. This chapter and the entire third section of this book are focused on practical applications of computational fractal-based analysis into the clinical neurosciences, namely, neurology and neuropsychiatry, neuroradiology and neurosurgery, neuropathology, neuro-oncology and neurorehabilitation, neuro-ophthalmology, and cognitive neurosciences, with special emphasis on the translation of the fractal dimension and other fractal parameters as clinical biomarkers useful from bench to bedside.
Collapse
|
8
|
Computer-aided autism diagnosis using visual attention models and eye-tracking: replication and improvement proposal. BMC Med Inform Decis Mak 2023; 23:285. [PMID: 38098001 PMCID: PMC10722824 DOI: 10.1186/s12911-023-02389-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2023] [Accepted: 12/04/2023] [Indexed: 12/17/2023] Open
Abstract
BACKGROUND Autism Spectrum Disorder (ASD) diagnosis can be aided by approaches based on eye-tracking signals. Recently, the feasibility of building Visual Attention Models (VAMs) from features extracted from visual stimuli and their use for classifying cases and controls has been demonstrated using Neural Networks and Support Vector Machines. The present work has three aims: 1) to evaluate whether the trained classifier from the previous study was generalist enough to classify new samples with a new stimulus; 2) to replicate the previously approach to train a new classifier with a new dataset; 3) to evaluate the performance of classifiers obtained by a new classification algorithm (Random Forest) using the previous and the current datasets. METHODS The previously approach was replicated with a new stimulus and new sample, 44 from the Typical Development group and 33 from the ASD group. After the replication, Random Forest classifier was tested to substitute Neural Networks algorithm. RESULTS The test with the trained classifier reached an AUC of 0.56, suggesting that the trained classifier requires retraining of the VAMs when changing the stimulus. The replication results reached an AUC of 0.71, indicating the potential of generalization of the approach for aiding ASD diagnosis, as long as the stimulus is similar to the originally proposed. The results achieved with Random Forest were superior to those achieved with the original approach, with an average AUC of 0.95 for the previous dataset and 0.74 for the new dataset. CONCLUSION In summary, the results of the replication experiment were satisfactory, which suggests the robustness of the approach and the VAM-based approaches feasibility to aid in ASD diagnosis. The proposed method change improved the classification performance. Some limitations are discussed and additional studies are encouraged to test other conditions and scenarios.
Collapse
|
9
|
Effects of water stress and fertilizer stress on maize growth and spectral identification of different stresses. SPECTROCHIMICA ACTA. PART A, MOLECULAR AND BIOMOLECULAR SPECTROSCOPY 2023; 297:122703. [PMID: 37060655 DOI: 10.1016/j.saa.2023.122703] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/12/2023] [Revised: 03/30/2023] [Accepted: 04/01/2023] [Indexed: 05/14/2023]
Abstract
Water stress and fertilizer stress have a significant impact on the growth and yield of maize. In order to improve the timeliness and accuracy of irrigation and fertilizer application, it is crucial to monitor water stress and fertilizer stress rapidly and accurately. This would help in conserving water and fertilizer resources and ensuring a stable maize yield. To this end, pot experiments were set up to explore the growth differences and photosynthetic properties of maize under water stress and fertilizer stress. The hyperspectral technology was used to construct the spectral indexes that can distinguish stress types, and the classification algorithm was combined to identify stress types. The research has shown that the plant height, basal diameter, leaf area, and photosynthetic properties of maize decreased with an increase in drought stress. However, rewatering could compensate for drought stress. Furthermore, fertilizer stress also affected water uptake by plants, and high nitrogen stress had a significant negative effect on the growth of maize plants. We employed a combination of spectral indexes and the support vector machine (SVM) classification algorithm in a stepwise manner to identify stress types. Using the training dataset, we constructed six classifiers for distinguishing stress types, including the SVM classifier, K-nearest neighbor (KNN) classifier, naive Bayes (NB) classifier, decision tree (DT) classifier, random forest (RF) classifier, and AdaBoost classifier. Our results showed that the RF and AdaBoost classifiers obtained stable results in stress type differentiation, achieving accurate identification of unstressed, water stressed, and fertilizer stressed maize plants. This is expected to provide a solid basis and reference for monitoring crop stress types in agricultural fields.
Collapse
|
10
|
Identification and assessment of differentially expressed necroptosis long non-coding RNAs associated with periodontitis in human. BMC Oral Health 2023; 23:632. [PMID: 37667236 PMCID: PMC10478209 DOI: 10.1186/s12903-023-03308-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2023] [Accepted: 08/13/2023] [Indexed: 09/06/2023] Open
Abstract
BACKGROUND Periodontitis is the most common oral disease and is closely related to immune infiltration in the periodontal microenvironment and its poor prognosis is related to the complex immune response. The progression of periodontitis is closely related to necroptosis, but there is still no systematic study of long non-coding RNA (lncRNA) associated with necroptosis for diagnosis and treatment of periodontitis. MATERIAL AND METHODS Transcriptome data and clinical data of periodontitis and healthy populations were obtained from the Gene Expression Omnibus (GEO) database, and necroptosis-related genes were obtained from previously published literature. FactoMineR package in R was used to perform principal component analysis (PCA) for obtaining the necroptosis-related lncRNAs. The core necroptosis-related lncRNAs were screened by the Linear Models for Microarray Data (limma) package in R, PCA principal component analysis and lasso algorithm. These lncRNAs were then used to construct a classifier for periodontitis with logistic regression. The receiver operating characteristic (ROC) curve was used to evaluate the sensitivity and specificity of the model. The CIBERSORT method and ssGSEA algorithm were used to estimate the immune infiltration and immune pathway activation of periodontitis. Spearman's correlation analysis was used to further verify the correlation between core genes and periodontitis immune microenvironment. The expression level of core genes in human periodontal ligament cells (hPDLCs) was detected by RT-qPCR. RESULTS A total of 10 core necroptosis-related lncRNAs (10-lncRNAs) were identified, including EPB41L4A-AS1, FAM30A, LINC01004, MALAT1, MIAT, OSER1-DT, PCOLCE-AS1, RNF144A-AS1, CARMN, and LINC00582. The classifier for periodontitis was successfully constructed. The Area Under the Curve (AUC) was 0.952, which suggested that the model had good predictive performance. The correlation analysis of 10-lncRNAs and periodontitis immune microenvironment showed that 10-lncRNAs had an impact on the immune infiltration of periodontitis. Notably, the RT-qPCR results showed that the expression level of the 10-lncRNAs obtained was consistent with the chip analysis results. CONCLUSIONS The 10-lncRNAs identified from the GEO dataset had a significant impact on the immune infiltration of periodontitis and the classifier based on 10-lncRNAs had good detection efficiency for periodontitis, which provided a new target for diagnosis and treatment of periodontitis.
Collapse
|
11
|
Bayesian network model structure based on binary evolutionary algorithm. PeerJ Comput Sci 2023; 9:e1466. [PMID: 37547397 PMCID: PMC10403175 DOI: 10.7717/peerj-cs.1466] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/20/2023] [Accepted: 06/08/2023] [Indexed: 08/08/2023]
Abstract
With the continuous development of new technologies, the scale of training data is also expanding. Machine learning algorithms are gradually beginning to be studied and applied in places where the scale of data is relatively large. Because the current structure of learning algorithms only focus on the identification of dependencies and ignores the direction of dependencies, it causes multiple labeled samples not to identify categories. Multiple labels need to be classified using techniques such as machine learning and then applied to solve the problem. In the environment of more training data, it is very meaningful to explore the structure extension to identify the dependencies between attributes and take into account the direction of dependencies. In this article, Bayesian network structure learning, analysis of the shortcomings of traditional algorithms, and binary evolutionary algorithm are applied to the randomized algorithm to generate the initial population. In the optimization process of the algorithm, it uses a Bayesian network to do a local search and uses a depth-first algorithm to break the loop. Finally, it finds a higher score for the network structure. In the simulation experiment, the classic data sets, ALARM and INSURANCE, are introduced to verify the effectiveness of the algorithm. Compared with NOTEARS and the Expectation-Maximization (EM) algorithm, the weight evaluation index of this article was 4.5% and 7.3% better than other schemes. The clustering effect was improved by 13.5% and 15.2%. The smallest error and the highest accuracy are also better than other schemes. The discussion of Bayesian reasoning in this article has very important theoretical and practical significance. This article further improves the Bayesian network structure and optimizes the performance of the classifier, which plays a very important role in promoting the expansion of the network structure and provides innovative thinking.
Collapse
|
12
|
Lung Cancer Detection from CT Images: Modified Adaptive Threshold Segmentation with Support Vector Machines and Artificial Neural Network Classifier. Curr Med Imaging 2023; 20:CMIR-EPUB-132897. [PMID: 37449711 DOI: 10.2174/1573405620666230714110914] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2023] [Revised: 05/12/2023] [Accepted: 06/13/2023] [Indexed: 07/18/2023]
Abstract
OBJECTIVE The objective of the research is to implement an advanced modified threshold segmentation and classification model for early and accurate detection of lung cancer from CT images. METHODS Using the Support Vector Machines (SVM) classifier as well as the Artificial Neural Network (ANN) classifier, the authors propose using Modified adaptive threshold segmentation as a segmentation approach for cancer detection. Here, Lung Image Database Consortium (LIDC) datasets, a collection of CT scans, are used as the video frames in an investigation to authorize the recitation of the suggested technique. RESULTS Both quantitative as well as qualitative analyses are used to analyze the segmentation function of the anticipated algorithm. Both the ANN and SVM classifiers used in the suggested technique for lung cancer diagnosis achieve world-record levels of accuracy, with the former achieving a 96.3% detection rate and the latter a 97% rate of accuracy. CONCLUSION This innovation may have a major impact on the worldwide rate of lung cancer rate due to its ability to detect lung tumors in their earliest stages when they are most amenable to being avoided and treated. This method is useful because it provides more information and facilitates quick, precise decision-making for doctors diagnosing lung cancer in their patients.
Collapse
|
13
|
[Molecular diagnosis of hand eczema]. DERMATOLOGIE (HEIDELBERG, GERMANY) 2023:10.1007/s00105-023-05148-z. [PMID: 37272967 DOI: 10.1007/s00105-023-05148-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Subscribe] [Scholar Register] [Accepted: 03/29/2023] [Indexed: 06/06/2023]
Abstract
BACKGROUND Because hand eczema is a diagnostic challenge even for experienced dermatologists, a correct diagnosis is essential to ensure success of specific therapies. OBJECTIVES Prerequisites for successful molecular diagnostics in general and in hand eczema in particular are discussed. MATERIALS AND METHODS Basic research and opinion statement on new developments in molecular diagnostics are considered with a special focus on hand eczema. RESULTS The first molecular classifier to distinguish psoriasis from (hand) eczema signature has been introduced as CE-marked in vitro diagnostics (CE-IVD); many more biomarkers associated with diagnostics, theranostics, or natural course of the disease are currently being investigated. CONCLUSIONS Diagnosis of hand eczema will be supported by molecular diagnostics in the near future; we are at the beginning of the molecular era in dermatology.
Collapse
|
14
|
The sub-molecular characterization identification for cervical cancer. Heliyon 2023; 9:e16873. [PMID: 37484385 PMCID: PMC10360967 DOI: 10.1016/j.heliyon.2023.e16873] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/09/2023] [Revised: 05/28/2023] [Accepted: 05/31/2023] [Indexed: 07/25/2023] Open
Abstract
Background The efficacy of therapy in cervical cancer (CESC) is blocked by high molecular heterogeneity. Thus, the sub-molecular characterization remains primarily explored for personalizing the treatment of CESC patients. Methods Datasets with 741 CESC patients were obtained from TCGA and GEO databases. The NMF algorithm, random forest algorithm, and multivariate Cox analysis were utilized to construct a classifier for defining the sub-molecular characterization. Then, the biological characteristics, genomic variations, prognosis, and immune landscape in molecular subtypes were explored. The significance of classifier genes was validated by quantitative Real-Time PCR, cell transfection, cell colony formation assay, wound healing assay, cell proliferation assay, and Western blot. Results The CESC patients were classified into two subtypes, and the high classifier-score patients with significant differences in ECM-receptor interaction, PI3K-Akt signaling pathway, and MAPK signaling pathway showed a poorer prognosis in OS (p < 0.001), DFI (p = 0.016), PFI (p < 0.001) and DSS (p < 0.001), and with high the M0 Macrophage and resting Mast cells infiltration and low HLA family gene expression. Moreover, the constructed classifier owns a high identified accuracy in the tumor/normal groups (AUC: 0.993), the tumor/CIN1-CIN3 groups (AUC: 0.963), and normal/CIN1-CIN3 groups (AUC: 0.962), and the total prediction performance is better than currently published signatures in CESC (C-index: 0,763). The combined prediction performance further indicated that Nomogram (AUC = 0.837) is superior to the classifier (AUC = 0.835) and Stage (AUC = 0.568), and the C-index of calibration curves is 0.784. The potential biological function of classifier genes indicated that silencing GALNT2 inhibited the cancer cell's proliferation, migration, and colony formation; Conversely, the cancer cell's proliferation, migration, and colony formation were increased after the upregulation of GALNT2. The Epithelial-Mesenchymal Transition Experiment showed that GALNT2 knockdown might reduce the levels of Snail and Vimentin proteins and increase E-cadherin; Conversely, the levels of Snail and Vimentin proteins were increased, E-cadherin was reduced by GALNT2 upregulation. Conclusion The classifier we constructed may help improve our understanding of subtype characteristics and provide a new strategy for developing CESC therapeutics. Remarkably, GALNT2 may be an option to directly target drivers in CESC cancer therapy.
Collapse
|
15
|
Identification and classification of pathology and artifacts for human intracranial cognitive research. Neuroimage 2023; 270:119961. [PMID: 36848970 PMCID: PMC10461234 DOI: 10.1016/j.neuroimage.2023.119961] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2022] [Revised: 02/17/2023] [Accepted: 02/20/2023] [Indexed: 02/27/2023] Open
Abstract
Intracranial electroencephalography (iEEG) presents a unique opportunity to extend human neuroscientific understanding. However, typically iEEG is collected from patients diagnosed with focal drug-resistant epilepsy (DRE) and contains transient bursts of pathological activity. This activity disrupts performances on cognitive tasks and can distort findings from human neurophysiology studies. In addition to manual marking by a trained expert, numerous IED detectors have been developed to identify these pathological events. Even so, the versatility and usefulness of these detectors is limited by training on small datasets, incomplete performance metrics, and lack of generalizability to iEEG. Here, we employed a large annotated public iEEG dataset from two institutions to train a random forest classifier (RFC) to distinguish data segments as either 'non-cerebral artifact' (n = 73,902), 'pathological activity' (n = 67,797), or 'physiological activity' (n = 151,290). We found our model performed with an accuracy of 0.941, specificity of 0.950, sensitivity of 0.908, precision of 0.911, and F1 score of 0.910, averaged across all three event types. We extended the generalizability of our model to continuous bipolar data collected in a task-state at a different institution with a lower sampling rate and found our model performed with an accuracy of 0.789, specificity of 0.806, and sensitivity of 0.742, averaged across all three event types. Additionally, we created a custom graphical user interface to implement our classifier and enhance usability.
Collapse
|
16
|
Mobile app for targeted selective treatment of haemonchosis in sheep. Vet Parasitol 2023; 316:109902. [PMID: 36871499 DOI: 10.1016/j.vetpar.2023.109902] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2022] [Revised: 12/22/2022] [Accepted: 02/27/2023] [Indexed: 03/06/2023]
Abstract
Livestock is an important part of many countries gross domestic product, and sanitary control impacts herd management costs. To contribute to incorporating new technologies into this economic chain, this work presents a mobile application for decision assistance to treatment against parasitic infection by Haemonchus contortus in small ruminants. Based on the Android system, the proposed software is a semi-automated computer-aided procedure to assist Famacha© pre-trained farmers in applying anthelmintic treatment. It mimics the two-class decision procedure performed by the veterinarian with the help of the Famacha© card. The embedded cell phone camera was employed to acquire an image from the ocular conjunctival mucosa, classifying the animal as healthy or anemic. Two machine-learning strategies were assessed, resulting in an accuracy of 83 % for a neural network and 87 % for a support vector machine (SVM). The SVM classifier was embedded into the app and made available for evaluation. This work is particularly interesting to small property owners from regions with difficult access or restrictions on obtaining continuous post-training technical guidance to use the Famacha© method effectively.
Collapse
|
17
|
Tellu - an object-detector algorithm for automatic classification of intestinal organoids. Dis Model Mech 2023; 16:297124. [PMID: 36804687 PMCID: PMC10067441 DOI: 10.1242/dmm.049756] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2022] [Accepted: 02/07/2023] [Indexed: 02/22/2023] Open
Abstract
Intestinal epithelial organoids recapitulate many of the in vivo features of the intestinal epithelium, thus representing excellent research models. Morphology of the organoids based on light-microscopy images is used as a proxy to assess the biological state of the intestinal epithelium. Currently, organoid classification is manual and, therefore, subjective and time consuming, hampering large-scale quantitative analyses. Here, we describe Tellu, an object-detector algorithm trained to classify cultured intestinal organoids. Tellu was trained by manual annotation of >20,000 intestinal organoids to identify cystic non-budding organoids, early organoids, late organoids and spheroids. Tellu can also be used to quantify the relative organoid size, and can classify intestinal organoids into these four subclasses with accuracy comparable to that of trained scientists but is significantly faster and without bias. Tellu is provided as an open, user-friendly online tool to benefit the increasing number of investigations using organoids through fast and unbiased organoid morphology and size analysis.
Collapse
|
18
|
Establishment of a schizophrenia classifier based on peripheral blood signatures and investigation of pathogenic miRNA-mRNA regulation. J Psychiatr Res 2023; 159:172-184. [PMID: 36738648 DOI: 10.1016/j.jpsychires.2023.01.035] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 10/17/2022] [Revised: 01/04/2023] [Accepted: 01/26/2023] [Indexed: 01/30/2023]
Abstract
To date, the diagnosis of schizophrenia (SCZ) mainly relies on patients' or guardians' self-reports and clinical observation, and the pathogenesis of SCZ remains elusive. In this study, we sought to develop a reliable classifier for diagnosing SCZ patients and provide clues to the etiology and pathogenesis of SCZ. Based on the high throughput sequencing analysis of peripheral blood miRNA expression profile and weighted gene co-expression network analysis (WGCNA) in our previous study, we selected eleven hub miRNAs for validation by qRT-PCR in 51 SCZ patients and 51 controls. miR-939-5p, miR-4732-3p let-7d-3p, and miR-142-3p were confirmed to be significantly up-regulated, and miR-30e-3p and miR-23a-3p were down-regulated in SCZ patients. miR-30e-3p with the most considerable fold change and statistically significance was selected for targeting validation. We first performed bioinformatics prediction followed by qRT-PCR and verified the up-regulation of potential target mRNAs (ABI1, NMT1, HMGB1) expression. Next, we found that the expression level of ABI1 was significantly up-regulated in SH-SY5Y cells transfected with miR-30e-3p mimics. Lastly, we conducted a luciferase assay in 293T cells confirming that miR-30e-3p could directly bind with the 3'untranslated region (3'-UTR) of ABI1, revealing that miR-30e-3p might play a role in the polymerization of neuronal actin and the reconstruction of the cytoskeleton via the downstream regulation of ABI1. In addition, we constructed a classifier by a series of bioinformatics algorithms and evaluated its diagnostic performance. It appears that the classifier consists of miRNAs and mRNAs possess a better discrimination performance than individual miRNA or mRNA in SCZ.
Collapse
|
19
|
Predicting pre-service teachers' computational thinking skills using machine learning classifiers. EDUCATION AND INFORMATION TECHNOLOGIES 2023; 28:1-21. [PMID: 36846494 PMCID: PMC9939859 DOI: 10.1007/s10639-023-11642-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 11/30/2022] [Accepted: 01/31/2023] [Indexed: 06/18/2023]
Abstract
Computational thinking (CT) skills of pre-service teachers have been explored extensively, but the effectiveness of CT training has yielded mixed results in previous studies. Thus, it is necessary to identify patterns in the relationships between predictors of CT and CT skills to further support CT development. This study developed an online CT training environment as well as compared and contrasted the predictive capacity of four supervised machine learning algorithms in classifying the CT skills of pre-service teachers using log data and survey data. First, the results show that Decision Tree outperformed K-Nearest Neighbors, Logistic Regression, and Naive Bayes in predicting pre-service teachers' CT skills. Second, the participants' time spent on CT training, prior CT skills, and perceptions of difficulty regarding the learning content were the top three important predictors in this model.
Collapse
|
20
|
Sarcoma classification by DNA methylation profiling in clinical everyday life: the Charité experience. Clin Epigenetics 2022; 14:149. [PMID: 36380356 PMCID: PMC9667620 DOI: 10.1186/s13148-022-01365-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2022] [Accepted: 10/25/2022] [Indexed: 11/16/2022] Open
Abstract
BACKGROUND Sarcomas are a heterogeneous group of rare malignant tumors with more than 100 subtypes. Accurate diagnosis remains challenging due to a lack of characteristic molecular or histomorphological hallmarks. A DNA methylation-based tumor profiling classifier for sarcomas (known as sarcoma classifier) from the German Cancer Research Center (Deutsches Krebsforschungszentrum) is now employed in selected cases to guide tumor classification and treatment decisions at our institution. Data on the usage of the classifier in daily clinical routine are lacking. METHODS In this single-center experience, we describe the clinical course of five sarcoma cases undergoing thorough pathological and reference pathological examination as well as DNA methylation-based profiling and their impact on subsequent treatment decisions. We collected data on the clinical course, DNA methylation analysis, histopathology, radiological imaging, and next-generation sequencing. RESULTS Five clinical cases involving DNA methylation-based profiling in 2021 at our institution were included. All patients' DNA methylation profiles were successfully matched to a methylation profile cluster of the sarcoma classifier's dataset. In three patients, the classifier reassured diagnosis or aided in finding the correct diagnosis in light of contradictory data and differential diagnoses. In two patients with intracranial tumors, the classifier changed the diagnosis to a novel diagnostic tumor group. CONCLUSIONS The sarcoma classifier is a valuable diagnostic tool that should be used after comprehensive clinical and histopathological evaluation. It may help to reassure the histopathological diagnosis or indicate the need for thorough reassessment in cases where it contradicts previous findings. However, certain limitations (non-classifiable cases, misclassifications, unclear degree of sample purity for analysis and others) currently preclude wide clinical application. The current sarcoma classifier is therefore not yet ready for a broad clinical routine. With further refinements, this promising tool may be implemented in daily clinical practice in selected cases.
Collapse
|
21
|
An unsupervised image segmentation algorithm for coronary angiography. BioData Min 2022; 15:27. [PMID: 36271448 PMCID: PMC9587570 DOI: 10.1186/s13040-022-00313-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/28/2022] [Accepted: 10/05/2022] [Indexed: 12/02/2022] Open
Abstract
Computer visual systems can rapidly obtain a large amount of data and automatically process them with ease. These characteristics constitute advantages for the application of such systems in the automatic analysis of medical images, as well as in processing technology. The precision of image segmentation, which plays a critical role in computer visual systems, directly affects the quality of processing results. Coronary angiographs feature various background colors, complex patterns, and blurry edges. The image areas containing blood vessels cannot be precisely segmented through regular methods. Therefore, this study proposed an unsupervised learning algorithm that uses regional parameter expansion (RPE). This method was derived from the flood fill algorithm, which can effectively segment image areas containing blood vessels despite a complex background or uneven light and shadow. An optimal cover tree (OCT) algorithm was proposed for the establishment of coronary arteries and the estimation of vessel diameter. Through the region growing method, spanning trees were used to record the cover length of adjacent connections, thereby establishing vessel paths, and the length can be used to track changes in vessel diameter.
Collapse
|
22
|
Performance of EHR classifiers for patient eligibility in a clinical trial of precision screening. Contemp Clin Trials 2022; 121:106926. [PMID: 36115637 DOI: 10.1016/j.cct.2022.106926] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2022] [Revised: 09/07/2022] [Accepted: 09/09/2022] [Indexed: 01/27/2023]
Abstract
BACKGROUND Validated computable eligibility criteria use real-world data and facilitate the conduct of clinical trials. The Genomic Medicine at VA (GenoVA) Study is a pragmatic trial of polygenic risk score testing enrolling patients without known diagnoses of 6 common diseases: atrial fibrillation, coronary artery disease, type 2 diabetes, breast cancer, colorectal cancer, and prostate cancer. We describe the validation of computable disease classifiers as eligibility criteria and their performance in the first 16 months of trial enrollment. METHODS We identified well-performing published computable classifiers for the 6 target diseases and validated these in the target population using blinded physician review. If needed, classifiers were refined and then underwent a subsequent round of blinded review until true positive and true negative rates ≥80% were achieved. The optimized classifiers were then implemented as pre-screening exclusion criteria; telephone screens enabled an assessment of their real-world negative predictive value (NPV-RW). RESULTS Published classifiers for type 2 diabetes and breast and prostate cancer achieved desired performance in blinded chart review without modification; the classifier for atrial fibrillation required two rounds of refinement before achieving desired performance. Among the 1077 potential participants screened in the first 16 months of enrollment, NPV-RW of the classifiers ranged from 98.4% for coronary artery disease to 99.9% for colorectal cancer. Performance did not differ by gender or race/ethnicity. CONCLUSIONS Computable disease classifiers can serve as efficient and accurate pre-screening classifiers for clinical trials, although performance will depend on the trial objectives and diseases under study.
Collapse
|
23
|
Multiclass feature selection with metaheuristic optimization algorithms: a review. Neural Comput Appl 2022; 34:19751-19790. [PMID: 36060097 PMCID: PMC9424068 DOI: 10.1007/s00521-022-07705-4] [Citation(s) in RCA: 14] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2022] [Accepted: 08/02/2022] [Indexed: 11/24/2022]
Abstract
Selecting relevant feature subsets is vital in machine learning, and multiclass feature selection is harder to perform since most classifications are binary. The feature selection problem aims at reducing the feature set dimension while maintaining the performance model accuracy. Datasets can be classified using various methods. Nevertheless, metaheuristic algorithms attract substantial attention to solving different problems in optimization. For this reason, this paper presents a systematic survey of literature for solving multiclass feature selection problems utilizing metaheuristic algorithms that can assist classifiers selects optima or near optima features faster and more accurately. Metaheuristic algorithms have also been presented in four primary behavior-based categories, i.e., evolutionary-based, swarm-intelligence-based, physics-based, and human-based, even though some literature works presented more categorization. Further, lists of metaheuristic algorithms were introduced in the categories mentioned. In finding the solution to issues related to multiclass feature selection, only articles on metaheuristic algorithms used for multiclass feature selection problems from the year 2000 to 2022 were reviewed about their different categories and detailed descriptions. We considered some application areas for some of the metaheuristic algorithms applied for multiclass feature selection with their variations. Popular multiclass classifiers for feature selection were also examined. Moreover, we also presented the challenges of metaheuristic algorithms for feature selection, and we identified gaps for further research studies.
Collapse
|
24
|
scDLC: a deep learning framework to classify large sample single-cell RNA-seq data. BMC Genomics 2022; 23:504. [PMID: 35831808 PMCID: PMC9281153 DOI: 10.1186/s12864-022-08715-1] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/03/2021] [Accepted: 06/21/2022] [Indexed: 11/10/2022] Open
Abstract
Background Using single-cell RNA sequencing (scRNA-seq) data to diagnose disease is an effective technique in medical research. Several statistical methods have been developed for the classification of RNA sequencing (RNA-seq) data, including, for example, Poisson linear discriminant analysis (PLDA), negative binomial linear discriminant analysis (NBLDA), and zero-inflated Poisson logistic discriminant analysis (ZIPLDA). Nevertheless, few existing methods perform well for large sample scRNA-seq data, in particular when the distribution assumption is also violated. Results We propose a deep learning classifier (scDLC) for large sample scRNA-seq data, based on the long short-term memory recurrent neural networks (LSTMs). Our new scDLC does not require a prior knowledge on the data distribution, but instead, it takes into account the dependency of the most outstanding feature genes in the LSTMs model. LSTMs is a special recurrent neural network, which can learn long-term dependencies of a sequence. Conclusions Simulation studies show that our new scDLC performs consistently better than the existing methods in a wide range of settings with large sample sizes. Four real scRNA-seq datasets are also analyzed, and they coincide with the simulation results that our new scDLC always performs the best. The code named “scDLC” is publicly available at https://github.com/scDLC-code/code. Supplementary Information The online version contains supplementary material available at (10.1186/s12864-022-08715-1).
Collapse
|
25
|
A comprehensive tool for rapid and accurate prediction of disease using DNA sequence classifier. JOURNAL OF AMBIENT INTELLIGENCE AND HUMANIZED COMPUTING 2022; 14:1-17. [PMID: 35789598 PMCID: PMC9243743 DOI: 10.1007/s12652-022-04099-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 12/10/2021] [Accepted: 06/06/2022] [Indexed: 06/15/2023]
Abstract
In the current pandemic situation where the coronavirus is spreading very fast that can jump from one human to another. Along with this, there are millions of viruses for example Ebola, SARS, etc. that can spread as fast as the coronavirus due to the mobilization and globalization of the population and are equally deadly. Earlier identification of these viruses can prevent the outbreaks that we are facing currently as well as can help in the earlier designing of drugs. Identification of disease at a prior stage can be achieved through DNA sequence classification as DNA carries most of the genetic information about organisms. This is the reason why the classification of DNA sequences plays an important role in computational biology. This paper has presented a solution in which samples collected from NCBI are used for the classification of DNA sequences. DNA sequence classification will in turn gives the pattern of various diseases; these patterns are then compared with the samples of a newly infected person and can help in the earlier identification of disease. However, feature extraction always remains a big issue. In this paper, a machine learning-based classifier and a new technique for extracting features from DNA sequences based on a hot vector matrix have been proposed. In the hot vector representation of the DNA sequence, each pair of the word is represented using a binary matrix which represents the position of each nucleotide in the DNA sequence. The resultant matrix is then given as an input to the traditional CNN for feature extraction. The results of the proposed method have been compared with 5 well-known classifiers namely Convolution neural network (CNN), Support Vector Machines (SVM), K-Nearest Neighbor (KNN) algorithm, Decision Trees, Recurrent Neural Networks (RNN) on several parameters including precision rate and accuracy and the result shows that the proposed method gives an accuracy of 93.9%, which is highest compared to other classifiers.
Collapse
|
26
|
MRI-based Whole-Tumor Radiomics to Classify the Types of Pediatric Posterior Fossa Brain Tumor. Neurochirurgie 2022; 68:601-607. [PMID: 35667473 DOI: 10.1016/j.neuchi.2022.05.004] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2021] [Revised: 03/23/2022] [Accepted: 05/06/2022] [Indexed: 10/18/2022]
Abstract
BACKGROUND Differential diagnosis between medulloblastoma (MB), ependymoma (EP) and astrocytoma (PA) is important due to differing medical treatment strategies and predicted survival. The aim of this study was to investigate non-invasive MRI-based radiomic analysis of whole tumors to classify the histologic tumor types of pediatric posterior fossa brain tumor and improve the accuracy of discrimination, using a random forest classifier. METHODS MRI images of 99 patients, with 59 MBs, 13 EPs and 27 PAs histologically confirmed by surgery and pathology before treatment, were included in this retrospective study. Registration was performed between the three sequences, and high- throughput features were extracted from manually segmented tumors on MR images of each case. The forest-based feature selection method was adopted to select the top ten significant features. Finally, the results were compared and analyzed according to the classification. RESULTS The top ten contributions according to the classifier of wavelet features all came from the ADC sequence. The random forest classifier achieved 100% accuracy on the training data and validated the best accuracy (0.938): sensitivity = 1.000, 0.948 and 0.808, specificity = 0.952, 0.926 and 1.000 for EP, MB and PA, respectively. CONCLUSION A random forest classifier based on the ADC sequence of the whole tumor provides more quantitative information than TIWI and T2WI in differentiating pediatric posterior fossa brain tumors. In particular, the histogram percentile value showed great superiority, which added diagnostic value in pediatric neuro-oncology.
Collapse
|
27
|
Comparison of five supervised feature selection algorithms leading to top features and gene signatures from multi-omics data in cancer. BMC Bioinformatics 2022; 23:153. [PMID: 35484501 PMCID: PMC9052461 DOI: 10.1186/s12859-022-04678-y] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2022] [Accepted: 04/11/2022] [Indexed: 11/24/2022] Open
Abstract
BACKGROUND As many complex omics data have been generated during the last two decades, dimensionality reduction problem has been a challenging issue in better mining such data. The omics data typically consists of many features. Accordingly, many feature selection algorithms have been developed. The performance of those feature selection methods often varies by specific data, making the discovery and interpretation of results challenging. METHODS AND RESULTS In this study, we performed a comprehensive comparative study of five widely used supervised feature selection methods (mRMR, INMIFS, DFS, SVM-RFE-CBR and VWMRmR) for multi-omics datasets. Specifically, we used five representative datasets: gene expression (Exp), exon expression (ExpExon), DNA methylation (hMethyl27), copy number variation (Gistic2), and pathway activity dataset (Paradigm IPLs) from a multi-omics study of acute myeloid leukemia (LAML) from The Cancer Genome Atlas (TCGA). The different feature subsets selected by the aforesaid five different feature selection algorithms are assessed using three evaluation criteria: (1) classification accuracy (Acc), (2) representation entropy (RE) and (3) redundancy rate (RR). Four different classifiers, viz., C4.5, NaiveBayes, KNN, and AdaBoost, were used to measure the classification accuary (Acc) for each selected feature subset. The VWMRmR algorithm obtains the best Acc for three datasets (ExpExon, hMethyl27 and Paradigm IPLs). The VWMRmR algorithm offers the best RR (obtained using normalized mutual information) for three datasets (Exp, Gistic2 and Paradigm IPLs), while it gives the best RR (obtained using Pearson correlation coefficient) for two datasets (Gistic2 and Paradigm IPLs). It also obtains the best RE for three datasets (Exp, Gistic2 and Paradigm IPLs). Overall, the VWMRmR algorithm yields best performance for all three evaluation criteria for majority of the datasets. In addition, we identified signature genes using supervised learning collected from the overlapped top feature set among five feature selection methods. We obtained a 7-gene signature (ZMIZ1, ENG, FGFR1, PAWR, KRT17, MPO and LAT2) for EXP, a 9-gene signature for ExpExon, a 7-gene signature for hMethyl27, one single-gene signature (PIK3CG) for Gistic2 and a 3-gene signature for Paradigm IPLs. CONCLUSION We performed a comprehensive comparison of the performance evaluation of five well-known feature selection methods for mining features from various high-dimensional datasets. We identified signature genes using supervised learning for the specific omic data for the disease. The study will help incorporate higher order dependencies among features.
Collapse
|
28
|
KCNQ1 and lymphovascular invasion are key features in a prognostic classifier for stage II and III colon cancer. BMC Cancer 2022; 22:372. [PMID: 35395779 PMCID: PMC8991490 DOI: 10.1186/s12885-022-09473-9] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2021] [Accepted: 03/24/2022] [Indexed: 12/15/2022] Open
Abstract
Background The risk of recurrence after resection of a stage II or III colon cancer, and therefore qualification for adjuvant chemotherapy (ACT), is traditionally based on clinicopathological parameters. However, the parameters used in clinical practice are not able to accurately identify all patients with or without minimal residual disease. Some patients considered ‘low-risk’ do develop recurrence (undertreatment), whilst other patients receiving ACT might not have developed recurrence at all (overtreatment). We previously analysed tumour tissue expression of 28 protein biomarkers that might improve identification of patients at risk of recurrence. In the present study we aimed to build a prognostic classifier based on these 28 biomarkers and clinicopathological parameters. Methods Classification and regression tree (CART) analysis was used to build a prognostic classifier based on a well described cohort of 386 patients with stage II and III colon cancer. Separate classifiers were built for patients who were or were not treated with ACT. Routine clinicopathological parameters and tumour tissue immunohistochemistry data were included, available for 28 proteins previously published. Classification trees were pruned until lowest misclassification error was obtained. Survival of the identified subgroups was analysed, and robustness of the selected CART variables was assessed by random forest analysis (1000 trees). Results In patients not treated with ACT, prognosis was estimated best based on expression of KCNQ1. Poor disease-free survival (DFS) was observed in those with loss of expression of KCNQ1 (HR = 3.38 (95% CI 2.12 – 5.40); p < 0.001). In patients treated with ACT, key prognostic factors were lymphovascular invasion (LVI) and expression of KCNQ1. Patients with LVI showed poorest DFS, whilst patients without LVI and high expression of KCNQ1 showed most favourable survival (HR = 7.50 (95% CI 3.57—15.74); p < 0.001). Patients without LVI and loss of expression of KCNQ1 had intermediate survival (HR = 3.91 (95% CI 1.76 – 8.72); p = 0.001). Conclusion KCNQ1 and LVI were identified as key features in prognostic classifiers for disease-free survival in stage II and III colon cancer patients. Supplementary Information The online version contains supplementary material available at 10.1186/s12885-022-09473-9.
Collapse
|
29
|
Using an Automated Speech Recognition Approach to Differentiate Between Normal and Aspirating Swallowing Sounds Recorded from Digital Cervical Auscultation in Children. Dysphagia 2022; 37:1482-1492. [PMID: 35092488 PMCID: PMC9643257 DOI: 10.1007/s00455-022-10410-y] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/22/2021] [Accepted: 01/19/2022] [Indexed: 12/16/2022]
Abstract
Use of machine learning to accurately detect aspirating swallowing sounds in children is an evolving field. Previously reported classifiers for the detection of aspirating swallowing sounds in children have reported sensitivities between 79 and 89%. This study aimed to investigate the accuracy of using an automatic speaker recognition approach to differentiate between normal and aspirating swallowing sounds recorded from digital cervical auscultation in children. We analysed 106 normal swallows from 23 healthy children (median 13 months; 52.1% male) and 18 aspirating swallows from 18 children (median 10.5 months; 61.1% male) who underwent concurrent videofluoroscopic swallow studies with digital cervical auscultation. All swallowing sounds were on thin fluids. A support vector machine classifier with a polynomial kernel was trained on feature vectors that comprised the mean and standard deviation of spectral subband centroids extracted from each swallowing sound in the training set. The trained support vector machine was then used to classify swallowing sounds in the test set. We found high accuracy in the differentiation of aspirating and normal swallowing sounds with 98% overall accuracy. Sensitivity for the detection of aspiration and normal swallowing sounds were 89% and 100%, respectively. There were consistent differences in time, power spectral density and spectral subband centroid features between aspirating and normal swallowing sounds in children. This study provides preliminary research evidence that aspirating and normal swallowing sounds in children can be differentiated accurately using machine learning techniques.
Collapse
|
30
|
An intelligent cyber security phishing detection system using deep learning techniques. CLUSTER COMPUTING 2022; 25:3819-3828. [PMID: 35602317 PMCID: PMC9107003 DOI: 10.1007/s10586-022-03604-4] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/17/2021] [Revised: 04/20/2022] [Accepted: 04/22/2022] [Indexed: 05/13/2023]
Abstract
Recently, phishing attacks have become one of the most prominent social engineering attacks faced by public internet users, governments, and businesses. In response to this threat, this paper proposes to give a complete vision to what Machine learning is, what phishers are using to trick gullible users with different types of phishing attacks techniques and based on our survey that phishing emails is the most effective on the targeted sectors and users which we are going to compare as well. Therefore, more effective phishing detection technology is needed to curb the threat of phishing emails that are growing at an alarming rate in recent years, thus will discuss the techniques of mitigation of phishing by Machine learning algorithms and technical solutions that have been proposed to mitigate the problem of phishing and valuable awareness knowledge users should be aware to detect and prevent from being duped by phishing scams. In this work, we proposed a detection model using machine learning techniques by splitting the dataset to train the detection model and validating the results using the test data , to capture inherent characteristics of the email text, and other features to be classified as phishing or non-phishing using three different data sets, After making a comparison between them, we obtained that the most number of features used the most accurate and efficient results achieved. the best ML algorithm accuracy were 0.88, 1.00, and 0.97 consecutively for boosted decision tree on the applied data sets.
Collapse
|
31
|
Supervised Methods for Biomarker Detection from Microarray Experiments. Methods Mol Biol 2022; 2401:101-120. [PMID: 34902125 DOI: 10.1007/978-1-0716-1839-4_8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
Biomarkers are valuable indicators of the state of a biological system. Microarray technology has been extensively used to identify biomarkers and build computational predictive models for disease prognosis, drug sensitivity and toxicity evaluations. Activation biomarkers can be used to understand the underlying signaling cascades, mechanisms of action and biological cross talk. Biomarker detection from microarray data requires several considerations both from the biological and computational points of view. In this chapter, we describe the main methodology used in biomarkers discovery and predictive modeling and we address some of the related challenges. Moreover, we discuss biomarker validation and give some insights into multiomics strategies for biomarker detection.
Collapse
|
32
|
Expanding the taxonomic range in the fecal metagenome. BMC Bioinformatics 2021; 22:312. [PMID: 34107881 PMCID: PMC8188691 DOI: 10.1186/s12859-021-04212-6] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2020] [Accepted: 05/20/2021] [Indexed: 11/10/2022] Open
Abstract
Background Except for bacteria, the taxonomic diversity of the human fecal metagenome has not been widely studied, despite the potential importance of viruses and eukaryotes. Widely used bioinformatic tools contain limited numbers of non-bacterial species in their databases compared to available genomic sequences and their methodologies do not favour classification of rare sequences which may represent only a small fraction of their parent genome. In seeking to optimise identification of non-bacterial species, we evaluated five widely-used metagenome classifier programs (BURST, Kraken2, Centrifuge, MetaPhlAn2 and CCMetagen) for their ability to correctly assign and count simulations of bacterial, viral and eukaryotic DNA sequence reads, including the effect of taxonomic order of analysis of bacteria, viruses and eukaryotes and the effect of sequencing depth. Results We found that the precision of metagenome classifiers varied significantly between programs and between taxonomic groups. When classifying viruses and eukaryotes, ordering the analysis such that bacteria were classified first significantly improved classification precision. Increasing sequencing depth decreased classification precision and did not improve recall of rare species. Conclusions Choice of metagenome classifier program can have a marked effect on results with respect to precision of species assignment in different taxonomic groups. The order of taxonomic classification can markedly improve precision. Increasing sequencing depth can decrease classification precision and yields diminishing returns in probability of species detection. Supplementary Information The online version contains supplementary material available at 10.1186/s12859-021-04212-6.
Collapse
|
33
|
Detection of COVID-19 in X-ray images by classification of bag of visual words using neural networks. Biomed Signal Process Control 2021; 68:102750. [PMID: 34007303 PMCID: PMC8120450 DOI: 10.1016/j.bspc.2021.102750] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/03/2020] [Revised: 03/01/2021] [Accepted: 05/09/2021] [Indexed: 11/30/2022]
Abstract
Coronavirus disease 2019 (COVID-19) was classified as a pandemic by the World Health Organization in March 2020. Given that this novel virus most notably affects the human respiratory system, early detection may help prevent severe lung damage, save lives, and help prevent further disease spread. Given the constraints on the healthcare facilities and staff, the role of artificial intelligence for automatic diagnosis is critical. The automatic diagnosis of COVID-19 based on medical images is, however, not straightforward. Due to the novelty of the disease, available X-ray datasets are very limited. Furthermore, there is a significant similarity between COVID-19 X-rays and other lung infections. In this paper, these challenges are addressed by proposing an approach consisting of a bag of visual words and a neural network classifier. The proposed method can classify X-ray chest images into non-COVID-19 and COVID-19 with high performance. Three public datasets are used to evaluate the proposed approach. Our best accuracy on the first, second, and third datasets is 96.1, 99.84, and 98 percent. Since detection of COVID-19 is important, sensitivity is used as a criterion. The proposed method’s best sensitivities are 90.32, 99.65, and 91 percent on these datasets, respectively. The experimental results show that extracting features with the bag of visual words results in better classification accuracy than the state-of-the-art techniques.
Collapse
|
34
|
Ortho_Sim_Loc: Essential protein prediction using orthology and priority-based similarity approach. Comput Biol Chem 2021; 92:107503. [PMID: 33962168 DOI: 10.1016/j.compbiolchem.2021.107503] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2020] [Revised: 04/02/2021] [Accepted: 04/21/2021] [Indexed: 10/21/2022]
Abstract
Proteins are the essential macro-molecules of living organism. But all proteins cannot be considered as essential in different relevant studies. Essentiality of a protein is thus computed by computation methods rather than biological experiments which in turn save both time and effort. Different computational approaches are already predicted to select essential proteins successfully with different biological significances by researchers. Most of the experimental approaches return higher false negative outcomes with respect to others. In order to retain the prediction accuracy level, a novel methodology "Ortho_Sim_Loc"has been proposed which is a combined approach of Orthology, Similarity (using clustering and priority based GO-Annotation) and Subcellular localization. Ortho_Sim_Loc can predict enriched functional set essential proteins. The predicted results are validated with other existing methods like different centrality measures, LIDC. The validation results exhibits better performance of Ortho_Sim_Loc in compare to other existing computational approaches.
Collapse
|
35
|
VB 10, a new blood biomarker for differential diagnosis and recovery monitoring of acute viral and bacterial infections. EBioMedicine 2021; 67:103352. [PMID: 33906069 PMCID: PMC8099739 DOI: 10.1016/j.ebiom.2021.103352] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/04/2020] [Revised: 04/04/2021] [Accepted: 04/07/2021] [Indexed: 01/22/2023] Open
Abstract
BACKGROUND Precise differential diagnosis between acute viral and bacterial infections is important to enable appropriate therapy, avoid unnecessary antibiotic prescriptions and optimize the use of hospital resources. A systems view of host response to infections provides opportunities for discovering sensitive and robust molecular diagnostics. METHODS We combine blood transcriptomes from six independent datasets (n = 756) with a knowledge-based human protein-protein interaction network, identifies subnetworks capturing host response to each infection class, and derives common response cores separately for viral and bacterial infections. We subject the subnetworks to a series of computational filters to identify a parsimonious gene panel and a standalone diagnostic score that can be applied to individual samples. We rigorously validate the panel and the diagnostic score in a wide range of publicly available datasets and in a newly developed Bangalore-Viral Bacterial (BL-VB) cohort. FINDING We discover a 10-gene blood-based biomarker panel (Panel-VB) that demonstrates high predictive performance to distinguish viral from bacterial infections, with a weighted mean AUROC of 0.97 (95% CI: 0.96-0.99) in eleven independent datasets (n = 898). We devise a new stand-alone patient-wise score (VB10) based on the panel, which shows high diagnostic accuracy with a weighted mean AUROC of 0.94 (95% CI 0.91-0.98) in 2996 patient samples from 56 public datasets from 19 different countries. Further, we evaluate VB10 in a newly generated South Indian (BL-VB, n = 56) cohort and find 97% accuracy in the confirmed cases of viral and bacterial infections. We find that VB10 is (a) capable of accurately identifying the infection class in culture-negative indeterminate cases, (b) reflects recovery status, and (c) is applicable across different age groups, covering a wide spectrum of acute bacterial and viral infections, including uncharacterized pathogens. We tested our VB10 score on publicly available COVID-19 data and find that our score detected viral infection in patient samples. INTERPRETATION Our results point to the promise of VB10 as a diagnostic test for precise diagnosis of acute infections and monitoring recovery status. We expect that it will provide clinical decision support for antibiotic prescriptions and thereby aid in antibiotic stewardship efforts. FUNDING Grand Challenges India, Biotechnology Industry Research Assistance Council (BIRAC), Department of Biotechnology, Govt. of India.
Collapse
|
36
|
Predicting rifampicin resistance mutations in bacterial RNA polymerase subunit beta based on majority consensus. BMC Bioinformatics 2021; 22:210. [PMID: 33888055 PMCID: PMC8063314 DOI: 10.1186/s12859-021-04137-0] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2020] [Accepted: 04/16/2021] [Indexed: 01/24/2023] Open
Abstract
BACKGROUND Mutations in an enzyme target are one of the most common mechanisms whereby antibiotic resistance arises. Identification of the resistance mutations in bacteria is essential for understanding the structural basis of antibiotic resistance and design of new drugs. However, the traditionally used experimental approaches to identify resistance mutations were usually labor-intensive and costly. RESULTS We present a machine learning (ML)-based classifier for predicting rifampicin (Rif) resistance mutations in bacterial RNA Polymerase subunit β (RpoB). A total of 186 mutations were gathered from the literature for developing the classifier, using 80% of the data as the training set and the rest as the test set. The features of the mutated RpoB and their binding energies with Rif were calculated through computational methods, and used as the mutation attributes for modeling. Classifiers based on five ML algorithms, i.e. decision tree, k nearest neighbors, naïve Bayes, probabilistic neural network and support vector machine, were first built, and a majority consensus (MC) approach was then used to obtain a new classifier based on the classifications of the five individual ML algorithms. The MC classifier comprehensively improved the predictive performance, with accuracy, F-measure and AUC of 0.78, 0.83 and 0.81for training set whilst 0.84, 0.87 and 0.83 for test set, respectively. CONCLUSION The MC classifier provides an alternative methodology for rapid identification of resistance mutations in bacteria, which may help with early detection of antibiotic resistance and new drug discovery.
Collapse
|
37
|
A review of microscopic analysis of blood cells for disease detection with AI perspective. PeerJ Comput Sci 2021; 7:e460. [PMID: 33981834 PMCID: PMC8080427 DOI: 10.7717/peerj-cs.460] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2020] [Accepted: 03/06/2021] [Indexed: 05/07/2023]
Abstract
BACKGROUND Any contamination in the human body can prompt changes in blood cell morphology and various parameters of cells. The minuscule images of blood cells are examined for recognizing the contamination inside the body with an expectation of maladies and variations from the norm. Appropriate segmentation of these cells makes the detection of a disease progressively exact and vigorous. Microscopic blood cell analysis is a critical activity in the pathological analysis. It highlights the investigation of appropriate malady after exact location followed by an order of abnormalities, which assumes an essential job in the analysis of various disorders, treatment arranging, and assessment of results of treatment. METHODOLOGY A survey of different areas where microscopic imaging of blood cells is used for disease detection is done in this paper. Research papers from this area are obtained from a popular search engine, Google Scholar. The articles are searched considering the basics of blood such as its composition followed by staining of blood, that is most important and mandatory before microscopic analysis. Different methods for classification, segmentation of blood cells are reviewed. Microscopic analysis using image processing, computer vision and machine learning are the main focus of the analysis and the review here. Methodologies employed by different researchers for blood cells analysis in terms of these mentioned algorithms is the key point of review considered in the study. RESULTS Different methodologies used for microscopic analysis of blood cells are analyzed and are compared according to different performance measures. From the extensive review the conclusion is made. CONCLUSION There are different machine learning and deep learning algorithms employed by researchers for segmentation of blood cell components and disease detection considering microscopic analysis. There is a scope of improvement in terms of different performance evaluation parameters. Different bio-inspired optimization algorithms can be used for improvement. Explainable AI can analyze the features of AI implemented system and will make the system more trusted and commercially suitable.
Collapse
|
38
|
Genetic factors increase the identification efficiency of predictive models for dyslipidaemia: a prospective cohort study. Lipids Health Dis 2021; 20:11. [PMID: 33579296 PMCID: PMC7881493 DOI: 10.1186/s12944-021-01439-3] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2020] [Accepted: 01/27/2021] [Indexed: 11/10/2022] Open
Abstract
Background Few studies have developed risk models for dyslipidaemia, especially for rural populations. Furthermore, the performance of genetic factors in predicting dyslipidaemia has not been explored. The purpose of this study is to develop and evaluate prediction models with and without genetic factors for dyslipidaemia in rural populations. Methods A total of 3596 individuals from the Henan Rural Cohort Study were included in this study. According to the ratio of 7:3, all individuals were divided into a training set and a testing set. The conventional models and conventional+GRS (genetic risk score) models were developed with Cox regression, artificial neural network (ANN), random forest (RF), and gradient boosting machine (GBM) classifiers in the training set. The area under the receiver operating characteristic curve (AUC), net reclassification index (NRI), and integrated discrimination index (IDI) were used to assess the discrimination ability of the models, and the calibration curve was used to show calibration ability in the testing set. Results Compared to the lowest quartile of GRS, the hazard ratio (HR) (95% confidence interval (CI)) of individuals in the highest quartile of GRS was 1.23(1.07, 1.41) in the total population. Age, family history of diabetes, physical activity, body mass index (BMI), triglycerides (TGs), high-density lipoprotein cholesterol (HDL-C), and low-density lipoprotein cholesterol (LDL-C) were used to develop the conventional models, and the AUCs of the Cox, ANN, RF, and GBM classifiers were 0.702(0.673, 0.729), 0.736(0.708, 0.762), 0.787 (0.762, 0.811), and 0.816(0.792, 0.839), respectively. After adding GRS, the AUCs increased by 0.005, 0.018, 0.023, and 0.015 with the Cox, ANN, RF, and GBM classifiers, respectively. The corresponding NRI and IDI were 25.6, 7.8, 14.1, and 18.1% and 2.3, 1.0, 2.5, and 1.8%, respectively. Conclusion Genetic factors could improve the predictive ability of the dyslipidaemia risk model, suggesting that genetic information could be provided as a potential predictor to screen for clinical dyslipidaemia. Trial registration The Henan Rural Cohort Study has been registered at the Chinese Clinical Trial Register. (Trial registration: ChiCTR-OOC-15006699. Registered 6 July 2015 - Retrospectively registered). Supplementary Information The online version contains supplementary material available at 10.1186/s12944-021-01439-3.
Collapse
|
39
|
Exploring differences for motor imagery using Teager energy operator-based EEG microstate analyses. J Integr Neurosci 2021; 20:411-417. [PMID: 34258941 DOI: 10.31083/j.jin2002042] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2020] [Revised: 02/03/2021] [Accepted: 03/02/2021] [Indexed: 11/06/2022] Open
Abstract
In this paper, the differences between two motor imagery tasks are captured through microstate parameters (occurrence, duration and coverage, and mean spatial correlation (Mspatcorr)) derived from a novel method based on electroencephalogram microstate and Teager energy operator. The results show that the significance between microstate parameters for two tasks is different (P < 0.05) with paired t-test. Furthermore, these microstate parameters are utilized as features. Support vector machine is utilized to classify the two tasks with a mean accuracy of 93.93%, which yielded superior performance compared to the other methods.
Collapse
|
40
|
Reliability of single-subject neural activation patterns in speech production tasks. BRAIN AND LANGUAGE 2021; 212:104881. [PMID: 33278802 PMCID: PMC7781091 DOI: 10.1016/j.bandl.2020.104881] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/09/2020] [Revised: 09/25/2020] [Accepted: 11/06/2020] [Indexed: 06/12/2023]
Abstract
Speech neuroimaging research targeting individual speakers could help elucidate differences that may be crucial to understanding speech disorders. However, this research necessitates reliable brain activation across multiple speech production sessions. In the present study, we evaluated the reliability of speech-related brain activity measured by functional magnetic resonance imaging data from twenty neuro-typical subjects who participated in two experiments involving reading aloud simple speech stimuli. Using traditional methods like the Dice and intraclass correlation coefficients, we found that most individuals displayed moderate to high reliability. We also found that a novel machine-learning subject classifier could identify these individuals by their speech activation patterns with 97% accuracy from among a dataset of seventy-five subjects. These results suggest that single-subject speech research would yield valid results and that investigations into the reliability of speech activation in people with speech disorders are warranted.
Collapse
|
41
|
Development and utility assessment of a machine learning bloodstream infection classifier in pediatric patients receiving cancer treatments. BMC Cancer 2020; 20:1103. [PMID: 33187484 PMCID: PMC7666525 DOI: 10.1186/s12885-020-07618-2] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/11/2020] [Accepted: 11/06/2020] [Indexed: 11/29/2022] Open
Abstract
Background Objectives were to build a machine learning algorithm to identify bloodstream infection (BSI) among pediatric patients with cancer and hematopoietic stem cell transplantation (HSCT) recipients, and to compare this approach with presence of neutropenia to identify BSI. Methods We included patients 0–18 years of age at cancer diagnosis or HSCT between January 2009 and November 2018. Eligible blood cultures were those with no previous blood culture (regardless of result) within 7 days. The primary outcome was BSI. Four machine learning algorithms were used: elastic net, support vector machine and two implementations of gradient boosting machine (GBM and XGBoost). Model training and evaluation were performed using temporally disjoint training (60%), validation (20%) and test (20%) sets. The best model was compared to neutropenia alone in the test set. Results Of 11,183 eligible blood cultures, 624 (5.6%) were positive. The best model in the validation set was GBM, which achieved an area-under-the-receiver-operator-curve (AUROC) of 0.74 in the test set. Among the 2236 in the test set, the number of false positives and specificity of GBM vs. neutropenia were 508 vs. 592 and 0.76 vs. 0.72 respectively. Among 139 test set BSIs, six (4.3%) non-neutropenic patients were identified by GBM. All received antibiotics prior to culture result availability. Conclusions We developed a machine learning algorithm to classify BSI. GBM achieved an AUROC of 0.74 and identified 4.3% additional true cases in the test set. The machine learning algorithm did not perform substantially better than using presence of neutropenia alone to predict BSI. Supplementary Information The online version contains supplementary material available at 10.1186/s12885-020-07618-2.
Collapse
|
42
|
External validation of putative biomarkers in eutopic endometrium of women with endometriosis using NanoString technology. J Assist Reprod Genet 2020; 37:2981-2987. [PMID: 33033989 DOI: 10.1007/s10815-020-01965-6] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/04/2020] [Accepted: 10/04/2020] [Indexed: 11/24/2022] Open
Abstract
PURPOSE To combine different independent endometrial markers to classify the presence of endometriosis. METHODS Endometrial biopsies were obtained from 109 women with endometriosis as well as 110 control women. Nine candidate biomarkers independent of cycle phase were selected from the literature and NanoString was performed. We compared differentially expressed genes between groups and generated generalized linear models to find a classifier for the disease. RESULTS Generalized linear models correctly detected 68% of women with endometriosis (combining deep infiltrating and ovarian endometriosis). However, we were not able to distinguish between individual types of endometriosis compared to controls. From the 9 tested genes, FOS, MMP7, and MMP11 seem to be important for disease classification, and FOS was the most over-expressed gene in endometriosis. CONCLUSION(S) Although generalized linear models may allow identification of endometriosis, we did not obtain perfect classification with the selected gene candidates.
Collapse
|
43
|
Linear predictive coding distinguishes spectral EEG features of Parkinson's disease. Parkinsonism Relat Disord 2020; 79:79-85. [PMID: 32891924 DOI: 10.1016/j.parkreldis.2020.08.001] [Citation(s) in RCA: 30] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 06/15/2020] [Revised: 08/02/2020] [Accepted: 08/03/2020] [Indexed: 10/23/2022]
Abstract
OBJECTIVE We have developed and validated a novel EEG-based signal processing approach to distinguish PD and control patients: Linear-predictive-coding EEG Algorithm for PD (LEAPD). This method efficiently encodes EEG time series into features that can detect PD in a computationally fast manner amenable to real time applications. METHODS We included a total of 41 PD patients and 41 demographically-matched controls from New Mexico and Iowa. Data for all participants from New Mexico (27 PD patients and 27 controls) were used to evaluate in-sample LEAPD performance, with extensive cross-validation. Participants from Iowa (14 PD patients and 14 controls) were used for out-of-sample tests. Our method utilized data from six EEG leads which were as little as 2 min long. RESULTS For the in-sample dataset, LEAPD differentiated PD patients from controls with 85.3 ± 0.1% diagnostic accuracy, 93.3 ± 0.5% area under the receiver operating characteristics curve (AUC), 87.9 ± 0.9% sensitivity, and 82.7 ± 1.1% specificity, with multiple cross-validations. After head-to-head comparison with state-of-the-art methods using our dataset, LEAPD showed a 13% increase in accuracy and a 15.5% increase in AUC. When the trained classifier was applied to a distinct out-of-sample dataset, LEAPD showed reliable performance with 85.7% diagnostic accuracy, 85.2% AUC, 85.7% sensitivity, and 85.7% specificity. No statistically significant effect of levodopa-ON and levodopa-OFF sessions were found. CONCLUSION We describe LEAPD, an efficient algorithm that is suitable for real time application and captures spectral EEG features using few parameters and reliably differentiates PD patients from demographically-matched controls.
Collapse
|
44
|
Enhancer prediction in the human genome by probabilistic modelling of the chromatin feature patterns. BMC Bioinformatics 2020; 21:317. [PMID: 32689977 PMCID: PMC7370432 DOI: 10.1186/s12859-020-03621-3] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2019] [Accepted: 06/19/2020] [Indexed: 12/11/2022] Open
Abstract
Background The binding sites of transcription factors (TFs) and the localisation of histone modifications in the human genome can be quantified by the chromatin immunoprecipitation assay coupled with next-generation sequencing (ChIP-seq). The resulting chromatin feature data has been successfully adopted for genome-wide enhancer identification by several unsupervised and supervised machine learning methods. However, the current methods predict different numbers and different sets of enhancers for the same cell type and do not utilise the pattern of the ChIP-seq coverage profiles efficiently. Results In this work, we propose a PRobabilistic Enhancer PRedictIoN Tool (PREPRINT) that assumes characteristic coverage patterns of chromatin features at enhancers and employs a statistical model to account for their variability. PREPRINT defines probabilistic distance measures to quantify the similarity of the genomic query regions and the characteristic coverage patterns. The probabilistic scores of the enhancer and non-enhancer samples are utilised to train a kernel-based classifier. The performance of the method is demonstrated on ENCODE data for two cell lines. The predicted enhancers are computationally validated based on the transcriptional regulatory protein binding sites and compared to the predictions obtained by state-of-the-art methods. Conclusion PREPRINT performs favorably to the state-of-the-art methods, especially when requiring the methods to predict a larger set of enhancers. PREPRINT generalises successfully to data from cell type not utilised for training, and often the PREPRINT performs better than the previous methods. The PREPRINT enhancers are less sensitive to the choice of prediction threshold. PREPRINT identifies biologically validated enhancers not predicted by the competing methods. The enhancers predicted by PREPRINT can aid the genome interpretation in functional genomics and clinical studies.
Collapse
|
45
|
Machine learning-based identification of radiofrequency electromagnetic radiation (RF-EMR) effect on brain morphology: a preliminary study. Med Biol Eng Comput 2020; 58:1751-1765. [PMID: 32483764 DOI: 10.1007/s11517-020-02198-6] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2019] [Accepted: 05/22/2020] [Indexed: 11/28/2022]
Abstract
The brain of a human and other organisms is affected by the electromagnetic field (EMF) radiations, emanating from the cell phones and mobile towers. Prolonged exposure to EMF radiations may cause neurological changes in the brain, which in turn may bring chemical as well as morphological changes in the brain. Conventionally, the identification of EMF radiation effect on the brain is performed using cellular-level analysis. In the present work, an automatic image processing-based approach is used where geometric features extracted from the segmented brain region has been analyzed for identifying the effect of EMF radiation on the morphology of a brain, using drosophila as a specimen. Genetic algorithm-based evolutionary feature selection algorithm has been used to select an optimal set of geometrical features, which, when fed to the machine learning classifiers, result in their optimal performance. The best classification accuracy has been obtained with the neural network with an optimally selected subset of geometrical features. A statistical test has also been performed to prove that the increase in the performance of classifier post-feature selection is statistically significant. This machine learning-based study indicates that there exists discrimination between the microscopic brain images of the EMF-exposed drosophila and non-exposed drosophila. Graphical abstract Proposed Methodology for identification of radiofrequency electromagnetic radiation (RF-EMR) effect on the morphology of brain of Drosophila.
Collapse
|
46
|
Predicting potential adverse events using safety data from marketed drugs. BMC Bioinformatics 2020; 21:163. [PMID: 32349656 PMCID: PMC7191698 DOI: 10.1186/s12859-020-3509-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2019] [Accepted: 04/22/2020] [Indexed: 11/26/2022] Open
Abstract
BACKGROUND While clinical trials are considered the gold standard for detecting adverse events, often these trials are not sufficiently powered to detect difficult to observe adverse events. We developed a preliminary approach to predict 135 adverse events using post-market safety data from marketed drugs. Adverse event information available from FDA product labels and scientific literature for drugs that have the same activity at one or more of the same targets, structural and target similarities, and the duration of post market experience were used as features for a classifier algorithm. The proposed method was studied using 54 drugs and a probabilistic approach of performance evaluation using bootstrapping with 10,000 iterations. RESULTS Out of 135 adverse events, 53 had high probability of having high positive predictive value. Cross validation showed that 32% of the model-predicted safety label changes occurred within four to nine years of approval (median: six years). CONCLUSIONS This approach predicts 53 serious adverse events with high positive predictive values where well-characterized target-event relationships exist. Adverse events with well-defined target-event associations were better predicted compared to adverse events that may be idiosyncratic or related to secondary target effects that were poorly captured. Further enhancement of this model with additional features, such as target prediction and drug binding data, may increase accuracy.
Collapse
|
47
|
Abstract
Clinical trials in the era of precision cancer medicine aim to identify and validate biomarker signatures which can guide the assignment of individually optimal treatments to patients. In this article, we propose a group sequential randomized phase II design, which updates the biomarker signature as the trial goes on, utilizes enrichment strategies for patient selection, and uses Bayesian response-adaptive randomization for treatment assignment. To evaluate the performance of the new design, in addition to the commonly considered criteria of type I error and power, we propose four new criteria measuring the benefits and losses for individuals both inside and outside of the clinical trial. Compared with designs with equal randomization, the proposed design gives trial participants a better chance to receive their personalized optimal treatments and thus results in a higher response rate on the trial. This design increases the chance to discover a successful new drug by an adaptive enrichment strategy, i.e., identification and selective enrollment of a subset of patients who are sensitive to the experimental therapies. Simulation studies demonstrate these advantages of the proposed design. It is illustrated by an example based on an actual clinical trial in non-small-cell lung cancer.
Collapse
|
48
|
Evaluation of a methylation classifier for predicting pre-cancer lesion among women with abnormal results between HPV16/18 and cytology. Clin Epigenetics 2020; 12:57. [PMID: 32317020 PMCID: PMC7175486 DOI: 10.1186/s13148-020-00849-x] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2019] [Accepted: 04/08/2020] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Although HPV testing and cytology detection are successful for cervical screening in China, additional procedures are urgently required to avoid misdiagnosis and overtreatment. In this multicenter study, we collected cervical samples during screening in clinics. A total of 588 women with HPV16/18+ and/or cytology result ≥HSIL+ (high-grade squamous intraepithelial lesion or worse) were referred to colposcopy for pathological diagnosis. Methylation of S5 was quantified by pyrosequencing. RESULTS The S5 classifier separates women with ≥HSIL+ from CONCLUSION The S5 classifier with high sensitivity and specificity provided increasing diagnostic information for women with HPV16/18+ and/or cytology results and could reduce the numerous unnecessary colposcopy referrals and avoid overtreatment.
Collapse
|
49
|
A novel hybrid approach for automated detection of retinal detachment using ultrasound images. Comput Biol Med 2020; 120:103704. [PMID: 32250849 DOI: 10.1016/j.compbiomed.2020.103704] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2020] [Revised: 03/12/2020] [Accepted: 03/12/2020] [Indexed: 12/18/2022]
Abstract
Retinal detachment (RD) is an ocular emergency, which needs quick intervention to preclude permanent vision loss. In general, ocular ultrasound is used by ophthalmologists to enhance their judgment in detecting RD in eyes with media opacities which precludes the retinal evaluation. However, the quality of ultrasound (US) images may be degraded due to the presence of noise, and other retinal conditions may cause membranous echoes. All these can influence the accuracy of diagnosis. Hence, to overcome the above, we are proposing an automated system to detect RD using texton, higher order spectral (HOS) cumulants and locality sensitive discriminant analysis (LSDA) techniques. Our developed method is able to classify the posterior vitreous detachment and RD using support vector machine classifier with highest accuracy of 99.13%. Our system is ready to be tested with more diverse ultrasound images and aid ophthalmologists to arrive at a more accurate diagnosis.
Collapse
|
50
|
Circulating basophil count as a prognostic marker of tumor aggressiveness and survival outcomes in colorectal cancer. Clin Transl Med 2020; 9:6. [PMID: 32037496 PMCID: PMC7008108 DOI: 10.1186/s40169-019-0255-4] [Citation(s) in RCA: 32] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2019] [Accepted: 12/26/2019] [Indexed: 12/31/2022] Open
Abstract
Background Accumulating evidence demonstrated immune/inflammation-related implications of basophils in affecting tissue microenvironment that surrounded a tumor, and this study aimed to elucidate the clinical value of serum basophil count level. Methods Between December 2007 and September 2013, 1029 patients diagnosed with stage I–III CRC in Fudan University Shanghai Cancer Center meeting the essential criteria were identified. The Kaplan–Meier method was used to construct the survival curves. Several Cox proportional hazard models were constructed to assess the prognostic factors. A simple predictor (CB classifier) was generated by combining serum basophil count and serum carcinoembryonic antigen (CEA) level which had long been accepted as the most important and reliable prognostic factor in CRC. Results The preoperative basophils count < 0.025*109/L was strongly associated with higher T stage, higher N stage, venous invasion, perineural invasion, elevated serum CEA level, and thus poor survival (P < 0.05). Moreover, multivariate Cox analysis showed that patients with low level of preoperative basophils count had an evidently poorer DFS [Hazard ratio (HR) = 2.197, 95% CI 1.868–2.585]. Conclusions As a common immune/inflammation-related biomarker available from the blood routine examination, low level of preoperative serum basophil count was associated with aggressive biology and indicated evidently poor survival. Preoperative serum basophil count would be a useful and simple marker for the management of CRC patients.
Collapse
|