1
|
Ettetuani B, Chahboune R, Moussa A. Adjustment of p-value expression to ontology using machine learning for genetic prediction, prioritization, interaction, and its validation in glomerular disease. Front Genet 2023; 14:1215232. [PMID: 37900183 PMCID: PMC10603191 DOI: 10.3389/fgene.2023.1215232] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/01/2023] [Accepted: 08/28/2023] [Indexed: 10/31/2023] Open
Abstract
The results of gene expression analysis based on p-value can be extracted and sorted by their absolute statistical significance and then applied to multiple similarity scores of their gene ontology (GO) terms to promote the combination and adjustment of these scores as essential predictive tasks for understanding biological/clinical pathways. The latter allows the possibility to assess whether certain aspects of gene function may be associated with other varieties of genes, to evaluate regulation, and to link them into networks that prioritize candidate genes for classification by applying machine learning techniques. We then detect significant genetic interactions based on our algorithm to validate the results. Finally, based on specifically selected tissues according to their normalized gene expression and frequencies of occurrence from their different biological and clinical inputs, a reported classification of genes under the subject category has validated the abstract (glomerular diseases) as a case study.
Collapse
Affiliation(s)
- Boutaina Ettetuani
- Systems and Data Engineering Team, National School of Applied Sciences, Abdelmalek Essaadi University, Tétouan, Morocco
| | - Rajaa Chahboune
- Life and Health Sciences Team, Faculty of Medicine and Pharmacy, Abdelmalek Essaadi University, Tétouan, Morocco
| | - Ahmed Moussa
- Systems and Data Engineering Team, National School of Applied Sciences, Abdelmalek Essaadi University, Tétouan, Morocco
| |
Collapse
|
2
|
Shi Y, Wang H, Yao X, Li J, Liu J, Chen Y, Liu L, Xu J. Machine learning prediction models for different stages of non-small cell lung cancer based on tongue and tumor marker: a pilot study. BMC Med Inform Decis Mak 2023; 23:197. [PMID: 37773123 PMCID: PMC10542664 DOI: 10.1186/s12911-023-02266-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2022] [Accepted: 08/17/2023] [Indexed: 09/30/2023] Open
Abstract
OBJECTIVE To analyze the tongue feature of NSCLC at different stages, as well as the correlation between tongue feature and tumor marker, and investigate the feasibility of establishing prediction models for NSCLC at different stages based on tongue feature and tumor marker. METHODS Tongue images were collected from non-advanced NSCLC patients (n = 109) and advanced NSCLC patients (n = 110), analyzed the tongue images to obtain tongue feature, and analyzed the correlation between tongue feature and tumor marker in different stages of NSCLC. On this basis, six classifiers, decision tree, logistic regression, SVM, random forest, naive bayes, and neural network, were used to establish prediction models for different stages of NSCLC based on tongue feature and tumor marker. RESULTS There were statistically significant differences in tongue feature between the non-advanced and advanced NSCLC groups. In the advanced NSCLC group, the number of indexes with statistically significant correlations between tongue feature and tumor marker was significantly higher than in the non-advanced NSCLC group, and the correlations were stronger. Support Vector Machine (SVM), decision tree, and logistic regression among the machine learning methods performed poorly in models with different stages of NSCLC. Neural network, random forest and naive bayes had better classification efficiency for the data set of tongue feature and tumor marker and baseline. The models' classification accuracies were 0.767 ± 0.081, 0.718 ± 0.062, and 0.688 ± 0.070, respectively, and the AUCs were 0.793 ± 0.086, 0.779 ± 0.075, and 0.771 ± 0.072, respectively. CONCLUSIONS There were statistically significant differences in tongue feature between different stages of NSCLC, with advanced NSCLC tongue feature being more closely correlated with tumor marker. Due to the limited information, single data sources including baseline, tongue feature, and tumor marker cannot be used to identify the different stages of NSCLC in this pilot study. In addition to the logistic regression method, other machine learning methods, based on tumor marker and baseline data sets, can effectively improve the differential diagnosis efficiency of different stages of NSCLC by adding tongue image data, which requires further verification based on large sample studies in the future.
Collapse
Affiliation(s)
- Yulin Shi
- The Office of Academic Affairs, Shanghai, 201203, China
| | - Hao Wang
- College of Traditional Chinese Medicine, Shanghai University of Traditional Chinese Medicine, Shanghai, 201203, China
| | - Xinghua Yao
- College of Traditional Chinese Medicine, Shanghai University of Traditional Chinese Medicine, Shanghai, 201203, China
| | - Jun Li
- College of Traditional Chinese Medicine, Shanghai University of Traditional Chinese Medicine, Shanghai, 201203, China
| | - Jiayi Liu
- College of Traditional Chinese Medicine, Shanghai University of Traditional Chinese Medicine, Shanghai, 201203, China
| | - Yuan Chen
- Longhua Hospital, Shanghai University of Traditional Chinese Medicine, Shanghai, 200032, China
| | - Lingshuang Liu
- Longhua Hospital, Shanghai University of Traditional Chinese Medicine, Shanghai, 200032, China.
| | - Jiatuo Xu
- College of Traditional Chinese Medicine, Shanghai University of Traditional Chinese Medicine, Shanghai, 201203, China.
| |
Collapse
|
3
|
A Study of Logistic Regression for Fatigue Classification Based on Data of Tongue and Pulse. EVIDENCE-BASED COMPLEMENTARY AND ALTERNATIVE MEDICINE 2022; 2022:2454678. [PMID: 35287309 PMCID: PMC8917949 DOI: 10.1155/2022/2454678] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/03/2021] [Accepted: 01/05/2022] [Indexed: 12/02/2022]
Abstract
Methods The Tongue and Face Diagnosis Analysis-1 instrument and Pulse Diagnosis Analysis-1 instrument were used to collect the tongue image and sphygmogram of the subhealth fatigue population (n = 252) and disease fatigue population (n = 1160), and we mainly analyzed the tongue and pulse characteristics and constructed the classification model by using the logistic regression method. Results The results showed that subhealth fatigue people and disease fatigue people had different characteristics of tongue and pulse, and the logistic regression model based on tongue and pulse data had a good classification effect. The accuracies of models of healthy controls and subhealth fatigue, subhealth fatigue and disease fatigue, and healthy controls and disease fatigue were 68.29%, 81.18%, and 84.73%, and the AUC was 0.698, 0.882, and 0.924, respectively. Conclusion This study provided a new noninvasive method for the fatigue diagnosis from the perspective of objective tongue and pulse data, and the modern tongue diagnosis and pulse diagnosis have good application prospects.
Collapse
|
4
|
Shi Y, Yao X, Xu J, Hu X, Tu L, Lan F, Cui J, Cui L, Huang J, Li J, Bi Z, Li J. A New Approach of Fatigue Classification Based on Data of Tongue and Pulse With Machine Learning. Front Physiol 2022; 12:708742. [PMID: 35197858 PMCID: PMC8859319 DOI: 10.3389/fphys.2021.708742] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2021] [Accepted: 11/03/2021] [Indexed: 11/13/2022] Open
Abstract
BACKGROUND Fatigue is a common and subjective symptom, which is associated with many diseases and suboptimal health status. A reliable and evidence-based approach is lacking to distinguish disease fatigue and non-disease fatigue. This study aimed to establish a method for early differential diagnosis of fatigue, which can be used to distinguish disease fatigue from non-disease fatigue, and to investigate the feasibility of characterizing fatigue states in a view of tongue and pulse data analysis. METHODS Tongue and Face Diagnosis Analysis-1 (TFDA-1) instrument and Pulse Diagnosis Analysis-1 (PDA-1) instrument were used to collect tongue and pulse data. Four machine learning models were used to perform classification experiments of disease fatigue vs. non-disease fatigue. RESULTS The results showed that all the four classifiers over "Tongue & Pulse" joint data showed better performances than those only over tongue data or only over pulse data. The model accuracy rates based on logistic regression, support vector machine, random forest, and neural network were (85.51 ± 1.87)%, (83.78 ± 4.39)%, (83.27 ± 3.48)% and (85.82 ± 3.01)%, and with Area Under Curve estimates of 0.9160 ± 0.0136, 0.9106 ± 0.0365, 0.8959 ± 0.0254 and 0.9239 ± 0.0174, respectively. CONCLUSION This study proposed and validated an innovative, non-invasive differential diagnosis approach. Results suggest that it is feasible to characterize disease fatigue and non-disease fatigue by using objective tongue data and pulse data.
Collapse
Affiliation(s)
- Yulin Shi
- Basic Medical College, Shanghai University of Traditional Chinese Medicine, Pudong, China
| | - Xinghua Yao
- Basic Medical College, Shanghai University of Traditional Chinese Medicine, Pudong, China
| | - Jiatuo Xu
- Basic Medical College, Shanghai University of Traditional Chinese Medicine, Pudong, China
| | - Xiaojuan Hu
- Shanghai Innovation Center of TCM Health Service, Shanghai University of Traditional Chinese Medicine, Pudong, China
| | - Liping Tu
- Basic Medical College, Shanghai University of Traditional Chinese Medicine, Pudong, China
| | - Fang Lan
- Basic Medical College, Shanghai University of Traditional Chinese Medicine, Pudong, China
| | - Ji Cui
- Basic Medical College, Shanghai University of Traditional Chinese Medicine, Pudong, China
| | - Longtao Cui
- Basic Medical College, Shanghai University of Traditional Chinese Medicine, Pudong, China
| | - Jingbin Huang
- Basic Medical College, Shanghai University of Traditional Chinese Medicine, Pudong, China
| | - Jun Li
- Basic Medical College, Shanghai University of Traditional Chinese Medicine, Pudong, China
| | - Zijuan Bi
- Basic Medical College, Shanghai University of Traditional Chinese Medicine, Pudong, China
| | - Jiacai Li
- Basic Medical College, Shanghai University of Traditional Chinese Medicine, Pudong, China
| |
Collapse
|
5
|
A New Method for Syndrome Classification of Non-Small-Cell Lung Cancer Based on Data of Tongue and Pulse with Machine Learning. BIOMED RESEARCH INTERNATIONAL 2021; 2021:1337558. [PMID: 34423031 PMCID: PMC8373490 DOI: 10.1155/2021/1337558] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/20/2021] [Revised: 07/12/2021] [Accepted: 07/23/2021] [Indexed: 12/18/2022]
Abstract
Objective To explore the data characteristics of tongue and pulse of non-small-cell lung cancer with Qi deficiency syndrome and Yin deficiency syndrome, establish syndrome classification model based on data of tongue and pulse by using machine learning methods, and evaluate the feasibility of syndrome classification based on data of tongue and pulse. Methods We collected tongue and pulse of non-small-cell lung cancer patients with Qi deficiency syndrome (n = 163), patients with Yin deficiency syndrome (n = 174), and healthy controls (n = 185) using intelligent tongue diagnosis analysis instrument and pulse diagnosis analysis instrument, respectively. We described the characteristics and examined the correlation of data of tongue and pulse. Four machine learning methods, namely, random forest, logistic regression, support vector machine, and neural network, were used to establish the classification models based on symptom, tongue and pulse, and symptom and tongue and pulse, respectively. Results Significant difference indices of tongue diagnosis between Qi deficiency syndrome and Yin deficiency syndrome were TB-a, TB-S, TB-Cr, TC-a, TC-S, TC-Cr, perAll, and the tongue coating texture indices including TC-CON, TC-ASM, TC-MEAN, and TC-ENT. Significant difference indices of pulse diagnosis were t4 and t5. The classification performance of each model based on different datasets was as follows: tongue and pulse < symptom < symptom and tongue and pulse. The neural network model had a better classification performance for symptom and tongue and pulse datasets, with an area under the ROC curves and accuracy rate which were 0.9401 and 0.8806. Conclusions It was feasible to use tongue data and pulse data as one of the objective diagnostic basis in Qi deficiency syndrome and Yin deficiency syndrome of non-small-cell lung cancer.
Collapse
|
6
|
Chase Huizar C, Raphael I, Forsthuber TG. Genomic, proteomic, and systems biology approaches in biomarker discovery for multiple sclerosis. Cell Immunol 2020; 358:104219. [PMID: 33039896 PMCID: PMC7927152 DOI: 10.1016/j.cellimm.2020.104219] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2020] [Revised: 09/13/2020] [Accepted: 09/16/2020] [Indexed: 12/12/2022]
Abstract
Multiple sclerosis (MS) is a neuroinflammatory disorder characterized by autoimmune-mediated inflammatory lesions in CNS leading to myelin damage and axonal loss. MS is a heterogenous disease with variable and unpredictable disease course. Due to its complex nature, MS is difficult to diagnose and responses to specific treatments may vary between individuals. Therefore, there is an indisputable need for biomarkers for early diagnosis, prediction of disease exacerbations, monitoring the progression of disease, and for measuring responses to therapy. Genomic and proteomic studies have sought to understand the molecular basis of MS and find biomarker candidates. Advances in next-generation sequencing and mass-spectrometry techniques have yielded an unprecedented amount of genomic and proteomic data; yet, translation of the results into the clinic has been underwhelming. This has prompted the development of novel data science techniques for exploring these large datasets to identify biologically relevant relationships and ultimately point towards useful biomarkers. Herein we discuss optimization of omics study designs, advances in the generation of omics data, and systems biology approaches aimed at improving biomarker discovery and translation to the clinic for MS.
Collapse
Affiliation(s)
- Carol Chase Huizar
- Department of Biology, University of Texas at San Antonio, San Antonio, TX, USA
| | - Itay Raphael
- Department of Neurological Surgery, University of Pittsburgh, UPMC Children's Hospital, Pittsburgh, PA, USA.
| | - Thomas G Forsthuber
- Department of Biology, University of Texas at San Antonio, San Antonio, TX, USA.
| |
Collapse
|
7
|
Greither T, Schumacher J, Dejung M, Behre HM, Zischler H, Butter F, Herlyn H. Fertility Relevance Probability Analysis Shortlists Genetic Markers for Male Fertility Impairment. Cytogenet Genome Res 2020; 160:506-522. [PMID: 33238277 DOI: 10.1159/000511117] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2020] [Accepted: 06/26/2020] [Indexed: 12/27/2022] Open
Abstract
Impairment of male fertility is one of the major public health issues worldwide. Nevertheless, genetic causes of male sub- and infertility can often only be suspected due to the lack of reliable and easy-to-use routine tests. Yet, the development of a marker panel is complicated by the large quantity of potentially predictive markers. Actually, hundreds or even thousands of genes could have fertility relevance. Thus, a systematic method enabling a selection of the most predictive markers out of the many candidates is required. As a criterion for marker selection, we derived a gene-specific score, which we refer to as fertility relevance probability (FRP). For this purpose, we first categorized 2,753 testis-expressed genes as either candidate markers or non-candidates, according to phenotypes in male knockout mice. In a parallel approach, 2,502 genes were classified as candidate markers or non-candidates based on phenotypes in men. Subsequently, we conducted logistic regression analyses with evolutionary rates of genes (dN/dS), transcription levels in testis relative to other organs, and connectivity of the encoded proteins in a protein-protein interaction network as covariates. In confirmation of the procedure, FRP values showed the expected pattern, thus being overall higher in genes with known relevance for fertility than in their counterparts without corresponding evidence. In addition, higher FRP values corresponded with an increased dysregulation of protein abundance in spermatozoa of 37 men with normal and 38 men with impaired fertility. Present analyses resulted in a ranking of genes according to their probable predictive power as candidate markers for male fertility impairment. Thus, AKAP4, TNP1, DAZL, BRDT, DMRT1, SPO11, ZPBP, HORMAD1, and SMC1B are prime candidates toward a marker panel for male fertility impairment. Additional candidate markers are DDX4, SHCBP1L, CCDC155, ODF1, DMRTB1, ASZ1, BOLL, FKBP6, SLC25A31, PRSS21, and RNF17. FRP inference additionally provides clues for potential new markers, thereunder TEX37 and POU4F2. The results of our logistic regression analyses are freely available at the PreFer Genes website (https://prefer-genes.uni-mainz.de/).
Collapse
Affiliation(s)
- Thomas Greither
- Center for Reproductive Medicine and Andrology, Martin Luther University Halle-Wittenberg, Halle, Germany
| | - Julia Schumacher
- Anthropology, Institute of Organismic and Molecular Evolution (iomE), Johannes Gutenberg University Mainz, Mainz, Germany
| | - Mario Dejung
- Quantitative Proteomics, Institute of Molecular Biology (IMB) Mainz, Mainz, Germany
| | - Hermann M Behre
- Center for Reproductive Medicine and Andrology, Martin Luther University Halle-Wittenberg, Halle, Germany
| | - Hans Zischler
- Anthropology, Institute of Organismic and Molecular Evolution (iomE), Johannes Gutenberg University Mainz, Mainz, Germany
| | - Falk Butter
- Quantitative Proteomics, Institute of Molecular Biology (IMB) Mainz, Mainz, Germany
| | - Holger Herlyn
- Anthropology, Institute of Organismic and Molecular Evolution (iomE), Johannes Gutenberg University Mainz, Mainz, Germany,
| |
Collapse
|
8
|
Functional connectome-based biomarkers predict chronic codeine-containing cough syrup dependent. J Psychiatr Res 2020; 130:333-341. [PMID: 32889355 DOI: 10.1016/j.jpsychires.2020.08.001] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 04/01/2020] [Revised: 07/26/2020] [Accepted: 08/03/2020] [Indexed: 12/30/2022]
Abstract
PURPOSE Codeine-containing cough syrup (CCS) is considered among the most popular drugs of abuse in adolescents worldwide. Accurate prediction and identification of CCS dependent (CCSD) users are crucial. This study aimed to identify a brain-connectome-based predictor of CCSD using a machine learning model based on a ten-fold cross-validation logistic regression (LR) classifier. METHODS 40 CCSD users and 40 healthy control (HC) subjects underwent functional magnetic resonance imaging to construct weight functional networks. Partial correlation analysis was used to analyze relations between abnormal network metrics and clinical characteristics (BIS total scores, CCS abuse duration, and mean CCS dose) in CCSD. A ten-fold cross-validation LR classifier was used to classify CCSD users and HC subjects. RESULTS The CCSD group showed significantly abnormal nodes and connections in the right posterior cingulate, right middle insula, bilateral prefrontal cortex, parietal lobe, temporal lobe, occipital lobe, and cerebellum. Furthermore, higher characteristic path length and lower clustering coefficient (Cp), global efficiency, and local efficiency (Eloc) were observed in the global topologies in CCSD. The abnormal global properties (Cp and Eloc) and node properties of the prefrontal cortex were significantly correlated with clinical characteristics (BIS-11 scores, CCS abuse duration) in CCSD. The LR classifier models demonstrated accuracy, sensitivity, specificity, precision, and AUC of 82.5%, 82.5%, 82.5%, 76.8%, and 82.5%. CONCLUSIONS These data demonstrate that abnormal functional connectome may be closely linked to clinical characteristics in CCSD. Functional connectome-based biomarkers can be a powerful tool for personalized diagnosis of CCSD in the future.
Collapse
|
9
|
Zhang H, Li SJ, Zhang H, Yang ZY, Ren YQ, Xia LY, Liang Y. Meta-Analysis Based on Nonconvex Regularization. Sci Rep 2020; 10:5755. [PMID: 32238826 PMCID: PMC7113298 DOI: 10.1038/s41598-020-62473-2] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/22/2019] [Accepted: 03/06/2020] [Indexed: 01/10/2023] Open
Abstract
The widespread applications of high-throughput sequencing technology have produced a large number of publicly available gene expression datasets. However, due to the gene expression datasets have the characteristics of small sample size, high dimensionality and high noise, the application of biostatistics and machine learning methods to analyze gene expression data is a challenging task, such as the low reproducibility of important biomarkers in different studies. Meta-analysis is an effective approach to deal with these problems, but the current methods have some limitations. In this paper, we propose the meta-analysis based on three nonconvex regularization methods, which are L1/2 regularization (meta-Half), Minimax Concave Penalty regularization (meta-MCP) and Smoothly Clipped Absolute Deviation regularization (meta-SCAD). The three nonconvex regularization methods are effective approaches for variable selection developed in recent years. Through the hierarchical decomposition of coefficients, our methods not only maintain the flexibility of variable selection and improve the efficiency of selecting important biomarkers, but also summarize and synthesize scientific evidence from multiple studies to consider the relationship between different datasets. We give the efficient algorithms and the theoretical property for our methods. Furthermore, we apply our methods to the simulation data and three publicly available lung cancer gene expression datasets, and compare the performance with state-of-the-art methods. Our methods have good performance in simulation studies, and the analysis results on the three publicly available lung cancer gene expression datasets are clinically meaningful. Our methods can also be extended to other areas where datasets are heterogeneous.
Collapse
Affiliation(s)
- Hui Zhang
- Faculty of Information Technology & State Key Laboratory of Quality Research in Chinese Medicines, Macau University of Science and Technology, Taipa, 999078, Macau
| | - Shou-Jiang Li
- Faculty of Information Technology & State Key Laboratory of Quality Research in Chinese Medicines, Macau University of Science and Technology, Taipa, 999078, Macau
| | - Hai Zhang
- Faculty of Information Technology & State Key Laboratory of Quality Research in Chinese Medicines, Macau University of Science and Technology, Taipa, 999078, Macau
- School of Mathematics, Northwest University, 710127, Xi'an, China
| | - Zi-Yi Yang
- Faculty of Information Technology & State Key Laboratory of Quality Research in Chinese Medicines, Macau University of Science and Technology, Taipa, 999078, Macau
| | - Yan-Qiong Ren
- Faculty of Information Technology & State Key Laboratory of Quality Research in Chinese Medicines, Macau University of Science and Technology, Taipa, 999078, Macau
| | - Liang-Yong Xia
- Faculty of Information Technology & State Key Laboratory of Quality Research in Chinese Medicines, Macau University of Science and Technology, Taipa, 999078, Macau
| | - Yong Liang
- Faculty of Information Technology & State Key Laboratory of Quality Research in Chinese Medicines, Macau University of Science and Technology, Taipa, 999078, Macau.
| |
Collapse
|